1
|
Xiong C, Zhang M, Yang H, Wei X, Zhao C, Zhang J. Modelling cell type-specific lncRNA regulatory network in autism with Cycle. BMC Bioinformatics 2024; 25:307. [PMID: 39333906 PMCID: PMC11430139 DOI: 10.1186/s12859-024-05933-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Accepted: 09/17/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND Autism spectrum disorder (ASD) is a class of complex neurodevelopment disorders with high genetic heterogeneity. Long non-coding RNAs (lncRNAs) are vital regulators that perform specific functions within diverse cell types and play pivotal roles in neurological diseases including ASD. Therefore, exploring lncRNA regulation would contribute to deciphering ASD molecular mechanisms. Existing computational methods utilize bulk transcriptomics data to identify lncRNA regulation in all of samples, which could reveal the commonalities of lncRNA regulation in ASD, but ignore the specificity of lncRNA regulation across various cell types. RESULTS Here, we present Cycle (Cell type-specific lncRNA regulatory network) to construct the landscape of cell type-specific lncRNA regulation in ASD. We have found that each ASD cell type is unique in lncRNA regulation, and more than one-third and all cell type-specific lncRNA regulatory networks are characterized as scale-free and small-world, respectively. Across 17 ASD cell types, we have discovered 19 rewired and 11 stable modules, along with eight rewired and three stable hubs within the constructed cell type-specific lncRNA regulatory networks. Enrichment analysis reveals that the discovered rewired and stable modules and hubs are closely related to ASD. Furthermore, more similar ASD cell types tend to be connected with higher strength in the constructed cell similarity network. Finally, the comparison results demonstrate that Cycle is a potential method for uncovering cell type-specific lncRNA regulation. CONCLUSION Overall, these results illustrate that Cycle is a promising method to model the landscape of cell type-specific lncRNA regulation, and provides insights into understanding the heterogeneity of lncRNA regulation between various ASD cell types.
Collapse
Affiliation(s)
- Chenchen Xiong
- School of Engineering, Dali University, Dali, Yunnan, China
- Beijing CapitalBio Pharma Technology Co.,Ltd., Beijing, China
| | | | - Haolin Yang
- School of Engineering, Dali University, Dali, Yunnan, China
| | - Xuemei Wei
- School of Engineering, Dali University, Dali, Yunnan, China
| | - Chunwen Zhao
- School of Engineering, Dali University, Dali, Yunnan, China
| | - Junpeng Zhang
- School of Engineering, Dali University, Dali, Yunnan, China.
| |
Collapse
|
2
|
Shi K, Huang K, Li L, Liu Q, Zhang Y, Zheng H. Predicting microbe-disease association based on graph autoencoder and inductive matrix completion with multi-similarities fusion. Front Microbiol 2024; 15:1438942. [PMID: 39355422 PMCID: PMC11443509 DOI: 10.3389/fmicb.2024.1438942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 08/02/2024] [Indexed: 10/03/2024] Open
Abstract
Background Clinical studies have demonstrated that microbes play a crucial role in human health and disease. The identification of microbe-disease interactions can provide insights into the pathogenesis and promote the diagnosis, treatment, and prevention of disease. Although a large number of computational methods are designed to screen novel microbe-disease associations, the accurate and efficient methods are still lacking due to data inconsistence, underutilization of prior information, and model performance. Methods In this study, we proposed an improved deep learning-based framework, named GIMMDA, to identify latent microbe-disease associations, which is based on graph autoencoder and inductive matrix completion. By co-training the information from microbe and disease space, the new representations of microbes and diseases are used to reconstruct microbe-disease association in the end-to-end framework. In particular, a similarity fusion strategy is conducted to improve prediction performance. Results The experimental results show that the performance of GIMMDA is competitive with that of existing state-of-the-art methods on 3 datasets (i.e., HMDAD, Disbiome, and multiMDA). In particular, it performs best with the area under the receiver operating characteristic curve (AUC) of 0.9735, 0.9156, 0.9396 on abovementioned 3 datasets, respectively. And the result also confirms that different similarity fusions can improve the prediction performance. Furthermore, case studies on two diseases, i.e., asthma and obesity, validate the effectiveness and reliability of our proposed model. Conclusion The proposed GIMMDA model show a strong capability in predicting microbe-disease associations. We expect that GPUDMDA will help identify potential microbe-related diseases in the future.
Collapse
Affiliation(s)
- Kai Shi
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent Systems, Guilin University of Technology, Guilin, China
| | - Kai Huang
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| | - Lin Li
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| | - Qiaohui Liu
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| | - Yi Zhang
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| | - Huilin Zheng
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| |
Collapse
|
3
|
Mehryary F, Nastou K, Ohta T, Jensen LJ, Pyysalo S. STRING-ing together protein complexes: corpus and methods for extracting physical protein interactions from the biomedical literature. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae552. [PMID: 39276156 PMCID: PMC11441320 DOI: 10.1093/bioinformatics/btae552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 07/01/2024] [Accepted: 09/12/2024] [Indexed: 09/16/2024]
Abstract
MOTIVATION Understanding biological processes relies heavily on curated knowledge of physical interactions between proteins. Yet, a notable gap remains between the information stored in databases of curated knowledge and the plethora of interactions documented in the scientific literature. RESULTS To bridge this gap, we introduce ComplexTome, a manually annotated corpus designed to facilitate the development of text-mining methods for the extraction of complex formation relationships among biomedical entities targeting the downstream semantics of the physical interaction subnetwork of the STRING database. This corpus comprises 1287 documents with ∼3500 relationships. We train a novel relation extraction model on this corpus and find that it can highly reliably identify physical protein interactions (F1-score = 82.8%). We additionally enhance the model's capabilities through unsupervised trigger word detection and apply it to extract relations and trigger words for these relations from all open publications in the domain literature. This information has been fully integrated into the latest version of the STRING database. AVAILABILITY AND IMPLEMENTATION We provide the corpus, code, and all results produced by the large-scale runs of our systems biomedical on literature via Zenodo https://doi.org/10.5281/zenodo.8139716, Github https://github.com/farmeh/ComplexTome_extraction, and the latest version of STRING database https://string-db.org/.
Collapse
Affiliation(s)
- Farrokh Mehryary
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark
| | - Tomoko Ohta
- Textimi, 1-37-13 Kitazawa, Tokyo, Setagaya-ku 155-0031, Japan
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark
| | - Sampo Pyysalo
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| |
Collapse
|
4
|
Chu S, Duan G, Yan C. PGCNMDA: Learning node representations along paths with graph convolutional network for predicting miRNA-disease associations. Methods 2024; 229:71-81. [PMID: 38909974 DOI: 10.1016/j.ymeth.2024.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/26/2024] [Accepted: 06/16/2024] [Indexed: 06/25/2024] Open
Abstract
Identifying miRNA-disease associations (MDAs) is crucial for improving the diagnosis and treatment of various diseases. However, biological experiments can be time-consuming and expensive. To overcome these challenges, computational approaches have been developed, with Graph Convolutional Network (GCN) showing promising results in MDA prediction. The success of GCN-based methods relies on learning a meaningful spatial operator to extract effective node feature representations. To enhance the inference of MDAs, we propose a novel method called PGCNMDA, which employs graph convolutional networks with a learning graph spatial operator from paths. This approach enables the generation of meaningful spatial convolutions from paths in GCN, leading to improved prediction performance. On HMDD v2.0, PGCNMDA obtains a mean AUC of 0.9229 and an AUPRC of 0.9206 under 5-fold cross-validation (5-CV), and a mean AUC of 0.9235 and an AUPRC of 0.9212 under 10-fold cross-validation (10-CV), respectively. Additionally, the AUC of PGCNMDA also reaches 0.9238 under global leave-one-out cross-validation (GLOOCV). On HMDD v3.2, PGCNMDA obtains a mean AUC of 0.9413 and an AUPRC of 0.9417 under 5-CV, and a mean AUC of 0.9419 and an AUPRC of 0.9425 under 10-CV, respectively. Furthermore, the AUC of PGCNMDA also reaches 0.9415 under GLOOCV. The results show that PGCNMDA is superior to other compared methods. In addition, the case studies on pancreatic neoplasms, thyroid neoplasms and leukemia show that 50, 50 and 48 of the top 50 predicted miRNAs linked to these diseases are confirmed, respectively. It further validates the effectiveness and feasibility of PGCNMDA in practical applications.
Collapse
Affiliation(s)
- Shuang Chu
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China.
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Cheng Yan
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China.
| |
Collapse
|
5
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
6
|
Valerio M, Inno A, Zambelli A, Cortesi L, Lorusso D, Viassolo V, Verzè M, Nicolis F, Gori S. Deep Neural Network Integrated into Network-Based Stratification (D3NS): A Method to Uncover Cancer Subtypes from Somatic Mutations. Cancers (Basel) 2024; 16:2845. [PMID: 39199616 PMCID: PMC11352240 DOI: 10.3390/cancers16162845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/06/2024] [Accepted: 08/12/2024] [Indexed: 09/01/2024] Open
Abstract
(1) Background: The identification of tumor subtypes is fundamental in precision medicine for accurate diagnoses and personalized therapies. Cancer development is often driven by the accumulation of somatic mutations that can cause alterations in tissue functions and morphologies. In this work, a method based on a deep neural network integrated into a network-based stratification framework (D3NS) is proposed to stratify tumors according to somatic mutations. (2) Methods: This approach leverages the power of deep neural networks to detect hidden information in the data by combining the knowledge contained in a network of gene interactions, as typical of network-based stratification methods. D3NS was applied using real-world data from The Cancer Genome Atlas for bladder, ovarian, and kidney cancers. (3) Results: This technique allows for the identification of tumor subtypes characterized by different survival rates and significant associations with several clinical outcomes (tumor stage, grade or response to therapy). (4) Conclusion: D3NS can provide a base model in cancer research and could be considered as a useful tool for tumor stratification, offering potential support in clinical settings.
Collapse
Affiliation(s)
- Matteo Valerio
- Medical Oncology, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy
| | - Alessandro Inno
- Medical Oncology, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy
| | - Alberto Zambelli
- Medical Oncology Unit, IRCCS Istituto Clinico Humanitas and Department of Biomedical Sciences, Humanitas University, 20089 Rozzano, Milan, Italy;
| | - Laura Cortesi
- Oncology, Hematology, and Respiratory Diseases, Azienda Ospedaliera-Universitaria, Policlinico di Modena, 41124 Modena, Italy
| | - Domenica Lorusso
- Gynecologic Oncology Unit, Humanitas San Pio X, Milan and Humanitas University, Pieve Emanuele, 20090 Milan, Italy
| | - Valeria Viassolo
- Medical Genetics, Medical Direction, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy;
| | - Matteo Verzè
- Medical Direction, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy; (M.V.)
| | - Fabrizio Nicolis
- Medical Direction, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy; (M.V.)
| | - Stefania Gori
- Medical Oncology, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy
| |
Collapse
|
7
|
Shor B, Schneidman-Duhovny D. Integrative modeling meets deep learning: Recent advances in modeling protein assemblies. Curr Opin Struct Biol 2024; 87:102841. [PMID: 38795564 DOI: 10.1016/j.sbi.2024.102841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/24/2024] [Accepted: 04/27/2024] [Indexed: 05/28/2024]
Abstract
Recent progress in protein structure prediction based on deep learning revolutionized the field of Structural Biology. Beyond single proteins, it also enabled high-throughput prediction of structures of protein-protein interactions. Despite the success in predicting complex structures, large macromolecular assemblies still require specialized approaches. Here we describe recent advances in modeling macromolecular assemblies using integrative and hierarchical approaches. We highlight applications that predict protein-protein interactions and challenges in modeling complexes based on the interaction networks, including the prediction of complex stoichiometry and heterogeneity.
Collapse
Affiliation(s)
- Ben Shor
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel. https://twitter.com/ben_shor
| | - Dina Schneidman-Duhovny
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
8
|
Sullivan KA, Miller JI, Townsend A, Morgan M, Lane M, Pavicic M, Shah M, Cashman M, Jacobson DA. MENTOR: Multiplex Embedding of Networks for Team-Based Omics Research. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.17.603821. [PMID: 39091782 PMCID: PMC11291001 DOI: 10.1101/2024.07.17.603821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
While the proliferation of data-driven omics technologies has continued to accelerate, methods of identifying relationships among large-scale changes from omics experiments have stagnated. It is therefore imperative to develop methods that can identify key mechanisms among one or more omics experiments in order to advance biological discovery. To solve this problem, here we describe the network-based algorithm MENTOR - Multiplex Embedding of Networks for Team-Based Omics Research. We demonstrate MENTOR's utility as a supervised learning approach to successfully partition a gene set containing multiple ontological functions into their respective functions. Subsequently, we used MENTOR as an unsupervised learning approach to identify important biological functions pertaining to the host genetic architectures in Populus trichocarpa associated with microbial abundance of multiple taxa. Moreover, as open source software designed with scientific teams in mind, we demonstrate the ability to use the output of MENTOR to facilitate distributed interpretation of omics experiments.
Collapse
|
9
|
Zhao Y, Xiang J, Shi X, Jia P, Zhang Y, Li M. MDDOmics: multi-omics resource of major depressive disorder. Database (Oxford) 2024; 2024:baae042. [PMID: 38917209 PMCID: PMC11197964 DOI: 10.1093/database/baae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/02/2024] [Accepted: 05/29/2024] [Indexed: 06/27/2024]
Abstract
Major depressive disorder (MDD) is a pressing global health issue. Its pathogenesis remains elusive, but numerous studies have revealed its intricate associations with various biological factors. Consequently, there is an urgent need for a comprehensive multi-omics resource to help researchers in conducting multi-omics data analysis for MDD. To address this issue, we constructed the MDDOmics database (Major Depressive Disorder Omics, (https://www.csuligroup.com/MDDOmics/), which integrates an extensive collection of published multi-omics data related to MDD. The database contains 41 222 entries of MDD research results and several original datasets, including Single Nucleotide Polymorphisms, genes, non-coding RNAs, DNA methylations, metabolites and proteins, and offers various interfaces for searching and visualization. We also provide extensive downstream analyses of the collected MDD data, including differential analysis, enrichment analysis and disease-gene prediction. Moreover, the database also incorporates multi-omics data for bipolar disorder, schizophrenia and anxiety disorder, due to the challenge in differentiating MDD from similar psychiatric disorders. In conclusion, by leveraging the rich content and online interfaces from MDDOmics, researchers can conduct more comprehensive analyses of MDD and its similar disorders from various perspectives, thereby gaining a deeper understanding of potential MDD biomarkers and intricate disease pathogenesis. Database URL: https://www.csuligroup.com/MDDOmics/.
Collapse
Affiliation(s)
- Yichao Zhao
- School of Computer Science and Engineering, Central South University, No.932 South Lushan Road, Changsha 410083, China
| | - Ju Xiang
- School of Computer and Communication Engineering, Changsha University of Science and Technology, No.45 Chiling Road, Changsha 410114, China
| | - Xingyuan Shi
- School of Computer Science and Engineering, Central South University, No.932 South Lushan Road, Changsha 410083, China
| | - Pengzhen Jia
- School of Computer Science and Engineering, Central South University, No.932 South Lushan Road, Changsha 410083, China
| | - Yan Zhang
- Department of Psychiatry, and National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, No.139 Renmin Road Central, Changsha 410011, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, No.932 South Lushan Road, Changsha 410083, China
| |
Collapse
|
10
|
Stincone P, Naimi A, Saviola AJ, Reher R, Petras D. Decoding the molecular interplay in the central dogma: An overview of mass spectrometry-based methods to investigate protein-metabolite interactions. Proteomics 2024; 24:e2200533. [PMID: 37929699 DOI: 10.1002/pmic.202200533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 10/15/2023] [Accepted: 10/23/2023] [Indexed: 11/07/2023]
Abstract
With the emergence of next-generation nucleotide sequencing and mass spectrometry-based proteomics and metabolomics tools, we have comprehensive and scalable methods to analyze the genes, transcripts, proteins, and metabolites of a multitude of biological systems. Despite the fascinating new molecular insights at the genome, transcriptome, proteome and metabolome scale, we are still far from fully understanding cellular organization, cell cycles and biology at the molecular level. Significant advances in sensitivity and depth for both sequencing as well as mass spectrometry-based methods allow the analysis at the single cell and single molecule level. At the same time, new tools are emerging that enable the investigation of molecular interactions throughout the central dogma of molecular biology. In this review, we provide an overview of established and recently developed mass spectrometry-based tools to probe metabolite-protein interactions-from individual interaction pairs to interactions at the proteome-metabolome scale.
Collapse
Affiliation(s)
- Paolo Stincone
- University of Tuebingen, CMFI Cluster of Excellence, Interfaculty Institute of Microbiology and Infection Medicine, Tuebingen, Germany
- University of Tuebingen, Center for Plant Molecular Biology, Tuebingen, Germany
| | - Amira Naimi
- University of Marburg, Institute of Pharmaceutical Biology and Biotechnology, Marburg, Germany
| | | | - Raphael Reher
- University of Marburg, Institute of Pharmaceutical Biology and Biotechnology, Marburg, Germany
| | - Daniel Petras
- University of Tuebingen, CMFI Cluster of Excellence, Interfaculty Institute of Microbiology and Infection Medicine, Tuebingen, Germany
- University of California Riverside, Department of Biochemistry, Riverside, USA
| |
Collapse
|
11
|
Cox RM, Papoulas O, Shril S, Lee C, Gardner T, Battenhouse AM, Lee M, Drew K, McWhite CD, Yang D, Leggere JC, Durand D, Hildebrandt F, Wallingford JB, Marcotte EM. Ancient eukaryotic protein interactions illuminate modern genetic traits and disorders. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.26.595818. [PMID: 38853926 PMCID: PMC11160598 DOI: 10.1101/2024.05.26.595818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
All eukaryotes share a common ancestor from roughly 1.5 - 1.8 billion years ago, a single-celled, swimming microbe known as LECA, the Last Eukaryotic Common Ancestor. Nearly half of the genes in modern eukaryotes were present in LECA, and many current genetic diseases and traits stem from these ancient molecular systems. To better understand these systems, we compared genes across modern organisms and identified a core set of 10,092 shared protein-coding gene families likely present in LECA, a quarter of which are uncharacterized. We then integrated >26,000 mass spectrometry proteomics analyses from 31 species to infer how these proteins interact in higher-order complexes. The resulting interactome describes the biochemical organization of LECA, revealing both known and new assemblies. We analyzed these ancient protein interactions to find new human gene-disease relationships for bone density and congenital birth defects, demonstrating the value of ancestral protein interactions for guiding functional genetics today.
Collapse
Affiliation(s)
- Rachael M Cox
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Ophelia Papoulas
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Shirlee Shril
- Division of Nephrology, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02215, USA
| | - Chanjae Lee
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Tynan Gardner
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Anna M Battenhouse
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Muyoung Lee
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Kevin Drew
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Claire D McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - David Yang
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Janelle C Leggere
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, 4400 5th Avenue Pittsburgh, PA 15213, USA
| | - Friedhelm Hildebrandt
- Division of Nephrology, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02215, USA
| | - John B Wallingford
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Edward M Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
12
|
Wei H, Gao L, Wu S, Jiang Y, Liu B. DiSMVC: a multi-view graph collaborative learning framework for measuring disease similarity. Bioinformatics 2024; 40:btae306. [PMID: 38715444 PMCID: PMC11256965 DOI: 10.1093/bioinformatics/btae306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/19/2024] [Accepted: 05/05/2024] [Indexed: 05/30/2024] Open
Abstract
MOTIVATION Exploring potential associations between diseases can help in understanding pathological mechanisms of diseases and facilitating the discovery of candidate biomarkers and drug targets, thereby promoting disease diagnosis and treatment. Some computational methods have been proposed for measuring disease similarity. However, these methods describe diseases without considering their latent multi-molecule regulation and valuable supervision signal, resulting in limited biological interpretability and efficiency to capture association patterns. RESULTS In this study, we propose a new computational method named DiSMVC. Different from existing predictors, DiSMVC designs a supervised graph collaborative framework to measure disease similarity. Multiple bio-entity associations related to genes and miRNAs are integrated via cross-view graph contrastive learning to extract informative disease representation, and then association pattern joint learning is implemented to compute disease similarity by incorporating phenotype-annotated disease associations. The experimental results show that DiSMVC can draw discriminative characteristics for disease pairs, and outperform other state-of-the-art methods. As a result, DiSMVC is a promising method for predicting disease associations with molecular interpretability. AVAILABILITY AND IMPLEMENTATION Datasets and source codes are available at https://github.com/Biohang/DiSMVC.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Shuai Wu
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Yina Jiang
- Department of Basic Medicine, Shaanxi University of Chinese Medicine, Xianyang, Shaanxi 712046, China
| | - Bin Liu
- Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen, Guangdong 518172, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
13
|
Wang M, Yan X, Dong Y, Li X, Gao B. Machine learning and multi-omics data reveal driver gene-based molecular subtypes in hepatocellular carcinoma for precision treatment. PLoS Comput Biol 2024; 20:e1012113. [PMID: 38728362 PMCID: PMC11230636 DOI: 10.1371/journal.pcbi.1012113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 07/08/2024] [Accepted: 04/24/2024] [Indexed: 05/12/2024] Open
Abstract
The heterogeneity of Hepatocellular Carcinoma (HCC) poses a barrier to effective treatment. Stratifying highly heterogeneous HCC into molecular subtypes with similar features is crucial for personalized anti-tumor therapies. Although driver genes play pivotal roles in cancer progression, their potential in HCC subtyping has been largely overlooked. This study aims to utilize driver genes to construct HCC subtype models and unravel their molecular mechanisms. Utilizing a novel computational framework, we expanded the initially identified 96 driver genes to 1192 based on mutational aspects and an additional 233 considering driver dysregulation. These genes were subsequently employed as stratification markers for further analyses. A novel multi-omics subtype classification algorithm was developed, leveraging mutation and expression data of the identified stratification genes. This algorithm successfully categorized HCC into two distinct subtypes, CLASS A and CLASS B, demonstrating significant differences in survival outcomes. Integrating multi-omics and single-cell data unveiled substantial distinctions between these subtypes regarding transcriptomics, mutations, copy number variations, and epigenomics. Moreover, our prognostic model exhibited excellent predictive performance in training and external validation cohorts. Finally, a 10-gene classification model for these subtypes identified TTK as a promising therapeutic target with robust classification capabilities. This comprehensive study provides a novel perspective on HCC stratification, offering crucial insights for a deeper understanding of its pathogenesis and the development of promising treatment strategies.
Collapse
Affiliation(s)
- Meng Wang
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| | - Xinyue Yan
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| | - Yanan Dong
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| | - Xiaoqin Li
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| | - Bin Gao
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| |
Collapse
|
14
|
Han J, Kang MJ, Lee S. DRSPRING: Graph convolutional network (GCN)-Based drug synergy prediction utilizing drug-induced gene expression profile. Comput Biol Med 2024; 174:108436. [PMID: 38643597 DOI: 10.1016/j.compbiomed.2024.108436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 04/01/2024] [Accepted: 04/07/2024] [Indexed: 04/23/2024]
Abstract
Great efforts have been made over the years to identify novel drug pairs with synergistic effects. Although numerous computational approaches have been proposed to analyze diverse types of biological big data, the pharmacogenomic profiles, presumably the most direct proxy of drug effects, have been rarely used due to the data sparsity problem. In this study, we developed a composite deep-learning-based model that predicts the drug synergy effect utilizing pharmacogenomic profiles as well as molecular properties. Graph convolutional network (GCN) was used to represent and integrate the chemical structure, genetic interactions, drug-target information, and gene expression profiles of cell lines. Insufficient amount of pharmacogenomic data, i.e., drug-induced expression profiles from the LINCS project, was resolved by augmenting the data with the predicted profiles. Our method learned and predicted the Loewe synergy score in the DrugComb database and achieved a better or comparable performance compared to other published methods in a benchmark test. We also investigated contribution of various input features, which highlighted the value of basal gene expression and pharmacogenomic profiles of each cell line. Importantly, DRSPRING (DRug Synergy PRediction by INtegrated GCN) can be applied to any drug pairs and any cell lines, greatly expanding its applicability compared to previous methods.
Collapse
Affiliation(s)
- Jiyeon Han
- Department of Bio-Information Science, Ewha Womans University, Seoul, 03760, Republic of Korea
| | - Min Ji Kang
- Department of Life Sciences, Ewha Womans University, Seoul, 03760, Republic of Korea
| | - Sanghyuk Lee
- Department of Bio-Information Science, Ewha Womans University, Seoul, 03760, Republic of Korea; Department of Life Sciences, Ewha Womans University, Seoul, 03760, Republic of Korea.
| |
Collapse
|
15
|
Wright SN, Colton S, Schaffer LV, Pillich RT, Churas C, Pratt D, Ideker T. State of the Interactomes: an evaluation of molecular networks for generating biological insights. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.26.587073. [PMID: 38746239 PMCID: PMC11092493 DOI: 10.1101/2024.04.26.587073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Advancements in genomic and proteomic technologies have powered the use of gene and protein networks ("interactomes") for understanding genotype-phenotype translation. However, the proliferation of interactomes complicates the selection of networks for specific applications. Here, we present a comprehensive evaluation of 46 current human interactomes, encompassing protein-protein interactions as well as gene regulatory, signaling, colocalization, and genetic interaction networks. Our analysis shows that large composite networks such as HumanNet, STRING, and FunCoup are most effective for identifying disease genes, while smaller networks such as DIP and SIGNOR demonstrate strong interaction prediction performance. These findings provide a benchmark for interactomes across diverse network biology applications and clarify factors that influence network performance. Furthermore, our evaluation pipeline paves the way for continued assessment of emerging and updated interaction networks in the future.
Collapse
|
16
|
He J, Li M, Qiu J, Pu X, Guo Y. HOPEXGB: A Consensual Model for Predicting miRNA/lncRNA-Disease Associations Using a Heterogeneous Disease-miRNA-lncRNA Information Network. J Chem Inf Model 2024; 64:2863-2877. [PMID: 37604142 DOI: 10.1021/acs.jcim.3c00856] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Predicting disease-related microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) is crucial to find new biomarkers for the prevention, diagnosis, and treatment of complex human diseases. Computational predictions for miRNA/lncRNA-disease associations are of great practical significance, since traditional experimental detection is expensive and time-consuming. In this paper, we proposed a consensual machine-learning technique-based prediction approach to identify disease-related miRNAs and lncRNAs by high-order proximity preserved embedding (HOPE) and eXtreme Gradient Boosting (XGB), named HOPEXGB. By connecting lncRNA, miRNA, and disease nodes based on their correlations and relationships, we first created a heterogeneous disease-miRNA-lncRNA (DML) information network to achieve an effective fusion of information on similarities, correlations, and interactions among miRNAs, lncRNAs, and diseases. In addition, a more rational negative data set was generated based on the similarities of unknown associations with the known ones, so as to effectively reduce the false negative rate in the data set for model construction. By 10-fold cross-validation, HOPE shows better performance than other graph embedding methods. The final consensual HOPEXGB model yields robust performance with a mean prediction accuracy of 0.9569 and also demonstrates high sensitivity and specificity advantages compared to lncRNA/miRNA-specific predictions. Moreover, it is superior to other existing methods and gives promising performance on the external testing data, indicating that integrating the information on lncRNA-miRNA interactions and the similarities of lncRNAs/miRNAs is beneficial for improving the prediction performance of the model. Finally, case studies on lung, stomach, and breast cancers indicate that HOPEXGB could be a powerful tool for preclinical biomarker detection and bioexperiment preliminary screening for the diagnosis and prognosis of cancers. HOPEXGB is publicly available at https://github.com/airpamper/HOPEXGB.
Collapse
Affiliation(s)
- Jian He
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jiangguo Qiu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
17
|
Lapcik P, Stacey RG, Potesil D, Kulhanek P, Foster LJ, Bouchal P. Global Interactome Mapping Reveals Pro-tumorigenic Interactions of NF-κB in Breast Cancer. Mol Cell Proteomics 2024; 23:100744. [PMID: 38417630 PMCID: PMC10988130 DOI: 10.1016/j.mcpro.2024.100744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 02/01/2024] [Accepted: 02/23/2024] [Indexed: 03/01/2024] Open
Abstract
NF-κB pathway is involved in inflammation; however, recent data shows its role also in cancer development and progression, including metastasis. To understand the role of NF-κB interactome dynamics in cancer, we study the complexity of breast cancer interactome in luminal A breast cancer model and its rearrangement associated with NF-κB modulation. Liquid chromatography-mass spectrometry measurement of 160 size-exclusion chromatography fractions identifies 5460 protein groups. Seven thousand five hundred sixty eight interactions among these proteins have been reconstructed by PrInCE algorithm, of which 2564 have been validated in independent datasets. NF-κB modulation leads to rearrangement of protein complexes involved in NF-κB signaling and immune response, cell cycle regulation, and DNA replication. Central NF-κB transcription regulator RELA co-elutes with interactors of NF-κB activator PRMT5, and these complexes are confirmed by AlphaPulldown prediction. A complementary immunoprecipitation experiment recapitulates RELA interactions with other NF-κB factors, associating NF-κB inhibition with lower binding of NF-κB activators to RELA. This study describes a network of pro-tumorigenic protein interactions and their rearrangement upon NF-κB inhibition with potential therapeutic implications in tumors with high NF-κB activity.
Collapse
Affiliation(s)
- Petr Lapcik
- Department of Biochemistry, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - R Greg Stacey
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - David Potesil
- Proteomics Core Facility, Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Petr Kulhanek
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Leonard J Foster
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada; Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, Canada
| | - Pavel Bouchal
- Department of Biochemistry, Faculty of Science, Masaryk University, Brno, Czech Republic.
| |
Collapse
|
18
|
Deritei D, Inuzuka H, Castaldi PJ, Yun JH, Xu Z, Anamika WJ, Asara JM, Guo F, Zhou X, Glass K, Wei W, Silverman EK. HHIP protein interactions in lung cells provide insight into COPD pathogenesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.01.586839. [PMID: 38617310 PMCID: PMC11014494 DOI: 10.1101/2024.04.01.586839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Chronic obstructive pulmonary disease (COPD) is the third leading cause of death worldwide. The primary causes of COPD are environmental, including cigarette smoking; however, genetic susceptibility also contributes to COPD risk. Genome-Wide Association Studies (GWASes) have revealed more than 80 genetic loci associated with COPD, leading to the identification of multiple COPD GWAS genes. However, the biological relationships between the identified COPD susceptibility genes are largely unknown. Genes associated with a complex disease are often in close network proximity, i.e. their protein products often interact directly with each other and/or similar proteins. In this study, we use affinity purification mass spectrometry (AP-MS) to identify protein interactions with HHIP , a well-established COPD GWAS gene which is part of the sonic hedgehog pathway, in two disease-relevant lung cell lines (IMR90 and 16HBE). To better understand the network neighborhood of HHIP , its proximity to the protein products of other COPD GWAS genes, and its functional role in COPD pathogenesis, we create HUBRIS, a protein-protein interaction network compiled from 8 publicly available databases. We identified both common and cell type-specific protein-protein interactors of HHIP. We find that our newly identified interactions shorten the network distance between HHIP and the protein products of several COPD GWAS genes, including DSP, MFAP2, TET2 , and FBLN5 . These new shorter paths include proteins that are encoded by genes involved in extracellular matrix and tissue organization. We found and validated interactions to proteins that provide new insights into COPD pathobiology, including CAVIN1 (IMR90) and TP53 (16HBE). The newly discovered HHIP interactions with CAVIN1 and TP53 implicate HHIP in response to oxidative stress.
Collapse
|
19
|
Pudjihartono M, Golovina E, Fadason T, O'Sullivan JM, Schierding W. Links between melanoma germline risk loci, driver genes and comorbidities: insight from a tissue-specific multi-omic analysis. Mol Oncol 2024; 18:1031-1048. [PMID: 38308491 PMCID: PMC10994230 DOI: 10.1002/1878-0261.13599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 11/15/2023] [Accepted: 01/22/2024] [Indexed: 02/04/2024] Open
Abstract
Genome-wide association studies (GWAS) have associated 76 loci with the risk of developing melanoma. However, understanding the molecular basis of such associations has remained a challenge because most of these loci are in non-coding regions of the genome. Here, we integrated data on epigenomic markers, three-dimensional (3D) genome organization, and expression quantitative trait loci (eQTL) from melanoma-relevant tissues and cell types to gain novel insights into the mechanisms underlying melanoma risk. This integrative approach revealed a total of 151 target genes, both near and far away from the risk loci in linear sequence, with known and novel roles in the etiology of melanoma. Using protein-protein interaction networks, we identified proteins that interact-directly or indirectly-with the products of the target genes. The interacting proteins were enriched for known melanoma driver genes. Further integration of these target genes into tissue-specific gene regulatory networks revealed patterns of gene regulation that connect melanoma to its comorbidities. Our study provides novel insights into the biological implications of genetic variants associated with melanoma risk.
Collapse
Affiliation(s)
| | | | | | - Justin M. O'Sullivan
- Liggins InstituteThe University of AucklandNew Zealand
- The Maurice Wilkins CentreThe University of AucklandNew Zealand
- Australian Parkinson's MissionGarvan Institute of Medical ResearchSydneyAustralia
- MRC Lifecourse Epidemiology UnitUniversity of SouthamptonUK
- Singapore Institute for Clinical SciencesAgency for Science, Technology and Research (A*STAR)Singapore CitySingapore
| | - William Schierding
- Liggins InstituteThe University of AucklandNew Zealand
- The Maurice Wilkins CentreThe University of AucklandNew Zealand
| |
Collapse
|
20
|
Chen Y, Zhang L. Hi-GeoMVP: a hierarchical geometry-enhanced deep learning model for drug response prediction. Bioinformatics 2024; 40:btae204. [PMID: 38614131 PMCID: PMC11060866 DOI: 10.1093/bioinformatics/btae204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 02/11/2024] [Accepted: 04/11/2024] [Indexed: 04/15/2024] Open
Abstract
MOTIVATION Personalized cancer treatments require accurate drug response predictions. Existing deep learning methods show promise but higher accuracy is needed to serve the purpose of precision medicine. The prediction accuracy can be improved with not only topology but geometrical information of drugs. RESULTS A novel deep learning methodology for drug response prediction is presented, named Hi-GeoMVP. It synthesizes hierarchical drug representation with multi-omics data, leveraging graph neural networks and variational autoencoders for detailed drug and cell line representations. Multi-task learning is employed to make better prediction, while both 2D and 3D molecular representations capture comprehensive drug information. Testing on the GDSC dataset confirms Hi-GeoMVP's enhanced performance, surpassing prior state-of-the-art methods by improving the Pearson correlation coefficient from 0.934 to 0.941 and decreasing the root mean square error from 0.969 to 0.931. In the case of blind test, Hi-GeoMVP demonstrated robustness, outperforming the best previous models with a superior Pearson correlation coefficient in the drug-blind test. These results underscore Hi-GeoMVP's capabilities in drug response prediction, implying its potential for precision medicine. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/matcyr/Hi-GeoMVP.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and the Centre for Data Science and Machine Learning, National University of Singapore, Singapore 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and the Centre for Data Science and Machine Learning, National University of Singapore, Singapore 119076, Singapore
| |
Collapse
|
21
|
Niemira M, Bielska A, Chwialkowska K, Raczkowska J, Skwarska A, Erol A, Zeller A, Sokolowska G, Toczydlowski D, Sidorkiewicz I, Mariak Z, Reszec J, Lyson T, Moniuszko M, Kretowski A. Circulating serum miR-362-3p and miR-6721-5p as potential biomarkers for classification patients with adult-type diffuse glioma. Front Mol Biosci 2024; 11:1368372. [PMID: 38455766 PMCID: PMC10918470 DOI: 10.3389/fmolb.2024.1368372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/05/2024] [Indexed: 03/09/2024] Open
Abstract
According to the fifth edition of the WHO Classification of Tumours of the Central Nervous System (CNS) published in 2021, grade 4 gliomas classification includes IDH-mutant astrocytomas and wild-type IDH glioblastomas. Unfortunately, despite precision oncology development, the prognosis for patients with grade 4 glioma remains poor, indicating an urgent need for better diagnostic and therapeutic strategies. Circulating miRNAs besides being important regulators of cancer development could serve as promising diagnostic biomarkers for patients with grade 4 glioma. Here, we propose a two-miRNA miR-362-3p and miR-6721-5p screening signature for serum for non-invasive classification of identified glioma cases into the highest-grade 4 and lower-grade gliomas. A total of 102 samples were included in this study, comprising 78 grade 4 glioma cases and 24 grade 2-3 glioma subjects. Using the NanoString platform, seven miRNAs were identified as differentially expressed (DE), which was subsequently confirmed via RT-qPCR analysis. Next, numerous combinations of DE miRNAs were employed to develop classification models. The dual panel of miR-362-3p and miR-6721-5p displayed the highest diagnostic value to differentiate grade 4 patients and lower grade cases with an AUC of 0.867. Additionally, this signature also had a high AUC = 0.854 in the verification cohorts by RT-qPCR and an AUC = 0.842 using external data from the GEO public database. The functional annotation analyses of predicted DE miRNA target genes showed their primary involvement in the STAT3 and HIF-1 signalling pathways and the signalling pathway of pluripotency of stem cells and glioblastoma-related pathways. For additional exploration of miRNA expression patterns correlated with glioma, we performed the Weighted Gene-Co Expression Network Analysis (WGCNA). We showed that the modules most associated with glioma grade contained as many as six DE miRNAs. In conclusion, this study presents the first evidence of serum miRNA expression profiling in adult-type diffuse glioma using a classification based on the WHO 2021 guidelines. We expect that the discovered dual miR-362-3p and miR-6721-5p signatures have the potential to be utilised for grading gliomas in clinical applications.
Collapse
Affiliation(s)
- Magdalena Niemira
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Agnieszka Bielska
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Karolina Chwialkowska
- Centre for Bioinformatics and Data Analysis, Medical University of Bialystok, Bialystok, Poland
| | - Justyna Raczkowska
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Anna Skwarska
- Albert Einstein College of Medicine, Cancer Center, Bronx, NY, United States
| | - Anna Erol
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Anna Zeller
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Gabriela Sokolowska
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Damian Toczydlowski
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Iwona Sidorkiewicz
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Zenon Mariak
- Department of Neurosurgery, Medical University of Bialystok, Bialystok, Poland
| | - Joanna Reszec
- Department of Medical Pathology, Medical University of Bialystok, Bialystok, Poland
| | - Tomasz Lyson
- Department of Neurosurgery, Medical University of Bialystok, Bialystok, Poland
| | - Marcin Moniuszko
- Centre of Regenerative Medicine, Medical University of Bialystok, Bialystok, Poland
| | - Adam Kretowski
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| |
Collapse
|
22
|
Chen J, Ikeda SI, Yang Y, Zhang Y, Ma Z, Liang Y, Negishi K, Tsubota K, Kurihara T. Scleral remodeling during myopia development in mice eyes: a potential role of thrombospondin-1. Mol Med 2024; 30:25. [PMID: 38355399 PMCID: PMC10865574 DOI: 10.1186/s10020-024-00795-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/30/2024] [Indexed: 02/16/2024] Open
Abstract
BACKGROUND Scleral extracellular matrix (ECM) remodeling plays a crucial role in the development of myopia, particularly in ocular axial elongation. Thrombospondin-1 (THBS1), also known as TSP-1, is a significant cellular protein involved in matrix remodeling in various tissues. However, the specific role of THBS1 in myopia development remains unclear. METHOD We employed the HumanNet database to predict genes related to myopic sclera remodeling, followed by screening and visualization of the predicted genes using bioinformatics tools. To investigate the potential target gene Thbs1, we utilized lens-induced myopia models in male C57BL/6J mice and performed Western blot analysis to detect the expression level of scleral THBS1 during myopia development. Additionally, we evaluated the effects of scleral THBS1 knockdown on myopia development through AAV sub-Tenon's injection. The refractive status and axial length were measured using a refractometer and SD-OCT system. RESULTS During lens-induced myopia, THBS1 protein expression in the sclera was downregulated, particularly in the early stages of myopia induction. Moreover, the mice in the THBS1 knockdown group exhibited alterations in myopia development in both refraction and axial length changed compared to the control group. Western blotting analysis confirmed the effectiveness of AAV-mediated knockdown, demonstrating a decrease in COLA1 expression and an increase in MMP9 levels in the sclera. CONCLUSION Our findings indicate that sclera THBS1 levels decreased during myopia development and subsequent THBS1 knockdown showed a decrease in scleral COLA1 expression. Taken together, these results suggest that THBS1 plays a role in maintaining the homeostasis of scleral extracellular matrix, and the reduction of THBS1 may promote the remodeling process and then affect ocular axial elongation during myopia progression.
Collapse
Affiliation(s)
- Junhan Chen
- Laboratory of Photobiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
- Department of Ophthalmology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Shin-Ichi Ikeda
- Laboratory of Photobiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
- Department of Ophthalmology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Yajing Yang
- Laboratory of Photobiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
- Department of Ophthalmology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Yan Zhang
- Laboratory of Photobiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
- Department of Ophthalmology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Ziyan Ma
- Laboratory of Photobiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
- Department of Ophthalmology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Yifan Liang
- Laboratory of Photobiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
- Department of Ophthalmology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Kazuno Negishi
- Department of Ophthalmology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Kazuo Tsubota
- Department of Ophthalmology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan.
- Tsubota Laboratory, Inc, 34 Shinanomachi, Shinjuku-ku, Tokyo, 160-0016, Japan.
| | - Toshihide Kurihara
- Laboratory of Photobiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan.
- Department of Ophthalmology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan.
| |
Collapse
|
23
|
Buzzao D, Castresana-Aguirre M, Guala D, Sonnhammer ELL. Benchmarking enrichment analysis methods with the disease pathway network. Brief Bioinform 2024; 25:bbae069. [PMID: 38436561 PMCID: PMC10939300 DOI: 10.1093/bib/bbae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/10/2024] [Accepted: 02/03/2024] [Indexed: 03/05/2024] Open
Abstract
Enrichment analysis (EA) is a common approach to gain functional insights from genome-scale experiments. As a consequence, a large number of EA methods have been developed, yet it is unclear from previous studies which method is the best for a given dataset. The main issues with previous benchmarks include the complexity of correctly assigning true pathways to a test dataset, and lack of generality of the evaluation metrics, for which the rank of a single target pathway is commonly used. We here provide a generalized EA benchmark and apply it to the most widely used EA methods, representing all four categories of current approaches. The benchmark employs a new set of 82 curated gene expression datasets from DNA microarray and RNA-Seq experiments for 26 diseases, of which only 13 are cancers. In order to address the shortcomings of the single target pathway approach and to enhance the sensitivity evaluation, we present the Disease Pathway Network, in which related Kyoto Encyclopedia of Genes and Genomes pathways are linked. We introduce a novel approach to evaluate pathway EA by combining sensitivity and specificity to provide a balanced evaluation of EA methods. This approach identifies Network Enrichment Analysis methods as the overall top performers compared with overlap-based methods. By using randomized gene expression datasets, we explore the null hypothesis bias of each method, revealing that most of them produce skewed P-values.
Collapse
Affiliation(s)
- Davide Buzzao
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | | | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| |
Collapse
|
24
|
Visonà G, Bouzigon E, Demenais F, Schweikert G. Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery. Brief Bioinform 2024; 25:bbae014. [PMID: 38340090 PMCID: PMC10858647 DOI: 10.1093/bib/bbae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/28/2023] [Accepted: 01/08/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.
Collapse
Affiliation(s)
- Giovanni Visonà
- Empirical Inference, Max-Planck Institute for Intelligent Systems, Tübingen 72076, Germany
| | | | | | | |
Collapse
|
25
|
Zheng R, Xu Z, Zeng Y, Wang E, Li M. SPIDE: A single cell potency inference method based on the local cell-specific network entropy. Methods 2023; 220:90-97. [PMID: 37952704 DOI: 10.1016/j.ymeth.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 11/14/2023] Open
Abstract
For a given single cell RNA-seq data, it is critical to pinpoint key cellular stages and quantify cells' differentiation potency along a differentiation pathway in a time course manner. Currently, several methods based on the entropy of gene functions or PPI network have been proposed to solve the problem. Nevertheless, these methods still suffer from the inaccurate interactions and noises originating from scRNA-seq profile. In this study, we proposed a cell potency inference method based on cell-specific network entropy, called SPIDE. SPIDE introduces the local weighted cell-specific network for each cell to maintain cell heterogeneity and calculates the entropy by incorporating gene expression with network structure. In this study, we compared three cell entropy estimation models on eight scRNA-Seq datasets. The results show that SPIDE obtains consistent conclusions with real cell differentiation potency on most datasets. Moreover, SPIDE accurately recovers the continuous changes of potency during cell differentiation and significantly correlates with the stemness of tumor cells in Colorectal cancer. To conclude, our study provides a universal and accurate framework for cell entropy estimation, which deepens our understanding of cell differentiation, the development of diseases and other related biological research.
Collapse
Affiliation(s)
- Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ziwei Xu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yanping Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Edwin Wang
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary T2N 4N1, Alberta, Canada
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
26
|
Li R, Wu J, Li G, Liu J, Xuan J, Zhu Q. Mdwgan-gp: data augmentation for gene expression data based on multiple discriminator WGAN-GP. BMC Bioinformatics 2023; 24:427. [PMID: 37957576 PMCID: PMC10644641 DOI: 10.1186/s12859-023-05558-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 11/06/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Although gene expression data play significant roles in biological and medical studies, their applications are hampered due to the difficulty and high expenses of gathering them through biological experiments. It is an urgent problem to generate high quality gene expression data with computational methods. WGAN-GP, a generative adversarial network-based method, has been successfully applied in augmenting gene expression data. However, mode collapse or over-fitting may take place for small training samples due to just one discriminator is adopted in the method. RESULTS In this study, an improved data augmentation approach MDWGAN-GP, a generative adversarial network model with multiple discriminators, is proposed. In addition, a novel method is devised for enriching training samples based on linear graph convolutional network. Extensive experiments were implemented on real biological data. CONCLUSIONS The experimental results have demonstrated that compared with other state-of-the-art methods, the MDWGAN-GP method can produce higher quality generated gene expression data in most cases.
Collapse
Affiliation(s)
- Rongyuan Li
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China.
| | - Gaoshi Li
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China
| | - Junbo Xuan
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China
| | - Qi Zhu
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| |
Collapse
|
27
|
Niemira M, Erol A, Bielska A, Zeller A, Skwarska A, Chwialkowska K, Kuzmicki M, Szamatowicz J, Reszec J, Knapp P, Moniuszko M, Kretowski A. Identification of serum miR-1246 and miR-150-5p as novel diagnostic biomarkers for high-grade serous ovarian cancer. Sci Rep 2023; 13:19287. [PMID: 37935712 PMCID: PMC10630404 DOI: 10.1038/s41598-023-45317-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 10/18/2023] [Indexed: 11/09/2023] Open
Abstract
Epithelial ovarian cancer (EOC) is one of the leading cancers in women, with high-grade serous ovarian cancer (HGSOC) being the most common and lethal subtype of this disease. A vast majority of HGSOC are diagnosed at the late stage of the disease when the treatment and total recovery chances are low. Thus, there is an urgent need for novel, more sensitive and specific methods for early and routine HGSOC clinical diagnosis. In this study, we performed miRNA expression profiling using the NanoString miRNA assay in 34 serum samples from patients with HGSOC and 36 healthy women. We identified 13 miRNAs that were differentially expressed (DE). For additional exploration of expression patterns correlated with HGSOC, we performed weighted gene co-expression network analysis (WGCNA). As a result, we showed that the module most correlated with tumour size, nodule and metastasis contained 8 DE miRNAs. The panel including miR-1246 and miR-150-5p was identified as a signature that could discriminate HGSOC patients with AUCs of 0.98 and 1 for the training and test sets, respectively. Furthermore, the above two-miRNA panel had an AUC = 0.946 in the verification cohorts of RT-qPCR data and an AUC = 0.895 using external data from the GEO public database. Thus, the model we developed has the potential to markedly improve the diagnosis of ovarian cancer.
Collapse
Affiliation(s)
- Magdalena Niemira
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland.
| | - Anna Erol
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Agnieszka Bielska
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Anna Zeller
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Anna Skwarska
- Cancer Center, Department of Oncology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Karolina Chwialkowska
- Centre for Bioinformatics and Data Analysis, Medical University of Bialystok, Bialystok, Poland
| | - Mariusz Kuzmicki
- Department of Gynecology and Gynecological Oncology, Medical University of Bialystok, Bialystok, Poland
| | - Jacek Szamatowicz
- Department of Gynecology and Gynecological Oncology, Medical University of Bialystok, Bialystok, Poland
| | - Joanna Reszec
- Department of Medical Pathomorphology, Medical University of Bialystok, Bialystok, Poland
| | - Pawel Knapp
- University Oncology Centre, University Clinical Hospital in Bialystok, Bialystok, Poland
| | - Marcin Moniuszko
- Department of Regenerative Medicine and Immune Regulation, Medical University of Bialystok, Bialystok, Poland
| | - Adam Kretowski
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| |
Collapse
|
28
|
Bondi D, Bevere M, Piccirillo R, Sorci G, Di Felice V, Re Cecconi AD, D'Amico D, Pietrangelo T, Fulle S. Integrated procedures for accelerating, deepening, and leading genetic inquiry: A first application on human muscle secretome. Mol Genet Metab 2023; 140:107705. [PMID: 37837864 DOI: 10.1016/j.ymgme.2023.107705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 06/15/2023] [Accepted: 10/01/2023] [Indexed: 10/16/2023]
Abstract
PURPOSE Beyond classical procedures, bioinformatic-assisted approaches and computational biology offer unprecedented opportunities for scholars. However, these amazing possibilities still need epistemological criticism, as well as standardized procedures. Especially those topics with a huge body of data may benefit from data science (DS)-assisted methods. Therefore, the current study dealt with the combined expert-assisted and DS-assisted approaches to address the broad field of muscle secretome. We aimed to apply DS tools to fix the literature research, suggest investigation targets with a data-driven approach, predict possible scenarios, and define a workflow. METHODS Recognized scholars with expertise on myokines were invited to provide a list of the most important myokines. GeneRecommender, GeneMANIA, HumanNet, and STRING were selected as DS tools. Networks were built on STRING and GeneMANIA. The outcomes of DS tools included the top 5 recommendations. Each expert-led discussion has been then integrated with an DS-led approach to provide further perspectives. RESULTS Among the results, 11 molecules had already been described as bona-fide myokines in literature, and 11 molecules were putative myokines. Most of the myokines and the putative myokines recommended by the DS tools were described as present in the cargo of extracellular vesicles. CONCLUSIONS Including both supervised and unsupervised learning methods, as well as encompassing algorithms focused on both protein interaction and gene represent a comprehensive approach to tackle complex biomedical topics. DS-assisted methods for reviewing existent evidence, recommending targets of interest, and predicting original scenarios are worth exploring as in silico recommendations to be integrated with experts' ideas for optimizing molecular studies.
Collapse
Affiliation(s)
- Danilo Bondi
- Department of Neuroscience, Imaging and Clinical Sciences, University "G. d'Annunzio" Chieti - Pescara, Chieti, Italy; Interuniversity Institute of Myology (IIM), Perugia, Italy.
| | - Michele Bevere
- Department of Neuroscience, Imaging and Clinical Sciences, University "G. d'Annunzio" Chieti - Pescara, Chieti, Italy.
| | - Rosanna Piccirillo
- Department of Neurosciences, Mario Negri Institute for Pharmacological Research IRCCS, Milan, Italy.
| | - Guglielmo Sorci
- Department of Medicine and Surgery, University of Perugia, Perugia, Italy; Interuniversity Institute of Myology (IIM), Perugia, Italy.
| | - Valentina Di Felice
- Department of Biomedicine, Neuroscience and Advanced Diagnostics, University of Palermo, Palermo, Italy.
| | - Andrea David Re Cecconi
- Department of Neurosciences, Mario Negri Institute for Pharmacological Research IRCCS, Milan, Italy.
| | - Daniela D'Amico
- Department of Biomedicine, Neuroscience and Advanced Diagnostics, University of Palermo, Palermo, Italy.
| | - Tiziana Pietrangelo
- Department of Neuroscience, Imaging and Clinical Sciences, University "G. d'Annunzio" Chieti - Pescara, Chieti, Italy; Interuniversity Institute of Myology (IIM), Perugia, Italy.
| | - Stefania Fulle
- Department of Neuroscience, Imaging and Clinical Sciences, University "G. d'Annunzio" Chieti - Pescara, Chieti, Italy; Interuniversity Institute of Myology (IIM), Perugia, Italy.
| |
Collapse
|
29
|
Bi X, Liang W, Zhao Q, Wang J. SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data. Bioinformatics 2023; 39:btad662. [PMID: 37941450 PMCID: PMC10666204 DOI: 10.1093/bioinformatics/btad662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/17/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023] Open
Abstract
MOTIVATION Medical genomics faces significant challenges in interpreting disease phenotype and genetic heterogeneity. Despite the establishment of standardized disease phenotype databases, computational methods for predicting gene-phenotype associations still suffer from imbalanced category distribution and a lack of labeled data in small categories. RESULTS To address the problem of labeled-data scarcity, we propose a self-supervised learning strategy for gene-phenotype association prediction, called SSLpheno. Our approach utilizes an attributed network that integrates protein-protein interactions and gene ontology data. We apply a Laplacian-based filter to ensure feature smoothness and use self-supervised training to optimize node feature representation. Specifically, we calculate the cosine similarity of feature vectors and select positive and negative sample nodes for reconstruction training labels. We employ a deep neural network for multi-label classification of phenotypes in the downstream task. Our experimental results demonstrate that SSLpheno outperforms state-of-the-art methods, especially in categories with fewer annotations. Moreover, our case studies illustrate the potential of SSLpheno as an effective prescreening tool for gene-phenotype association identification. AVAILABILITY AND IMPLEMENTATION https://github.com/bixuehua/SSLpheno.
Collapse
Affiliation(s)
- Xuehua Bi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi 830017, China
| | - Weiyang Liang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
30
|
Amgalan B, Day CP, Przytycka TM. Exploring tumor-normal cross-talk with TranNet: Role of the environment in tumor progression. PLoS Comput Biol 2023; 19:e1011472. [PMID: 37721939 PMCID: PMC10538798 DOI: 10.1371/journal.pcbi.1011472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/28/2023] [Accepted: 08/23/2023] [Indexed: 09/20/2023] Open
Abstract
There is a growing awareness that tumor-adjacent normal tissues used as control samples in cancer studies do not represent fully healthy tissues. Instead, they are intermediates between healthy tissues and tumors. The factors that contribute to the deviation of such control samples from healthy state include exposure to the tumor-promoting factors, tumor-related immune response, and other aspects of tumor microenvironment. Characterizing the relation between gene expression of tumor-adjacent control samples and tumors is fundamental for understanding roles of microenvironment in tumor initiation and progression, as well as for identification of diagnostic and prognostic biomarkers for cancers. To address the demand, we developed and validated TranNet, a computational approach that utilizes gene expression in matched control and tumor samples to study the relation between their gene expression profiles. TranNet infers a sparse weighted bipartite graph from gene expression profiles of matched control samples to tumors. The results allow us to identify predictors (potential regulators) of this transition. To our knowledge, TranNet is the first computational method to infer such dependencies. We applied TranNet to the data of several cancer types and their matched control samples from The Cancer Genome Atlas (TCGA). Many predictors identified by TranNet are genes associated with regulation by the tumor microenvironment as they are enriched in G-protein coupled receptor signaling, cell-to-cell communication, immune processes, and cell adhesion. Correspondingly, targets of inferred predictors are enriched in pathways related to tissue remodelling (including the epithelial-mesenchymal Transition (EMT)), immune response, and cell proliferation. This implies that the predictors are markers and potential stromal facilitators of tumor progression. Our results provide new insights into the relationships between tumor adjacent control sample, tumor and the tumor environment. Moreover, the set of predictors identified by TranNet will provide a valuable resource for future investigations.
Collapse
Affiliation(s)
- Bayarbaatar Amgalan
- National Center for Biotechnology Information/National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics/Center for Cancer Research/National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Teresa M. Przytycka
- National Center for Biotechnology Information/National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
31
|
He Y, Xue Y, Wang J, Huang Y, Liu L, Huang Y, Gao YQ. Diffusion-enhanced characterization of 3D chromatin structure reveals its linkage to gene regulatory networks and the interactome. Genome Res 2023; 33:1354-1368. [PMID: 37491077 PMCID: PMC10547250 DOI: 10.1101/gr.277737.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 07/21/2023] [Indexed: 07/27/2023]
Abstract
The interactome networks at the DNA, RNA, and protein levels are crucial for cellular functions, and the diverse variations of these networks are heavily involved in the establishment of different cell states. We have developed a diffusion-based method, Hi-C to geometry (CTG), to obtain reliable geometric information on the chromatin from Hi-C data. CTG produces a consistent and reproducible framework for the 3D genomic structure and provides a reliable and quantitative understanding of the alterations of genomic structures under different cellular conditions. The genomic structure yielded by CTG serves as an architectural blueprint of the dynamic gene regulatory network, based on which cell-specific correspondence between gene-gene and corresponding protein-protein physical interactions, as well as transcription correlation, is revealed. We also find that gene fusion events are significantly enriched between genes of short CTG distances and are thus close in 3D space. These findings indicate that 3D chromatin structure is at least partially correlated with downstream processes such as transcription, gene regulation, and even regulatory networking through affecting protein-protein interactions.
Collapse
Affiliation(s)
- Yueying He
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yue Xue
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Jingyao Wang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yupeng Huang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Lu Liu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
| | - Yanyi Huang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China;
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China
| | - Yi Qin Gao
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China;
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China
| |
Collapse
|
32
|
Evangelista JE, Xie Z, Marino GB, Nguyen N, Clarke DB, Ma’ayan A. Enrichr-KG: bridging enrichment analysis across multiple libraries. Nucleic Acids Res 2023; 51:W168-W179. [PMID: 37166973 PMCID: PMC10320098 DOI: 10.1093/nar/gkad393] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/23/2023] [Accepted: 05/02/2023] [Indexed: 05/12/2023] Open
Abstract
Gene and protein set enrichment analysis is a critical step in the analysis of data collected from omics experiments. Enrichr is a popular gene set enrichment analysis web-server search engine that contains hundreds of thousands of annotated gene sets. While Enrichr has been useful in providing enrichment analysis with many gene set libraries from different categories, integrating enrichment results across libraries and domains of knowledge can further hypothesis generation. To this end, Enrichr-KG is a knowledge graph database and a web-server application that combines selected gene set libraries from Enrichr for integrative enrichment analysis and visualization. The enrichment results are presented as subgraphs made of nodes and links that connect genes to their enriched terms. In addition, users of Enrichr-KG can add gene-gene links, as well as predicted genes to the subgraphs. This graphical representation of cross-library results with enriched and predicted genes can illuminate hidden associations between genes and annotated enriched terms from across datasets and resources. Enrichr-KG currently serves 26 gene set libraries from different categories that include transcription, pathways, ontologies, diseases/drugs, and cell types. To demonstrate the utility of Enrichr-KG we provide several case studies. Enrichr-KG is freely available at: https://maayanlab.cloud/enrichr-kg.
Collapse
Affiliation(s)
- John Erol Evangelista
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Giacomo B Marino
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Nhi Nguyen
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Daniel J B Clarke
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Avi Ma’ayan
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| |
Collapse
|
33
|
García-Cárdenas JM, Armendáriz-Castillo I, García-Cárdenas N, Pesantez-Coronel D, López-Cortés A, Indacochea A, Guerrero S. Data mining identifies novel RNA-binding proteins involved in colon and rectal carcinomas. Front Cell Dev Biol 2023; 11:1088057. [PMID: 37384253 PMCID: PMC10293682 DOI: 10.3389/fcell.2023.1088057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 02/13/2023] [Indexed: 06/30/2023] Open
Abstract
Colorectal adenocarcinoma (COREAD) is the second most deadly cancer and third most frequently encountered malignancy worldwide. Despite efforts in molecular subtyping and subsequent personalized COREAD treatments, multidisciplinary evidence suggests separating COREAD into colon cancer (COAD) and rectal cancer (READ). This new perspective could improve diagnosis and treatment of both carcinomas. RNA-binding proteins (RBPs), as critical regulators of every hallmark of cancer, could fulfill the need to identify sensitive biomarkers for COAD and READ separately. To detect new RBPs involved in COAD and READ progression, here we used a multidata integration strategy to prioritize tumorigenic RBPs. We analyzed and integrated 1) RBPs genomic and transcriptomic alterations from 488 COAD and 155 READ patients, 2) ∼ 10,000 raw associations between RBPs and cancer genes, 3) ∼ 15,000 immunostainings, and 4) loss-of-function screens performed in 102 COREAD cell lines. Thus, we unraveled new putative roles of NOP56, RBM12, NAT10, FKBP1A, EMG1, and CSE1L in COAD and READ progression. Interestingly, FKBP1A and EMG1 have never been related with any of these carcinomas but presented tumorigenic features in other cancer types. Subsequent survival analyses highlighted the clinical relevance of FKBP1A, NOP56, and NAT10 mRNA expression to predict poor prognosis in COREAD and COAD patients. Further research should be performed to validate their clinical potential and to elucidate their molecular mechanisms underlying these malignancies.
Collapse
Affiliation(s)
- Jennyfer M. García-Cárdenas
- Laboratorio de Ciencia de Datos Biomédicos, Escuela de Medicina, Facultad de Ciencias Médicas de la Salud y de la Vida, Universidad Internacional del Ecuador, Quito, Ecuador
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
| | - Isaac Armendáriz-Castillo
- Laboratorio de Ciencia de Datos Biomédicos, Escuela de Medicina, Facultad de Ciencias Médicas de la Salud y de la Vida, Universidad Internacional del Ecuador, Quito, Ecuador
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
- Facultad de Ingenierías y Ciencias Aplicadas, Universidad Internacional SEK, Quito, Ecuador
| | | | - David Pesantez-Coronel
- Medical Oncology Department Hospital Clinic and Translational Genomics and Targeted Therapies in Solid Tumors, IDIBAPS, Barcelona, Spain
| | - Andrés López-Cortés
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| | - Alberto Indacochea
- Medical Oncology Department Hospital Clinic and Translational Genomics and Targeted Therapies in Solid Tumors, IDIBAPS, Barcelona, Spain
| | - Santiago Guerrero
- Laboratorio de Ciencia de Datos Biomédicos, Escuela de Medicina, Facultad de Ciencias Médicas de la Salud y de la Vida, Universidad Internacional del Ecuador, Quito, Ecuador
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
| |
Collapse
|
34
|
Pan X, Coban Akdemir ZH, Gao R, Jiang X, Sheynkman GM, Wu E, Huang JH, Sahni N, Yi SS. AD-Syn-Net: systematic identification of Alzheimer's disease-associated mutation and co-mutation vulnerabilities via deep learning. Brief Bioinform 2023; 24:bbad030. [PMID: 36752347 PMCID: PMC10025433 DOI: 10.1093/bib/bbad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 12/19/2022] [Accepted: 01/13/2023] [Indexed: 02/09/2023] Open
Abstract
Alzheimer's disease (AD) is one of the most challenging neurodegenerative diseases because of its complicated and progressive mechanisms, and multiple risk factors. Increasing research evidence demonstrates that genetics may be a key factor responsible for the occurrence of the disease. Although previous reports identified quite a few AD-associated genes, they were mostly limited owing to patient sample size and selection bias. There is a lack of comprehensive research aimed to identify AD-associated risk mutations systematically. To address this challenge, we hereby construct a large-scale AD mutation and co-mutation framework ('AD-Syn-Net'), and propose deep learning models named Deep-SMCI and Deep-CMCI configured with fully connected layers that are capable of predicting cognitive impairment of subjects effectively based on genetic mutation and co-mutation profiles. Next, we apply the customized frameworks to data sets to evaluate the importance scores of the mutations and identified mutation effectors and co-mutation combination vulnerabilities contributing to cognitive impairment. Furthermore, we evaluate the influence of mutation pairs on the network architecture to dissect the genetic organization of AD and identify novel co-mutations that could be responsible for dementia, laying a solid foundation for proposing future targeted therapy for AD precision medicine. Our deep learning model codes are available open access here: https://github.com/Pan-Bio/AD-mutation-effectors.
Collapse
Affiliation(s)
- Xingxin Pan
- Livestrong Cancer Institutes, and Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
| | - Zeynep H Coban Akdemir
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Ruixuan Gao
- Departments of Chemistry and Biological Sciences, University of Illinois Chicago, Chicago, IL 60607, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, USA
| | - Gloria M Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Center for Public Health Genomics, and UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| | - Erxi Wu
- Livestrong Cancer Institutes, and Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
- Neuroscience Institute and Department of Neurosurgery, Baylor Scott & White Health, Temple, TX 76502, USA
- Department of Surgery, Texas A & M University Health Science Center, College of Medicine, Temple, TX 76508, USA
- Department of Pharmaceutical Sciences, Texas A & M University Health Science Center, College of Pharmacy, College Station, TX 77843, USA
| | - Jason H Huang
- Neuroscience Institute and Department of Neurosurgery, Baylor Scott & White Health, Temple, TX 76502, USA
- Department of Surgery, Texas A & M University Health Science Center, College of Medicine, Temple, TX 76508, USA
| | - Nidhi Sahni
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77054, USA
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Quantitative and Computational Biosciences Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - S Stephen Yi
- Livestrong Cancer Institutes, and Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
- Oden Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin, Austin, TX 78712, USA
- Interdisciplinary Life Sciences Graduate Programs (ILSGP), College of Natural Sciences, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Biomedical Engineering, Cockrell School of Engineering, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
35
|
Weaver DT, Scott JG. Crosstalkr: An open-source R package to facilitate drug target identification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.07.531526. [PMID: 36945602 PMCID: PMC10028947 DOI: 10.1101/2023.03.07.531526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
In the last few decades, interest in graph-based analysis of biological networks has grown substantially. Protein-protein interaction networks are one of the most common biological networks, and represent the molecular relationships between every known protein and every other known protein. Integration of these interactomic data into bioinformatic pipelines may increase the translational potential of discoveries made through analysis of multi-omic datasets. Crosstalkr provides a unified toolkit for drug target and disease subnetwork identification, two of the most common uses of protein protein interaction networks. First, crosstalkr enables users to download and leverage high-quality protein-protein interaction networks from online repositories. Users can then filter these large networks into manageable subnetworks using a variety of methods. For example, network filtration can be done using random walks with restarts, starting at the user-provided seed proteins. Affinity scores from a given random walk with restarts are compared to a bootstrapped null distribution to assess statistical significance. Random walks are implemented using sparse matrix multiplication to facilitate fast execution. Next, users can perform in-silico repression experiments to assess the relative importance of nodes in their network. At this step, users can supply protein or gene expression data to make node rankings more meaningful. The default behavior evaluates the human interactome. However, users can evaluate more than 1000 non-human protein-protein interaction networks as a result of integration with StringDB. It is a free, open-source R package designed to allow users to integrate functional analysis using the protein-protein interaction network into existing bioinformatic pipelines. A beta version of crosstalkr available on CRAN (https://cran.rstudio.com/web/packages/crosstalkr/index.html).
Collapse
Affiliation(s)
- Davis T. Weaver
- Case Western Reserve University School of Medicine, Cleveland, OH, 44106, USA
- Translational Hematology Oncology Research, Cleveland Clinic, Cleveland OH, 44106, USA
| | - Jacob G. Scott
- Case Western Reserve University School of Medicine, Cleveland, OH, 44106, USA
- Translational Hematology Oncology Research, Cleveland Clinic, Cleveland OH, 44106, USA
- Department of Physics, Case Western Reserve University, Cleveland, OH, 44106, USA
| |
Collapse
|
36
|
Pan J, Gao Y, Han H, Pan T, Guo J, Li S, Xu J, Li Y. Multi-omics characterization of RNA binding proteins reveals disease comorbidities and potential drugs in COVID-19. Comput Biol Med 2023; 155:106651. [PMID: 36805221 PMCID: PMC9916187 DOI: 10.1016/j.compbiomed.2023.106651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 02/02/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023]
Abstract
The COVID-19 has led to a devastating global health crisis, which emphasizes the urgent need to deepen our understanding of the molecular mechanism and identifying potential antiviral drugs. Here, we comprehensively analyzed the transcriptomic and proteomic profiles of 178 COVID-19 patients, ranging from asymptomatic to critically ill. Our analyses found that the RNA binding proteins (RBPs) were likely to be perturbed in infection. Interactome analysis revealed that RBPs interact with virus proteins and the viral interacting RBPs were likely to locate in central regions of human protein-protein interaction network. Functional enrichment analysis revealed that the viral interacting RBPs were likely to be enriched in RNA transport, apoptosis and viral genome replication-related pathways. Based on network proximity analyses of 299 human complex-disease genes and COVID-19-related RBPs in the human interactome, we revealed the significant associations between complex diseases and COVID-19. Network analysis also implicated potential antiviral drugs for treatment of COVID-19. In summary, our integrative characterization of COVID-19 patients may thus help providing evidence regarding pathophysiology and potential therapeutic strategies for COVID-19.
Collapse
Affiliation(s)
- Jiwei Pan
- NHC Key Laboratory of Tropical Disease Control, College of Biomedical Information and Engineering, Hainan Women and Children's Medical Center, Hainan Medical University, Haikou, 571199, China
| | - Yueying Gao
- NHC Key Laboratory of Tropical Disease Control, College of Biomedical Information and Engineering, Hainan Women and Children's Medical Center, Hainan Medical University, Haikou, 571199, China
| | - Huirui Han
- NHC Key Laboratory of Tropical Disease Control, College of Biomedical Information and Engineering, Hainan Women and Children's Medical Center, Hainan Medical University, Haikou, 571199, China
| | - Tao Pan
- NHC Key Laboratory of Tropical Disease Control, College of Biomedical Information and Engineering, Hainan Women and Children's Medical Center, Hainan Medical University, Haikou, 571199, China
| | - Jing Guo
- NHC Key Laboratory of Tropical Disease Control, College of Biomedical Information and Engineering, Hainan Women and Children's Medical Center, Hainan Medical University, Haikou, 571199, China
| | - Si Li
- NHC Key Laboratory of Tropical Disease Control, College of Biomedical Information and Engineering, Hainan Women and Children's Medical Center, Hainan Medical University, Haikou, 571199, China
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| | - Yongsheng Li
- NHC Key Laboratory of Tropical Disease Control, College of Biomedical Information and Engineering, Hainan Women and Children's Medical Center, Hainan Medical University, Haikou, 571199, China.
| |
Collapse
|
37
|
Khozyainova AA, Valyaeva AA, Arbatsky MS, Isaev SV, Iamshchikov PS, Volchkov EV, Sabirov MS, Zainullina VR, Chechekhin VI, Vorobev RS, Menyailo ME, Tyurin-Kuzmin PA, Denisov EV. Complex Analysis of Single-Cell RNA Sequencing Data. BIOCHEMISTRY. BIOKHIMIIA 2023; 88:231-252. [PMID: 37072324 PMCID: PMC10000364 DOI: 10.1134/s0006297923020074] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/13/2022] [Accepted: 12/13/2022] [Indexed: 03/12/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a revolutionary tool for studying the physiology of normal and pathologically altered tissues. This approach provides information about molecular features (gene expression, mutations, chromatin accessibility, etc.) of cells, opens up the possibility to analyze the trajectories/phylogeny of cell differentiation and cell-cell interactions, and helps in discovery of new cell types and previously unexplored processes. From a clinical point of view, scRNA-seq facilitates deeper and more detailed analysis of molecular mechanisms of diseases and serves as a basis for the development of new preventive, diagnostic, and therapeutic strategies. The review describes different approaches to the analysis of scRNA-seq data, discusses the advantages and disadvantages of bioinformatics tools, provides recommendations and examples of their successful use, and suggests potential directions for improvement. We also emphasize the need for creating new protocols, including multiomics ones, for the preparation of DNA/RNA libraries of single cells with the purpose of more complete understanding of individual cells.
Collapse
Affiliation(s)
- Anna A Khozyainova
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia.
| | - Anna A Valyaeva
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119991, Russia
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Mikhail S Arbatsky
- Laboratory of Artificial Intelligence and Bioinformatics, The Russian Clinical Research Center for Gerontology, Pirogov Russian National Medical University, Moscow, 129226, Russia
- School of Public Administration, Lomonosov Moscow State University, Moscow, 119991, Russia
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Sergey V Isaev
- Research Institute of Personalized Medicine, National Center for Personalized Medicine of Endocrine Diseases, National Medical Research Center for Endocrinology, Moscow, 117036, Russia
| | - Pavel S Iamshchikov
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
- Laboratory of Complex Analysis of Big Bioimage Data, National Research Tomsk State University, Tomsk, 634050, Russia
| | - Egor V Volchkov
- Department of Oncohematology, Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, 117198, Russia
| | - Marat S Sabirov
- Laboratory of Bioinformatics and Molecular Genetics, Koltzov Institute of Developmental Biology of the Russian Academy of Sciences, Moscow, 119334, Russia
| | - Viktoria R Zainullina
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| | - Vadim I Chechekhin
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Rostislav S Vorobev
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| | - Maxim E Menyailo
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| | - Pyotr A Tyurin-Kuzmin
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Evgeny V Denisov
- Laboratory of Cancer Progression Biology, Cancer Research Institute, Tomsk National Research Medical Center, Russian Academy of Sciences, Tomsk, 634050, Russia
| |
Collapse
|
38
|
Poverennaya EV, Pyatnitskiy MA, Dolgalev GV, Arzumanian VA, Kiseleva OI, Kurbatov IY, Kurbatov LK, Vakhrushev IV, Romashin DD, Kim YS, Ponomarenko EA. Exploiting Multi-Omics Profiling and Systems Biology to Investigate Functions of TOMM34. BIOLOGY 2023; 12:198. [PMID: 36829477 PMCID: PMC9952762 DOI: 10.3390/biology12020198] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/17/2023] [Accepted: 01/25/2023] [Indexed: 01/31/2023]
Abstract
Although modern biology is now in the post-genomic era with vastly increased access to high-quality data, the set of human genes with a known function remains far from complete. This is especially true for hundreds of mitochondria-associated genes, which are under-characterized and lack clear functional annotation. However, with the advent of multi-omics profiling methods coupled with systems biology algorithms, the cellular role of many such genes can be elucidated. Here, we report genes and pathways associated with TOMM34, Translocase of Outer Mitochondrial Membrane, which plays role in the mitochondrial protein import as a part of cytosolic complex together with Hsp70/Hsp90 and is upregulated in various cancers. We identified genes, proteins, and metabolites altered in TOMM34-/- HepG2 cells. To our knowledge, this is the first attempt to study the functional capacity of TOMM34 using a multi-omics strategy. We demonstrate that TOMM34 affects various processes including oxidative phosphorylation, citric acid cycle, metabolism of purine, and several amino acids. Besides the analysis of already known pathways, we utilized de novo network enrichment algorithm to extract novel perturbed subnetworks, thus obtaining evidence that TOMM34 potentially plays role in several other cellular processes, including NOTCH-, MAPK-, and STAT3-signaling. Collectively, our findings provide new insights into TOMM34's cellular functions.
Collapse
Affiliation(s)
| | - Mikhail A. Pyatnitskiy
- Institute of Biomedical Chemistry, Moscow 119121, Russia
- Faculty Of Computer Science, National Research University Higher School of Economics, Moscow 101000, Russia
| | | | | | | | | | | | | | | | - Yan S. Kim
- Institute of Biomedical Chemistry, Moscow 119121, Russia
| | | |
Collapse
|
39
|
Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva N, Pyysalo S, Bork P, Jensen L, von Mering C. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 2023; 51:D638-D646. [PMID: 36370105 PMCID: PMC9825434 DOI: 10.1093/nar/gkac1000] [Citation(s) in RCA: 1496] [Impact Index Per Article: 1496.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/10/2022] [Accepted: 10/19/2022] [Indexed: 11/13/2022] Open
Abstract
Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Farrokh Mehryary
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Annika L Gable
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
- Yonsei Frontier Lab (YFL), Yonsei University, Seoul 03722, South Korea
- Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany
- Department of Bioinformatics, Biozentrum, University of Würzburg, 97074 Würzburg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
40
|
Cha J, Lavi M, Kim J, Shomron N, Lee I. Imputation of single-cell transcriptome data enables the reconstruction of networks predictive of breast cancer metastasis. Comput Struct Biotechnol J 2023; 21:2296-2304. [PMID: 37035549 PMCID: PMC10073994 DOI: 10.1016/j.csbj.2023.03.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 03/21/2023] [Accepted: 03/21/2023] [Indexed: 03/30/2023] Open
Abstract
Single-cell transcriptome data provide a unique opportunity to explore the gene networks of a particular cell type. However, insufficient capture rate and high dimensionality of single-cell RNA sequencing (scRNA-seq) data challenge cell-type-specific gene network (CGN) reconstruction. Here, we demonstrated that the imputation of scRNA-seq data enables reconstruction of CGNs by effective retrieval of gene functional associations. We reconstructed CGNs for seven primary and nine metastatic breast cancer cell lines using scRNA-seq data with imputation. Key genes for primary or metastatic cell lines were prioritized based on network centrality measures and CGN hub genes that were presumed to be the major determinant of cell type characteristics. To identify novel genes in breast cancer metastasis, we used the average rank difference of centrality between the primary and metastatic cell lines. Genes predicted using CGN centrality analysis were more enriched for known breast cancer metastatic genes than those predicted using differential expression. The molecular chaperone CCT2 was identified as a novel gene for breast metastasis during knockdown assays of several candidate genes. Overall, our study demonstrated an effective CGN reconstruction technique with imputation of scRNA-seq data and the feasibility of identifying key genes for particular cell subsets using single-cell network analysis.
Collapse
Affiliation(s)
- Junha Cha
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Michael Lavi
- Faculty of Medicine and Edmond J Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv 69978, Israel
| | - Junhan Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Noam Shomron
- Faculty of Medicine and Edmond J Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv 69978, Israel
- Corresponding author.
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
- POSTECH Biotech Center, Pohang University of Science and Technology (POSTECH), Pohang 37673, Republic of Korea
- Corresponding author at: Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea.
| |
Collapse
|
41
|
Buzzao D, Castresana-Aguirre M, Guala D, Sonnhammer ELL. TOPAS, a network-based approach to detect disease modules in a top-down fashion. NAR Genom Bioinform 2022; 4:lqac093. [PMCID: PMC9706483 DOI: 10.1093/nargab/lqac093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 10/14/2022] [Accepted: 11/15/2022] [Indexed: 12/02/2022] Open
Abstract
A vast scenario of potential disease mechanisms and remedies is yet to be discovered. The field of Network Medicine has grown thanks to the massive amount of high-throughput data and the emerging evidence that disease-related proteins form ‘disease modules’. Relying on prior disease knowledge, network-based disease module detection algorithms aim at connecting the list of known disease associated genes by exploiting interaction networks. Most existing methods extend disease modules by iteratively adding connector genes in a bottom-up fashion, while top-down approaches remain largely unexplored. We have created TOPAS, an iterative approach that aims at connecting the largest number of seed nodes in a top-down fashion through connectors that guarantee the highest flow of a Random Walk with Restart in a network of functional associations. We used a corpus of 382 manually selected functional gene sets to benchmark our algorithm against SCA, DIAMOnD, MaxLink and ROBUST across four interactomes. We demonstrate that TOPAS outperforms competing methods in terms of Seed Recovery Rate, Seed to Connector Ratio and consistency during module detection. We also show that TOPAS achieves competitive performance in terms of biological relevance of detected modules and scalability.
Collapse
Affiliation(s)
- Davide Buzzao
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | | | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | | |
Collapse
|
42
|
Gheorghe V, Hart T. Optimal construction of a functional interaction network from pooled library CRISPR fitness screens. BMC Bioinformatics 2022; 23:510. [PMID: 36443674 PMCID: PMC9707256 DOI: 10.1186/s12859-022-05078-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 11/23/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Functional interaction networks, where edges connect genes likely to operate in the same biological process or pathway, can be inferred from CRISPR knockout screens in cancer cell lines. Genes with similar knockout fitness profiles across a sufficiently diverse set of cell line screens are likely to be co-functional, and these "coessentiality" networks are increasingly powerful predictors of gene function and biological modularity. While several such networks have been published, most use different algorithms for each step of the network construction process. RESULTS In this study, we identify an optimal measure of functional interaction and test all combinations of options at each step-essentiality scoring, sample variance and covariance normalization, and similarity measurement-to identify best practices for generating a functional interaction network from CRISPR knockout data. We show that Bayes Factor and Ceres scores give the best results, that Ceres outperforms the newer Chronos scoring scheme, and that covariance normalization is a critical step in network construction. We further show that Pearson correlation, mathematically identical to ordinary least squares after covariance normalization, can be extended by using partial correlation to detect and amplify signals from "moonlighting" proteins which show context-dependent interaction with different partners. CONCLUSIONS We describe a systematic survey of methods for generating coessentiality networks from the Cancer Dependency Map data and provide a partial correlation-based approach for exploring context-dependent interactions.
Collapse
Affiliation(s)
- Veronica Gheorghe
- grid.240145.60000 0001 2291 4776Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA ,grid.240145.60000 0001 2291 4776Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX USA
| | - Traver Hart
- grid.240145.60000 0001 2291 4776Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA ,grid.240145.60000 0001 2291 4776Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| |
Collapse
|
43
|
Singh V, Pandey S, Bhardwaj A. From the reference human genome to human pangenome: Premise, promise and challenge. Front Genet 2022; 13:1042550. [PMID: 36437921 PMCID: PMC9684177 DOI: 10.3389/fgene.2022.1042550] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 10/21/2022] [Indexed: 11/11/2022] Open
Abstract
The Reference Human Genome remains the single most important resource for mapping genetic variations and assessing their impact. However, it is monophasic, incomplete and not representative of the variation that exists in the population. Given the extent of ethno-geographic diversity and the consequent diversity in clinical manifestations of these variations, population specific references were developed overtime. The dramatically plummeting cost of sequencing whole genomes and the advent of third generation long range sequencers allowing accurate, error free, telomere-to-telomere assemblies of human genomes present us with a unique and unprecedented opportunity to develop a more composite standard reference consisting of a collection of multiple genomes that capture the maximal variation existing in the population, with the deepest annotation possible, enabling a realistic, reliable and actionable estimation of clinical significance of specific variations. The Human Pangenome Project thus is a logical next step promising a more accurate and global representation of genomic variations. The pangenome effort must be reciprocally complemented with precise variant discovery tools and exhaustive annotation to ensure unambiguous clinical assessment of the variant in ethno-geographical context. Here we discuss a broad roadmap, the challenges and way forward in developing a universal pangenome reference including data visualization techniques and integration of prior knowledge base in the new graph based architecture and tools to submit, compare, query, annotate and retrieve relevant information from the pangenomes. The biggest challenge, however, will be the ethical, legal and social implications and the training of human resource to the new reference paradigm.
Collapse
Affiliation(s)
- Vipin Singh
- University Institute of Biotechnology, Chandigarh University, Mohali, India
| | - Shweta Pandey
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Anshu Bhardwaj
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
- *Correspondence: Anshu Bhardwaj,
| |
Collapse
|
44
|
Cha J, Yu J, Cho JW, Hemberg M, Lee I. scHumanNet: a single-cell network analysis platform for the study of cell-type specificity of disease genes. Nucleic Acids Res 2022; 51:e8. [PMID: 36350625 PMCID: PMC9881140 DOI: 10.1093/nar/gkac1042] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 09/19/2022] [Accepted: 10/25/2022] [Indexed: 11/10/2022] Open
Abstract
A major challenge in single-cell biology is identifying cell-type-specific gene functions, which may substantially improve precision medicine. Differential expression analysis of genes is a popular, yet insufficient approach, and complementary methods that associate function with cell type are required. Here, we describe scHumanNet (https://github.com/netbiolab/scHumanNet), a single-cell network analysis platform for resolving cellular heterogeneity across gene functions in humans. Based on cell-type-specific gene networks (CGNs) constructed under the guidance of the HumanNet reference interactome, scHumanNet displayed higher functional relevance to the cellular context than CGNs built by other methods on single-cell transcriptome data. Cellular deconvolution of gene signatures based on network compactness across cell types revealed breast cancer prognostic markers associated with T cells. scHumanNet could also prioritize genes associated with particular cell types using CGN centrality and identified the differential hubness of CGNs between disease and healthy conditions. We demonstrated the usefulness of scHumanNet by uncovering T-cell-specific functional effects of GITR, a prognostic gene for breast cancer, and functional defects in autism spectrum disorder genes specific for inhibitory neurons. These results suggest that scHumanNet will advance our understanding of cell-type specificity across human disease genes.
Collapse
Affiliation(s)
- Junha Cha
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Jiwon Yu
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Jae-Won Cho
- Evergrande Center for Immunologic Disease, Harvard Medical School and Brigham and Women's Hospital, Boston, MA, USA
| | - Martin Hemberg
- Correspondence may also be addressed to Martin Hemberg. Tel: +1 857 307 1422;
| | - Insuk Lee
- To whom correspondence should be addressed. Tel: +82 2 2123 5559; Fax: +82 2 362 7265;
| |
Collapse
|
45
|
COVID-GWAB: A Web-Based Prediction of COVID-19 Host Genes via Network Boosting of Genome-Wide Association Data. Biomolecules 2022; 12:biom12101446. [PMID: 36291657 PMCID: PMC9599684 DOI: 10.3390/biom12101446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/01/2022] [Accepted: 10/02/2022] [Indexed: 11/17/2022] Open
Abstract
Host genetics affect both the susceptibility and response to viral infection. Searching for host genes that contribute to COVID-19, the Host Genetics Initiative (HGI) was formed to investigate the genetic factors involved in COVID-19 via genome-wide association studies (GWAS). The GWAS suffer from limited statistical power and in general, only a few genes can pass the conventional significance thresholds. This statistical limitation may be overcome by boosting weak association signals through integrating independent functional information such as molecular interactions. Additionally, the boosted results can be evaluated by various independent data for further connections to COVID-19. We present COVID-GWAB, a web-based tool to boost original GWAS signals from COVID-19 patients by taking the signals of the interactome neighbors. COVID-GWAB takes summary statistics from the COVID-19 HGI or user input data and reprioritizes candidate host genes for COVID-19 using HumanNet, a co-functional human gene network. The current version of COVID-GWAB provides the pre-processed data of releases 5, 6, and 7 of the HGI. Additionally, COVID-GWAB provides web interfaces for a summary of augmented GWAS signals, prediction evaluations by appearance frequency in COVID-19 literature, single-cell transcriptome data, and associated pathways. The web server also enables browsing the candidate gene networks.
Collapse
|
46
|
Chen Y, Hu Y, Hu X, Feng C, Chen M. CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure. Bioinformatics 2022; 38:4380-4386. [PMID: 35900147 DOI: 10.1093/bioinformatics/btac520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/16/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Quantifying the similarity of human diseases provides guiding insights to the discovery of micro-scope mechanisms from a macro scale. Previous work demonstrated that better performance can be gained by integrating multiview data sources or applying machine learning techniques. However, designing an efficient framework to extract and incorporate information from different biological data using deep learning models remains unexplored. RESULTS We present CoGO, a Contrastive learning framework to predict disease similarity based on Gene network and Ontology structure, which incorporates the gene interaction network and gene ontology (GO) domain knowledge using graph deep learning models. First, graph deep learning models are applied to encode the features of genes and GO terms from separate graph structure data. Next, gene and GO features are projected to a common embedding space via a nonlinear projection. Then cross-view contrastive loss is applied to maximize the agreement of corresponding gene-GO associations and lead to meaningful gene representation. Finally, CoGO infers the similarity between diseases by the cosine similarity of disease representation vectors derived from related gene embedding. In our experiments, CoGO outperforms the most competitive baseline method on both AUROC and AUPRC, especially improves 19.57% in AUPRC (0.7733). The prediction results are significantly comparable with other disease similarity studies and thus highly credible. Furthermore, we conduct a detailed case study of top similar disease pairs which is demonstrated by other studies. Empirical results show that CoGO achieves powerful performance in disease similarity problem. AVAILABILITY AND IMPLEMENTATION https://github.com/yhchen1123/CoGO.
Collapse
Affiliation(s)
- Yuhao Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yanshi Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Xiaotian Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Cong Feng
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.,Biomedical Big Data Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China.,Institute of Hematology, Zhejiang University, Hangzhou, 310058, China
| |
Collapse
|
47
|
Mitra R, Adams CM, Eischen CM. Systematic lncRNA mapping to genome-wide co-essential modules uncovers cancer dependency on uncharacterized lncRNAs. eLife 2022; 11:e77357. [PMID: 35695878 PMCID: PMC9191893 DOI: 10.7554/elife.77357] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 05/17/2022] [Indexed: 12/03/2022] Open
Abstract
Quantification of gene dependency across hundreds of cell lines using genome-scale CRISPR screens has revealed co-essential pathways/modules and critical functions of uncharacterized genes. In contrast to protein-coding genes, robust CRISPR-based loss-of-function screens are lacking for long noncoding RNAs (lncRNAs), which are key regulators of many cellular processes, leaving many essential lncRNAs unidentified and uninvestigated. Integrating copy number, epigenetic, and transcriptomic data of >800 cancer cell lines with CRISPR-derived co-essential pathways, our method recapitulates known essential lncRNAs and predicts proliferation/growth dependency of 289 poorly characterized lncRNAs. Analyzing lncRNA dependencies across 10 cancer types and their expression alteration by diverse growth inhibitors across cell types, we prioritize 30 high-confidence pan-cancer proliferation/growth-regulating lncRNAs. Further evaluating two previously uncharacterized top proliferation-suppressive lncRNAs (PSLR-1, PSLR-2) showed they are transcriptionally regulated by p53, induced by multiple cancer treatments, and significantly correlate to increased cancer patient survival. These lncRNAs modulate G2 cell cycle-regulating genes within the FOXM1 transcriptional network, inducing a G2 arrest and inhibiting proliferation and colony formation. Collectively, our results serve as a powerful resource for exploring lncRNA-mediated regulation of cellular fitness in cancer, circumventing current limitations in lncRNA research.
Collapse
Affiliation(s)
- Ramkrishna Mitra
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson UniversityPhiladelphiaUnited States
| | - Clare M Adams
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson UniversityPhiladelphiaUnited States
| | - Christine M Eischen
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson UniversityPhiladelphiaUnited States
| |
Collapse
|
48
|
Zou H, Pan T, Gao Y, Chen R, Li S, Guo J, Tian Z, Xu G, Xu J, Ma Y, Li Y. Pan-cancer assessment of mutational landscape in intrinsically disordered hotspots reveals potential driver genes. Nucleic Acids Res 2022; 50:e49. [PMID: 35061901 PMCID: PMC9122534 DOI: 10.1093/nar/gkac028] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/22/2021] [Accepted: 01/10/2022] [Indexed: 11/13/2022] Open
Abstract
Large-scale cancer genome sequencing has enabled the catalogs of somatic mutations; however, the mutational impact on intrinsically disordered protein regions (IDRs) has not been systematically investigated to date. Here, we comprehensively characterized the mutational landscapes of IDRs and found that IDRs have higher mutation frequencies across diverse cancers. We thus developed a computational method, ROI-Driver, to identify putative driver genes enriching IDR and domain hotspots in cancer. Numerous well-known cancer-related oncogenes or tumor suppressors that play important roles in cancer signaling regulation, development and immune response were identified at a higher resolution. In particular, the incorporation of IDR structures helps in the identification of novel potential driver genes that play central roles in human protein-protein interaction networks. Interestingly, we found that the putative driver genes with IDR hotspots were significantly enriched with predicted phase separation propensities, suggesting that IDR mutations disrupt phase separation in key cellular pathways. We also identified an appreciable number of clinically relevant genes enriching IDR mutational hotspots that exhibited differential expression patterns and are associated with cancer patient survival. In summary, combinations of mutational effects on IDRs significantly increase the sensitivity of driver detection and are likely to open new therapeutic avenues for various cancers.
Collapse
Affiliation(s)
- Haozhe Zou
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Provincial Key Laboratory for Human Reproductive Medicine and Genetic Research, International Technology Cooperation Base ‘China–Myanmar Joint Research Center for Prevention and Treatment of Regional Major Disease’ by the Ministry of Science and Technology of China, Hainan Provincial Clinical Research Center for Thalassemia, The First Affiliated Hospital of Hainan Medical University, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Tao Pan
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Provincial Key Laboratory for Human Reproductive Medicine and Genetic Research, International Technology Cooperation Base ‘China–Myanmar Joint Research Center for Prevention and Treatment of Regional Major Disease’ by the Ministry of Science and Technology of China, Hainan Provincial Clinical Research Center for Thalassemia, The First Affiliated Hospital of Hainan Medical University, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
| | - Yueying Gao
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Provincial Key Laboratory for Human Reproductive Medicine and Genetic Research, International Technology Cooperation Base ‘China–Myanmar Joint Research Center for Prevention and Treatment of Regional Major Disease’ by the Ministry of Science and Technology of China, Hainan Provincial Clinical Research Center for Thalassemia, The First Affiliated Hospital of Hainan Medical University, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
| | - Renwei Chen
- Hainan Women and Children’s Medical Center, Hainan Medical University, Haikou 571199, China
| | - Si Li
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Provincial Key Laboratory for Human Reproductive Medicine and Genetic Research, International Technology Cooperation Base ‘China–Myanmar Joint Research Center for Prevention and Treatment of Regional Major Disease’ by the Ministry of Science and Technology of China, Hainan Provincial Clinical Research Center for Thalassemia, The First Affiliated Hospital of Hainan Medical University, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
| | - Jing Guo
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Provincial Key Laboratory for Human Reproductive Medicine and Genetic Research, International Technology Cooperation Base ‘China–Myanmar Joint Research Center for Prevention and Treatment of Regional Major Disease’ by the Ministry of Science and Technology of China, Hainan Provincial Clinical Research Center for Thalassemia, The First Affiliated Hospital of Hainan Medical University, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
| | - Zhanyu Tian
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Provincial Key Laboratory for Human Reproductive Medicine and Genetic Research, International Technology Cooperation Base ‘China–Myanmar Joint Research Center for Prevention and Treatment of Regional Major Disease’ by the Ministry of Science and Technology of China, Hainan Provincial Clinical Research Center for Thalassemia, The First Affiliated Hospital of Hainan Medical University, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
| | - Gang Xu
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Provincial Key Laboratory for Human Reproductive Medicine and Genetic Research, International Technology Cooperation Base ‘China–Myanmar Joint Research Center for Prevention and Treatment of Regional Major Disease’ by the Ministry of Science and Technology of China, Hainan Provincial Clinical Research Center for Thalassemia, The First Affiliated Hospital of Hainan Medical University, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Yanlin Ma
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Provincial Key Laboratory for Human Reproductive Medicine and Genetic Research, International Technology Cooperation Base ‘China–Myanmar Joint Research Center for Prevention and Treatment of Regional Major Disease’ by the Ministry of Science and Technology of China, Hainan Provincial Clinical Research Center for Thalassemia, The First Affiliated Hospital of Hainan Medical University, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
| | - Yongsheng Li
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Provincial Key Laboratory for Human Reproductive Medicine and Genetic Research, International Technology Cooperation Base ‘China–Myanmar Joint Research Center for Prevention and Treatment of Regional Major Disease’ by the Ministry of Science and Technology of China, Hainan Provincial Clinical Research Center for Thalassemia, The First Affiliated Hospital of Hainan Medical University, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
- Hainan Women and Children’s Medical Center, Hainan Medical University, Haikou 571199, China
| |
Collapse
|
49
|
Mancuso CA, Bills PS, Krum D, Newsted J, Liu R, Krishnan A. GenePlexus: a web-server for gene discovery using network-based machine learning. Nucleic Acids Res 2022; 50:W358-W366. [PMID: 35580053 PMCID: PMC9252732 DOI: 10.1093/nar/gkac335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/13/2022] [Accepted: 04/30/2022] [Indexed: 11/28/2022] Open
Abstract
Biomedical researchers take advantage of high-throughput, high-coverage technologies to routinely generate sets of genes of interest across a wide range of biological conditions. Although these technologies have directly shed light on the molecular underpinnings of various biological processes and diseases, the list of genes from any individual experiment is often noisy and incomplete. Additionally, interpreting these lists of genes can be challenging in terms of how they are related to each other and to other genes in the genome. In this work, we present GenePlexus (https://www.geneplexus.net/), a web-server that allows a researcher to utilize a powerful, network-based machine learning method to gain insights into their gene set of interest and additional functionally similar genes. Once a user uploads their own set of human genes and chooses between a number of different human network representations, GenePlexus provides predictions of how associated every gene in the network is to the input set. The web-server also provides interpretability through network visualization and comparison to other machine learning models trained on thousands of known process/pathway and disease gene sets. GenePlexus is free and open to all users without the need for registration.
Collapse
Affiliation(s)
- Christopher A Mancuso
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Patrick S Bills
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Douglas Krum
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Jacob Newsted
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Renming Liu
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|