151
|
Dafniet B, Cerisier N, Boezio B, Clary A, Ducrot P, Dorval T, Gohier A, Brown D, Audouze K, Taboureau O. Development of a chemogenomics library for phenotypic screening. J Cheminform 2021; 13:91. [PMID: 34819133 PMCID: PMC8611952 DOI: 10.1186/s13321-021-00569-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 11/06/2021] [Indexed: 12/03/2022] Open
Abstract
With the development of advanced technologies in cell-based phenotypic screening, phenotypic drug discovery (PDD) strategies have re-emerged as promising approaches in the identification and development of novel and safe drugs. However, phenotypic screening does not rely on knowledge of specific drug targets and needs to be combined with chemical biology approaches to identify therapeutic targets and mechanisms of actions induced by drugs and associated with an observable phenotype. In this study, we developed a system pharmacology network integrating drug-target-pathway-disease relationships as well as morphological profile from an existing high content imaging-based high-throughput phenotypic profiling assay known as “Cell Painting”. Furthermore, from this network, a chemogenomic library of 5000 small molecules that represent a large and diverse panel of drug targets involved in diverse biological effects and diseases has been developed. Such a platform and a chemogenomic library could assist in the target identification and mechanism deconvolution of some phenotypic assays. The usefulness of the platform is illustrated through examples.
Collapse
Affiliation(s)
- Bryan Dafniet
- Université de Paris, INSERM U1133, CNRS UMR8251, 75006, Paris, France
| | - Natacha Cerisier
- Université de Paris, INSERM U1133, CNRS UMR8251, 75006, Paris, France
| | - Batiste Boezio
- Université de Paris, INSERM U1133, CNRS UMR8251, 75006, Paris, France
| | - Anaelle Clary
- Institut de Recherche Servier, 125 Chemin de Ronde, 78290, Croissy-sur-Seine, France
| | - Pierre Ducrot
- Institut de Recherche Servier, 125 Chemin de Ronde, 78290, Croissy-sur-Seine, France
| | - Thierry Dorval
- Institut de Recherche Servier, 125 Chemin de Ronde, 78290, Croissy-sur-Seine, France
| | - Arnaud Gohier
- Institut de Recherche Servier, 125 Chemin de Ronde, 78290, Croissy-sur-Seine, France
| | - David Brown
- Institut de Recherche Servier, 125 Chemin de Ronde, 78290, Croissy-sur-Seine, France
| | - Karine Audouze
- Université de Paris, INSERM UMR S-1124, 75006, Paris, France
| | - Olivier Taboureau
- Université de Paris, INSERM U1133, CNRS UMR8251, 75006, Paris, France.
| |
Collapse
|
152
|
Lin R, Zhong X, Zhou Y, Geng H, Hu Q, Huang Z, Hu J, Fu XD, Chen L, Chen JY. R-loopBase: a knowledgebase for genome-wide R-loop formation and regulation. Nucleic Acids Res 2021; 50:D303-D315. [PMID: 34792163 PMCID: PMC8728142 DOI: 10.1093/nar/gkab1103] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 09/28/2021] [Accepted: 10/21/2021] [Indexed: 11/13/2022] Open
Abstract
R-loops play versatile roles in many physiological and pathological processes, and are of great interest to scientists in multiple fields. However, controversy about their genomic localization and incomplete understanding of their regulatory network raise great challenges for R-loop research. Here, we present R-loopBase (https://rloopbase.nju.edu.cn) to tackle these pressing issues by systematic integration of genomics and literature data. First, based on 107 high-quality genome-wide R-loop mapping datasets generated by 11 different technologies, we present a reference set of human R-loop zones for high-confidence R-loop localization, and spot conservative genomic features associated with R-loop formation. Second, through literature mining and multi-omics analyses, we curate the most comprehensive list of R-loop regulatory proteins and their targeted R-loops in multiple species to date. These efforts help reveal a global regulatory network of R-loop dynamics and its potential links to the development of cancers and neurological diseases. Finally, we integrate billions of functional genomic annotations, and develop interactive interfaces to search, visualize, download and analyze R-loops and R-loop regulators in a well-annotated genomic context. R-loopBase allows all users, including those with little bioinformatics background to utilize these data for their own research. We anticipate R-loopBase will become a one-stop resource for the R-loop community.
Collapse
Affiliation(s)
- Ruoyao Lin
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing 210023, China
| | - Xiaoming Zhong
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL 60637, USA
| | - Yongli Zhou
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing 210023, China
| | - Huichao Geng
- Hubei Key Laboratory of Cell Homeostasis, RNA Institute, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Qingxi Hu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing 210023, China
| | - Zhihao Huang
- Hubei Key Laboratory of Cell Homeostasis, RNA Institute, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Jun Hu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing 210023, China
| | - Xiang-Dong Fu
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Liang Chen
- Hubei Key Laboratory of Cell Homeostasis, RNA Institute, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing 210023, China
| |
Collapse
|
153
|
Wu J, Zhao M, Li T, Sun J, Chen Q, Yin C, Jia Z, Zhao C, Lin G, Ni Y, Xie G, Shi J, He K. HFIP: an integrated multi-omics data and knowledge platform for the precision medicine of heart failure. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6427587. [PMID: 34791105 PMCID: PMC8607296 DOI: 10.1093/database/baab076] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 10/14/2021] [Accepted: 11/09/2021] [Indexed: 12/11/2022]
Abstract
As the terminal clinical phenotype of almost all types of cardiovascular diseases, heart
failure (HF) is a complex and heterogeneous syndrome leading to considerable morbidity and
mortality. Existing HF-related omics studies mainly focus on case/control comparisons,
small cohorts of special subtypes, etc., and a large amount of multi-omics data and
knowledge have been generated. However, it is difficult for researchers to obtain
biological and clinical insights from these scattered data and knowledge. In this paper,
we built the Heart Failure Integrated Platform (HFIP) for data exploration, fusion
analysis and visualization by collecting and curating existing multi-omics data and
knowledge from various public sources and also provided an auto-updating mechanism for
future integration. The developed HFIP contained 253 datasets (7842 samples), multiple
analysis flow, and 14 independent tools. In addition, based on the integration of existing
databases and literature, a knowledge base for HF was constructed with a scoring system
for evaluating the relationship between molecular signals and HF. The knowledge base
includes 1956 genes and annotation information. The literature mining module was developed
to assist the researcher to overview the hotspots and contexts in basic and clinical
research. HFIP can be used as a data-driven and knowledge-guided platform for the basic
and clinical research of HF. Database URL: http://heartfailure.medical-bigdata.com
Collapse
Affiliation(s)
- Jing Wu
- Research Center of Medical Big Data, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| | - Min Zhao
- Research Center of Medical Big Data, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| | - Tao Li
- Research Center of Medical Big Data, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| | - Jinxiu Sun
- Research Center of Medical Big Data, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| | - Qi Chen
- Research Center of Medical Big Data, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| | - Chengliang Yin
- Research Center of Medical Big Data, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| | - Zhilong Jia
- Research Center of Artificial Intelligence, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| | - Chenghui Zhao
- Research Center of Biomedical Engineering, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| | - Gui Lin
- Ping An Healthcare Technology, 316-1 Laoshan Road, Beijing 200120, China
| | - Yuan Ni
- Ping An Healthcare Technology, 316-1 Laoshan Road, Beijing 200120, China
| | - Guotong Xie
- Ping An Healthcare Technology, 316-1 Laoshan Road, Beijing 200120, China.,Ping An Healthcare and Technology Co, Ltd, 316-1 Laoshan Road, Shanghai 200120, China.,Ping An International Smart City Technology Co, Ltd, 5033 Yitian Road, Shenzhen 518046, China
| | - Jinlong Shi
- Research Center of Medical Big Data, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| | - Kunlun He
- Research Center of Medical Big Data, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100853, China
| |
Collapse
|
154
|
Urban M, Cuzick A, Seager J, Wood V, Rutherford K, Venkatesh SY, Sahu J, Iyer SV, Khamari L, De Silva N, Martinez MC, Pedro H, Yates AD, Hammond-Kosack KE. PHI-base in 2022: a multi-species phenotype database for Pathogen-Host Interactions. Nucleic Acids Res 2021; 50:D837-D847. [PMID: 34788826 PMCID: PMC8728202 DOI: 10.1093/nar/gkab1037] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/13/2021] [Accepted: 11/08/2021] [Indexed: 12/28/2022] Open
Abstract
Since 2005, the Pathogen–Host Interactions Database (PHI-base) has manually curated experimentally verified pathogenicity, virulence and effector genes from fungal, bacterial and protist pathogens, which infect animal, plant, fish, insect and/or fungal hosts. PHI-base (www.phi-base.org) is devoted to the identification and presentation of phenotype information on pathogenicity and effector genes and their host interactions. Specific gene alterations that did not alter the in host interaction phenotype are also presented. PHI-base is invaluable for comparative analyses and for the discovery of candidate targets in medically and agronomically important species for intervention. Version 4.12 (September 2021) contains 4387 references, and provides information on 8411 genes from 279 pathogens, tested on 228 hosts in 18, 190 interactions. This provides a 24% increase in gene content since Version 4.8 (September 2019). Bacterial and fungal pathogens represent the majority of the interaction data, with a 54:46 split of entries, whilst protists, protozoa, nematodes and insects represent 3.6% of entries. Host species consist of approximately 54% plants and 46% others of medical, veterinary and/or environmental importance. PHI-base data is disseminated to UniProtKB, FungiDB and Ensembl Genomes. PHI-base will migrate to a new gene-centric version (version 5.0) in early 2022. This major development is briefly described.
Collapse
Affiliation(s)
- Martin Urban
- Department of Biointeractions and Crop Protection, Rothamsted Research, Harpenden AL5 2JQ, UK
| | - Alayne Cuzick
- Department of Biointeractions and Crop Protection, Rothamsted Research, Harpenden AL5 2JQ, UK
| | - James Seager
- Department of Biointeractions and Crop Protection, Rothamsted Research, Harpenden AL5 2JQ, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Kim Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | | | - Jashobanta Sahu
- Molecular Connections, Kandala Mansions, Kariappa Road, Basavanagudi, Bengaluru 560 004, India
| | - S Vijaylakshmi Iyer
- Molecular Connections, Kandala Mansions, Kariappa Road, Basavanagudi, Bengaluru 560 004, India
| | - Lokanath Khamari
- Molecular Connections, Kandala Mansions, Kariappa Road, Basavanagudi, Bengaluru 560 004, India
| | - Nishadi De Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Manuel Carbajo Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helder Pedro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kim E Hammond-Kosack
- Department of Biointeractions and Crop Protection, Rothamsted Research, Harpenden AL5 2JQ, UK
| |
Collapse
|
155
|
Cabili MN, Lawson J, Saltzman A, Rushton G, O’Rourke P, Wilbanks J, Rodriguez LL, Nyronen T, Courtot M, Donnelly S, Philippakis AA. Empirical validation of an automated approach to data use oversight. CELL GENOMICS 2021; 1:100031. [PMID: 36778584 PMCID: PMC9903839 DOI: 10.1016/j.xgen.2021.100031] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 06/30/2021] [Accepted: 08/07/2021] [Indexed: 10/19/2022]
Abstract
The current paradigm for data use oversight of biomedical datasets is onerous, extending the timescale and resources needed to obtain access for secondary analyses, thus hindering scientific discovery. For a researcher to utilize a controlled-access dataset, a data access committee must review her research plans to determine whether they are consistent with the data use limitations (DULs) specified by the informed consent form. The newly created GA4GH data use ontology (DUO) holds the potential to streamline this process by making data use oversight computable. Here, we describe an open-source software platform, the Data Use Oversight System (DUOS), that connects with DUO terminology to enable automated data use oversight. We analyze dbGaP data acquired since 2006, finding an exponential increase in data access requests, which will not be sustainable with current manual oversight review. We perform an empirical evaluation of DUOS and DUO on selected datasets from the Broad Institute's data repository. We were able to structure 118/123 of the evaluated DULs (96%) and 52/52 (100%) of research proposals using DUO terminology, and we find that DUOS' automated data access adjudication in all cases agreed with the DAC manual review. This first empirical evaluation of the feasibility of automated data use oversight demonstrates comparable accuracy to human-based data access oversight in real-world data governance.
Collapse
Affiliation(s)
- Moran N. Cabili
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan Lawson
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Andrea Saltzman
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Greg Rushton
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | | | - Tommi Nyronen
- ELIXIR Finland, CSC - IT Center for Science, Espoo, Finland
| | - Mélanie Courtot
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Stacey Donnelly
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA,Corresponding author
| | - Anthony A. Philippakis
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA,Corresponding author
| |
Collapse
|
156
|
Schriml LM, Munro JB, Schor M, Olley D, McCracken C, Felix V, Baron JA, Jackson R, Bello SM, Bearer C, Lichenstein R, Bisordi K, Dialo NC, Giglio M, Greene C. The Human Disease Ontology 2022 update. Nucleic Acids Res 2021; 50:D1255-D1261. [PMID: 34755882 PMCID: PMC8728220 DOI: 10.1093/nar/gkab1063] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 10/13/2021] [Accepted: 10/18/2021] [Indexed: 01/31/2023] Open
Abstract
The Human Disease Ontology (DO) (www.disease-ontology.org) database, has significantly expanded the disease content and enhanced our userbase and website since the DO’s 2018 Nucleic Acids Research DATABASE issue paper. Conservatively, based on available resource statistics, terms from the DO have been annotated to over 1.5 million biomedical data elements and citations, a 10× increase in the past 5 years. The DO, funded as a NHGRI Genomic Resource, plays a key role in disease knowledge organization, representation, and standardization, serving as a reference framework for multiscale biomedical data integration and analysis across thousands of clinical, biomedical and computational research projects and genomic resources around the world. This update reports on the addition of 1,793 new disease terms, a 14% increase of textual definitions and the integration of 22 137 new SubClassOf axioms defining disease to disease connections representing the DO’s complex disease classification. The DO’s updated website provides multifaceted etiology searching, enhanced documentation and educational resources.
Collapse
Affiliation(s)
- Lynn M Schriml
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - James B Munro
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Mike Schor
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Dustin Olley
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Carrie McCracken
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Victor Felix
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - J Allen Baron
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | | | - Susan M Bello
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| | | | | | | | | | - Michelle Giglio
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Carol Greene
- University of Maryland School of Medicine, Baltimore, MD, USA
| |
Collapse
|
157
|
Gaudelet T, Day B, Jamasb AR, Soman J, Regep C, Liu G, Hayter JBR, Vickers R, Roberts C, Tang J, Roblin D, Blundell TL, Bronstein MM, Taylor-King JP. Utilizing graph machine learning within drug discovery and development. Brief Bioinform 2021; 22:bbab159. [PMID: 34013350 PMCID: PMC8574649 DOI: 10.1093/bib/bbab159] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 04/01/2021] [Accepted: 04/05/2021] [Indexed: 12/15/2022] Open
Abstract
Graph machine learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development. After introducing key terms and modelling approaches, we move chronologically through the drug development pipeline to identify and summarize work incorporating: target identification, design of small molecules and biologics, and drug repurposing. Whilst the field is still emerging, key milestones including repurposed drugs entering in vivo studies, suggest GML will become a modelling framework of choice within biomedical machine learning.
Collapse
Affiliation(s)
| | - Ben Day
- Relation Therapeutics, London, UK
- The Computer Laboratory, University of Cambridge, UK
| | - Arian R Jamasb
- Relation Therapeutics, London, UK
- The Computer Laboratory, University of Cambridge, UK
- Department of Biochemistry, University of Cambridge, UK
| | | | | | | | | | | | | | - Jian Tang
- Mila, the Quebec AI Institute, Canada
- HEC Montreal, Canada
| | - David Roblin
- Relation Therapeutics, London, UK
- Juvenescence, London, UK
- The Francis Crick Institute, London, UK
| | | | - Michael M Bronstein
- Relation Therapeutics, London, UK
- Department of Computing, Imperial College London, UK
- Twitter, UK
| | | |
Collapse
|
158
|
Yang JJ, Grissa D, Lambert CG, Bologa CG, Mathias SL, Waller A, Wild DJ, Jensen LJ, Oprea TI. TIGA: target illumination GWAS analytics. Bioinformatics 2021; 37:3865-3873. [PMID: 34086846 PMCID: PMC11025677 DOI: 10.1093/bioinformatics/btab427] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 05/12/2021] [Accepted: 06/03/2021] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Genome-wide association studies can reveal important genotype-phenotype associations; however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. RESULTS Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite relative citation ratio, and meanRank scores, to aggregate multivariate evidence.This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY AND IMPLEMENTATION Web application, datasets and source code via https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jeremy J Yang
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Integrative Data Science Laboratory, School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47408, USA
| | - Dhouha Grissa
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| | - Christophe G Lambert
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Cristian G Bologa
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Stephen L Mathias
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Anna Waller
- Department of Pathology, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - David J Wild
- Integrative Data Science Laboratory, School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47408, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| | - Tudor I Oprea
- Division of Translational Informatics, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| |
Collapse
|
159
|
Wang W, Han R, Zhang M, Wang Y, Wang T, Wang Y, Shang X, Peng J. A network-based method for brain disease gene prediction by integrating brain connectome and molecular network. Brief Bioinform 2021; 23:6415315. [PMID: 34727570 DOI: 10.1093/bib/bbab459] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/18/2021] [Accepted: 10/07/2021] [Indexed: 12/27/2022] Open
Abstract
Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Ruijiang Han
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Menghan Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| |
Collapse
|
160
|
Ringwald M, Richardson JE, Baldarelli RM, Blake JA, Kadin JA, Smith C, Bult CJ. Mouse Genome Informatics (MGI): latest news from MGD and GXD. Mamm Genome 2021; 33:4-18. [PMID: 34698891 PMCID: PMC8913530 DOI: 10.1007/s00335-021-09921-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/21/2021] [Indexed: 12/01/2022]
Abstract
The Mouse Genome Informatics (MGI) database system combines multiple expertly curated community data resources into a shared knowledge management ecosystem united by common metadata annotation standards. MGI's mission is to facilitate the use of the mouse as an experimental model for understanding the genetic and genomic basis of human health and disease. MGI is the authoritative source for mouse gene, allele, and strain nomenclature and is the primary source of mouse phenotype annotations, functional annotations, developmental gene expression information, and annotations of mouse models with human diseases. MGI maintains mouse anatomy and phenotype ontologies and contributes to the development of the Gene Ontology and Disease Ontology and uses these ontologies as standard terminologies for annotation. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are MGI's two major knowledgebases. Here, we highlight some of the recent changes and enhancements to MGD and GXD that have been implemented in response to changing needs of the biomedical research community and to improve the efficiency of expert curation. MGI can be accessed freely at http://www.informatics.jax.org .
Collapse
|
161
|
Alachram H, Chereda H, Beißbarth T, Wingender E, Stegmaier P. Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks. PLoS One 2021; 16:e0258623. [PMID: 34653224 PMCID: PMC8519453 DOI: 10.1371/journal.pone.0258623] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 10/01/2021] [Indexed: 11/18/2022] Open
Abstract
Biomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on large breast cancer gene expression data and on other cancer datasets. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed sufficiently good for the metastatic event prediction tasks compared to other networks. Such performance was good enough to validate the utility of our generated word embeddings in constructing biological networks. Word representations as produced by text mining algorithms like word2vec, therefore are able to capture biologically meaningful relations between entities. Our generated embeddings are publicly available at https://github.com/genexplain/Word2vec-based-Networks/blob/main/README.md.
Collapse
Affiliation(s)
- Halima Alachram
- Department of Medical Bioinformatics, University Medical Center, Göttingen, Lower Saxony, Germany
| | - Hryhorii Chereda
- Department of Medical Bioinformatics, University Medical Center, Göttingen, Lower Saxony, Germany
| | - Tim Beißbarth
- Department of Medical Bioinformatics, University Medical Center, Göttingen, Lower Saxony, Germany
| | | | | |
Collapse
|
162
|
Ontology-Based Reasoning for Educational Assistance in Noncommunicable Chronic Diseases. COMPUTERS 2021. [DOI: 10.3390/computers10100128] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Noncommunicable chronic diseases (NCDs) affect a large part of the population. With the emergence of COVID-19, its most severe cases impact people with NCDs, increasing the mortality rate. For this reason, it is necessary to develop personalized solutions to support healthcare considering the specific characteristics of individuals. This paper proposes an ontology to represent the knowledge of educational assistance in NCDs. The purpose of ontology is to support educational practices and systems oriented towards preventing and monitoring these diseases. The ontology is implemented under Protégé 5.5.0 in Ontology Web Language (OWL) format, and defined competency questions, SWRL rules, and SPARQL queries. The current version of ontology includes 138 classes, 31 relations, 6 semantic rules, and 575 axioms. The ontology serves as a NCDs knowledge base and supports automatic reasoning. Evaluations performed through a demo dataset demonstrated the effectiveness of the ontology. SWRL rules were used to define accurate axioms, improving the correct classification and inference of six instantiated individuals. As a scientific contribution, this study presents the first ontology for educational assistance in NCDs.
Collapse
|
163
|
Baltoumas FA, Zafeiropoulou S, Karatzas E, Paragkamian S, Thanati F, Iliopoulos I, Eliopoulos AG, Schneider R, Jensen LJ, Pafilis E, Pavlopoulos GA. OnTheFly 2.0: a text-mining web application for automated biomedical entity recognition, document annotation, network and functional enrichment analysis. NAR Genom Bioinform 2021; 3:lqab090. [PMID: 34632381 PMCID: PMC8494211 DOI: 10.1093/nargab/lqab090] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 09/09/2021] [Accepted: 09/20/2021] [Indexed: 02/06/2023] Open
Abstract
Extracting and processing information from documents is of great importance as lots of experimental results and findings are stored in local files. Therefore, extracting and analyzing biomedical terms from such files in an automated way is absolutely necessary. In this article, we present OnTheFly2.0, a web application for extracting biomedical entities from individual files such as plain texts, office documents, PDF files or images. OnTheFly2.0 can generate informative summaries in popup windows containing knowledge related to the identified terms along with links to various databases. It uses the EXTRACT tagging service to perform named entity recognition (NER) for genes/proteins, chemical compounds, organisms, tissues, environments, diseases, phenotypes and gene ontology terms. Multiple files can be analyzed, whereas identified terms such as proteins or genes can be explored through functional enrichment analysis or be associated with diseases and PubMed entries. Finally, protein-protein and protein-chemical networks can be generated with the use of STRING and STITCH services. To demonstrate its capacity for knowledge discovery, we interrogated published meta-analyses of clinical biomarkers of severe COVID-19 and uncovered inflammatory and senescence pathways that impact disease pathogenesis. OnTheFly2.0 currently supports 197 species and is available at http://bib.fleming.gr:3838/OnTheFly/ and http://onthefly.pavlopouloslab.info.
Collapse
Affiliation(s)
- Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Sofia Zafeiropoulou
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Savvas Paragkamian
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003 Heraklion, Crete, Greece
| | - Foteini Thanati
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, Heraklion 71003, Crete, Greece
| | - Aristides G Eliopoulos
- Department of Biology, School of Medicine, National and Kapodistrian University of Athens, Athens, 70013, Greece
| | - Reinhard Schneider
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, L-4365, Luxembourg
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003 Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| |
Collapse
|
164
|
Reyes-Peña C, Tovar M, Bravo M, Motz R. An ontology network for Diabetes Mellitus in Mexico. J Biomed Semantics 2021; 12:19. [PMID: 34625104 PMCID: PMC8500829 DOI: 10.1186/s13326-021-00252-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 09/14/2021] [Indexed: 12/04/2022] Open
Abstract
Background Medical experts in the domain of Diabetes Mellitus (DM) acquire specific knowledge from diabetic patients through monitoring and interaction. This allows them to know the disease and information about other conditions or comorbidities, treatments, and typical consequences of the Mexican population. This indicates that an expert in a domain knows technical information about the domain and contextual factors that interact with it in the real world, contributing to new knowledge generation. For capturing and managing information about the DM, it is necessary to design and implement techniques and methods that allow: determining the most relevant conceptual dimensions and their correct organization, the integration of existing medical and clinical information from different resources, and the generation of structures that represent the deduction process of the doctor. An Ontology Network is a collection of ontologies of diverse knowledge domains which can be interconnected by meta-relations. This article describes an Ontology Network for representing DM in Mexico, designed by a proposed methodology. The information used for Ontology Network building include the ontological resource reuse and non-ontological resource transformation for ontology design and ontology extending by natural language processing techniques. These are medical information extracted from vocabularies, taxonomies, medical dictionaries, ontologies, among others. Additionally, a set of semantic rules has been defined within the Ontology Network to derive new knowledge. Results An Ontology Network for DM in Mexico has been built from six well-defined domains, resulting in new classes, using ontological and non-ontological resources to offer a semantic structure for assisting in the medical diagnosis process. The network comprises 1367 classes, 20 object properties, 63 data properties, and 4268 individuals from seven different ontologies. Ontology Network evaluation was carried out by verifying the purpose for its design and some quality criteria. Conclusions The composition of the Ontology Network offers a set of well-defined ontological modules facilitating the reuse of one or more of them. The inclusion of international vocabularies as SNOMED CT or ICD-10 reinforces the representation by international standards. It increases the semantic interoperability of the network, providing the opportunity to integrate other ontologies with the same vocabularies. The ontology network design methodology offers a guide for ontology developers about how to use ontological and non-ontological resources in order to exploit the maximum of information and knowledge from a set of domains that share or not information.
Collapse
Affiliation(s)
- Cecilia Reyes-Peña
- Faculty of Computer Science, Benemerita Universidad Autonoma de Puebla, Av. San Claudio, Puebla, Mexico.
| | - Mireya Tovar
- Faculty of Computer Science, Benemerita Universidad Autonoma de Puebla, Av. San Claudio, Puebla, Mexico
| | - Maricela Bravo
- Universidad Autonoma Metropolitana, Av. San Pablo No. 180, Mexico City, Mexico
| | - Regina Motz
- Universidad de la Republica, Julio Herrera y Reissig 565, Montevideo, Uruguay
| |
Collapse
|
165
|
Rosário-Ferreira N, Guimarães V, Costa VS, Moreira IS. SicknessMiner: a deep-learning-driven text-mining tool to abridge disease-disease associations. BMC Bioinformatics 2021; 22:482. [PMID: 34607568 PMCID: PMC8491382 DOI: 10.1186/s12859-021-04397-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 09/24/2021] [Indexed: 12/24/2022] Open
Abstract
Background Blood cancers (BCs) are responsible for over 720 K yearly deaths worldwide. Their prevalence and mortality-rate uphold the relevance of research related to BCs. Despite the availability of different resources establishing Disease-Disease Associations (DDAs), the knowledge is scattered and not accessible in a straightforward way to the scientific community. Here, we propose SicknessMiner, a biomedical Text-Mining (TM) approach towards the centralization of DDAs. Our methodology encompasses Named Entity Recognition (NER) and Named Entity Normalization (NEN) steps, and the DDAs retrieved were compared to the DisGeNET resource for qualitative and quantitative comparison. Results We obtained the DDAs via co-mention using our SicknessMiner or gene- or variant-disease similarity on DisGeNET. SicknessMiner was able to retrieve around 92% of the DisGeNET results and nearly 15% of the SicknessMiner results were specific to our pipeline. Conclusions SicknessMiner is a valuable tool to extract disease-disease relationship from RAW input corpus. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04397-w.
Collapse
Affiliation(s)
- Nícia Rosário-Ferreira
- CQC - Coimbra Chemistry Center, Chemistry Department, Faculty of Science and Technology, University of Coimbra, 3004-535, Coimbra, Portugal. .,CNC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal.
| | - Victor Guimarães
- Department of Sciences, University of Porto, Porto, Portugal.,INESC-TEC - Centre of Advanced Computing Systems, Porto, Portugal
| | - Vítor S Costa
- Department of Sciences, University of Porto, Porto, Portugal.,INESC-TEC - Centre of Advanced Computing Systems, Porto, Portugal
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456, Coimbra, Portugal. .,CNC - Center for Neuroscience and Cell Biology, CIBB - Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
166
|
Conceição SIR, Couto FM. Text Mining for Building Biomedical Networks Using Cancer as a Case Study. Biomolecules 2021; 11:biom11101430. [PMID: 34680062 PMCID: PMC8533101 DOI: 10.3390/biom11101430] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 12/15/2022] Open
Abstract
In the assembly of biological networks it is important to provide reliable interactions in an effort to have the most possible accurate representation of real-life systems. Commonly, the data used to build a network comes from diverse high-throughput essays, however most of the interaction data is available through scientific literature. This has become a challenge with the notable increase in scientific literature being published, as it is hard for human curators to track all recent discoveries without using efficient tools to help them identify these interactions in an automatic way. This can be surpassed by using text mining approaches which are capable of extracting knowledge from scientific documents. One of the most important tasks in text mining for biological network building is relation extraction, which identifies relations between the entities of interest. Many interaction databases already use text mining systems, and the development of these tools will lead to more reliable networks, as well as the possibility to personalize the networks by selecting the desired relations. This review will focus on different approaches of automatic information extraction from biomedical text that can be used to enhance existing networks or create new ones, such as deep learning state-of-the-art approaches, focusing on cancer disease as a case-study.
Collapse
|
167
|
Liu Z, Liu J, Liu X, Wang X, Xie Q, Zhang X, Kong X, He M, Yang Y, Deng X, Yang L, Qi Y, Li J, Liu Y, Yuan L, Diao L, He F, Li D. CTR-DB, an omnibus for patient-derived gene expression signatures correlated with cancer drug response. Nucleic Acids Res 2021; 50:D1184-D1199. [PMID: 34570230 PMCID: PMC8728209 DOI: 10.1093/nar/gkab860] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 09/08/2021] [Accepted: 09/15/2021] [Indexed: 12/26/2022] Open
Abstract
To date, only some cancer patients can benefit from chemotherapy and targeted therapy. Drug resistance continues to be a major and challenging problem facing current cancer research. Rapidly accumulated patient-derived clinical transcriptomic data with cancer drug response bring opportunities for exploring molecular determinants of drug response, but meanwhile pose challenges for data management, integration, and reuse. Here we present the Cancer Treatment Response gene signature DataBase (CTR-DB, http://ctrdb.ncpsb.org.cn/), a unique database for basic and clinical researchers to access, integrate, and reuse clinical transcriptomes with cancer drug response. CTR-DB has collected and uniformly reprocessed 83 patient-derived pre-treatment transcriptomic source datasets with manually curated cancer drug response information, involving 28 histological cancer types, 123 drugs, and 5139 patient samples. These data are browsable, searchable, and downloadable. Moreover, CTR-DB supports single-dataset exploration (including differential gene expression, receiver operating characteristic curve, functional enrichment, sensitizing drug search, and tumor microenvironment analyses), and multiple-dataset combination and comparison, as well as biomarker validation function, which provide insights into the drug resistance mechanism, predictive biomarker discovery and validation, drug combination, and resistance mechanism heterogeneity.
Collapse
Affiliation(s)
- Zhongyang Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.,College of Chemistry and Environmental Science, Hebei University, Baoding 071002, China
| | - Jiale Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Xinyue Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Xun Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Qiaosheng Xie
- Department of Radiation Oncology, China-Japan Friendship Hospital, Beijing 100029, China
| | - Xinlei Zhang
- Beijing Geneworks Technology Co., Ltd., Beijing 100101, China
| | - Xiangya Kong
- Beijing Geneworks Technology Co., Ltd., Beijing 100101, China
| | - Mengqi He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yuting Yang
- Department of Immunology, Medical College of Qingdao University, Qingdao 266071, China
| | - Xinru Deng
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Lele Yang
- College of Chemistry and Environmental Science, Hebei University, Baoding 071002, China
| | - Yaning Qi
- College of Chemistry and Environmental Science, Hebei University, Baoding 071002, China
| | - Jiajun Li
- College of Chemistry and Environmental Science, Hebei University, Baoding 071002, China
| | - Yuan Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Liying Yuan
- College of Chemistry and Environmental Science, Hebei University, Baoding 071002, China
| | - Lihong Diao
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.,College of Chemistry and Environmental Science, Hebei University, Baoding 071002, China
| |
Collapse
|
168
|
Li R, Qu H, Wang S, Chater JM, Wang X, Cui Y, Yu L, Zhou R, Jia Q, Traband R, Wang M, Xie W, Yuan D, Zhu J, Zhong WD, Jia Z. CancerMIRNome: an interactive analysis and visualization database for miRNome profiles of human cancer. Nucleic Acids Res 2021; 50:D1139-D1146. [PMID: 34500460 PMCID: PMC8728249 DOI: 10.1093/nar/gkab784] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 08/22/2021] [Accepted: 08/30/2021] [Indexed: 12/13/2022] Open
Abstract
MicroRNAs (miRNAs), which play critical roles in gene regulatory networks, have emerged as promising diagnostic and prognostic biomarkers for human cancer. In particular, circulating miRNAs that are secreted into circulation exist in remarkably stable forms, and have enormous potential to be leveraged as non-invasive biomarkers for early cancer detection. Novel and user-friendly tools are desperately needed to facilitate data mining of the vast amount of miRNA expression data from The Cancer Genome Atlas (TCGA) and large-scale circulating miRNA profiling studies. To fill this void, we developed CancerMIRNome, a comprehensive database for the interactive analysis and visualization of miRNA expression profiles based on 10 554 samples from 33 TCGA projects and 28 633 samples from 40 public circulating miRNome datasets. A series of cutting-edge bioinformatics tools and machine learning algorithms have been packaged in CancerMIRNome, allowing for the pan-cancer analysis of a miRNA of interest across multiple cancer types and the comprehensive analysis of miRNome profiles to identify dysregulated miRNAs and develop diagnostic or prognostic signatures. The data analysis and visualization modules will greatly facilitate the exploit of the valuable resources and promote translational application of miRNA biomarkers in cancer. The CancerMIRNome database is publicly available at http://bioinfo.jialab-ucr.org/CancerMIRNome.
Collapse
Affiliation(s)
- Ruidong Li
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA.,Graduate Program in Genetics, Genomics, and Bioinformatics, University of California, Riverside, CA, USA
| | - Han Qu
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | - Shibo Wang
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | - John M Chater
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | - Xuesong Wang
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA.,Graduate Program in Genetics, Genomics, and Bioinformatics, University of California, Riverside, CA, USA
| | - Yanru Cui
- College of Agronomy, Hebei Agricultural University, Baoding, China
| | - Lei Yu
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA.,Graduate Program in Genetics, Genomics, and Bioinformatics, University of California, Riverside, CA, USA
| | - Rui Zhou
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | - Qiong Jia
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA.,Graduate Program in Genetics, Genomics, and Bioinformatics, University of California, Riverside, CA, USA
| | - Ryan Traband
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | - Meiyue Wang
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Weibo Xie
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Dongbo Yuan
- Department of Urology, Guizhou Provincial People's Hospital, Guizhou, China
| | - Jianguo Zhu
- Department of Urology, Guizhou Provincial People's Hospital, Guizhou, China
| | - Wei-De Zhong
- Department of Urology, Guangdong Key Laboratory of Clinical Molecular Medicine and Diagnostics, Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, China.,Urology Key Laboratory of Guangdong Province, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou Medical University, Guangzhou, China.,Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, China
| | - Zhenyu Jia
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA.,Graduate Program in Genetics, Genomics, and Bioinformatics, University of California, Riverside, CA, USA
| |
Collapse
|
169
|
Koşaloğlu-Yalçın Z, Blazeska N, Carter H, Nielsen M, Cohen E, Kufe D, Conejo-Garcia J, Robbins P, Schoenberger SP, Peters B, Sette A. The Cancer Epitope Database and Analysis Resource: A Blueprint for the Establishment of a New Bioinformatics Resource for Use by the Cancer Immunology Community. Front Immunol 2021; 12:735609. [PMID: 34504503 PMCID: PMC8421848 DOI: 10.3389/fimmu.2021.735609] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 08/09/2021] [Indexed: 12/17/2022] Open
Abstract
Recent years have witnessed a dramatic rise in interest towards cancer epitopes in general and particularly neoepitopes, antigens that are encoded by somatic mutations that arise as a consequence of tumorigenesis. There is also an interest in the specific T cell and B cell receptors recognizing these epitopes, as they have therapeutic applications. They can also aid in basic studies to infer the specificity of T cells or B cells characterized in bulk and single-cell sequencing data. The resurgence of interest in T cell and B cell epitopes emphasizes the need to catalog all cancer epitope-related data linked to the biological, immunological, and clinical contexts, and most importantly, making this information freely available to the scientific community in a user-friendly format. In parallel, there is also a need to develop resources for epitope prediction and analysis tools that provide researchers access to predictive strategies and provide objective evaluations of their performance. For example, such tools should enable researchers to identify epitopes that can be effectively used for immunotherapy or in defining biomarkers to predict the outcome of checkpoint blockade therapies. We present here a detailed vision, blueprint, and work plan for the development of a new resource, the Cancer Epitope Database and Analysis Resource (CEDAR). CEDAR will provide a freely accessible, comprehensive collection of cancer epitope and receptor data curated from the literature and provide easily accessible epitope and T cell/B cell target prediction and analysis tools. The curated cancer epitope data will provide a transparent benchmark dataset that can be used to assess how well prediction tools perform and to develop new prediction tools relevant to the cancer research community.
Collapse
MESH Headings
- Antigens, Neoplasm/genetics
- Antigens, Neoplasm/immunology
- Computational Biology
- Databases, Genetic
- Epitopes, B-Lymphocyte
- Epitopes, T-Lymphocyte
- Humans
- Immunotherapy
- Mutation
- Neoplasms/genetics
- Neoplasms/immunology
- Neoplasms/therapy
- Receptors, Antigen, B-Cell/genetics
- Receptors, Antigen, B-Cell/immunology
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/immunology
- Tumor Microenvironment
Collapse
Affiliation(s)
- Zeynep Koşaloğlu-Yalçın
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, United States
| | - Nina Blazeska
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, United States
| | - Hannah Carter
- Department of Medicine, University of California San Diego, La Jolla, CA, United States
- Moore’s Cancer Center, University of California San Diego, La Jolla, CA, United States
| | - Morten Nielsen
- Department of Bio and Health Informatics, Technical University of Denmark, Lyngby, Denmark
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, San Martín, Argentina
| | - Ezra Cohen
- Moore’s Cancer Center, University of California San Diego, La Jolla, CA, United States
| | - Donald Kufe
- Dana Farber Cancer Institute, Harvard Medical School, Boston, MA, United States
| | - Jose Conejo-Garcia
- Department of Gynecologic Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
- Department of Immunology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Paul Robbins
- National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - Stephen P. Schoenberger
- Laboratory of Cellular Immunology, La Jolla Institute for Immunology, La Jolla, CA, United States
| | - Bjoern Peters
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, United States
- Department of Medicine, University of California San Diego, La Jolla, CA, United States
| | - Alessandro Sette
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, United States
- Department of Medicine, University of California San Diego, La Jolla, CA, United States
| |
Collapse
|
170
|
Kafkas Ş, Althubaiti S, Gkoutos GV, Hoehndorf R, Schofield PN. Linking common human diseases to their phenotypes; development of a resource for human phenomics. J Biomed Semantics 2021; 12:17. [PMID: 34425897 PMCID: PMC8383460 DOI: 10.1186/s13326-021-00249-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 07/30/2021] [Indexed: 11/11/2022] Open
Abstract
Background In recent years a large volume of clinical genomics data has become available due to rapid advances in sequencing technologies. Efficient exploitation of this genomics data requires linkage to patient phenotype profiles. Current resources providing disease-phenotype associations are not comprehensive, and they often do not have broad coverage of the disease terminologies, particularly ICD-10, which is still the primary terminology used in clinical settings. Methods We developed two approaches to gather disease-phenotype associations. First, we used a text mining method that utilizes semantic relations in phenotype ontologies, and applies statistical methods to extract associations between diseases in ICD-10 and phenotype ontology classes from the literature. Second, we developed a semi-automatic way to collect ICD-10–phenotype associations from existing resources containing known relationships. Results We generated four datasets. Two of them are independent datasets linking diseases to their phenotypes based on text mining and semi-automatic strategies. The remaining two datasets are generated from these datasets and cover a subset of ICD-10 classes of common diseases contained in UK Biobank. We extensively validated our text mined and semi-automatically curated datasets by: comparing them against an expert-curated validation dataset containing disease–phenotype associations, measuring their similarity to disease–phenotype associations found in public databases, and assessing how well they could be used to recover gene–disease associations using phenotype similarity. Conclusion We find that our text mining method can produce phenotype annotations of diseases that are correct but often too general to have significant information content, or too specific to accurately reflect the typical manifestations of the sporadic disease. On the other hand, the datasets generated from integrating multiple knowledgebases are more complete (i.e., cover more of the required phenotype annotations for a given disease). We make all data freely available at 10.5281/zenodo.4726713. Supplementary Information The online version contains supplementary material available at (10.1186/s13326-021-00249-x).
Collapse
Affiliation(s)
- Şenay Kafkas
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia
| | - Sara Althubaiti
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia
| | - Georgios V Gkoutos
- Health Data Research UK, Midlands site, Edgbaston, Birmingham, B15 2TT, United Kingdom.,Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia.
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, United Kingdom
| |
Collapse
|
171
|
Baltoumas FA, Zafeiropoulou S, Karatzas E, Koutrouli M, Thanati F, Voutsadaki K, Gkonta M, Hotova J, Kasionis I, Hatzis P, Pavlopoulos GA. Biomolecule and Bioentity Interaction Databases in Systems Biology: A Comprehensive Review. Biomolecules 2021; 11:1245. [PMID: 34439912 PMCID: PMC8391349 DOI: 10.3390/biom11081245] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 08/16/2021] [Accepted: 08/18/2021] [Indexed: 02/06/2023] Open
Abstract
Technological advances in high-throughput techniques have resulted in tremendous growth of complex biological datasets providing evidence regarding various biomolecular interactions. To cope with this data flood, computational approaches, web services, and databases have been implemented to deal with issues such as data integration, visualization, exploration, organization, scalability, and complexity. Nevertheless, as the number of such sets increases, it is becoming more and more difficult for an end user to know what the scope and focus of each repository is and how redundant the information between them is. Several repositories have a more general scope, while others focus on specialized aspects, such as specific organisms or biological systems. Unfortunately, many of these databases are self-contained or poorly documented and maintained. For a clearer view, in this article we provide a comprehensive categorization, comparison and evaluation of such repositories for different bioentity interaction types. We discuss most of the publicly available services based on their content, sources of information, data representation methods, user-friendliness, scope and interconnectivity, and we comment on their strengths and weaknesses. We aim for this review to reach a broad readership varying from biomedical beginners to experts and serve as a reference article in the field of Network Biology.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Sofia Zafeiropoulou
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Foteini Thanati
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Kleanthi Voutsadaki
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Maria Gkonta
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Joana Hotova
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Ioannis Kasionis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Pantelis Hatzis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
| |
Collapse
|
172
|
Turewicz M, Frericks-Zipper A, Stepath M, Schork K, Ramesh S, Marcus K, Eisenacher M. BIONDA: a free database for a fast information on published biomarkers. BIOINFORMATICS ADVANCES 2021; 1:vbab015. [PMID: 36700097 PMCID: PMC9710600 DOI: 10.1093/bioadv/vbab015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 07/11/2021] [Indexed: 01/28/2023]
Abstract
Summary Because of the steadily increasing and already manually unmanageable total number of biomarker-related articles in biomedical research, there is a need for intelligent systems that extract all relevant information from biomedical texts and provide it as structured information to researchers in a user-friendly way. To address this, BIONDA was implemented as a free text mining-based online database for molecular biomarkers including genes, proteins and miRNAs and for all kinds of diseases. The contained structured information on published biomarkers is extracted automatically from Europe PMC publication abstracts and high-quality sources like UniProt and Disease Ontology. This allows frequent content updates. Availability and implementation BIONDA is freely accessible via a user-friendly web application at http://bionda.mpc.ruhr-uni-bochum.de. The current BIONDA code is available at GitHub via https://github.com/mpc-bioinformatics/bionda. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Michael Turewicz
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Anika Frericks-Zipper
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Markus Stepath
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Karin Schork
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Spoorti Ramesh
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Katrin Marcus
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| | - Martin Eisenacher
- Medizinisches Proteom-Center, Ruhr University Bochum, Bochum 44801, Germany.,Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany
| |
Collapse
|
173
|
Pan Y, Lei X, Zhang Y. Association predictions of genomics, proteinomics, transcriptomics, microbiome, metabolomics, pathomics, radiomics, drug, symptoms, environment factor, and disease networks: A comprehensive approach. Med Res Rev 2021; 42:441-461. [PMID: 34346083 DOI: 10.1002/med.21847] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 05/22/2021] [Accepted: 07/07/2021] [Indexed: 12/12/2022]
Abstract
Currently, the research of multi-omics, such as genomics, proteinomics, transcriptomics, microbiome, metabolomics, pathomics, and radiomics, are hot spots. The relationship between multi-omics data, drugs, and diseases has received extensive attention from researchers. At the same time, multi-omics can effectively predict the diagnosis, prognosis, and treatment of diseases. In essence, these research entities, such as genes, RNAs, proteins, microbes, metabolites, pathways as well as pathological and medical imaging data, can all be represented by the network at different levels. And some computer and biology scholars have tried to use computational methods to explore the potential relationships between biological entities. We summary a comprehensive research strategy, that is to build a multi-omics heterogeneous network, covering multimodal data, and use the current popular computational methods to make predictions. In this study, we first introduce the calculation method of the similarity of biological entities at the data level, second discuss multimodal data fusion and methods of feature extraction. Finally, the challenges and opportunities at this stage are summarized. Some scholars have used such a framework to calculate and predict. We also summarize them and discuss the challenges. We hope that our review could help scholars who are interested in the field of bioinformatics, biomedical image, and computer research.
Collapse
Affiliation(s)
- Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
174
|
Zhang J, Liu L, Xu T, Zhang W, Li J, Rao N, Le TD. Time to infer miRNA sponge modules. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 13:e1686. [PMID: 34342388 DOI: 10.1002/wrna.1686] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 07/14/2021] [Accepted: 07/14/2021] [Indexed: 01/01/2023]
Abstract
Inferring competing endogenous RNA (ceRNA) or microRNA (miRNA) sponge modules is a challenging and meaningful task for revealing ceRNA regulation mechanism at the module level. Modules in this context refer to groups of miRNA sponges which have mutual competitions and act as functional units for achieving biological processes. The recent development of computational methods based on heterogeneous data provides a novel way to discern the competitive effects of miRNA sponges on human complex diseases. This article aims to provide a comprehensive perspective of miRNA sponge module discovery methods. We first review the publicly available databases of cancer-related miRNA sponges, as the miRNA sponges involved in human cancers contribute to the discovery of cancer-associated modules. Then we review the existing computational methods for inferring miRNA sponge modules. Furthermore, we conduct an assessment on the performance of the module discovery methods with the pan-cancer dataset, and the comparison study indicates that it is useful to infer biologically meaningful miRNA sponge modules by directly mapping heterogeneous data to the competitive modules. Finally, we discuss the future directions and associated challenges in developing in silico methods to infer miRNA sponge modules. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Small Molecule-RNA Interactions Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs.
Collapse
Affiliation(s)
- Junpeng Zhang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.,School of Engineering, Dali University, Dali, Yunnan, China
| | - Lin Liu
- UniSA STEM, University of South Australia, Mawson Lakes, South Australia, Australia
| | - Taosheng Xu
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, China
| | - Wu Zhang
- School of Agriculture and Biological Sciences, Dali University, Dali, Yunnan, China
| | - Jiuyong Li
- UniSA STEM, University of South Australia, Mawson Lakes, South Australia, Australia
| | - Nini Rao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Thuc Duy Le
- UniSA STEM, University of South Australia, Mawson Lakes, South Australia, Australia
| |
Collapse
|
175
|
Kurbatova N, Swiers R. Disease ontologies for knowledge graphs. BMC Bioinformatics 2021; 22:377. [PMID: 34289807 PMCID: PMC8296689 DOI: 10.1186/s12859-021-04173-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 05/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Data integration to build a biomedical knowledge graph is a challenging task. There are multiple disease ontologies used in data sources and publications, each having its hierarchy. A common task is to map between ontologies, find disease clusters and finally build a representation of the chosen disease area. There is a shortage of published resources and tools to facilitate interactive, efficient and flexible cross-referencing and analysis of multiple disease ontologies commonly found in data sources and research. RESULTS Our results are represented as a knowledge graph solution that uses disease ontology cross-references and facilitates switching between ontology hierarchies for data integration and other tasks. CONCLUSIONS Grakn core with pre-installed "Disease ontologies for knowledge graphs" facilitates the biomedical knowledge graph build and provides an elegant solution for the multiple disease ontologies problem.
Collapse
Affiliation(s)
- Natalja Kurbatova
- Data Infrastructure & Tools, Data Science & Artificial Intelligence, R&D, AstraZeneca, Cambridge, UK.
| | - Rowan Swiers
- Quantitative Biology, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| |
Collapse
|
176
|
Poon CL, Chen CY. Exploring the Impact of Cerebrovascular Disease and Major Depression on Non-diseased Human Tissue Transcriptomes. Front Genet 2021; 12:696836. [PMID: 34349785 PMCID: PMC8327210 DOI: 10.3389/fgene.2021.696836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
Background The development of complex diseases is contributed by the combination of multiple factors and complicated interactions between them. Inflammation has recently been associated with many complex diseases and may cause long-term damage to the human body. In this study, we examined whether two types of complex disease, cerebrovascular disease (CVD) or major depression (MD), systematically altered the transcriptomes of non-diseased human tissues and whether inflammation is linked to identifiable molecular signatures, using post-mortem samples from the Genotype-Tissue Expression (GTEx) project. Results Following a series of differential expression analyses, dozens to hundreds of differentially expressed genes (DEGs) were identified in multiple tissues between subjects with and without a history of CVD or MD. DEGs from these disease-associated tissues-the visceral adipose, tibial artery, caudate, and spinal cord for CVD; and the hypothalamus, putamen, and spinal cord for MD-were further analyzed for functional enrichment. Many pathways associated with immunological events were enriched in the upregulated DEGs of the CVD-associated tissues, as were the neurological and metabolic pathways in DEGs of the MD-associated tissues. Eight gene-tissue pairs were found to overlap with those prioritized by our transcriptome-wide association studies, indicating a potential genetic effect on gene expression for circulating cytokine phenotypes. Conclusion Cerebrovascular disease and major depression cause detectable changes in the gene expression of non-diseased tissues, suggesting that a possible long-term impact of diseases, lifestyles and environmental factors may together contribute to the appearance of "transcriptomic scars" on the human body. Furthermore, inflammation is probably one of the systemic and long-lasting effects of cerebrovascular events.
Collapse
Affiliation(s)
- Chi-Lam Poon
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan.,Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, United States
| | - Cho-Yi Chen
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan.,Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan
| |
Collapse
|
177
|
Notaro M, Frasca M, Petrini A, Gliozzo J, Casiraghi E, Robinson PN, Valentini G. HEMDAG: a family of modular and scalable hierarchical ensemble methods to improve Gene Ontology term prediction. Bioinformatics 2021; 37:4526-4533. [PMID: 34240108 DOI: 10.1093/bioinformatics/btab485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/15/2021] [Accepted: 07/04/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Automated protein function prediction is a complex multi-class, multi-label, structured classification problem in which protein functions are organized in a controlled vocabulary, according to the Gene Ontology (GO). "Hierarchy-unaware" classifiers, also known as "flat" methods, predict GO terms without exploiting the inherent structure of the ontology, potentially violating the True-Path-Rule (TPR) that governs the GO, while "hierarchy-aware" approaches, even if they obey the TPR, do not always show clear improvements with respect to flat methods, or do not scale well when applied to the full GO. RESULTS To overcome these limitations, we propose Hierarchical Ensemble Methods for Directed Acyclic Graphs (HEMDAG), a family of highly modular hierarchical ensembles of classifiers, able to build upon any flat method and to provide "TPR-safe" predictions, by leveraging a combination of isotonic regression and TPR learning strategies. Extensive experiments on synthetic and real data across several organisms firstly show that HEMDAG can be used as a general tool to improve the predictions of flat classifiers, and secondly that HEMDAG is competitive versus state-of-the-art hierarchy-aware learning methods proposed in the last CAFA international challenges. AVAILABILITY Fully-tested R code freely available at https://anaconda.org/bioconda/r-hemdag. Tutorial and documentation at https://hemdag.readthedocs.io. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marco Notaro
- AnacletoLab-Dipartimento di Informatica, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Marco Frasca
- AnacletoLab-Dipartimento di Informatica, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Alessandro Petrini
- AnacletoLab-Dipartimento di Informatica, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Jessica Gliozzo
- AnacletoLab-Dipartimento di Informatica, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Elena Casiraghi
- AnacletoLab-Dipartimento di Informatica, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, US
| | - Giorgio Valentini
- AnacletoLab-Dipartimento di Informatica, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy.,CINI, National Laboratory in Artificial Intelligence and Intelligent Systems-AIIS, Roma, Italy.,Data Science Research Center, Università degli Studi di Milano, Milano, 20133, Italy
| |
Collapse
|
178
|
Korbolina EE, Bryzgalov LO, Ustrokhanova DZ, Postovalov SN, Poverin DV, Damarov IS, Merkulova TI. A Panel of rSNPs Demonstrating Allelic Asymmetry in Both ChIP-seq and RNA-seq Data and the Search for Their Phenotypic Outcomes through Analysis of DEGs. Int J Mol Sci 2021; 22:ijms22147240. [PMID: 34298860 PMCID: PMC8303726 DOI: 10.3390/ijms22147240] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 06/24/2021] [Accepted: 06/30/2021] [Indexed: 12/12/2022] Open
Abstract
Currently, the detection of the allele asymmetry of gene expression from RNA-seq data or the transcription factor binding from ChIP-seq data is one of the approaches used to identify the functional genetic variants that can affect gene expression (regulatory SNPs or rSNPs). In this study, we searched for rSNPs using the data for human pulmonary arterial endothelial cells (PAECs) available from the Sequence Read Archive (SRA). Allele-asymmetric binding and expression events are analyzed in paired ChIP-seq data for H3K4me3 mark and RNA-seq data obtained for 19 individuals. Two statistical approaches, weighted z-scores and predicted probabilities, were used to improve the efficiency of finding rSNPs. In total, we identified 14,266 rSNPs associated with both allele-specific binding and expression. Among them, 645 rSNPs were associated with GWAS phenotypes; 4746 rSNPs were reported as eQTLs by GTEx, and 11,536 rSNPs were located in 374 candidate transcription factor binding motifs. Additionally, we searched for the rSNPs associated with gene expression using an SRA RNA-seq dataset for 281 clinically annotated human postmortem brain samples and detected eQTLs for 2505 rSNPs. Based on these results, we conducted Gene Ontology (GO), Disease Ontology (DO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses and constructed the protein-protein interaction networks to represent the top-ranked biological processes with a possible contribution to the phenotypic outcome.
Collapse
Affiliation(s)
- Elena E. Korbolina
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 LavrentyevaProspekt, 630090 Novosibirsk, Russia; (L.O.B.); (I.S.D.); (T.I.M.)
- Correspondence:
| | - Leonid O. Bryzgalov
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 LavrentyevaProspekt, 630090 Novosibirsk, Russia; (L.O.B.); (I.S.D.); (T.I.M.)
- VECTOR-BEST, PO BOX 492, 630117 Novosibirsk, Russia
| | - Diana Z. Ustrokhanova
- Department of Information Biology, The Novosibirsk State University, 1 Pirogovast, 630090 Novosibirsk, Russia;
| | - Sergey N. Postovalov
- Department of Theoretical and Applied Informatics, The Novosibirsk State Technical University, 630073 Novosibirsk, Russia; (S.N.P.); (D.V.P.)
| | - Dmitry V. Poverin
- Department of Theoretical and Applied Informatics, The Novosibirsk State Technical University, 630073 Novosibirsk, Russia; (S.N.P.); (D.V.P.)
| | - Igor S. Damarov
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 LavrentyevaProspekt, 630090 Novosibirsk, Russia; (L.O.B.); (I.S.D.); (T.I.M.)
| | - Tatiana I. Merkulova
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Science, 10 LavrentyevaProspekt, 630090 Novosibirsk, Russia; (L.O.B.); (I.S.D.); (T.I.M.)
- Department of Information Biology, The Novosibirsk State University, 1 Pirogovast, 630090 Novosibirsk, Russia;
| |
Collapse
|
179
|
Figueiredo RQ, Raschka T, Kodamullil AT, Hofmann-Apitius M, Mubeen S, Domingo-Fernández D. Towards a global investigation of transcriptomic signatures through co-expression networks and pathway knowledge for the identification of disease mechanisms. Nucleic Acids Res 2021; 49:7939-7953. [PMID: 34197603 PMCID: PMC8373148 DOI: 10.1093/nar/gkab556] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 05/17/2021] [Accepted: 06/11/2021] [Indexed: 12/17/2022] Open
Abstract
We attempt to address a key question in the joint analysis of transcriptomic data: can we correlate the patterns we observe in transcriptomic datasets to known interactions and pathway knowledge to broaden our understanding of disease pathophysiology? We present a systematic approach that sheds light on the patterns observed in hundreds of transcriptomic datasets from over sixty indications by using pathways and molecular interactions as a template. Our analysis employs transcriptomic datasets to construct dozens of disease specific co-expression networks, alongside a human protein-protein interactome network. Leveraging the interoperability between these two network templates, we explore patterns both common and particular to these diseases on three different levels. Firstly, at the node-level, we identify most and least common proteins across diseases and evaluate their consistency against the interactome as a proxy for their prevalence in the scientific literature. Secondly, we overlay both network templates to analyze common correlations and interactions across diseases at the edge-level. Thirdly, we explore the similarity between patterns observed at the disease-level and pathway knowledge to identify signatures associated with specific diseases and indication areas. Finally, we present a case scenario in schizophrenia, where we show how our approach can be used to investigate disease pathophysiology.
Collapse
Affiliation(s)
- Rebeca Queiroz Figueiredo
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany
| | - Tamara Raschka
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany.,Fraunhofer Center for Machine Learning, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Causality Biomodels, Kinfra Hi-Tech Park, Kalamassery, Cochin, Kerala, India
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany
| | - Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany.,Fraunhofer Center for Machine Learning, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany.,Fraunhofer Center for Machine Learning, Germany.,Enveda Biosciences, Boulder, CO 80301, USA
| |
Collapse
|
180
|
Yan VKC, Li X, Ye X, Ou M, Luo R, Zhang Q, Tang B, Cowling BJ, Hung I, Siu CW, Wong ICK, Cheng RCK, Chan EW. Drug Repurposing for the Treatment of COVID-19: A Knowledge Graph Approach. ADVANCED THERAPEUTICS 2021; 4:2100055. [PMID: 34179346 PMCID: PMC8212091 DOI: 10.1002/adtp.202100055] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 04/20/2021] [Indexed: 12/19/2022]
Abstract
Identifying effective drug treatments for COVID-19 is essential to reduce morbidity and mortality. Although a number of existing drugs have been proposed as potential COVID-19 treatments, effective data platforms and algorithms to prioritize drug candidates for evaluation and application of knowledge graph for drug repurposing have not been adequately explored. A COVID-19 knowledge graph by integrating 14 public bioinformatic databases containing information on drugs, genes, proteins, viruses, diseases, symptoms and their linkages is developed. An algorithm is developed to extract hidden linkages connecting drugs and COVID-19 from the knowledge graph, to generate and rank proposed drug candidates for repurposing as treatments for COVID-19 by integrating three scores for each drug: motif scores, knowledge graph PageRank scores, and knowledge graph embedding scores. The knowledge graph contains over 48 000 nodes and 13 37 000 edges, including 13 563 molecules in the DrugBank database. From the 5624 molecules identified by the motif-discovery algorithms, ranking results show that 112 drug molecules had the top 2% scores, of which 50 existing drugs with other indications approved by health administrations reported. The proposed drug candidates serve to generate hypotheses for future evaluation in clinical trials and observational studies.
Collapse
Affiliation(s)
- Vincent K. C. Yan
- Centre for Safe Medication Practice and ResearchDepartment of Pharmacology and PharmacyLKS Faculty of MedicineUniversity of Hong KongHong Kong Special Administrative Region, 1/F, Jockey Club Building for Interdisciplinary Research, 5 Sassoon RoadPokfulamHong Kong SARChina
| | - Xiaodong Li
- Department of Computer ScienceFaculty of EngineeringUniversity of Hong KongHong Kong Special Administrative Region, CB303, Chow Yei Ching BuildingPokfulaHong Kong SARChina
| | - Xuxiao Ye
- Centre for Safe Medication Practice and ResearchDepartment of Pharmacology and PharmacyLKS Faculty of MedicineUniversity of Hong KongHong Kong Special Administrative Region, 1/F, Jockey Club Building for Interdisciplinary Research, 5 Sassoon RoadPokfulamHong Kong SARChina
| | - Min Ou
- Department of Computer ScienceFaculty of EngineeringUniversity of Hong KongHong Kong Special Administrative Region, CB303, Chow Yei Ching BuildingPokfulaHong Kong SARChina
| | - Ruibang Luo
- Department of Computer ScienceFaculty of EngineeringUniversity of Hong KongHong Kong Special Administrative Region, CB303, Chow Yei Ching BuildingPokfulaHong Kong SARChina
| | - Qingpeng Zhang
- School of Data ScienceCity University of Hong KongHong Kong Special Administrative Region, 83 Tat Chee AvenueKowloonHong Kong SARChina
| | - Bo Tang
- Department of Computer Science and EngineeringSouthern University of Science and Technology1088 Xueyuan Avenue, Nanshan DistrictShenzhenGuangdong518055China
| | - Benjamin J. Cowling
- Division of Epidemiology and BiostatisticsSchool of Public HealthUniversity of Hong KongHong Kong Special Administrative Region, 21 Sassoon RoadPokfulamHong Kong SARChina
| | - Ivan Hung
- Division of Infectious DiseasesDepartment of MedicineLKS Faculty of MedicineUniversity of Hong KongHong Kong Special Administrative Region, 102 Pokfulam RoadHong Kong SARChina
| | - Chung Wah Siu
- Division of CardiologyDepartment of MedicineUniversity of Hong KongHong Kong Special Administrative Region, 102 Pokfulam RoadHong Kong SARChina
| | - Ian C. K. Wong
- Centre for Safe Medication Practice and ResearchDepartment of Pharmacology and PharmacyLKS Faculty of MedicineUniversity of Hong KongHong Kong Special Administrative Region, 1/F, Jockey Club Building for Interdisciplinary Research, 5 Sassoon RoadPokfulamHong Kong SARChina
- Laboratory of Data Discovery for Health12/F, Building 19W, 19 Science Park West AvenueHong Kong SARChina
- Department of PharmacyThe University of Hong Kong‐Shenzhen Hopsital1 Haiyuanyi Road, Futian DistrictShenzhen518009China
- Shenzhen Institute of Research and InovationThe University of Hong Kong5/F, Key Laboratory Platform Building, Shenzhen Virtual University Park No.6, NanshanShenzhen518057China
- UCL School of Pharmacy29–39 Brunswick SquareLondonUK
| | - Reynold C. K. Cheng
- Department of Computer ScienceFaculty of EngineeringUniversity of Hong KongHong Kong Special Administrative Region, CB303, Chow Yei Ching BuildingPokfulaHong Kong SARChina
| | - Esther W. Chan
- Centre for Safe Medication Practice and ResearchDepartment of Pharmacology and PharmacyLKS Faculty of MedicineUniversity of Hong KongHong Kong Special Administrative Region, 1/F, Jockey Club Building for Interdisciplinary Research, 5 Sassoon RoadPokfulamHong Kong SARChina
- Laboratory of Data Discovery for Health12/F, Building 19W, 19 Science Park West AvenueHong Kong SARChina
- Department of PharmacyThe University of Hong Kong‐Shenzhen Hopsital1 Haiyuanyi Road, Futian DistrictShenzhen518009China
- Shenzhen Institute of Research and InovationThe University of Hong Kong5/F, Key Laboratory Platform Building, Shenzhen Virtual University Park No.6, NanshanShenzhen518057China
| |
Collapse
|
181
|
Lopes de Souza P, Lopes de Souza W, Ferreira Pires L. ScrumOntoBDD: Agile software development based on scrum, ontologies and behaviour-driven development. JOURNAL OF THE BRAZILIAN COMPUTER SOCIETY 2021. [DOI: 10.1186/s13173-021-00114-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
AbstractWhen developing a Learning Management System (LMS) using Scrum, we noticed that it was quite often necessary to redefine some system behaviour scenarios, due to ambiguities in the requirement specifications, or due to misinterpretations of stories reported by the Product Owners (POs). The definition of test suites was also cumbersome, resulting in test suites that were incomplete or did not at all comply with the system requirements. Based on this experience and to deal with these problems, in this paper, we propose the ScrumOntoBDD approach to agile software development, which combines Scrum, ontologies and Behaviour-Driven Development (BDD). This approach is centred on the concepts and techniques of Scrum and BDD and focuses on the planning and analysis phases of the software life cycle, since the BDD tools currently provide little support to these phases, while most of the problems during the LMS development were found exactly there. We claim that our approach improves the software development practices in this respect. Furthermore, ScrumOntoBDD employs ontologies in order to reduce ambiguities intrinsic to the use of a natural language as a BDD ubiquitous language. In this paper, we illustrate and systematically evaluate our approach, showing that it is beneficial since it improves the communication between members of an agile development team.
Collapse
|
182
|
Vieira LM, Jorge NAN, de Sousa JB, Setubal JC, Stadler PF, Walter MEMT. Competing Endogenous RNA in Colorectal Cancer: An Analysis for Colon, Rectum, and Rectosigmoid Junction. Front Oncol 2021; 11:681579. [PMID: 34178670 PMCID: PMC8222815 DOI: 10.3389/fonc.2021.681579] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 04/22/2021] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Colorectal cancer (CRC) is a heterogeneous cancer. Its treatment depends on its anatomical site and distinguishes between colon, rectum, and rectosigmoid junction cancer. This study aimed to identify diagnostic and prognostic biomarkers using networks of CRC-associated transcripts that can be built based on competing endogenous RNAs (ceRNA). METHODS RNA expression and clinical information data of patients with colon, rectum, and rectosigmoid junction cancer were obtained from The Cancer Genome Atlas (TCGA). The RNA expression profiles were assessed through bioinformatics analysis, and a ceRNA was constructed for each CRC site. A functional enrichment analysis was performed to assess the functional roles of the ceRNA networks in the prognosis of colon, rectum, and rectosigmoid junction cancer. Finally, to verify the ceRNA impact on prognosis, an overall survival analysis was performed. RESULTS The study identified various CRC site-specific prognosis biomarkers: hsa-miR-1271-5p, NRG1, hsa-miR-130a-3p, SNHG16, and hsa-miR-495-3p in the colon; E2F8 in the rectum and DMD and hsa-miR-130b-3p in the rectosigmoid junction. We also identified different biological pathways that highlight differences in CRC behavior at different anatomical sites, thus reinforcing the importance of correctly identifying the tumor site. CONCLUSIONS Several potential prognostic markers for colon, rectum, and rectosigmoid junction cancer were found. CeRNA networks could provide better understanding of the differences between, and common factors in, prognosis of colon, rectum, and rectosigmoid junction cancer.
Collapse
Affiliation(s)
- Lucas Maciel Vieira
- Departamento de Ciência da Computação, Instituto de Ciência Exatas, University of Brasília, Brasília, Brazil
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig, Germany
| | | | - João Batista de Sousa
- Division of Coloproctology, Department of Surgery, University of Brasília School of Medicine, Brasília, Brazil
| | - João Carlos Setubal
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig, Germany
- Max Planck Institute for Mathematics in the Science, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia
- Santa Fe Institute, Santa Fe, CA, United States
| | | |
Collapse
|
183
|
Zuo ZL, Cao RF, Wei PJ, Xia JF, Zheng CH. Double matrix completion for circRNA-disease association prediction. BMC Bioinformatics 2021; 22:307. [PMID: 34103016 PMCID: PMC8185931 DOI: 10.1186/s12859-021-04231-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 05/28/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Circular RNAs (circRNAs) are a class of single-stranded RNA molecules with a closed-loop structure. A growing body of research has shown that circRNAs are closely related to the development of diseases. Because biological experiments to verify circRNA-disease associations are time-consuming and wasteful of resources, it is necessary to propose a reliable computational method to predict the potential candidate circRNA-disease associations for biological experiments to make them more efficient. RESULTS In this paper, we propose a double matrix completion method (DMCCDA) for predicting potential circRNA-disease associations. First, we constructed a similarity matrix of circRNA and disease according to circRNA sequence information and semantic disease information. We also built a Gauss interaction profile similarity matrix for circRNA and disease based on experimentally verified circRNA-disease associations. Then, the corresponding circRNA sequence similarity and semantic similarity of disease are used to update the association matrix from the perspective of circRNA and disease, respectively, by matrix multiplication. Finally, from the perspective of circRNA and disease, matrix completion is used to update the matrix block, which is formed by splicing the association matrix obtained in the previous step with the corresponding Gaussian similarity matrix. Compared with other approaches, the model of DMCCDA has a relatively good result in leave-one-out cross-validation and five-fold cross-validation. Additionally, the results of the case studies illustrate the effectiveness of the DMCCDA model. CONCLUSION The results show that our method works well for recommending the potential circRNAs for a disease for biological experiments.
Collapse
Affiliation(s)
- Zong-Lan Zuo
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China
| | - Rui-Fen Cao
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China
- Engineering Research Center of Big Data Application in Private Health Medicine, Fujian Province University, Putian, Fujian, China
| | - Pi-Jing Wei
- Institute of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Jun-Feng Xia
- Institute of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China.
| |
Collapse
|
184
|
Lou P, Dong Y, Jimeno Yepes A, Li C. A representation model for biological entities by fusing structured axioms with unstructured texts. Bioinformatics 2021; 37:1156-1163. [PMID: 33107905 DOI: 10.1093/bioinformatics/btaa913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 09/04/2020] [Accepted: 10/13/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Structured semantic resources, for example, biological knowledge bases and ontologies, formally define biological concepts, entities and their semantic relationships, manifested as structured axioms and unstructured texts (e.g. textual definitions). The resources contain accurate expressions of biological reality and have been used by machine-learning models to assist intelligent applications like knowledge discovery. The current methods use both the axioms and definitions as plain texts in representation learning (RL). However, since the axioms are machine-readable while the natural language is human-understandable, difference in meaning of token and structure impedes the representations to encode desirable biological knowledge. RESULTS We propose ERBK, a RL model of bio-entities. Instead of using the axioms and definitions as a textual corpus, our method uses knowledge graph embedding method and deep convolutional neural models to encode the axioms and definitions respectively. The representations could not only encode more underlying biological knowledge but also be further applied to zero-shot circumstance where existing approaches fall short. Experimental evaluations show that ERBK outperforms the existing methods for predicting protein-protein interactions and gene-disease associations. Moreover, it shows that ERBK still maintains promising performance under the zero-shot circumstance. We believe the representations and the method have certain generality and could extend to other types of bio-relation. AVAILABILITY AND IMPLEMENTATION The source code is available at the gitlab repository https://gitlab.com/BioAI/erbk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peiliang Lou
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China.,Key Laboratory of Intelligent Networks and Network Security (Xi'an Jiaotong University), Ministry of Education, Xi'an, Shaanxi 710049, China
| | - YuXin Dong
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| | | | - Chen Li
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China.,National Engineering Lab for Big Data Analytics, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China
| |
Collapse
|
185
|
Mohammed Y, Michaud SA, Pětrošová H, Yang J, Ganguly M, Schibli D, Flenniken AM, Nutter LMJ, Adissu HA, Lloyd KCK, McKerlie C, Borchers CH. Proteotyping of knockout mouse strains reveals sex- and strain-specific signatures in blood plasma. NPJ Syst Biol Appl 2021; 7:25. [PMID: 34050187 PMCID: PMC8163790 DOI: 10.1038/s41540-021-00184-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 04/25/2021] [Indexed: 11/24/2022] Open
Abstract
We proteotyped blood plasma from 30 mouse knockout strains and corresponding wild-type mice from the International Mouse Phenotyping Consortium. We used targeted proteomics with internal standards to quantify 375 proteins in 218 samples. Our results provide insights into the manifested effects of each gene knockout at the plasma proteome level. We first investigated possible contamination by erythrocytes during sample preparation and labeled, in one case, up to 11 differential proteins as erythrocyte originated. Second, we showed that differences in baseline protein abundance between female and male mice were evident in all mice, emphasizing the necessity to include both sexes in basic research, target discovery, and preclinical effect and safety studies. Next, we identified the protein signature of each gene knockout and performed functional analyses for all knockout strains. Further, to demonstrate how proteome analysis identifies the effect of gene deficiency beyond traditional phenotyping tests, we provide in-depth analysis of two strains, C8a-/- and Npc2+/-. The proteins encoded by these genes are well-characterized providing good validation of our method in homozygous and heterozygous knockout mice. Ig alpha chain C region, a poorly characterized protein, was among the differentiating proteins in C8a-/-. In Npc2+/- mice, where histopathology and traditional tests failed to differentiate heterozygous from wild-type mice, our data showed significant difference in various lysosomal storage disease-related proteins. Our results demonstrate how to combine absolute quantitative proteomics with mouse gene knockout strategies to systematically study the effect of protein absence. The approach used here for blood plasma is applicable to all tissue protein extracts.
Collapse
Affiliation(s)
- Yassene Mohammed
- University of Victoria-Genome BC Proteomics Centre, Victoria, BC, Canada.
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands.
| | - Sarah A Michaud
- University of Victoria-Genome BC Proteomics Centre, Victoria, BC, Canada.
| | - Helena Pětrošová
- University of Victoria-Genome BC Proteomics Centre, Victoria, BC, Canada
| | - Juncong Yang
- University of Victoria-Genome BC Proteomics Centre, Victoria, BC, Canada
| | - Milan Ganguly
- The Center for Phenogenomics, Toronto, ON, Canada
- The Hospital for Sick Children, Toronto, ON, Canada
| | - David Schibli
- University of Victoria-Genome BC Proteomics Centre, Victoria, BC, Canada
| | - Ann M Flenniken
- The Center for Phenogenomics, Toronto, ON, Canada
- Sinai Health Lunenfeld-Tanenbaum Research Institute, Toronto, ON, Canada
| | - Lauryl M J Nutter
- The Center for Phenogenomics, Toronto, ON, Canada
- The Hospital for Sick Children, Toronto, ON, Canada
| | | | - K C Kent Lloyd
- Department of Surgery, School of Medicine, and Mouse Biology Program, University of California, Davis, CA, USA
| | | | - Christoph H Borchers
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, QC, Canada.
- Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, QC, Canada.
- Department of Data Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, Russia.
| |
Collapse
|
186
|
Havrilla JM, Liu C, Dong X, Weng C, Wang K. PhenCards: a data resource linking human phenotype information to biomedical knowledge. Genome Med 2021; 13:91. [PMID: 34034817 PMCID: PMC8147460 DOI: 10.1186/s13073-021-00909-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 05/13/2021] [Indexed: 02/07/2023] Open
Abstract
We present PhenCards ( https://phencards.org ), a database and web server intended as a one-stop shop for previously disconnected biomedical knowledge related to human clinical phenotypes. Users can query human phenotype terms or clinical notes. PhenCards obtains relevant disease/phenotype prevalence and co-occurrence, drug, procedural, pathway, literature, grant, and collaborator data. PhenCards recommends the most probable genetic diseases and candidate genes based on phenotype terms from clinical notes. PhenCards facilitates exploration of phenotype, e.g., which drugs cause or are prescribed for patient symptoms, which genes likely cause specific symptoms, and which comorbidities co-occur with phenotypes.
Collapse
Affiliation(s)
- James M Havrilla
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Xiangchen Dong
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA.
| |
Collapse
|
187
|
Hayman DJ, Modebadze T, Charlton S, Cheung K, Soul J, Lin H, Hao Y, Miles CG, Tsompani D, Jackson RM, Briggs MD, Piróg KA, Clark IM, Barter MJ, Clowry GJ, LeBeau FEN, Young DA. Increased hippocampal excitability in miR-324-null mice. Sci Rep 2021; 11:10452. [PMID: 34001919 PMCID: PMC8129095 DOI: 10.1038/s41598-021-89874-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 04/30/2021] [Indexed: 12/31/2022] Open
Abstract
MicroRNAs are non-coding RNAs that act to downregulate the expression of target genes by translational repression and degradation of messenger RNA molecules. Individual microRNAs have the ability to specifically target a wide array of gene transcripts, therefore allowing each microRNA to play key roles in multiple biological pathways. miR-324 is a microRNA predicted to target thousands of RNA transcripts and is expressed far more highly in the brain than in any other tissue, suggesting that it may play a role in one or multiple neurological pathways. Here we present data from the first global miR-324-null mice, in which increased excitability and interictal discharges were identified in vitro in the hippocampus. RNA sequencing was used to identify differentially expressed genes in miR-324-null mice which may contribute to this increased hippocampal excitability, and 3'UTR luciferase assays and western blotting revealed that two of these, Suox and Cd300lf, are novel direct targets of miR-324. Characterisation of microRNAs that produce an effect on neurological activity, such as miR-324, and identification of the pathways they regulate will allow a better understanding of the processes involved in normal neurological function and in turn may present novel pharmaceutical targets in treating neurological disease.
Collapse
Affiliation(s)
- Dan J Hayman
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Tamara Modebadze
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Sarah Charlton
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Kat Cheung
- Bioinformatics Support Unit, Faculty of Medical Sciences, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Jamie Soul
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Hua Lin
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Yao Hao
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
- Orthopedics Department, First Hospital of Shanxi Medical University, Yingze District, Taiyuan, 030000, China
| | - Colin G Miles
- Translational and Clinical Research Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Dimitra Tsompani
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Robert M Jackson
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Michael D Briggs
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Katarzyna A Piróg
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Ian M Clark
- School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Matt J Barter
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Gavin J Clowry
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - Fiona E N LeBeau
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK
| | - David A Young
- Biosciences Institute, Newcastle University, Central Parkway, Newcastle upon Tyne, NE1 3BZ, UK.
| |
Collapse
|
188
|
Savojardo C, Babbi G, Martelli PL, Casadio R. Mapping OMIM Disease-Related Variations on Protein Domains Reveals an Association Among Variation Type, Pfam Models, and Disease Classes. Front Mol Biosci 2021; 8:617016. [PMID: 34026820 PMCID: PMC8138129 DOI: 10.3389/fmolb.2021.617016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 04/09/2021] [Indexed: 12/23/2022] Open
Abstract
Human genome resequencing projects provide an unprecedented amount of data about single-nucleotide variations occurring in protein-coding regions and often leading to observable changes in the covalent structure of gene products. For many of these variations, links to Online Mendelian Inheritance in Man (OMIM) genetic diseases are available and are reported in many databases that are collecting human variation data such as Humsavar. However, the current knowledge on the molecular mechanisms that are leading to diseases is, in many cases, still limited. For understanding the complex mechanisms behind disease insurgence, the identification of putative models, when considering the protein structure and chemico-physical features of the variations, can be useful in many contexts, including early diagnosis and prognosis. In this study, we investigate the occurrence and distribution of human disease–related variations in the context of Pfam domains. The aim of this study is the identification and characterization of Pfam domains that are statistically more likely to be associated with disease-related variations. The study takes into consideration 2,513 human protein sequences with 22,763 disease-related variations. We describe patterns of disease-related variation types in biunivocal relation with Pfam domains, which are likely to be possible markers for linking Pfam domains to OMIM diseases. Furthermore, we take advantage of the specific association between disease-related variation types and Pfam domains for clustering diseases according to the Human Disease Ontology, and we establish a relation among variation types, Pfam domains, and disease classes. We find that Pfam models are specific markers of patterns of variation types and that they can serve to bridge genes, diseases, and disease classes. Data are available as Supplementary Material for 1,670 Pfam models, including 22,763 disease-related variations associated to 3,257 OMIM diseases.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| |
Collapse
|
189
|
Sobczyk MK, Gaunt TR, Paternoster L. MendelVar: gene prioritization at GWAS loci using phenotypic enrichment of Mendelian disease genes. Bioinformatics 2021; 37:1-8. [PMID: 33836063 PMCID: PMC8034535 DOI: 10.1093/bioinformatics/btaa1096] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 11/30/2020] [Accepted: 01/08/2021] [Indexed: 11/26/2022] Open
Abstract
Motivation Gene prioritization at human GWAS loci is challenging due to linkage-disequilibrium and long-range gene regulatory mechanisms. However, identifying the causal gene is crucial to enable identification of potential drug targets and better understanding of molecular mechanisms. Mapping GWAS traits to known phenotypically relevant Mendelian disease genes near a locus is a promising approach to gene prioritization. Results We present MendelVar, a comprehensive tool that integrates knowledge from four databases on Mendelian disease genes with enrichment testing for a range of associated functional annotations such as Human Phenotype Ontology, Disease Ontology and variants from ClinVar. This open web-based platform enables users to strengthen the case for causal importance of phenotypically matched candidate genes at GWAS loci. We demonstrate the use of MendelVar in post-GWAS gene annotation for type 1 diabetes, type 2 diabetes, blood lipids and atopic dermatitis. Availability and implementation MendelVar is freely available at https://mendelvar.mrcieu.ac.uk Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M K Sobczyk
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK
| | - T R Gaunt
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK
| | - L Paternoster
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK
| |
Collapse
|
190
|
The microRNA analysis portal is a next-generation tool for exploring and analyzing miRNA-focused data in the literature. Sci Rep 2021; 11:9007. [PMID: 33903708 PMCID: PMC8076240 DOI: 10.1038/s41598-021-88617-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 04/07/2021] [Indexed: 02/02/2023] Open
Abstract
MicroRNAs constitute a class of noncoding small RNAs involved in the posttranscriptional regulation of many biological pathways. In recent years, microRNAs have also been associated with regulation across kingdoms, demonstrating that exogenous miRNAs can function in mammals in a fashion similar to mammalian miRNAs. The growing interest in microRNAs and the increasing amount of literature and molecular and biomedical data available make it difficult to identify records of interest and keep up to date with novel findings. For these reasons, we developed the microRNA Analysis Portal (MAP). MAP selects relevant miRNA-focused articles from PubMed, links biomedical and molecular data and applies bioinformatics modules. At the time of this writing, MAP represents the richest, most complete and integrated database focused on microRNAs. MAP also integrates an updated version of MirCompare (2.0), a computational platform used for selecting plant microRNAs on the basis of their ability to regulate mammalian genes. Both MAP and MirCompare functionalities were used to predict that microRNAs from Moringa oleifera have putative roles across kingdoms by regulating human genes coding for proteins of the immune system. Starting from a selection of 94 human microRNAs, MirCompare selected 6 Moringa oleifera functional homologs. The subsequent prediction of human targets and areas of functional enrichment highlighted the central involvement of these genes in regulating immune system processes, particularly the host-virus interaction processes in hepatitis B, cytomegalovirus, papillomavirus and coronavirus. This case of use showed how MAP can help to perform complex queries without any computational background. MAP is available at http://stablab.uniroma2.it/MAP .
Collapse
|
191
|
Pirch S, Müller F, Iofinova E, Pazmandi J, Hütter CVR, Chiettini M, Sin C, Boztug K, Podkosova I, Kaufmann H, Menche J. The VRNetzer platform enables interactive network analysis in Virtual Reality. Nat Commun 2021; 12:2432. [PMID: 33893283 PMCID: PMC8065164 DOI: 10.1038/s41467-021-22570-w] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 03/09/2021] [Indexed: 12/17/2022] Open
Abstract
Networks provide a powerful representation of interacting components within complex systems, making them ideal for visually and analytically exploring big data. However, the size and complexity of many networks render static visualizations on typically-sized paper or screens impractical, resulting in proverbial ‘hairballs’. Here, we introduce a Virtual Reality (VR) platform that overcomes these limitations by facilitating the thorough visual, and interactive, exploration of large networks. Our platform allows maximal customization and extendibility, through the import of custom code for data analysis, integration of external databases, and design of arbitrary user interface elements, among other features. As a proof of concept, we show how our platform can be used to interactively explore genome-scale molecular networks to identify genes associated with rare diseases and understand how they might contribute to disease development. Our platform represents a general purpose, VR-based data exploration platform for large and diverse data types by providing an interface that facilitates the interaction between human intuition and state-of-the-art analysis methods. Data-rich networks can be difficult to interpret beyond a certain size. Here, the authors introduce a platform that uses virtual reality to allow the visual exploration of large networks, while interfacing with data repositories and other analytical methods to improve the interpretation of big data.
Collapse
Affiliation(s)
- Sebastian Pirch
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Felix Müller
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Eugenia Iofinova
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Julia Pazmandi
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria.,Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria
| | - Christiane V R Hütter
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Martin Chiettini
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Celine Sin
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Kaan Boztug
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria.,St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria.,St. Anna Children's Hospital, Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, Vienna, Austria.,Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, Vienna, Austria
| | - Iana Podkosova
- Institute of Visual Computing and Human-Centered Technology, TU Wien, Vienna, Austria
| | - Hannes Kaufmann
- Institute of Visual Computing and Human-Centered Technology, TU Wien, Vienna, Austria
| | - Jörg Menche
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria. .,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria. .,Faculty of Mathematics, University of Vienna, Vienna, Austria.
| |
Collapse
|
192
|
Zhang J, Liu L, Xu T, Zhang W, Zhao C, Li S, Li J, Rao N, Le TD. miRSM: an R package to infer and analyse miRNA sponge modules in heterogeneous data. RNA Biol 2021; 18:2308-2320. [PMID: 33822666 DOI: 10.1080/15476286.2021.1905341] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
In molecular biology, microRNA (miRNA) sponges are RNA transcripts which compete with other RNA transcripts for binding with miRNAs. Research has shown that miRNA sponges have a fundamental impact on tissue development and disease progression. Generally, to achieve a specific biological function, miRNA sponges tend to form modules or communities in a biological system. Until now, however, there is still a lack of tools to aid researchers to infer and analyse miRNA sponge modules from heterogeneous data. To fill this gap, we develop an R/Bioconductor package, miRSM, for facilitating the procedure of inferring and analysing miRNA sponge modules. miRSM provides a collection of 50 co-expression analysis methods to identify gene co-expression modules (which are candidate miRNA sponge modules), four module discovery methods to infer miRNA sponge modules and seven modular analysis methods for investigating miRNA sponge modules. miRSM will enable researchers to quickly apply new datasets to infer and analyse miRNA sponge modules, and will consequently accelerate the research on miRNA sponges.
Collapse
Affiliation(s)
- Junpeng Zhang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.,School of Engineering, Dali University, Dali, Yunnan, China
| | - Lin Liu
- UniSA STEM, University of South Australia, Mawson Lakes, SA, Australia
| | - Taosheng Xu
- Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, China
| | - Wu Zhang
- School of Agriculture and Biological Sciences, Dali University, Dali, Yunnan, China
| | - Chunwen Zhao
- School of Engineering, Dali University, Dali, Yunnan, China
| | - Sijing Li
- School of Engineering, Dali University, Dali, Yunnan, China
| | - Jiuyong Li
- UniSA STEM, University of South Australia, Mawson Lakes, SA, Australia
| | - Nini Rao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Thuc Duy Le
- UniSA STEM, University of South Australia, Mawson Lakes, SA, Australia
| |
Collapse
|
193
|
Chen CH, Lu F, Yang WJ, Yang PE, Chen WM, Kang ST, Huang YS, Kao YC, Feng CT, Chang PC, Wang T, Hsieh CA, Lin YC, Jen Huang JY, Wang LHC. A novel platform for discovery of differentially expressed microRNAs in patients with repeated implantation failure. Fertil Steril 2021; 116:181-188. [PMID: 33823989 DOI: 10.1016/j.fertnstert.2021.01.055] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 01/29/2021] [Accepted: 01/29/2021] [Indexed: 12/22/2022]
Abstract
OBJECTIVE To identify predictor microRNAs (miRNAs) from patients with repeated implantation failure (RIF). DESIGN Systemic analysis of miRNA profiles from the endometrium of patients undergoing in vitro fertilization (IVF). SETTING University research institute, private IVF center, and molecular testing laboratory. PATIENT(S) Twenty five infertile patients in the discovery cohort and 11 patients in the validation cohort. INTERVENTIONS(S) None. MAIN OUTCOME MEASURE(S) A signature set of miRNA associated with the risk of RIF. RESULT(S) We designed a reproductive disease-related PanelChip to access endometrium miRNA profiles in patients undergoing IVF. Three major miRNA signatures, including hsa-miR-20b-5p, hsa-miR-155-5p, and hsa-miR-718, were identified using infinite combination signature search algorithm analysis from 25 patients in the discovery cohort undergoing IVF. These miRNAs were used as biomarkers in the validation cohort of 11 patients. Finally, the 3-miRNA signature was capable of predicting patients with RIF with an accuracy >90%. CONCLUSION(S) Our findings indicated that specific endometrial miRNAs can be applied as diagnostic biomarkers to predict RIF. Such information will definitely help to increase the success rate of implantation practice.
Collapse
Affiliation(s)
- Ching Hung Chen
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, Taiwan; Department of Obstetrics and Gynecology, Ton Yen General Hospital, Hsinchu, Taiwan; Taiwan IVF Group Center for Reproductive Medicine and Infertility, Hsinchu, Taiwan
| | - Farn Lu
- Department of Obstetrics and Gynecology, Ton Yen General Hospital, Hsinchu, Taiwan; Taiwan IVF Group Center for Reproductive Medicine and Infertility, Hsinchu, Taiwan
| | - Wen Jui Yang
- Department of Obstetrics and Gynecology, Ton Yen General Hospital, Hsinchu, Taiwan; Taiwan IVF Group Center for Reproductive Medicine and Infertility, Hsinchu, Taiwan
| | | | | | | | | | - Yi Chi Kao
- Quark Biosciences, Inc., Hsinchu, Taiwan
| | | | | | | | - Chi An Hsieh
- Taiwan IVF Group Center for Reproductive Medicine and Infertility, Hsinchu, Taiwan
| | - Yu Chun Lin
- Taiwan IVF Group Center for Reproductive Medicine and Infertility, Hsinchu, Taiwan
| | - Jack Yu Jen Huang
- Department of Obstetrics and Gynecology, Ton Yen General Hospital, Hsinchu, Taiwan; Taiwan IVF Group Center for Reproductive Medicine and Infertility, Hsinchu, Taiwan; Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Stanford University, Stanford, California
| | - Lily Hui-Ching Wang
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, Taiwan; Department of Medical Science, National Tsing Hua University, Hsinchu, Taiwan.
| |
Collapse
|
194
|
Li Y, Jiang Y, Zhang Y, Li N, Yin Q, Liu L, Lv X, Liu Y, Li A, Fang B, Li J, Ye H, Yang G, Cui X, Liu Y, Qu Y, Li C, Li J, Li D, Gai Z, Wang S, Zhan F, Liang M. Abnormal upregulation of cardiovascular disease biomarker PLA2G7 induced by proinflammatory macrophages in COVID-19 patients. Sci Rep 2021; 11:6811. [PMID: 33762651 PMCID: PMC7990942 DOI: 10.1038/s41598-021-85848-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/08/2021] [Indexed: 01/07/2023] Open
Abstract
High rate of cardiovascular disease (CVD) has been reported among patients with coronavirus disease 2019 (COVID-19). Importantly, CVD, as one of the comorbidities, could also increase the risks of the severity of COVID-19. Here we identified phospholipase A2 group VII (PLA2G7), a well-studied CVD biomarker, as a hub gene in COVID-19 though an integrated hypothesis-free genomic analysis on nasal swabs (n = 486) from patients with COVID-19. PLA2G7 was further found to be predominantly expressed by proinflammatory macrophages in lungs emerging with progression of COVID-19. In the validation stage, RNA level of PLA2G7 was identified in nasal swabs from both COVID-19 and pneumonia patients, other than health individuals. The positive rate of PLA2G7 were correlated with not only viral loads but also severity of pneumonia in non-COVID-19 patients. Serum protein levels of PLA2G7 were found to be elevated and beyond the normal limit in COVID-19 patients, especially among those re-positive patients. We identified and validated PLA2G7, a biomarker for CVD, was abnormally enhanced in COVID-19 at both nucleotide and protein aspects. These findings provided indications into the prevalence of cardiovascular involvements seen in patients with COVID-19. PLA2G7 could be a potential prognostic and therapeutic target in COVID-19.
Collapse
Affiliation(s)
- Yang Li
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Yongzhong Jiang
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, 430065, China
| | - Yi Zhang
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Naizhe Li
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Qiangling Yin
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Linlin Liu
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, 430065, China
| | - Xin Lv
- Qilu Children's Hospital, Cheeloo College of Medicine, Shandong University and Jinan Children's Hospital, Jinan, 250022, China
| | - Yan Liu
- Department of Microbiology, School of Basic Medical Science, Anhui Medical University, Hefei, 230032, China
| | - Aqian Li
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Bin Fang
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, 430065, China
| | - Jiajia Li
- The Center for Scientific Research of the First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China
| | - Hengping Ye
- Xiantao Center for Disease Control and Prevention, Xiantao, 433000, China
| | - Gang Yang
- Xiangyang Center for Disease Control and Prevention, Xiangyang, 441000, China
| | - Xiaoxian Cui
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, 200336, China
| | - Yang Liu
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Yuanyuan Qu
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Chuan Li
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Jiandong Li
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Dexin Li
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China
| | - Zhongtao Gai
- Qilu Children's Hospital, Cheeloo College of Medicine, Shandong University and Jinan Children's Hospital, Jinan, 250022, China
| | - Shiwen Wang
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China.
- CDC-WIV Joint Research Center for Emerging Diseases and Biosafety, Wuhan, 430071, China.
| | - Faxian Zhan
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, 430065, China.
| | - Mifang Liang
- NHC Key Laboratory of Medical Virology and Viral Diseases, National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, 102206, China.
- CDC-WIV Joint Research Center for Emerging Diseases and Biosafety, Wuhan, 430071, China.
| |
Collapse
|
195
|
Bernasconi A, Canakoglu A, Masseroli M, Pinoli P, Ceri S. A review on viral data sources and search systems for perspective mitigation of COVID-19. Brief Bioinform 2021; 22:664-675. [PMID: 33348368 PMCID: PMC7799334 DOI: 10.1093/bib/bbaa359] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/09/2020] [Accepted: 11/09/2020] [Indexed: 12/26/2022] Open
Abstract
With the outbreak of the COVID-19 disease, the research community is producing unprecedented efforts dedicated to better understand and mitigate the effects of the pandemic. In this context, we review the data integration efforts required for accessing and searching genome sequences and metadata of SARS-CoV2, the virus responsible for the COVID-19 disease, which have been deposited into the most important repositories of viral sequences. Organizations that were already present in the virus domain are now dedicating special interest to the emergence of COVID-19 pandemics, by emphasizing specific SARS-CoV2 data and services. At the same time, novel organizations and resources were born in this critical period to serve specifically the purposes of COVID-19 mitigation while setting the research ground for contrasting possible future pandemics. Accessibility and integration of viral sequence data, possibly in conjunction with the human host genotype and clinical data, are paramount to better understand the COVID-19 disease and mitigate its effects. Few examples of host-pathogen integrated datasets exist so far, but we expect them to grow together with the knowledge of COVID-19 disease; once such datasets will be available, useful integrative surveillance mechanisms can be put in place by observing how common variants distribute in time and space, relating them to the phenotypic impact evidenced in the literature.
Collapse
|
196
|
A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations. BMC Bioinformatics 2021; 22:136. [PMID: 33745450 PMCID: PMC7983260 DOI: 10.1186/s12859-021-04073-z] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 03/11/2021] [Indexed: 01/01/2023] Open
Abstract
Background Numerous studies have demonstrated that long non-coding RNAs are related to plenty of human diseases. Therefore, it is crucial to predict potential lncRNA-disease associations for disease prognosis, diagnosis and therapy. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately. Results We proposed an end-to-end model, VGAELDA, which integrates variational inference and graph autoencoders for lncRNA-disease associations prediction. VGAELDA contains two kinds of graph autoencoders. Variational graph autoencoders (VGAE) infer representations from features of lncRNAs and diseases respectively, while graph autoencoders propagate labels via known lncRNA-disease associations. These two kinds of autoencoders are trained alternately by adopting variational expectation maximization algorithm. The integration of both the VGAE for graph representation learning, and the alternate training via variational inference, strengthens the capability of VGAELDA to capture efficient low-dimensional representations from high-dimensional features, and hence promotes the robustness and preciseness for predicting unknown lncRNA-disease associations. Further analysis illuminates that the designed co-training framework of lncRNA and disease for VGAELDA solves a geometric matrix completion problem for capturing efficient low-dimensional representations via a deep learning approach. Conclusion Cross validations and numerical experiments illustrate that VGAELDA outperforms the current state-of-the-art methods in lncRNA-disease association prediction. Case studies indicate that VGAELDA is capable of detecting potential lncRNA-disease associations. The source code and data are available at https://github.com/zhanglabNKU/VGAELDA. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04073-z.
Collapse
|
197
|
Silberstein M, Nesbit N, Cai J, Lee PH. Pathway analysis for genome-wide genetic variation data: Analytic principles, latest developments, and new opportunities. J Genet Genomics 2021; 48:173-183. [PMID: 33896739 PMCID: PMC8286309 DOI: 10.1016/j.jgg.2021.01.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/24/2021] [Accepted: 01/25/2021] [Indexed: 12/23/2022]
Abstract
Pathway analysis, also known as gene-set enrichment analysis, is a multilocus analytic strategy that integrates a priori, biological knowledge into the statistical analysis of high-throughput genetics data. Originally developed for the studies of gene expression data, it has become a powerful analytic procedure for in-depth mining of genome-wide genetic variation data. Astonishing discoveries were made in the past years, uncovering genes and biological mechanisms underlying common and complex disorders. However, as massive amounts of diverse functional genomics data accrue, there is a pressing need for newer generations of pathway analysis methods that can utilize multiple layers of high-throughput genomics data. In this review, we provide an intellectual foundation of this powerful analytic strategy, as well as an update of the state-of-the-art in recent method developments. The goal of this review is threefold: (1) introduce the motivation and basic steps of pathway analysis for genome-wide genetic variation data; (2) review the merits and the shortcomings of classic and newly emerging integrative pathway analysis tools; and (3) discuss remaining challenges and future directions for further method developments.
Collapse
Affiliation(s)
- Micah Silberstein
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nicholas Nesbit
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jacquelyn Cai
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Phil H Lee
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
198
|
Ruiz C, Zitnik M, Leskovec J. Identification of disease treatment mechanisms through the multiscale interactome. Nat Commun 2021; 12:1796. [PMID: 33741907 PMCID: PMC7979814 DOI: 10.1038/s41467-021-21770-8] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 02/04/2021] [Indexed: 12/12/2022] Open
Abstract
Most diseases disrupt multiple proteins, and drugs treat such diseases by restoring the functions of the disrupted proteins. How drugs restore these functions, however, is often unknown as a drug's therapeutic effects are not limited to the proteins that the drug directly targets. Here, we develop the multiscale interactome, a powerful approach to explain disease treatment. We integrate disease-perturbed proteins, drug targets, and biological functions into a multiscale interactome network. We then develop a random walk-based method that captures how drug effects propagate through a hierarchy of biological functions and physical protein-protein interactions. On three key pharmacological tasks, the multiscale interactome predicts drug-disease treatment, identifies proteins and biological functions related to treatment, and predicts genes that alter a treatment's efficacy and adverse reactions. Our results indicate that physical interactions between proteins alone cannot explain treatment since many drugs treat diseases by affecting the biological functions disrupted by the disease rather than directly targeting disease proteins or their regulators. We provide a general framework for explaining treatment, even when drugs seem unrelated to the diseases they are recommended for.
Collapse
Affiliation(s)
- Camilo Ruiz
- Computer Science Department, Stanford University, Stanford, CA, USA
- Bioengineering Department, Stanford University, Stanford, CA, USA
| | - Marinka Zitnik
- Biomedical Informatics Department, Harvard University, Boston, MA, USA
| | - Jure Leskovec
- Computer Science Department, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
199
|
Carter JM, Ang DA, Sim N, Budiman A, Li Y. Approaches to Identify and Characterise the Post-Transcriptional Roles of lncRNAs in Cancer. Noncoding RNA 2021; 7:19. [PMID: 33803328 PMCID: PMC8005986 DOI: 10.3390/ncrna7010019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 02/28/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
It is becoming increasingly evident that the non-coding genome and transcriptome exert great influence over their coding counterparts through complex molecular interactions. Among non-coding RNAs (ncRNA), long non-coding RNAs (lncRNAs) in particular present increased potential to participate in dysregulation of post-transcriptional processes through both RNA and protein interactions. Since such processes can play key roles in contributing to cancer progression, it is desirable to continue expanding the search for lncRNAs impacting cancer through post-transcriptional mechanisms. The sheer diversity of mechanisms requires diverse resources and methods that have been developed and refined over the past decade. We provide an overview of computational resources as well as proven low-to-high throughput techniques to enable identification and characterisation of lncRNAs in their complex interactive contexts. As more cancer research strategies evolve to explore the non-coding genome and transcriptome, we anticipate this will provide a valuable primer and perspective of how these technologies have matured and will continue to evolve to assist researchers in elucidating post-transcriptional roles of lncRNAs in cancer.
Collapse
Affiliation(s)
- Jean-Michel Carter
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), 60 Nanyang Drive, Singapore 637551, Singapore; (D.A.A.); (N.S.); (A.B.)
| | - Daniel Aron Ang
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), 60 Nanyang Drive, Singapore 637551, Singapore; (D.A.A.); (N.S.); (A.B.)
| | - Nicholas Sim
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), 60 Nanyang Drive, Singapore 637551, Singapore; (D.A.A.); (N.S.); (A.B.)
| | - Andrea Budiman
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), 60 Nanyang Drive, Singapore 637551, Singapore; (D.A.A.); (N.S.); (A.B.)
| | - Yinghui Li
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), 60 Nanyang Drive, Singapore 637551, Singapore; (D.A.A.); (N.S.); (A.B.)
- Institute of Molecular and Cell Biology (IMCB), A*STAR, Singapore 138673, Singapore
| |
Collapse
|
200
|
Barros M, Moitinho A, Couto FM. Hybrid semantic recommender system for chemical compounds in large-scale datasets. J Cheminform 2021; 13:15. [PMID: 33622374 PMCID: PMC7903631 DOI: 10.1186/s13321-021-00495-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 02/10/2021] [Indexed: 12/16/2022] Open
Abstract
The large, and increasing, number of chemical compounds poses challenges to the exploration of such datasets. In this work, we propose the usage of recommender systems to identify compounds of interest to scientific researchers. Our approach consists of a hybrid recommender model suitable for implicit feedback datasets and focused on retrieving a ranked list according to the relevance of the items. The model integrates collaborative-filtering algorithms for implicit feedback (Alternating Least Squares and Bayesian Personalized Ranking) and a new content-based algorithm, using the semantic similarity between the chemical compounds in the ChEBI ontology. The algorithms were assessed on an implicit dataset of chemical compounds, CheRM-20, with more than 16.000 items (chemical compounds). The hybrid model was able to improve the results of the collaborative-filtering algorithms, by more than ten percentage points in most of the assessed evaluation metrics.
Collapse
Affiliation(s)
- Marcia Barros
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal. .,CENTRA, Departamento de Física, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal.
| | - Andre Moitinho
- CENTRA, Departamento de Física, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal
| | - Francisco M Couto
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal
| |
Collapse
|