Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Doğan T. HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ 2018;6:e5298. [PMID: 30083448 PMCID: PMC6076985 DOI: 10.7717/peerj.5298] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 07/03/2018] [Indexed: 01/24/2023] Open

For:	Doğan T. HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ 2018;6:e5298. [PMID: 30083448 PMCID: PMC6076985 DOI: 10.7717/peerj.5298] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 07/03/2018] [Indexed: 01/24/2023] Open

Number

Cited by Other Article(s)

Ulusoy E, Doğan T. Mutual annotation-based prediction of protein domain functions with Domain2GO. Protein Sci 2024;33:e4988. [PMID: 38757367 PMCID: PMC11099699 DOI: 10.1002/pro.4988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/25/2024] [Accepted: 03/30/2024] [Indexed: 05/18/2024]

Abstract

Identifying unknown functional properties of proteins is essential for understanding their roles in both health and disease states. The domain composition of a protein can reveal critical information in this context, as domains are structural and functional units that dictate how the protein should act at the molecular level. The expensive and time-consuming nature of wet-lab experimental approaches prompted researchers to develop computational strategies for predicting the functions of proteins. In this study, we proposed a new method called Domain2GO that infers associations between protein domains and function-defining gene ontology (GO) terms, thus redefining the problem as domain function prediction. Domain2GO uses documented protein-level GO annotations together with proteins' domain annotations. Co-annotation patterns of domains and GO terms in the same proteins are examined using statistical resampling to obtain reliable associations. As a use-case study, we evaluated the biological relevance of examples selected from the Domain2GO-generated domain-GO term mappings via literature review. Then, we applied Domain2GO to predict unknown protein functions by propagating domain-associated GO terms to proteins annotated with these domains. For function prediction performance evaluation and comparison against other methods, we employed Critical Assessment of Function Annotation 3 (CAFA3) challenge datasets. The results demonstrated the high potential of Domain2GO, particularly for predicting molecular function and biological process terms, along with advantages such as producing interpretable results and having an exceptionally low computational cost. The approach presented here can be extended to other ontologies and biological entities to investigate unknown relationships in complex and large-scale biological data. The source code, datasets, results, and user instructions for Domain2GO are available at https://github.com/HUBioDataLab/Domain2GO. Additionally, we offer a user-friendly online tool at https://huggingface.co/spaces/HUBioDataLab/Domain2GO, which simplifies the prediction of functions of previously unannotated proteins solely using amino acid sequences.

Collapse

Novoa J, López-Ibáñez J, Chagoyen M, Ranea JAG, Pazos F. CoMentG: comprehensive retrieval of generic relationships between biomedical concepts from the scientific literature. Database (Oxford) 2024;2024:baae025. [PMID: 38564426 PMCID: PMC10986793 DOI: 10.1093/database/baae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 03/01/2024] [Accepted: 03/15/2024] [Indexed: 04/04/2024]

Cankara F, Doğan T. ASCARIS: Positional feature annotation and protein structure-based representation of single amino acid variations. Comput Struct Biotechnol J 2023;21:4743-4758. [PMID: 37822561 PMCID: PMC10562615 DOI: 10.1016/j.csbj.2023.09.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 09/15/2023] [Accepted: 09/15/2023] [Indexed: 10/13/2023] Open

Abstract

Background

Genomic variations may cause deleterious effects on protein functionality and perturb biological processes. Elucidating the effects of variations is critical for developing novel treatment strategies for diseases of genetic origin. Computational approaches have been aiding the work in this field by modeling and analyzing the mutational landscape. However, new approaches are required, especially for accurate representation and data-centric analysis of sequence variations.

Method

In this study, we propose ASCARIS (Annotation and StruCture-bAsed RepresentatIon of Single amino acid variations), a method for the featurization (i.e., quantitative representation) of single amino acid variations (SAVs), which could be used for a variety of purposes, such as predicting their functional effects or building multi-omics-based integrative models. ASCARIS utilizes the direct and spatial correspondence between the location of the SAV on the sequence/structure and 30 different types of positional feature annotations (e.g., active/lipidation/glycosylation sites; calcium/metal/DNA binding, inter/transmembrane regions, etc.), along with structural features and physicochemical properties. The main novelty of this method lies in constructing reusable numerical representations of SAVs via functional annotations.

Results

We statistically analyzed the relationship between these features and the consequences of variations and found that each carries information in this regard. To investigate potential applications of ASCARIS, we trained variant effect prediction models that utilize our SAV representations as input. We carried out an ablation study and a comparison against the state-of-the-art methods and observed that ASCARIS has a competing and complementary performance against widely-used predictors. ASCARIS can be used alone or in combination with other approaches to represent SAVs from a functional perspective. ASCARIS is available as a programmatic tool at https://github.com/HUBioDataLab/ASCARIS and as a web-service at https://huggingface.co/spaces/HUBioDataLab/ASCARIS.

Collapse

Atas Guvenilir H, Doğan T. How to approach machine learning-based prediction of drug/compound-target interactions. J Cheminform 2023;15:16. [PMID: 36747300 PMCID: PMC9901167 DOI: 10.1186/s13321-023-00689-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/30/2023] [Indexed: 02/08/2023] Open

Abstract

The identification of drug/compound-target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.

Collapse

Börner K, Bueckle A, Herr BW, Cross LE, Quardokus EM, Record EG, Ju Y, Silverstein JC, Browne KM, Jain S, Wasserfall CH, Jorgensen ML, Spraggins JM, Patterson NH, Weber GM. Tissue registration and exploration user interfaces in support of a human reference atlas. Commun Biol 2022;5:1369. [PMID: 36513738 PMCID: PMC9747802 DOI: 10.1038/s42003-022-03644-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 06/27/2022] [Indexed: 12/14/2022] Open

Affiliation(s)

Katy Börner Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA.
Andreas Bueckle Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA.
Bruce W Herr Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Leonard E Cross Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Ellen M Quardokus Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Elizabeth G Record Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Yingnan Ju Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Jonathan C Silverstein Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
Kristen M Browne Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Sanjay Jain Department of Medicine, Department of Pathology & Immunology, Department of Pediatrics, Washington University School of Medicine, Saint Louis, MO, USA
Clive H Wasserfall Departments of Pathology and Pediatrics, University of Florida, Gainesville, FL, USA
Marda L Jorgensen Departments of Pathology and Pediatrics, University of Florida, Gainesville, FL, USA
Jeffrey M Spraggins Mass Spectrometry Research Center, Vanderbilt University, Nashville, TN, USA
N Heath Patterson Mass Spectrometry Research Center, Vanderbilt University, Nashville, TN, USA
Griffin M Weber Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

Collapse

Ciray F, Doğan T. Machine learning-based prediction of drug approvals using molecular, physicochemical, clinical trial, and patent-related features. Expert Opin Drug Discov 2022;17:1425-1441. [PMID: 36444655 DOI: 10.1080/17460441.2023.2153830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Özsarı G, Rifaioglu AS, Atakan A, Doğan T, Martin MJ, Çetin Atalay R, Atalay V. SLPred: a multi-view subcellular localization prediction tool for multi-location human proteins. Bioinformatics 2022;38:4226-4229. [PMID: 35801913 DOI: 10.1093/bioinformatics/btac458] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 06/08/2022] [Accepted: 07/07/2022] [Indexed: 12/24/2022] Open

Investigation of Genetic Causes in Patients with Congenital Heart Disease in Qatar: Findings from the Sidra Cardiac Registry. Genes (Basel) 2022;13:genes13081369. [PMID: 36011280 PMCID: PMC9407366 DOI: 10.3390/genes13081369] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 07/18/2022] [Accepted: 07/25/2022] [Indexed: 02/04/2023] Open

CoMent: relationships between biomedical concepts inferred from the scientific literature. J Mol Biol 2022;434:167568. [DOI: 10.1016/j.jmb.2022.167568] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 03/18/2022] [Accepted: 03/22/2022] [Indexed: 01/22/2023]

Herman I, Jolly A, Du H, Dawood M, Abdel-Salam GMH, Marafi D, Mitani T, Calame DG, Coban-Akdemir Z, Fatih JM, Hegazy I, Jhangiani SN, Gibbs RA, Pehlivan D, Posey JE, Lupski JR. Quantitative dissection of multilocus pathogenic variation in an Egyptian infant with severe neurodevelopmental disorder resulting from multiple molecular diagnoses. Am J Med Genet A 2022;188:735-750. [PMID: 34816580 PMCID: PMC8837671 DOI: 10.1002/ajmg.a.62565] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 10/11/2021] [Accepted: 10/18/2021] [Indexed: 12/19/2022]

Affiliation(s)

Isabella Herman Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, Texas, 77030, USA,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Texas Children's Hospital, Houston, Texas, 77030, USA
Angad Jolly Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Medical Scientist Training Program, Baylor College of Medicine, Houston, TX, 77030, USA
Haowei Du Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA
Moez Dawood Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Medical Scientist Training Program, Baylor College of Medicine, Houston, TX, 77030, USA,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, 77030, USA
Ghada M. H. Abdel-Salam Clinical Genetics Department, Human Genetics and Genome Research Division, National Research Centre, Cairo, Egypt
Dana Marafi Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Department of Pediatrics, Faculty of Medicine, Kuwait University, P.O. Box 24923, 13110 Safat, Kuwait,Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA
Tadahiro Mitani Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA
Daniel G. Calame Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, Texas, 77030, USA,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Texas Children's Hospital, Houston, Texas, 77030, USA
Zeynep Coban-Akdemir Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA
Jawid M. Fatih Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA
Ibrahim Hegazy Clinical Genetics Department, Human Genetics and Genome Research Division, National Research Centre, Cairo, Egypt
Shalini N. Jhangiani Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, 77030, USA
Richard A. Gibbs Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, 77030, USA
Davut Pehlivan Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, Texas, 77030, USA,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Texas Children's Hospital, Houston, Texas, 77030, USA
Jennifer E. Posey Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA
James R. Lupski Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA,Texas Children's Hospital, Houston, Texas, 77030, USA,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, 77030, USA,Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030

Collapse

Liu L, Mamitsuka H, Zhu S. HPODNets: deep graph convolutional networks for predicting human protein-phenotype associations. Bioinformatics 2022;38:799-808. [PMID: 34672333 DOI: 10.1093/bioinformatics/btab729] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 09/18/2021] [Accepted: 10/18/2021] [Indexed: 02/03/2023] Open

Doğan T, Akhan Güzelcan E, Baumann M, Koyas A, Atas H, Baxendale IR, Martin M, Cetin-Atalay R. Protein domain-based prediction of drug/compound-target interactions and experimental validation on LIM kinases. PLoS Comput Biol 2021;17:e1009171. [PMID: 34843456 PMCID: PMC8659301 DOI: 10.1371/journal.pcbi.1009171] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 12/09/2021] [Accepted: 11/09/2021] [Indexed: 12/23/2022] Open

Abstract

Predictive approaches such as virtual screening have been used in drug discovery with the objective of reducing developmental time and costs. Current machine learning and network-based approaches have issues related to generalization, usability, or model interpretability, especially due to the complexity of target proteins' structure/function, and bias in system training datasets. Here, we propose a new method "DRUIDom" (DRUg Interacting Domain prediction) to identify bio-interactions between drug candidate compounds and targets by utilizing the domain modularity of proteins, to overcome problems associated with current approaches. DRUIDom is composed of two methodological steps. First, ligands/compounds are statistically mapped to structural domains of their target proteins, with the aim of identifying their interactions. As such, other proteins containing the same mapped domain or domain pair become new candidate targets for the corresponding compounds. Next, a million-scale dataset of small molecule compounds, including those mapped to domains in the previous step, are clustered based on their molecular similarities, and their domain associations are propagated to other compounds within the same clusters. Experimentally verified bioactivity data points, obtained from public databases, are meticulously filtered to construct datasets of active/interacting and inactive/non-interacting drug/compound-target pairs (~2.9M data points), and used as training data for calculating parameters of compound-domain mappings, which led to 27,032 high-confidence associations between 250 domains and 8,165 compounds, and a finalized output of ~5 million new compound-protein interactions. DRUIDom is experimentally validated by syntheses and bioactivity analyses of compounds predicted to target LIM-kinase proteins, which play critical roles in the regulation of cell motility, cell cycle progression, and differentiation through actin filament dynamics. We showed that LIMK-inhibitor-2 and its derivatives significantly block the cancer cell migration through inhibition of LIMK phosphorylation and the downstream protein cofilin. One of the derivative compounds (LIMKi-2d) was identified as a promising candidate due to its action on resistant Mahlavu liver cancer cells. The results demonstrated that DRUIDom can be exploited to identify drug candidate compounds for intended targets and to predict new target proteins based on the defined compound-domain relationships. Datasets, results, and the source code of DRUIDom are fully-available at: https://github.com/cansyl/DRUIDom.

Collapse

Pourreza Shahri M, Kahanda I. Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes. BMC Bioinformatics 2021;22:500. [PMID: 34656098 PMCID: PMC8520253 DOI: 10.1186/s12859-021-04421-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 10/04/2021] [Indexed: 11/13/2022] Open

Abstract

Background

Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward.

Results

In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists.

Conclusions

This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

Collapse

Liu L, Zhu S. Computational Methods for Prediction of Human Protein-Phenotype Associations: A Review. PHENOMICS (CHAM, SWITZERLAND) 2021;1:171-185. [PMID: 36939789 PMCID: PMC9590544 DOI: 10.1007/s43657-021-00019-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 06/05/2021] [Accepted: 06/16/2021] [Indexed: 12/01/2022]

DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier. PLoS Comput Biol 2020;16:e1008453. [PMID: 33206638 PMCID: PMC7710064 DOI: 10.1371/journal.pcbi.1008453] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 12/02/2020] [Accepted: 10/20/2020] [Indexed: 12/21/2022] Open

Abstract

Predicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype–phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations. We developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over established methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene–disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno have recently been added as phenotype databases.

Gene–phenotype associations can help to understand the underlying mechanisms of many genetic diseases. However, experimental identification, often involving animal models, is time consuming and expensive. Computational methods that predict gene–phenotype associations can be used instead. We developed DeepPheno, a novel approach for predicting the phenotypes resulting from a loss of function of a single gene. We use gene functions and gene expression as information to prediction phenotypes. Our method uses a neural network classifier that is able to account for hierarchical dependencies between phenotypes. We extensively evaluate our method and compare it with related approaches, and we show that DeepPheno results in better performance in several evaluations. Furthermore, we found that many of the new predictions made by our method have been added to phenotype association databases released one year later. Overall, DeepPheno simulates some aspects of human physiology and how molecular and physiological alterations lead to abnormal phenotypes.

Collapse

Systematic identification of genetic systems associated with phenotypes in patients with rare genomic copy number variations. Hum Genet 2020;140:457-475. [PMID: 32778951 DOI: 10.1007/s00439-020-02214-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 07/30/2020] [Indexed: 01/02/2023]

Köhler S, Carmody L, Vasilevsky N, Jacobsen JOB, Danis D, Gourdine JP, Gargano M, Harris NL, Matentzoglu N, McMurry JA, Osumi-Sutherland D, Cipriani V, Balhoff JP, Conlin T, Blau H, Baynam G, Palmer R, Gratian D, Dawkins H, Segal M, Jansen AC, Muaz A, Chang WH, Bergerson J, Laulederkind SJF, Yüksel Z, Beltran S, Freeman AF, Sergouniotis PI, Durkin D, Storm AL, Hanauer M, Brudno M, Bello SM, Sincan M, Rageth K, Wheeler MT, Oegema R, Lourghi H, Della Rocca MG, Thompson R, Castellanos F, Priest J, Cunningham-Rundles C, Hegde A, Lovering RC, Hajek C, Olry A, Notarangelo L, Similuk M, Zhang XA, Gómez-Andrés D, Lochmüller H, Dollfus H, Rosenzweig S, Marwaha S, Rath A, Sullivan K, Smith C, Milner JD, Leroux D, Boerkoel CF, Klion A, Carter MC, Groza T, Smedley D, Haendel MA, Mungall C, Robinson PN. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res 2020;47:D1018-D1027. [PMID: 30476213 PMCID: PMC6324074 DOI: 10.1093/nar/gky1105] [Citation(s) in RCA: 412] [Impact Index Per Article: 103.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 10/24/2018] [Indexed: 12/12/2022] Open

Affiliation(s)

Sebastian Köhler Charité Centrum für Therapieforschung, Charité-Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin 10117, Germany.,Einstein Center Digital Future, Berlin 10117, Germany.,Monarch Initiative, monarchinitiative.org
Leigh Carmody Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Nicole Vasilevsky Monarch Initiative, monarchinitiative.org.,Oregon Health & Science University, Portland, OR 97217, USA
Julius O B Jacobsen Monarch Initiative, monarchinitiative.org.,Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
Daniel Danis Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Jean-Philippe Gourdine Monarch Initiative, monarchinitiative.org.,Oregon Health & Science University, Portland, OR 97217, USA
Michael Gargano Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Nomi L Harris Monarch Initiative, monarchinitiative.org.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Nicolas Matentzoglu Monarch Initiative, monarchinitiative.org.,European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
Julie A McMurry Monarch Initiative, monarchinitiative.org.,Linus Pauling institute, Oregon State University, Corvallis, OR, USA
David Osumi-Sutherland Monarch Initiative, monarchinitiative.org.,European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
Valentina Cipriani Monarch Initiative, monarchinitiative.org.,William Harvey Research Institute, Queen Mary University College of London.,UCL Genetics Institute, University College of London.,UCL Institute of Ophthalmology, University College of London
James P Balhoff Monarch Initiative, monarchinitiative.org.,Renaissance Computing Institute, University of North Carolina at Chapel Hill
Tom Conlin Monarch Initiative, monarchinitiative.org.,Linus Pauling institute, Oregon State University, Corvallis, OR, USA
Hannah Blau Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Gareth Baynam Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, Department of Health, Government of Western Australia, WA, Australia.,School of Paediatrics and Telethon Kids Institute, University of Western Australia, Perth, WA, Australia.,Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA, Australia.,Spatial Sciences, Department of Science and Engineering, Curtin University, Perth, WA, Australia.,The Office of Population Health Genomics, Department of Health, Government of Western Australia, Perth, WA, Australia
Richard Palmer Spatial Sciences, Department of Science and Engineering, Curtin University, Perth, WA, Australia
Dylan Gratian Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, Department of Health, Government of Western Australia, WA, Australia
Hugh Dawkins The Office of Population Health Genomics, Department of Health, Government of Western Australia, Perth, WA, Australia
Michael Segal SimulConsult, Chestnut Hill, MA, USA
Anna C Jansen Neurogenetics Research Group, Vrije Universiteit Brussel, Brussels, Belgium.,Pediatric Neurology Unit, Department of Pediatrics, UZ Brussel, Brussels, Belgium
Ahmed Muaz Monarch Initiative, monarchinitiative.org.,Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
Willie H Chang Centre for Computational Medicine, Hospital for Sick Children and Department of Computer Science, University of Toronto, Toronto, Canada
Jenna Bergerson National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Stanley J F Laulederkind Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin & Marquette University, 8701 Watertown Plank Road Milwaukee, WI 53226, USA
Zafer Yüksel Bioscientia GmbH, Ingelheim, Germany
Sergi Beltran CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona 08028, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
Alexandra F Freeman National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Panagiotis I Sergouniotis University of Manchester & Manchester Royal Eye Hospital, Manchester, UK
Daniel Durkin The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Andrea L Storm ICF, Rockville, MD, USA.,National Center for Advancing Translational Sciences, Office of Rare Diseases Research, National Institutes of Health, Bethesda, MD, USA
Marc Hanauer INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
Michael Brudno Centre for Computational Medicine, Hospital for Sick Children and Department of Computer Science, University of Toronto, Toronto, Canada
Susan M Bello The Jackson Laboratory, Bar Harbor, ME, USA
Murat Sincan Sanford Imagenetics, Sanford Health, Sioux Falls, SD, USA
Kayli Rageth Sanford Imagenetics, Sanford Health, Sioux Falls, SD, USA
Matthew T Wheeler Center for Undiagnosed Diseases, Stanford University School of Medicine, Stanford, CA, USA
Renske Oegema Department of Genetics, University Medical Center Utrecht, the Netherlands
Halima Lourghi INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
Maria G Della Rocca ICF, Rockville, MD, USA.,National Center for Advancing Translational Sciences, Office of Rare Diseases Research, National Institutes of Health, Bethesda, MD, USA
Rachel Thompson Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK
Francisco Castellanos The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
James Priest Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
Charlotte Cunningham-Rundles Mount Sinai School of Medicine, New York, NY, USA
Ayushi Hegde The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Ruth C Lovering Institute of Cardiovascular Science, University College London, UK
Catherine Hajek Sanford Imagenetics, Sanford Health, Sioux Falls, SD, USA
Annie Olry INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
Luigi Notarangelo National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Morgan Similuk National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Xingmin A Zhang Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
David Gómez-Andrés Child Neurology Unit. Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute (VHIR), Barcelona, Spain
Hanns Lochmüller CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona 08028, Spain.,Department of Neuropediatrics and Muscle Disorders, Medical Center-University of Freiburg, Faculty of Medicine, Freiburg, Germany.,Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada.,Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada
Hélène Dollfus Centre for Rare Eye Diseases CARGO, SENSGENE FSMR Network, Strasbourg University Hospital, Strasbourg, France
Sergio Rosenzweig Immunology Service, Department of Laboratory Medicine, NIH Clinical Center, Bethesda, MD, USA
Shruti Marwaha Center for Undiagnosed Diseases, Stanford University School of Medicine, Stanford, CA, USA
Ana Rath INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
Kathleen Sullivan Department of Pediatrics, Division of Allergy Immunology, The Children's Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, 3615 Civic Center Boulevard, Philadelphia, PA 19104, USA
Cynthia Smith The Jackson Laboratory, Bar Harbor, ME, USA
Joshua D Milner National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Dorothée Leroux Centre for Rare Eye Diseases CARGO, SENSGENE FSMR Network, Strasbourg University Hospital, Strasbourg, France
Cornelius F Boerkoel Sanford Imagenetics, Sanford Health, Sioux Falls, SD, USA
Amy Klion National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Melody C Carter National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Tudor Groza Monarch Initiative, monarchinitiative.org.,Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
Damian Smedley Monarch Initiative, monarchinitiative.org.,Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
Melissa A Haendel Monarch Initiative, monarchinitiative.org.,Oregon Health & Science University, Portland, OR 97217, USA.,Linus Pauling institute, Oregon State University, Corvallis, OR, USA
Chris Mungall Monarch Initiative, monarchinitiative.org.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Peter N Robinson Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.,Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA

Collapse

Liu L, Huang X, Mamitsuka H, Zhu S. HPOLabeler: improving prediction of human protein–phenotype associations by learning to rank. Bioinformatics 2020;36:4180-4188. [DOI: 10.1093/bioinformatics/btaa284] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 04/05/2020] [Accepted: 04/30/2020] [Indexed: 12/23/2022] Open

Abstract Abstract Motivation Annotating human proteins by abnormal phenotypes has become an important topic. Human Phenotype Ontology (HPO) is a standardized vocabulary of phenotypic abnormalities encountered in human diseases. As of November 2019, only <4000 proteins have been annotated with HPO. Thus, a computational approach for accurately predicting protein–HPO associations would be important, whereas no methods have outperformed a simple Naive approach in the second Critical Assessment of Functional Annotation, 2013–2014 (CAFA2). Results We present HPOLabeler, which is able to use a wide variety of evidence, such as protein–protein interaction (PPI) networks, Gene Ontology, InterPro, trigram frequency and HPO term frequency, in the framework of learning to rank (LTR). LTR has been proved to be powerful for solving large-scale, multi-label ranking problems in bioinformatics. Given an input protein, LTR outputs the ranked list of HPO terms from a series of input scores given to the candidate HPO terms by component learning models (logistic regression, nearest neighbor and a Naive method), which are trained from given multiple evidence. We empirically evaluate HPOLabeler extensively through mainly two experiments of cross validation and temporal validation, for which HPOLabeler significantly outperformed all component models and competing methods including the current state-of-the-art method. We further found that (i) PPI is most informative for prediction among diverse data sources and (ii) low prediction performance of temporal validation might be caused by incomplete annotation of new proteins. Availability and implementation http://issubmission.sjtu.edu.cn/hpolabeler/. Contact zhusf@fudan.edu.cn Supplementary information Supplementary data are available at Bioinformatics online. Collapse

Shen F, Peng S, Fan Y, Wen A, Liu S, Wang Y, Wang L, Liu H. HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology. J Biomed Inform 2019;96:103246. [PMID: 31255713 DOI: 10.1016/j.jbi.2019.103246] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 06/25/2019] [Accepted: 06/26/2019] [Indexed: 11/25/2022]

Abstract

BACKGROUND

In precision medicine, deep phenotyping is defined as the precise and comprehensive analysis of phenotypic abnormalities, aiming to acquire a better understanding of the natural history of a disease and its genotype-phenotype associations. Detecting phenotypic relevance is an important task when translating precision medicine into clinical practice, especially for patient stratification tasks based on deep phenotyping. In our previous work, we developed node embeddings for the Human Phenotype Ontology (HPO) to assist in phenotypic relevance measurement incorporating distributed semantic representations. However, the derived HPO embeddings hold only distributed representations for IS-A relationships among nodes, hampering the ability to fully explore the graph.

METHODS

In this study, we developed a framework, HPO2Vec+, to enrich the produced HPO embeddings with heterogeneous knowledge resources (i.e., DECIPHER, OMIM, and Orphanet) for detecting phenotypic relevance. Specifically, we parsed disease-phenotype associations contained in these three resources to enrich non-inheritance relationships among phenotypic nodes in the HPO. To generate node embeddings for the HPO, node2vec was applied to perform node sampling on the enriched HPO graphs based on random walk followed by feature learning over the sampled nodes to generate enriched node embeddings. Four HPO embeddings were generated based on different graph structures, which we hereafter label as HPOEmb-Original, HPOEmb-DECIPHER, HPOEmb-OMIM, and HPOEmb-Orphanet. We evaluated the derived embeddings quantitatively through an HPO link prediction task with four edge embeddings operations and six machine learning algorithms. The resulting best embeddings were then evaluated for patient stratification of 10 rare diseases using electronic health records (EHR) collected at Mayo Clinic. We assessed our framework qualitatively by visualizing phenotypic clusters and conducting a use case study on primary hyperoxaluria (PH), a rare disease, on the task of inferring relevant phenotypes given 22 annotated PH related phenotypes.

RESULTS

The quantitative link prediction task shows that HPOEmb-Orphanet achieved an optimal AUROC of 0.92 and an average precision of 0.94. In addition, HPOEmb-Orphanet achieved an optimal F1 score of 0.86. The quantitative patient similarity measurement task indicates that HPOEmb-Orphanet achieved the highest average detection rate for similar patients over 10 rare diseases and performed better than other similarity measures implemented by an existing tool, HPOSim, especially for pairwise patients with fewer shared common phenotypes. The qualitative evaluation shows that the enriched HPO embeddings are generally able to detect relationships among nodes with fine granularity and HPOEmb-Orphanet is particularly good at associating phenotypes across different disease systems. For the use case of detecting relevant phenotypic characterizations for given PH related phenotypes, HPOEmb-Orphanet outperformed the other three HPO embeddings by achieving the highest average P@5 of 0.81 and the highest P@10 of 0.79. Compared to seven conventional similarity measurements provided by HPOSim, HPOEmb-Orphanet is able to detect more relevant phenotypic pairs, especially for pairs not in inheritance relationships.

CONCLUSION

We drew the following conclusions based on the evaluation results. First, with additional non-inheritance edges, enriched HPO embeddings can detect more associations between fine granularity phenotypic nodes regardless of their topological structures in the HPO graph. Second, HPOEmb-Orphanet not only can achieve the optimal performance through link prediction and patient stratification based on phenotypic similarity, but is also able to detect relevant phenotypes closer to domain expert's judgments than other embeddings and conventional similarity measurements. Third, incorporating heterogeneous knowledge resources do not necessarily result in better performance for detecting relevant phenotypes. From a clinical perspective, in our use case study, clinical-oriented knowledge resources (e.g., Orphanet) can achieve better performance in detecting relevant phenotypic characterizations compared to biomedical-oriented knowledge resources (e.g., DECIPHER and OMIM).

Collapse