Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jain A, Kihara D. Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences. Bioinformatics 2019;35:753-759. [PMID: 30165572 DOI: 10.1093/bioinformatics/bty704] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 07/30/2018] [Accepted: 08/23/2018] [Indexed: 02/03/2023] Open

For:	Jain A, Kihara D. Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences. Bioinformatics 2019;35:753-759. [PMID: 30165572 DOI: 10.1093/bioinformatics/bty704] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 07/30/2018] [Accepted: 08/23/2018] [Indexed: 02/03/2023] Open

Number

Cited by Other Article(s)

Panda P, Giri SJ, Sherman LA, Kihara D, Aryal UK. Proteomic changes orchestrate metabolic acclimation of a unicellular diazotrophic cyanobacterium during light-dark cycle and nitrogen fixation states. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.30.605809. [PMID: 39131303 PMCID: PMC11312527 DOI: 10.1101/2024.07.30.605809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]

Panda P, Giri SJ, Sherman L, Kihara D, Aryal UK. Proteomic analysis of unicellular cyanobacterium Crocosphaera subtropica ATCC 51142 under extended light or dark growth. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.29.605499. [PMID: 39131394 PMCID: PMC11312443 DOI: 10.1101/2024.07.29.605499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]

Ulusoy E, Doğan T. Mutual annotation-based prediction of protein domain functions with Domain2GO. Protein Sci 2024;33:e4988. [PMID: 38757367 PMCID: PMC11099699 DOI: 10.1002/pro.4988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/25/2024] [Accepted: 03/30/2024] [Indexed: 05/18/2024]

Abstract

Identifying unknown functional properties of proteins is essential for understanding their roles in both health and disease states. The domain composition of a protein can reveal critical information in this context, as domains are structural and functional units that dictate how the protein should act at the molecular level. The expensive and time-consuming nature of wet-lab experimental approaches prompted researchers to develop computational strategies for predicting the functions of proteins. In this study, we proposed a new method called Domain2GO that infers associations between protein domains and function-defining gene ontology (GO) terms, thus redefining the problem as domain function prediction. Domain2GO uses documented protein-level GO annotations together with proteins' domain annotations. Co-annotation patterns of domains and GO terms in the same proteins are examined using statistical resampling to obtain reliable associations. As a use-case study, we evaluated the biological relevance of examples selected from the Domain2GO-generated domain-GO term mappings via literature review. Then, we applied Domain2GO to predict unknown protein functions by propagating domain-associated GO terms to proteins annotated with these domains. For function prediction performance evaluation and comparison against other methods, we employed Critical Assessment of Function Annotation 3 (CAFA3) challenge datasets. The results demonstrated the high potential of Domain2GO, particularly for predicting molecular function and biological process terms, along with advantages such as producing interpretable results and having an exceptionally low computational cost. The approach presented here can be extended to other ontologies and biological entities to investigate unknown relationships in complex and large-scale biological data. The source code, datasets, results, and user instructions for Domain2GO are available at https://github.com/HUBioDataLab/Domain2GO. Additionally, we offer a user-friendly online tool at https://huggingface.co/spaces/HUBioDataLab/Domain2GO, which simplifies the prediction of functions of previously unannotated proteins solely using amino acid sequences.

Collapse

Giri SJ, Ibtehaz N, Kihara D. GO2Sum: generating human-readable functional summary of proteins from GO terms. NPJ Syst Biol Appl 2024;10:29. [PMID: 38491038 PMCID: PMC10943200 DOI: 10.1038/s41540-024-00358-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/05/2024] [Indexed: 03/18/2024] Open

Bukhman YV, Morin PA, Meyer S, Chu LF, Jacobsen JK, Antosiewicz-Bourget J, Mamott D, Gonzales M, Argus C, Bolin J, Berres ME, Fedrigo O, Steill J, Swanson SA, Jiang P, Rhie A, Formenti G, Phillippy AM, Harris RS, Wood JMD, Howe K, Kirilenko BM, Munegowda C, Hiller M, Jain A, Kihara D, Johnston JS, Ionkov A, Raja K, Toh H, Lang A, Wolf M, Jarvis ED, Thomson JA, Chaisson MJP, Stewart R. A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography. Mol Biol Evol 2024;41:msae036. [PMID: 38376487 PMCID: PMC10919930 DOI: 10.1093/molbev/msae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 01/11/2024] [Accepted: 01/22/2024] [Indexed: 02/21/2024] Open

Affiliation(s)

Yury V Bukhman Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
Phillip A Morin Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration (NOAA), La Jolla, CA 92037, USA
Susanne Meyer Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
Li-Fang Chu Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA Department of Comparative Biology and Experimental Medicine, University of Calgary, Calgary, Canada
Jeff K Jacobsen V.E. Enterprises, Arcata, CA, USA
Jessica Antosiewicz-Bourget Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
Daniel Mamott Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
Maylie Gonzales Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
Cara Argus Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
Jennifer Bolin Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
Mark E Berres University of Wisconsin Biotechnology Center, Bioinformatics Resource Center, University of Wisconsin - Madison, Madison, WI 53706, USA
Olivier Fedrigo Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
John Steill Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
Scott A Swanson Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
Peng Jiang Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH, USA Department of Biological, Geological and Environmental Sciences, Cleveland State University, Cleveland, OH, USA Center for RNA Science and Therapeutics, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
Arang Rhie Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD 20892, USA
Giulio Formenti Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, New York, NY 10065, USA
Adam M Phillippy Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD 20892, USA
Robert S Harris Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
Jonathan M D Wood Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
Kerstin Howe Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
Bogdan M Kirilenko LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany Senckenberg Research Institute, 60325 Frankfurt, Germany Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
Chetan Munegowda LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany Senckenberg Research Institute, 60325 Frankfurt, Germany Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
Michael Hiller LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany Senckenberg Research Institute, 60325 Frankfurt, Germany Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
Aashish Jain Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
Daisuke Kihara Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
J Spencer Johnston Department of Entomology, Texas A&M University, College Station, TX 77843, USA
Alexander Ionkov Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
Kalpana Raja Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
Huishi Toh Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
Aimee Lang Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration (NOAA), La Jolla, CA 92037, USA
Magnus Wolf Institute for Evolution and Biodiversity (IEB), University of Muenster, 48149, Muenster, Germany Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
Erich D Jarvis Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, New York, NY 10065, USA
James A Thomson Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA Department of Molecular, Cellular and Developmental Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI 53726, USA
Mark J P Chaisson Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA
Ron Stewart Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA

Collapse

Bukhman YV, Meyer S, Chu LF, Abueg L, Antosiewicz-Bourget J, Balacco J, Brecht M, Dinatale E, Fedrigo O, Formenti G, Fungtammasan A, Giri SJ, Hiller M, Howe K, Kihara D, Mamott D, Mountcastle J, Pelan S, Rabbani K, Sims Y, Tracey A, Wood JMD, Jarvis ED, Thomson JA, Chaisson MJP, Stewart R. Chromosome level genome assembly of the Etruscan shrew Suncus etruscus. Sci Data 2024;11:176. [PMID: 38326333 PMCID: PMC10850158 DOI: 10.1038/s41597-024-03011-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/26/2024] [Indexed: 02/09/2024] Open

Affiliation(s)

Yury V Bukhman Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA.
Susanne Meyer Neuroscience Research Institute, University of California - Santa Barbara, 494 UCEN Rd, Isla Vista, CA, 93117, USA
Li-Fang Chu Department of Comparative Biology and Experimental Medicine, University of Calgary, 2500 University Drive NW, Calgary, Alberta, T2N 1N4, Canada
Linelle Abueg Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
Jessica Antosiewicz-Bourget Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
Jennifer Balacco Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
Michael Brecht BCCN/Humboldt University Berlin, Philippstr, 13 House 6, 10115, Berlin, Germany
Erica Dinatale Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, 72076, Tübingen, Germany
Olivier Fedrigo Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
Giulio Formenti Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA
Arkarachai Fungtammasan DNAnexus Inc., 1975 W El Camino Real, Mountain View, CA, 94040, USA
Swagarika Jaharlal Giri Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA
Michael Hiller LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt, Germany
Kerstin Howe Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Daisuke Kihara Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA Department of Biological Sciences, Purdue University, 249 S. Martin Jischke Dr., West Lafayette, IN, 47907, USA
Daniel Mamott Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
Jacquelyn Mountcastle Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
Sarah Pelan Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Keon Rabbani Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA
Ying Sims Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Alan Tracey Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Jonathan M D Wood Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Erich D Jarvis Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA
James A Thomson Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA Department of Molecular, Cellular and Developmental Biology, University of California Santa Barbara, Santa Barbara, CA, 93106, USA Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI, 53726, USA
Mark J P Chaisson Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA
Ron Stewart Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA

Collapse

Giri SJ, Ibtehaz N, Kihara D. GO2Sum: Generating Human Readable Functional Summary of Proteins from GO Terms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566665. [PMID: 38014080 PMCID: PMC10680659 DOI: 10.1101/2023.11.10.566665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]

Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP allows protein function prediction using function-aware domain embedding representations. Commun Biol 2023;6:1103. [PMID: 37907681 PMCID: PMC10618451 DOI: 10.1038/s42003-023-05476-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/17/2023] [Indexed: 11/02/2023] Open

Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554486. [PMID: 37662252 PMCID: PMC10473699 DOI: 10.1101/2023.08.23.554486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]

Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open

Song J, Sun J, Wang Y, Ding Y, Zhang S, Ma X, Chang F, Fan B, Liu H, Bao C, Meng W. CeRNA network identified hsa-miR-17-5p, hsa-miR-106a-5p and hsa-miR-2355-5p as potential diagnostic biomarkers for tuberculosis. Medicine (Baltimore) 2023;102:e33117. [PMID: 36930090 PMCID: PMC10019109 DOI: 10.1097/md.0000000000033117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 02/08/2023] [Indexed: 03/18/2023] Open

Abstract

This study aims to analyze the regulatory non-coding RNAs in the pathological process of tuberculosis (TB), and identify novel diagnostic biomarkers. A longitudinal study was conducted in 5 newly diagnosed pulmonary tuberculosis patients, peripheral blood samples were collected before and after anti-TB treatment for 6 months, separately. After whole transcriptome sequencing, the differentially expressed RNAs (DE RNAs) were filtrated with |log2 (fold change) | > log2(1.5) and P value < .05 as screening criteria. Then functional annotation was actualized by gene ontology enrichment analysis, and enrichment pathway analysis was conducted by Kyoto Encyclopedia of Genes and Genomes database. And finally, the competitive endogenous RNA (ceRNA) regulatory network was established according to the interaction of ceRNA pairs and miRNA-mRNA pairs. Five young women were recruited and completed this study. Based on the differential expression analysis, a total of 1469 mRNAs, 996 long non-coding RNAs, 468 circular RNAs, and 86 miRNAs were filtrated as DE RNAs. Functional annotation demonstrated that those DE-mRNAs were strongly involved in the cellular process (n = 624), metabolic process (n = 513), single-organism process (n = 505), cell (n = 651), cell part (n = 650), organelle (n = 569), and binding (n = 629). Enrichment pathway analysis revealed that the differentially expressed genes were mainly enriched in HTLV-l infection, T cell receptor signaling pathway, glycosaminoglycan biosynthesis-heparan sulfate/heparin, and Hippo signaling pathway. CeRNA networks revealed that hsa-miR-17-5p, hsa-miR-106a-5p and hsa-miR-2355-5p might be regarded as potential diagnostic biomarkers for TB. Immunomodulation-related genes are differentially expressed in TB patients, and hsa-miR-106a-5p, hsa-miR-17-5p, hsa-miR-2355-5p might serve as potential diagnostic biomarkers.

Collapse

Machine learning for the identification of respiratory viral attachment machinery from sequences data. PLoS One 2023;18:e0281642. [PMID: 36862685 PMCID: PMC9980812 DOI: 10.1371/journal.pone.0281642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 01/27/2023] [Indexed: 03/03/2023] Open

Yan TC, Yue ZX, Xu HQ, Liu YH, Hong YF, Chen GX, Tao L, Xie T. A systematic review of state-of-the-art strategies for machine learning-based protein function prediction. Comput Biol Med 2023;154:106446. [PMID: 36680931 DOI: 10.1016/j.compbiomed.2022.106446] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/07/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]

Efremenko E, Aslanli A, Lyagin I. Advanced Situation with Recombinant Toxins: Diversity, Production and Application Purposes. Int J Mol Sci 2023;24:ijms24054630. [PMID: 36902061 PMCID: PMC10003545 DOI: 10.3390/ijms24054630] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 02/14/2023] [Accepted: 02/20/2023] [Indexed: 03/04/2023] Open

Suleman MT, Khan YD. m1A-pred: Prediction of Modified 1-methyladenosine Sites in RNA Sequences through Artificial Intelligence. Comb Chem High Throughput Screen 2022;25:2473-2484. [PMID: 35718969 DOI: 10.2174/1386207325666220617152743] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 04/06/2022] [Accepted: 04/11/2022] [Indexed: 01/27/2023]

Toh H, Yang C, Formenti G, Raja K, Yan L, Tracey A, Chow W, Howe K, Bergeron LA, Zhang G, Haase B, Mountcastle J, Fedrigo O, Fogg J, Kirilenko B, Munegowda C, Hiller M, Jain A, Kihara D, Rhie A, Phillippy AM, Swanson SA, Jiang P, Clegg DO, Jarvis ED, Thomson JA, Stewart R, Chaisson MJP, Bukhman YV. A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes. BMC Biol 2022;20:245. [DOI: 10.1186/s12915-022-01427-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 09/29/2022] [Indexed: 11/09/2022] Open

Suleman MT, Alkhalifah T, Alturise F, Khan YD. DHU-Pred: accurate prediction of dihydrouridine sites using position and composition variant features on diverse classifiers. PeerJ 2022;10:e14104. [PMID: 36320563 PMCID: PMC9618264 DOI: 10.7717/peerj.14104] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 09/01/2022] [Indexed: 01/21/2023] Open

Fenoy E, Edera AA, Stegmayer G. Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks. Brief Bioinform 2022;23:6618242. [PMID: 35758229 DOI: 10.1093/bib/bbac232] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 05/13/2022] [Accepted: 05/18/2022] [Indexed: 11/13/2022] Open

Abstract

A representation method is an algorithm that calculates numerical feature vectors for samples in a dataset. Such vectors, also known as embeddings, define a relatively low-dimensional space able to efficiently encode high-dimensional data. Very recently, many types of learned data representations based on machine learning have appeared and are being applied to several tasks in bioinformatics. In particular, protein representation learning methods integrate different types of protein information (sequence, domains, etc.), in supervised or unsupervised learning approaches, and provide embeddings of protein sequences that can be used for downstream tasks. One task that is of special interest is the automatic function prediction of the huge number of novel proteins that are being discovered nowadays and are still totally uncharacterized. However, despite its importance, up to date there is not a fair benchmark study of the predictive performance of existing proposals on the same large set of proteins and for very concrete and common bioinformatics tasks. Therefore, this lack of benchmark studies prevent the community from using adequate predictive methods for accelerating the functional characterization of proteins. In this study, we performed a detailed comparison of protein sequence representation learning methods, explaining each approach and comparing them with an experimental benchmark on several bioinformatics tasks: (i) determining protein sequence similarity in the embedding space; (ii) inferring protein domains and (iii) predicting ontology-based protein functions. We examine the advantages and disadvantages of each representation approach over the benchmark results. We hope the results and the discussion of this study can help the community to select the most adequate machine learning-based technique for protein representation according to the bioinformatics task at hand.

Collapse

Kagaya Y, Flannery ST, Jain A, Kihara D. ContactPFP: Protein Function Prediction Using Predicted Contact Information. FRONTIERS IN BIOINFORMATICS 2022;2. [PMID: 35875419 PMCID: PMC9302406 DOI: 10.3389/fbinf.2022.896295] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract Computational function prediction is one of the most important problems in bioinformatics as elucidating the function of genes is a central task in molecular biology and genomics. Most of the existing function prediction methods use protein sequences as the primary source of input information because the sequence is the most available information for query proteins. There are attempts to consider other attributes of query proteins. Among these attributes, the three-dimensional (3D) structure of proteins is known to be very useful in identifying the evolutionary relationship of proteins, from which functional similarity can be inferred. Here, we report a novel protein function prediction method, ContactPFP, which uses predicted residue-residue contact maps as input structural features of query proteins. Although 3D structure information is known to be useful, it has not been routinely used in function prediction because the 3D structure is not experimentally determined for many proteins. In ContactPFP, we overcome this limitation by using residue-residue contact prediction, which has become increasingly accurate due to rapid development in the protein structure prediction field. ContactPFP takes a query protein sequence as input and uses predicted residue-residue contact as a proxy for the 3D protein structure. To characterize how predicted contacts contribute to function prediction accuracy, we compared the performance of ContactPFP with several well-established sequence-based function prediction methods. The comparative study revealed the advantages and weaknesses of ContactPFP compared to contemporary sequence-based methods. There were many cases where it showed higher prediction accuracy. We examined factors that affected the accuracy of ContactPFP using several illustrative cases that highlight the strength of our method. Collapse

Reijnders MJMF, Waterhouse RM. CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation. PLoS Comput Biol 2022;18:e1010075. [PMID: 35560159 PMCID: PMC9132264 DOI: 10.1371/journal.pcbi.1010075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 05/25/2022] [Accepted: 04/04/2022] [Indexed: 11/29/2022] Open

Abstract

Characterising gene function for the ever-increasing number and diversity of species with annotated genomes relies almost entirely on computational prediction methods. These software are also numerous and diverse, each with different strengths and weaknesses as revealed through community benchmarking efforts. Meta-predictors that assess consensus and conflict from individual algorithms should deliver enhanced functional annotations. To exploit the benefits of meta-approaches, we developed CrowdGO, an open-source consensus-based Gene Ontology (GO) term meta-predictor that employs machine learning models with GO term semantic similarities and information contents. By re-evaluating each gene-term annotation, a consensus dataset is produced with high-scoring confident annotations and low-scoring rejected annotations. Applying CrowdGO to results from a deep learning-based, a sequence similarity-based, and two protein domain-based methods, delivers consensus annotations with improved precision and recall. Furthermore, using standard evaluation measures CrowdGO performance matches that of the community’s best performing individual methods. CrowdGO therefore offers a model-informed approach to leverage strengths of individual predictors and produce comprehensive and accurate gene functional annotations.

New technologies mean that we are able to read the genetic blueprints in the form of complete genome sequences from many different species. We are also able to use computational methods combined with evidence from experiments to map out the locations in the genomes of many thousands of genes and other important regions. However, discovering and characterising the biological functions of all these genes and their protein products requires considerably more experimental work. In order to gain insights into the possible functions of the many genes currently lacking functional information from experiments we must therefore rely on methods that computationally predict protein functions. Many different software tools have been developed to tackle this challenge, each with their own strengths and weaknesses as shown by several community-based competitions that assess the performance of the predictors. Taking advantage of powerful modern machine learning techniques, we developed CrowdGO, a new software that aims to combine predictions from several tools and produce comprehensive and accurate gene functional annotations. CrowdGO is able to computationally assess agreements and conflicts amongst annotations from different predictors to then re-evaluate the results and deliver enhanced predictions of protein functions.

Collapse

Learning functional properties of proteins with language models. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00457-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Yousafi Q, Sarfaraz A, Saad Khan M, Saleem S, Shahzad U, Abbas Khan A, Sadiq M, Ditta Abid A, Sohail Shahzad M, ul Hassan N. In silico annotation of unreviewed acetylcholinesterase (AChE) in some lepidopteran insect pest species reveals the causes of insecticide resistance. Saudi J Biol Sci 2021;28:2197-2209. [PMID: 33911936 PMCID: PMC8071828 DOI: 10.1016/j.sjbs.2021.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 02/07/2023] Open

Makrodimitris S, van Ham RCHJ, Reinders MJT. Automatic Gene Function Prediction in the 2020's. Genes (Basel) 2020;11:E1264. [PMID: 33120976 PMCID: PMC7692357 DOI: 10.3390/genes11111264] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 02/06/2023] Open

de Witt RN, Kroukamp H, Van Zyl WH, Paulsen IT, Volschenk H. QTL analysis of natural Saccharomyces cerevisiae isolates reveals unique alleles involved in lignocellulosic inhibitor tolerance. FEMS Yeast Res 2020;19:5528620. [PMID: 31276593 DOI: 10.1093/femsyr/foz047] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 07/03/2019] [Indexed: 12/13/2022] Open

You R, Yao S, Xiong Y, Huang X, Sun F, Mamitsuka H, Zhu S. NetGO: improving large-scale protein function prediction with massive network information. Nucleic Acids Res 2020;47:W379-W387. [PMID: 31106361 PMCID: PMC6602452 DOI: 10.1093/nar/gkz388] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/24/2019] [Accepted: 05/01/2019] [Indexed: 01/19/2023] Open

Affiliation(s)

Ronghui You School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China
Shuwei Yao School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China
Yi Xiong Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University
Xiaodi Huang School of Computing and Mathematics, Charles Sturt University, Albury, NSW 2640, Australia
Fengzhu Sun Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China.,Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
Hiroshi Mamitsuka Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan.,Department of Computer Science, Aalto University, Espoo and Helsinki, Finland
Shanfeng Zhu School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China

Collapse

NNTox: Gene Ontology-Based Protein Toxicity Prediction Using Neural Network. Sci Rep 2019;9:17923. [PMID: 31784686 PMCID: PMC6884647 DOI: 10.1038/s41598-019-54405-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 11/13/2019] [Indexed: 12/23/2022] Open

Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform 2019;21:1437-1447. [PMID: 31504150 PMCID: PMC7412958 DOI: 10.1093/bib/bbz081] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 05/27/2019] [Accepted: 06/10/2019] [Indexed: 11/12/2022] Open