Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hawkins T, Chitale M, Luban S, Kihara D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 2009;74:566-82. [PMID: 18655063 DOI: 10.1002/prot.22172] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

For:	Hawkins T, Chitale M, Luban S, Kihara D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 2009;74:566-82. [PMID: 18655063 DOI: 10.1002/prot.22172] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Oladipo EK, Ojo TO, Elegbeleye OE, Bolaji OQ, Oyewole MP, Ogunlana AT, Olalekan EO, Abiodun B, Adediran DA, Obideyi OA, Olufemi SE, Salamatullah AM, Bourhia M, Younous YA, Adelusi TI. Exploring the nuclear proteins, viral capsid protein, and early antigen protein using immunoinformatic and molecular modeling approaches to design a vaccine candidate against Epstein Barr virus. Sci Rep 2024;14:16798. [PMID: 39039173 PMCID: PMC11263613 DOI: 10.1038/s41598-024-66828-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 07/04/2024] [Indexed: 07/24/2024] Open

Affiliation(s)

Elijah Kolawole Oladipo Division of Vaccine Design and Development, Helix Biogen Institute, Ogbomoso, 210214, Nigeria Department of Microbiology, Laboratory of Molecular Biology, Immunology and Bioinformatics, Adeleke University, Ede, 232104, Nigeria
Taiwo Ooreoluwa Ojo Division of Vaccine Design and Development, Helix Biogen Institute, Ogbomoso, 210214, Nigeria Computational Biology and Drug Discovery Laboratory, Department of Biochemistry, Ladoke Akintola University of Technology, (LAUTECH), Ogbomoso, 210214, Nigeria
Oluwabamise Emmanuel Elegbeleye Computational Biology and Drug Discovery Laboratory, Department of Biochemistry, Ladoke Akintola University of Technology, (LAUTECH), Ogbomoso, 210214, Nigeria
Olawale Quadri Bolaji Computational Biology and Drug Discovery Laboratory, Department of Biochemistry, Ladoke Akintola University of Technology, (LAUTECH), Ogbomoso, 210214, Nigeria
Moyosoluwa Precious Oyewole Division of Vaccine Design and Development, Helix Biogen Institute, Ogbomoso, 210214, Nigeria Department of Biochemistry, Bowen University, Iwo, 232101, Nigeria
Abdeen Tunde Ogunlana Institute of Advanced Medical Research and Training (IAMRAT), College of Medicine, University of Ibadan, Ibadan, 200005, Nigeria
Emmanuel Obanijesu Olalekan Computational Biology and Drug Discovery Laboratory, Department of Biochemistry, Ladoke Akintola University of Technology, (LAUTECH), Ogbomoso, 210214, Nigeria
Bamidele Abiodun Computational Biology and Drug Discovery Laboratory, Department of Biochemistry, Ladoke Akintola University of Technology, (LAUTECH), Ogbomoso, 210214, Nigeria
Daniel Adewole Adediran Division of Vaccine Design and Development, Helix Biogen Institute, Ogbomoso, 210214, Nigeria
Oluwaseun Adeola Obideyi Division of Vaccine Design and Development, Helix Biogen Institute, Ogbomoso, 210214, Nigeria
Seun Elijah Olufemi Division of Vaccine Design and Development, Helix Biogen Institute, Ogbomoso, 210214, Nigeria
Ahmad Mohammad Salamatullah Department of Food Science and Nutrition, College of Food and Agricultural Sciences, King Saud University, 11, P.O. Box 2460, 11451, Riyadh, Saudi Arabia
Mohammed Bourhia Laboratory of Therapeutic and Organic Chemistry, Faculty of Pharmacy, University of Montpellier, Montpellier, 34000, France
Youssouf Ali Younous Evangelical College, BP 1200, N'Djamena, Chad.
Temitope Isaac Adelusi Computational Biology and Drug Discovery Laboratory, Department of Biochemistry, Ladoke Akintola University of Technology, (LAUTECH), Ogbomoso, 210214, Nigeria. Department of Surgery, School of Medicine, University of Connecticut Health, Farmington Ave, Farmington, CT, 06030, USA.

Collapse

Zhang C, Freddolino L. A large-scale assessment of sequence database search tools for homology-based protein function prediction. Brief Bioinform 2024;25:bbae349. [PMID: 39038936 PMCID: PMC11262835 DOI: 10.1093/bib/bbae349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/03/2024] [Accepted: 07/05/2024] [Indexed: 07/24/2024] Open

Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024;124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]

Giri SJ, Ibtehaz N, Kihara D. GO2Sum: generating human-readable functional summary of proteins from GO terms. NPJ Syst Biol Appl 2024;10:29. [PMID: 38491038 PMCID: PMC10943200 DOI: 10.1038/s41540-024-00358-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/05/2024] [Indexed: 03/18/2024] Open

Bukhman YV, Meyer S, Chu LF, Abueg L, Antosiewicz-Bourget J, Balacco J, Brecht M, Dinatale E, Fedrigo O, Formenti G, Fungtammasan A, Giri SJ, Hiller M, Howe K, Kihara D, Mamott D, Mountcastle J, Pelan S, Rabbani K, Sims Y, Tracey A, Wood JMD, Jarvis ED, Thomson JA, Chaisson MJP, Stewart R. Chromosome level genome assembly of the Etruscan shrew Suncus etruscus. Sci Data 2024;11:176. [PMID: 38326333 PMCID: PMC10850158 DOI: 10.1038/s41597-024-03011-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/26/2024] [Indexed: 02/09/2024] Open

Affiliation(s)

Yury V Bukhman Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA.
Susanne Meyer Neuroscience Research Institute, University of California - Santa Barbara, 494 UCEN Rd, Isla Vista, CA, 93117, USA
Li-Fang Chu Department of Comparative Biology and Experimental Medicine, University of Calgary, 2500 University Drive NW, Calgary, Alberta, T2N 1N4, Canada
Linelle Abueg Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
Jessica Antosiewicz-Bourget Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
Jennifer Balacco Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
Michael Brecht BCCN/Humboldt University Berlin, Philippstr, 13 House 6, 10115, Berlin, Germany
Erica Dinatale Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, 72076, Tübingen, Germany
Olivier Fedrigo Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
Giulio Formenti Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA
Arkarachai Fungtammasan DNAnexus Inc., 1975 W El Camino Real, Mountain View, CA, 94040, USA
Swagarika Jaharlal Giri Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA
Michael Hiller LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt, Germany
Kerstin Howe Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Daisuke Kihara Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA Department of Biological Sciences, Purdue University, 249 S. Martin Jischke Dr., West Lafayette, IN, 47907, USA
Daniel Mamott Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
Jacquelyn Mountcastle Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
Sarah Pelan Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Keon Rabbani Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA
Ying Sims Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Alan Tracey Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Jonathan M D Wood Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
Erich D Jarvis Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA
James A Thomson Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA Department of Molecular, Cellular and Developmental Biology, University of California Santa Barbara, Santa Barbara, CA, 93106, USA Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI, 53726, USA
Mark J P Chaisson Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA
Ron Stewart Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA

Collapse

Barik TK, Swain SN, Sahu SK, Acharya UR, Metz HC, Rasgon JL. In Silico Characterization of Intracellular Localization Signals and Structural Features of Mosquito Densovirus (MDV) Viral Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.13.571551. [PMID: 38168177 PMCID: PMC10760122 DOI: 10.1101/2023.12.13.571551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]

Giri SJ, Ibtehaz N, Kihara D. GO2Sum: Generating Human Readable Functional Summary of Proteins from GO Terms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566665. [PMID: 38014080 PMCID: PMC10680659 DOI: 10.1101/2023.11.10.566665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]

Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP allows protein function prediction using function-aware domain embedding representations. Commun Biol 2023;6:1103. [PMID: 37907681 PMCID: PMC10618451 DOI: 10.1038/s42003-023-05476-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/17/2023] [Indexed: 11/02/2023] Open

Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554486. [PMID: 37662252 PMCID: PMC10473699 DOI: 10.1101/2023.08.23.554486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]

Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open

Rahman A, Sarker MT, Islam MA, Hossain MU, Hasan M, Susmi TF. Targeting Essential Hypothetical Proteins of Pseudomonas aeruginosa PAO1 for Mining of Novel Therapeutics: An In Silico Approach. BIOMED RESEARCH INTERNATIONAL 2023;2023:1787485. [PMID: 37090194 PMCID: PMC10119676 DOI: 10.1155/2023/1787485] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 01/24/2023] [Accepted: 02/06/2023] [Indexed: 04/25/2023]

Ranjan A, Fahad MS, Deepak A. λ-Scaled-attention: A novel fast attention mechanism for efficient modeling of protein sequences. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Kagaya Y, Flannery ST, Jain A, Kihara D. ContactPFP: Protein Function Prediction Using Predicted Contact Information. FRONTIERS IN BIOINFORMATICS 2022;2. [PMID: 35875419 PMCID: PMC9302406 DOI: 10.3389/fbinf.2022.896295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract Computational function prediction is one of the most important problems in bioinformatics as elucidating the function of genes is a central task in molecular biology and genomics. Most of the existing function prediction methods use protein sequences as the primary source of input information because the sequence is the most available information for query proteins. There are attempts to consider other attributes of query proteins. Among these attributes, the three-dimensional (3D) structure of proteins is known to be very useful in identifying the evolutionary relationship of proteins, from which functional similarity can be inferred. Here, we report a novel protein function prediction method, ContactPFP, which uses predicted residue-residue contact maps as input structural features of query proteins. Although 3D structure information is known to be useful, it has not been routinely used in function prediction because the 3D structure is not experimentally determined for many proteins. In ContactPFP, we overcome this limitation by using residue-residue contact prediction, which has become increasingly accurate due to rapid development in the protein structure prediction field. ContactPFP takes a query protein sequence as input and uses predicted residue-residue contact as a proxy for the 3D protein structure. To characterize how predicted contacts contribute to function prediction accuracy, we compared the performance of ContactPFP with several well-established sequence-based function prediction methods. The comparative study revealed the advantages and weaknesses of ContactPFP compared to contemporary sequence-based methods. There were many cases where it showed higher prediction accuracy. We examined factors that affected the accuracy of ContactPFP using several illustrative cases that highlight the strength of our method. Collapse

Integrated bioinformatics based subtractive genomics approach to decipher the therapeutic function of hypothetical proteins from Salmonella typhi XDR H-58 strain. Biotechnol Lett 2022;44:279-298. [PMID: 35037232 PMCID: PMC8761513 DOI: 10.1007/s10529-021-03219-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/12/2021] [Indexed: 11/21/2022]

Saxena R, Bishnoi R, Singla D. Gene Ontology: application and importance in functional annotation of the genomic data. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00015-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Törönen P, Holm L. PANNZER-A practical tool for protein function prediction. Protein Sci 2022;31:118-128. [PMID: 34562305 PMCID: PMC8740830 DOI: 10.1002/pro.4193] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/22/2021] [Accepted: 09/22/2021] [Indexed: 01/03/2023]

Vu TTD, Jung J. Protein function prediction with gene ontology: from traditional to deep learning models. PeerJ 2021;9:e12019. [PMID: 34513334 PMCID: PMC8395570 DOI: 10.7717/peerj.12019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 07/29/2021] [Indexed: 11/25/2022] Open

Abad-Navarro F, Quesada-Martínez M, Duque-Ramos A, Fernández-Breis JT. Analysis of readability and structural accuracy in SNOMED CT. BMC Med Inform Decis Mak 2020;20:284. [PMID: 33319711 PMCID: PMC7737250 DOI: 10.1186/s12911-020-01291-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 10/13/2020] [Indexed: 11/18/2022] Open

Abstract

Background

The increasing adoption of ontologies in biomedical research and the growing number of ontologies available have made it necessary to assure the quality of these resources. Most of the well-established ontologies, such as the Gene Ontology or SNOMED CT, have their own quality assurance processes. These have demonstrated their usefulness for the maintenance of the resources but are unable to detect all of the modelling flaws in the ontologies. Consequently, the development of efficient and effective quality assurance methods is needed.

Methods

Here, we propose a series of quantitative metrics based on the processing of the lexical regularities existing in the content of the ontology, to analyse readability and structural accuracy. The readability metrics account for the ratio of labels, descriptions, and synonyms associated with the ontology entities. The structural accuracy metrics evaluate how two ontology modelling best practices are followed: (1) lexically suggest locally define (LSLD), that is, if what is expressed in natural language for humans is available as logical axioms for machines; and (2) systematic naming, which accounts for the amount of label content of the classes in a given taxonomy shared.

Results

We applied the metrics to different versions of SNOMED CT. Both readability and structural accuracy metrics remained stable in time but could capture some changes in the modelling decisions in SNOMED CT. The value of the LSLD metric increased from 0.27 to 0.31, and the value of the systematic naming metric was around 0.17. We analysed the readability and structural accuracy in the SNOMED CT July 2019 release. The results showed that the fulfilment of the structural accuracy criteria varied among the SNOMED CT hierarchies. The value of the metrics for the hierarchies was in the range of 0–0.92 (LSLD) and 0.08–1 (systematic naming). We also identified the cases that did not meet the best practices.

Conclusions

We generated useful information about the engineering of the ontology, making the following contributions: (1) a set of readability metrics, (2) the use of lexical regularities to define structural accuracy metrics, and (3) the generation of quality assurance information for SNOMED CT.

Collapse

Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis. Int J Mol Sci 2020;21:ijms21082873. [PMID: 32326049 PMCID: PMC7216093 DOI: 10.3390/ijms21082873] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 04/16/2020] [Accepted: 04/18/2020] [Indexed: 01/15/2023] Open

Khan IK, Jain A, Rawi R, Bensmail H, Kihara D. Prediction of protein group function by iterative classification on functional relevance network. Bioinformatics 2020;35:1388-1394. [PMID: 30192921 DOI: 10.1093/bioinformatics/bty787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 08/28/2018] [Accepted: 09/04/2018] [Indexed: 11/14/2022] Open

Jain A, Kihara D. Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences. Bioinformatics 2019;35:753-759. [PMID: 30165572 DOI: 10.1093/bioinformatics/bty704] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 07/30/2018] [Accepted: 08/23/2018] [Indexed: 02/03/2023] Open

NNTox: Gene Ontology-Based Protein Toxicity Prediction Using Neural Network. Sci Rep 2019;9:17923. [PMID: 31784686 PMCID: PMC6884647 DOI: 10.1038/s41598-019-54405-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 11/13/2019] [Indexed: 12/23/2022] Open

Imam N, Alam A, Ali R, Siddiqui MF, Ali S, Malik MZ, Ishrat R. In silico characterization of hypothetical proteins from Orientia tsutsugamushi str. Karp uncovers virulence genes. Heliyon 2019;5:e02734. [PMID: 31720472 PMCID: PMC6838952 DOI: 10.1016/j.heliyon.2019.e02734] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 04/29/2019] [Accepted: 10/23/2019] [Indexed: 11/20/2022] Open

Abstract

Scrub typhus also known as bush typhus is a disease with symptoms similar to Chikungunya infection. It is caused by a gram-negative bacterium Orientia tsutsugamushi which resides in its vertebrate host, Mites. The genome of Orientia tsutsugamushi str. Karp encodes for 1,563 proteins, of which 344 are characterized as hypothetical ones. In the present study, we tried to identify the probable functions of these 344 hypothetical proteins (HPs). All the characterized hypothetical proteins (HPs) belong to the various protein classes like enzymes, transporters, binding proteins, metabolic process and catalytic activity and kinase activity. These hypothetical proteins (HPs) were further analyzed for virulence factors with 62 proteins identified as the most virulent proteins among these hypothetical proteins (HPs). In addition, we studied the protein sequence similarity network for visualizing functional trends across protein superfamilies from the context of sequence similarity and it shows great potential for generating testable hypotheses about protein structure-function relationships. Furthermore, we calculated toplogical properties of the network and found them to obey network power law distributions showing a fractal nature. We also identifed two highly interconnected modules in the main network which contained five hub proteins (KJV55465, KJV56211, KJV57212, KJV57203 and KJV57216) having 1.0 clustering coefficient. The structural modeling (2D and 3D structure) of these five hub proteins was carried out and the catalytic site essential for its functioning was analyzed. The outcome of the present study may facilitate a better understanding of the mechanism of virulence, pathogenesis, adaptability to host and up-to-date annotations will make unknown genes easy to identify and target for experimentation. The information on the functional attributes and virulence characteristic of these hypothetical proteins (HPs) are envisaged to facilitate effective development of novel antibacterial drug targets of Orientia tsutsugamushi.

Collapse

Frasca M, Bianchi NC. Multitask Protein Function Prediction through Task Dissimilarity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:1550-1560. [PMID: 28328509 DOI: 10.1109/tcbb.2017.2684127] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Ding Z, Kihara D. Computational identification of protein-protein interactions in model plant proteomes. Sci Rep 2019;9:8740. [PMID: 31217453 PMCID: PMC6584649 DOI: 10.1038/s41598-019-45072-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 05/30/2019] [Indexed: 12/12/2022] Open

Gao R, Wang M, Zhou J, Fu Y, Liang M, Guo D, Nie J. Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation. Int J Mol Sci 2019;20:E2845. [PMID: 31212665 PMCID: PMC6600291 DOI: 10.3390/ijms20112845] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 06/03/2019] [Accepted: 06/04/2019] [Indexed: 01/28/2023] Open

Jain A, Gali H, Kihara D. Identification of Moonlighting Proteins in Genomes Using Text Mining Techniques. Proteomics 2018;18:e1800083. [PMID: 30260564 PMCID: PMC6404977 DOI: 10.1002/pmic.201800083] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 08/13/2018] [Indexed: 12/31/2022]

Doğan T. HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ 2018;6:e5298. [PMID: 30083448 PMCID: PMC6076985 DOI: 10.7717/peerj.5298] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 07/03/2018] [Indexed: 01/24/2023] Open

Abstract

Analysing the relationships between biomolecules and the genetic diseases is a highly active area of research, where the aim is to identify the genes and their products that cause a particular disease due to functional changes originated from mutations. Biological ontologies are frequently employed in these studies, which provides researchers with extensive opportunities for knowledge discovery through computational data analysis. In this study, a novel approach is proposed for the identification of relationships between biomedical entities by automatically mapping phenotypic abnormality defining HPO terms with biomolecular function defining GO terms, where each association indicates the occurrence of the abnormality due to the loss of the biomolecular function expressed by the corresponding GO term. The proposed HPO2GO mappings were extracted by calculating the frequency of the co-annotations of the terms on the same genes/proteins, using already existing curated HPO and GO annotation sets. This was followed by the filtering of the unreliable mappings that could be observed due to chance, by statistical resampling of the co-occurrence similarity distributions. Furthermore, the biological relevance of the finalized mappings were discussed over selected cases, using the literature. The resulting HPO2GO mappings can be employed in different settings to predict and to analyse novel gene/protein—ontology term—disease relations. As an application of the proposed approach, HPO term—protein associations (i.e., HPO2protein) were predicted. In order to test the predictive performance of the method on a quantitative basis, and to compare it with the state-of-the-art, CAFA2 challenge HPO prediction target protein set was employed. The results of the benchmark indicated the potential of the proposed approach, as HPO2GO performance was among the best (Fmax = 0.35). The automated cross ontology mapping approach developed in this work may be extended to other ontologies as well, to identify unexplored relation patterns at the systemic level. The datasets, results and the source code of HPO2GO are available for download at: https://github.com/cansyl/HPO2GO.

Collapse

Ding Z, Kihara D. Computational Methods for Predicting Protein-Protein Interactions Using Various Protein Features. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2018;93:e62. [PMID: 29927082 PMCID: PMC6097941 DOI: 10.1002/cpps.62] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Yerneni S, Khan IK, Wei Q, Kihara D. IAS: Interaction Specific GO Term Associations for Predicting Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:1247-1258. [PMID: 26415209 DOI: 10.1109/tcbb.2015.2476809] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Khan IK, Bhuiyan M, Kihara D. DextMP: deep dive into text for predicting moonlighting proteins. Bioinformatics 2018;33:i83-i91. [PMID: 28881966 PMCID: PMC5870774 DOI: 10.1093/bioinformatics/btx231] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Zhang C, Zheng W, Freddolino PL, Zhang Y. MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping. J Mol Biol 2018. [PMID: 29534977 DOI: 10.1016/j.jmb.2018.03.004] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Ding Z, Wei Q, Kihara D. Computing and Visualizing Gene Function Similarity and Coherence with NaviGO. Methods Mol Biol 2018;1807:113-130. [PMID: 30030807 DOI: 10.1007/978-1-4939-8561-6_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Rifaioglu AS, Doğan T, Saraç ÖS, Ersahin T, Saidi R, Atalay MV, Martin MJ, Cetin-Atalay R. Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants. Proteins 2017;86:135-151. [PMID: 29098713 DOI: 10.1002/prot.25416] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2017] [Revised: 10/24/2017] [Accepted: 11/01/2017] [Indexed: 12/24/2022]

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. Molecules 2017;22:molecules22101732. [PMID: 29039790 PMCID: PMC6151571 DOI: 10.3390/molecules22101732] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 10/11/2017] [Accepted: 10/11/2017] [Indexed: 11/25/2022] Open

Wei Q, Khan IK, Ding Z, Yerneni S, Kihara D. NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology. BMC Bioinformatics 2017;18:177. [PMID: 28320317 PMCID: PMC5359872 DOI: 10.1186/s12859-017-1600-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 03/11/2017] [Indexed: 12/25/2022] Open

In-silico prediction of dual function of DksA like hypothetical protein in V. cholerae O395 genome. Microbiol Res 2017;195:60-70. [DOI: 10.1016/j.micres.2016.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 11/04/2016] [Accepted: 11/05/2016] [Indexed: 11/20/2022]

Cozzetto D, Jones DT. Computational Methods for Annotation Transfers from Sequence. Methods Mol Biol 2017;1446:55-67. [PMID: 27812935 DOI: 10.1007/978-1-4939-3743-1_5] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Wei Q, McGraw J, Khan I, Kihara D. Using PFP and ESG Protein Function Prediction Web Servers. Methods Mol Biol 2017;1611:1-14. [PMID: 28451967 DOI: 10.1007/978-1-4939-7015-5_1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Khan I, McGraw J, Kihara D. MPFit: Computational Tool for Predicting Moonlighting Proteins. Methods Mol Biol 2017;1611:45-57. [PMID: 28451971 DOI: 10.1007/978-1-4939-7015-5_5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Huang G, Chu C, Huang T, Kong X, Zhang Y, Zhang N, Cai YD. Exploring Mouse Protein Function via Multiple Approaches. PLoS One 2016;11:e0166580. [PMID: 27846315 PMCID: PMC5112993 DOI: 10.1371/journal.pone.0166580] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2016] [Accepted: 10/31/2016] [Indexed: 01/16/2023] Open

Abstract

Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.

Collapse

Making sense of genomes of parasitic worms: Tackling bioinformatic challenges. Biotechnol Adv 2016;34:663-686. [DOI: 10.1016/j.biotechadv.2016.03.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 02/25/2016] [Accepted: 03/01/2016] [Indexed: 01/25/2023]

Missing gene identification using functional coherence scores. Sci Rep 2016;6:31725. [PMID: 27552989 PMCID: PMC4995438 DOI: 10.1038/srep31725] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 07/22/2016] [Indexed: 11/18/2022] Open

Vidulin V, Šmuc T, Supek F. Extensive complementarity between gene function prediction methods. Bioinformatics 2016;32:3645-3653. [PMID: 27522084 DOI: 10.1093/bioinformatics/btw532] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 07/11/2016] [Accepted: 08/09/2016] [Indexed: 12/22/2022] Open

Abstract

MOTIVATION

The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions.

RESULTS

Our pipeline amalgamates 5 133 543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse genomic AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most-confident prediction per gene/function, rather than enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known Escherichia coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the Critical Assessment of Function Annotation 2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them.

AVAILABILITY AND IMPLEMENTATION

The individual and integrated GO predictions for the complete set of genes are available from http://gorbi.irb.hr/ CONTACT: fran.supek@irb.hrSupplementary information: Supplementary materials are available at Bioinformatics online.

Collapse

Cao R, Cheng J. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods 2016;93:84-91. [PMID: 26370280 PMCID: PMC4894840 DOI: 10.1016/j.ymeth.2015.09.011] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 09/03/2015] [Accepted: 09/10/2015] [Indexed: 11/30/2022] Open

GoFDR: A sequence alignment based method for predicting protein functions. Methods 2016;93:3-14. [DOI: 10.1016/j.ymeth.2015.08.009] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 07/27/2015] [Accepted: 08/11/2015] [Indexed: 01/01/2023] Open

Das S, Orengo CA. Protein function annotation using protein domain family resources. Methods 2016;93:24-34. [DOI: 10.1016/j.ymeth.2015.09.029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Revised: 09/28/2015] [Accepted: 09/29/2015] [Indexed: 01/25/2023] Open

Frasca M, Bertoni A, Valentini G. UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions. J Comput Biol 2015;22:1057-74. [PMID: 26402488 DOI: 10.1089/cmb.2014.0110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Khan IK, Wei Q, Chapman S, KC DB, Kihara D. The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches. Gigascience 2015;4:43. [PMID: 26380077 PMCID: PMC4570625 DOI: 10.1186/s13742-015-0083-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 08/27/2015] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

Functional annotation of novel proteins is one of the central problems in bioinformatics. With the ever-increasing development of genome sequencing technologies, more and more sequence information is becoming available to analyze and annotate. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To objectively evaluate the performance of such methods on a large scale, community-wide assessment experiments have been conducted. The second round of the Critical Assessment of Function Annotation (CAFA) experiment was held in 2013-2014. Evaluation of participating groups was reported in a special interest group meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in Boston in 2014. Our group participated in both CAFA1 and CAFA2 using multiple, in-house AFP methods. Here, we report benchmark results of our methods obtained in the course of preparation for CAFA2 prior to submitting function predictions for CAFA2 targets.

RESULTS

For CAFA2, we updated the annotation databases used by our methods, protein function prediction (PFP) and extended similarity group (ESG), and benchmarked their function prediction performances using the original (older) and updated databases. Performance evaluation for PFP with different settings and ESG are discussed. We also developed two ensemble methods that combine function predictions from six independent, sequence-based AFP methods. We further analyzed the performances of our prediction methods by enriching the predictions with prior distribution of gene ontology (GO) terms. Examples of predictions by the ensemble methods are discussed.

CONCLUSIONS

Updating the annotation database was successful, improving the Fmax prediction accuracy score for both PFP and ESG. Adding the prior distribution of GO terms did not make much improvement. Both of the ensemble methods we developed improved the average Fmax score over all individual component methods except for ESG. Our benchmark results will not only complement the overall assessment that will be done by the CAFA organizers, but also help elucidate the predictive powers of sequence-based function prediction methods in general.

Collapse

Frasca M. Automated gene function prediction through gene multifunctionality in biological networks. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.04.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]