1
|
Yoon KA, Kim WJ, Lee S, Yang HS, Lee BH, Lee SH. Comparative analyses of the venom components in the salivary gland transcriptomes and saliva proteomes of some heteropteran insects. INSECT SCIENCE 2022; 29:411-429. [PMID: 34296820 DOI: 10.1111/1744-7917.12955] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 07/16/2021] [Accepted: 07/19/2021] [Indexed: 06/13/2023]
Abstract
Salivary gland-specific transcriptomes of nine heteropteran insects with distinct feeding strategies (predaceous, hematophagous, and phytophagous) were analyzed and annotated to compare and identify the venom components as well as their expression profiles. The transcriptional abundance of venom genes was verified via quantitative real-time PCR. Hierarchical clustering of 30 representative differentially expressed venom genes from the nine heteropteran species revealed unique groups of salivary gland-specific genes depending on their feeding strategy. The commonly transcribed genes included a paralytic neurotoxin (arginine kinase), digestive enzymes (cathepsin and serine protease), an anti-inflammatory protein (cystatin), hexamerin, and an odorant binding protein. Both predaceous and hematophagous (bed bug) heteropteran species showed relatively higher transcription levels of genes encoding proteins involved in proteolysis and cytolysis, whereas phytophagous heteropterans exhibited little or no expression of these genes, but had a high expression of vitellogenin, a multifunctional allergen. Saliva proteomes from four representative species were also analyzed. All venom proteins identified via saliva proteome analysis were annotated using salivary gland transcriptome data. The proteomic expression profiles of venom proteins were in good agreement with the salivary gland-specific transcriptomic profiles. Our results indicate that profiling of the salivary gland transcriptome provides important information on the composition and evolutionary features of venoms depending on their feeding strategy.
Collapse
Affiliation(s)
- Kyungjae Andrew Yoon
- Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Korea
| | | | - Seungki Lee
- National Institute of Biological Resources, Environmental Research Complex, Incheon, Korea
| | - Hee-Sun Yang
- National Institute of Biological Resources, Environmental Research Complex, Incheon, Korea
| | - Byoung-Hee Lee
- National Institute of Biological Resources, Environmental Research Complex, Incheon, Korea
| | - Si Hyeock Lee
- Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Korea
- Department of Agricultural Biology, Seoul National University, Seoul, Korea
| |
Collapse
|
2
|
Zhao C, Liu T, Wang Z. PANDA2: protein function prediction using graph neural networks. NAR Genom Bioinform 2022; 4:lqac004. [PMID: 35118378 PMCID: PMC8808544 DOI: 10.1093/nargab/lqac004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 11/20/2021] [Accepted: 01/05/2022] [Indexed: 12/13/2022] Open
Abstract
High-throughput sequencing technologies have generated massive protein sequences, but the annotations of protein sequences highly rely on the low-throughput and expensive biological experiments. Therefore, accurate and fast computational alternatives are needed to infer functional knowledge from protein sequences. The gene ontology (GO) directed acyclic graph (DAG) contains the hierarchical relationships between GO terms but is hard to be integrated into machine learning algorithms for functional predictions. We developed a deep learning system named PANDA2 to predict protein functions, which used the cutting-edge graph neural network to model the topology of the GO DAG and integrated the features generated by transformer protein language models. Compared with the top 10 methods in CAFA3, PANDA2 ranked first in cellular component ontology (CCO), tied first in biological process ontology (BPO) but had a higher coverage rate, and second in molecular function ontology (MFO). Compared with other recently-developed cutting-edge predictors DeepGOPlus, GOLabeler, and DeepText2GO, and benchmarked on another independent dataset, PANDA2 ranked first in CCO, first in BPO, and second in MFO. PANDA2 can be freely accessed from http://dna.cs.miami.edu/PANDA2/.
Collapse
Affiliation(s)
- Chenguang Zhao
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 33124, USA
| | - Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 33124, USA
| |
Collapse
|
3
|
Sørensen PE, Baig S, Stegger M, Ingmer H, Garmyn A, Butaye P. Spontaneous Phage Resistance in Avian Pathogenic Escherichia coli. Front Microbiol 2021; 12:782757. [PMID: 34966369 PMCID: PMC8711792 DOI: 10.3389/fmicb.2021.782757] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 11/23/2021] [Indexed: 01/19/2023] Open
Abstract
Avian pathogenic Escherichia coli (APEC) is one of the most important bacterial pathogens affecting poultry worldwide. The emergence of multidrug-resistant pathogens has renewed the interest in the therapeutic use of bacteriophages (phages). However, a major concern for the successful implementation of phage therapy is the emergence of phage-resistant mutants. The understanding of the phage-host interactions, as well as underlying mechanisms of resistance, have shown to be essential for the development of a successful phage therapy. Here, we demonstrate that the strictly lytic Escherichia phage vB_EcoM-P10 rapidly selected for resistance in the APEC ST95 O1 strain AM621. Whole-genome sequence analysis of 109 spontaneous phage-resistant mutant strains revealed 41 mutants with single-nucleotide polymorphisms (SNPs) in their core genome. In 32 of these, a single SNP was detected while two SNPs were identified in a total of nine strains. In total, 34 unique SNPs were detected. In 42 strains, including 18 strains with SNP(s), gene losses spanning 17 different genes were detected. Affected by genetic changes were genes known to be involved in phage resistance (outer membrane protein A, lipopolysaccharide-, O- antigen-, or cell wall-related genes) as well as genes not previously linked to phage resistance, including two hypothetical genes. In several strains, we did not detect any genetic changes. Infecting phages were not able to overcome the phage resistance in host strains. However, interestingly the initial infection was shown to have a great fitness cost for several mutant strains, with up to ∼65% decrease in overall growth. In conclusion, this study provides valuable insights into the phage-host interaction and phage resistance in APEC. Although acquired resistance to phages is frequently observed in pathogenic E. coli, it may be associated with loss of fitness, which could be exploited in phage therapy.
Collapse
Affiliation(s)
- Patricia E. Sørensen
- Department of Pathobiology, Pharmacology and Zoological Medicine, Ghent University, Merelbeke, Belgium
- Department of Biomedical Sciences, Ross University School of Veterinary Medicine, Basseterre, Saint Kitts and Nevis
| | - Sharmin Baig
- Department of Bacteria, Parasites and Fungi, Statens Serum Institut, Copenhagen, Denmark
| | - Marc Stegger
- Department of Bacteria, Parasites and Fungi, Statens Serum Institut, Copenhagen, Denmark
| | - Hanne Ingmer
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - An Garmyn
- Department of Pathobiology, Pharmacology and Zoological Medicine, Ghent University, Merelbeke, Belgium
| | - Patrick Butaye
- Department of Pathobiology, Pharmacology and Zoological Medicine, Ghent University, Merelbeke, Belgium
- Department of Biomedical Sciences, Ross University School of Veterinary Medicine, Basseterre, Saint Kitts and Nevis
| |
Collapse
|
4
|
Sandaruwan PD, Wannige CT. An improved deep learning model for hierarchical classification of protein families. PLoS One 2021; 16:e0258625. [PMID: 34669708 PMCID: PMC8528337 DOI: 10.1371/journal.pone.0258625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 10/01/2021] [Indexed: 12/28/2022] Open
Abstract
Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.
Collapse
|
5
|
Kulmanov M, Zhapa-Camacho F, Hoehndorf R. DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web. Nucleic Acids Res 2021; 49:W140-W146. [PMID: 34019664 PMCID: PMC8262746 DOI: 10.1093/nar/gkab373] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 04/18/2021] [Accepted: 04/26/2021] [Indexed: 11/24/2022] Open
Abstract
Understanding the functions of proteins is crucial to understand biological processes on a molecular level. Many more protein sequences are available than can be investigated experimentally. DeepGOPlus is a protein function prediction method based on deep learning and sequence similarity. DeepGOWeb makes the prediction model available through a website, an API, and through the SPARQL query language for interoperability with databases that rely on Semantic Web technologies. DeepGOWeb provides accurate and fast predictions and ensures that predicted functions are consistent with the Gene Ontology; it can provide predictions for any protein and any function in Gene Ontology. DeepGOWeb is freely available at https://deepgo.cbrc.kaust.edu.sa/.
Collapse
Affiliation(s)
- Maxat Kulmanov
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Fernando Zhapa-Camacho
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
6
|
Wang Y, Zhang H, Zhong H, Xue Z. Protein domain identification methods and online resources. Comput Struct Biotechnol J 2021; 19:1145-1153. [PMID: 33680357 PMCID: PMC7895673 DOI: 10.1016/j.csbj.2021.01.041] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/25/2021] [Accepted: 01/26/2021] [Indexed: 01/03/2023] Open
Abstract
Protein domains are the basic units of proteins that can fold, function, and evolve independently. Knowledge of protein domains is critical for protein classification, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Thus, over the past two decades, a number of protein domain identification approaches have been developed, and a variety of protein domain databases have also been constructed. This review divides protein domain prediction methods into two categories, namely sequence-based and structure-based. These methods are introduced in detail, and their advantages and limitations are compared. Furthermore, this review also provides a comprehensive overview of popular online protein domain sequence and structure databases. Finally, we discuss potential improvements of these prediction methods.
Collapse
Affiliation(s)
- Yan Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical College, Yantai, Shandong 264003, China
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Hang Zhang
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Haolin Zhong
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
7
|
Stamboulian M, Guerrero RF, Hahn MW, Radivojac P. The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 2020; 36:i219-i226. [PMID: 32657391 PMCID: PMC7355290 DOI: 10.1093/bioinformatics/btaa468] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION The computational prediction of gene function is a key step in making full use of newly sequenced genomes. Function is generally predicted by transferring annotations from homologous genes or proteins for which experimental evidence exists. The 'ortholog conjecture' proposes that orthologous genes should be preferred when making such predictions, as they evolve functions more slowly than paralogous genes. Previous research has provided little support for the ortholog conjecture, though the incomplete nature of the data cast doubt on the conclusions. RESULTS We use experimental annotations from over 40 000 proteins, drawn from over 80 000 publications, to revisit the ortholog conjecture in two pairs of species: (i) Homo sapiens and Mus musculus and (ii) Saccharomyces cerevisiae and Schizosaccharomyces pombe. By making a distinction between questions about the evolution of function versus questions about the prediction of function, we find strong evidence against the ortholog conjecture in the context of function prediction, though questions about the evolution of function remain difficult to address. In both pairs of species, we quantify the amount of information that would be ignored if paralogs are discarded, as well as the resulting loss in prediction accuracy. Taken as a whole, our results support the view that the types of homologs used for function transfer are largely irrelevant to the task of function prediction. Maximizing the amount of data used for this task, regardless of whether it comes from orthologs or paralogs, is most likely to lead to higher prediction accuracy. AVAILABILITY AND IMPLEMENTATION https://github.com/predragradivojac/oc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Moses Stamboulian
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Rafael F Guerrero
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Matthew W Hahn
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
8
|
Taheri S, Bouyer A. Community Detection in Social Networks Using Affinity Propagation with Adaptive Similarity Matrix. BIG DATA 2020; 8:189-202. [PMID: 32397731 DOI: 10.1089/big.2019.0143] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Community detection problem is a projection of data clustering where the network's topological properties are only considered for measuring similarities among nodes. Also, finding communities' kernel nodes and expanding a community from kernel will certainly help us to find optimal communities. Among the existing community detection approaches, the affinity propagation (AP)-based method has been showing promising results and does not require any predefined information such as the number of clusters (communities). AP is an exemplar-based clustering method that defines the negative real-valued similarity measure sim(i, k) between data point i and exemplar k as the probability of k being the exemplar of data point i. According to our intuition, the value of sim(i, k) should not be identical to sim(k, i). In this study, a new version of AP using an adaptive similarity matrix, namely affinity propagation with adaptive similarity (APAS) matrix, is proposed, which could efficiently show the leadership probabilities between data points. APAS can adaptively transform the symmetric similarity matrix into an asymmetric one. It outperforms AP method in terms of accuracy. Extensive experiments conducted on artificial and real-world networks demonstrate the effectiveness of our approach.
Collapse
Affiliation(s)
- Sona Taheri
- Department of Computer Engineering, Azarbaijan Shahid Madani University, Tabriz, Iran
| | - Asgarali Bouyer
- Department of Computer Engineering, Azarbaijan Shahid Madani University, Tabriz, Iran
| |
Collapse
|
9
|
Zhang Z, Wang Z, Dang Y, Wang J, Jayaprakash S, Wang H, He J. Transcriptomic Prediction of Pig Liver-Enriched Gene 1 Functions in a Liver Cell Line. Genes (Basel) 2020; 11:genes11040412. [PMID: 32290278 PMCID: PMC7230230 DOI: 10.3390/genes11040412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 03/27/2020] [Accepted: 04/09/2020] [Indexed: 11/16/2022] Open
Abstract
The newly identified liver-enriched gene 1 (LEG1) encodes a protein with a characteristic domain of unknown function 781 (DUF781/LEG1), constituting a protein family with only one member in mammals. A functional study in zebrafish suggested that LEG1 genes are involved in liver development, while the platypus LEG1 homolog, Monotreme Lactation Protein (MLP), which is enriched in the mammary gland and milk, acts as an antibacterial substance. However, no functional studies on eutherian LEG1s have been published to date. Thus, we here report the first functional prediction study at the cellular level. As previously reported, eutherian LEG1s can be classified into three paralogous groups. Pigs have all three LEG1 genes (pLEG1s), while humans and mice have retained only LEG1a. Hence, pLEG1s might represent an ideal model for studying LEG1 gene functions. RNA-seq was performed by the overexpression of pLEG1s and platypus MLP in HepG2 cells. Enrichment analysis showed that pLEG1a and pLEG1b might exhibit little function in liver cells; however, pLEG1c is probably involved in the endoplasmic reticulum (ER) stress response and protein folding. Additionally, gene set enrichment analysis revealed that platypus MLP shows antibacterial activity, confirming the functional study in platypus. Therefore, our study showed from the transcriptomic perspective that mammalian LEG1s have different functions in liver cells due to the subfunctionalization of paralogous genes.
Collapse
Affiliation(s)
- Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China; (Z.Z.); (Y.D.); (J.W.)
| | - Zizengchen Wang
- Department of Veterinary Medicine, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China; (Z.W.); (H.W.)
| | - Yanna Dang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China; (Z.Z.); (Y.D.); (J.W.)
| | - Jinyang Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China; (Z.Z.); (Y.D.); (J.W.)
| | - Sakthidasan Jayaprakash
- Department of Chemical Engineering, Hindustan Institute of Technology and Science, Chennai 603103, India;
| | - Huanan Wang
- Department of Veterinary Medicine, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China; (Z.W.); (H.W.)
| | - Jin He
- Department of Animal Science, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China; (Z.Z.); (Y.D.); (J.W.)
- Correspondence:
| |
Collapse
|
10
|
Mishra S, Rastogi YP, Jabin S, Kaur P, Amir M, Khatun S. A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species. Comput Biol Chem 2019; 83:107147. [PMID: 31698160 DOI: 10.1016/j.compbiolchem.2019.107147] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 10/05/2019] [Accepted: 10/09/2019] [Indexed: 01/06/2023]
Abstract
Protein function prediction is a crucial task in the post-genomics era due to their diverse irreplaceable roles in a biological system. Traditional methods involved cost-intensive and time-consuming molecular biology techniques but they proved to be ineffective after the outburst of sequencing data through the advent of cost-effective and advanced sequencing techniques. To manage the pace of annotation with that of data generation, there is a shift to computational approaches which are based on homology, sequence and structure-based features, protein-protein interaction networks, phylogenetic profiles, and physicochemical properties, etc. A combination of these features has proven to be promising for protein function prediction in terms of improving prediction accuracy. In the present work, we have employed a combination of features based on sequence, physicochemical property, subsequence and annotation features with a total of 9890 features extracted and/or calculated for 171,212 reviewed prokaryotic proteins of 9 bacterial phyla from UniProtKB, to train a supervised deep learning ensemble model with the aim to categorize a bacterial hypothetical/unreviewed protein's function into 1739 GO terms as functional classes. The proposed system being fully dedicated to bacterial organisms is a novel attempt amongst various existing machine learning based protein function prediction systems based on mixed organisms. Experimental results demonstrate the success of the proposed deep learning ensemble model based on deep neural network method with F1 measure of 0.7912 on the prepared Test dataset 1 of reviewed proteins.
Collapse
Affiliation(s)
- Sarthak Mishra
- Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi, 110025, Delhi, India
| | - Yash Pratap Rastogi
- Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi, 110025, Delhi, India
| | - Suraiya Jabin
- Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi, 110025, Delhi, India.
| | - Punit Kaur
- Department of Biophysics, All India Institute of Medical Sciences (AIIMS), New Delhi, 110 029, Delhi, India
| | - Mohammad Amir
- Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi, 110025, Delhi, India
| | - Shabnam Khatun
- Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi, 110025, Delhi, India
| |
Collapse
|
11
|
Bhat AS, Grishin NV. Predicting Sequence Features, Function, and Structure of Proteins Using MESSA. CURRENT PROTOCOLS IN BIOINFORMATICS 2019; 67:e84. [PMID: 31524991 PMCID: PMC6750024 DOI: 10.1002/cpbi.84] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
MEta-Server for protein Sequence Analysis (MESSA) is a tool that facilitates widespread protein sequence analysis by gathering structural (local sequence properties and three-dimensional structure) and functional (annotations from SWISS-PROT, Gene Ontology terms, and enzyme classification) predictions for a query protein of interest. MESSA uses multiple well-established tools to offer consensus-based predictions on important aspects of protein sequence analysis. Being freely available for noncommercial users and with a user-friendly interface, MESSA serves as an umbrella platform that overcomes the absence of a comprehensive tool for predictive protein analysis. This article reveals how to access MESSA via the Web and shows how to input a protein sequence to analyze using the MESSA web server. It also includes a detailed explanation of the output from MESSA to aid in better interpretation of results. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Archana S. Bhat
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
| |
Collapse
|
12
|
Saha S, Chatterjee P, Basu S, Nasipuri M, Plewczynski D. FunPred 3.0: improved protein function prediction using protein interaction network. PeerJ 2019; 7:e6830. [PMID: 31198622 PMCID: PMC6535044 DOI: 10.7717/peerj.6830] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 03/21/2019] [Indexed: 11/23/2022] Open
Abstract
Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein–protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein–protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science and Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, Kolkata, West Bengal, India
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|