1
|
Tavis S, Hettich RL. Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome. BMC Genomics 2024; 25:267. [PMID: 38468234 PMCID: PMC10926591 DOI: 10.1186/s12864-024-10082-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/02/2024] [Indexed: 03/13/2024] Open
Abstract
In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bias analyses since these proteins can carry out functions which impact the engineering of organisms. As a consequence of predicting protein function across all organisms, function prediction tools generally fail to use all of the types of data available for any specific organism, including protein and transcript expression information. Additionally, the release of Alphafold predictions for all Uniprot proteins provides a novel opportunity for leveraging structural information. We constructed a bespoke machine learning model to predict the function of recalcitrant proteins of unknown function in Pseudomonas putida based on these sources of data, which annotated 1079 terms to 213 proteins. Among the predicted functions supplied by the model, we found evidence for a significant overrepresentation of nitrogen metabolism and macromolecule processing proteins. These findings were corroborated by manual analyses of selected proteins which identified, among others, a functionally unannotated operon that likely encodes a branch of the shikimate pathway.
Collapse
Affiliation(s)
- Steven Tavis
- Genome Science and Technology Graduate Program, University of Tennessee Knoxville, Knoxville, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| |
Collapse
|
2
|
Arslan M. Whole-genome sequencing and genomic analysis of Norduz goat (Capra hircus). Mamm Genome 2023:10.1007/s00335-023-09990-3. [PMID: 37004528 DOI: 10.1007/s00335-023-09990-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 03/21/2023] [Indexed: 04/04/2023]
Abstract
Artificial and natural selective breeding of goats has resulted in many different goat breeds all around the world. Norduz goat is one of these breeds, and it is a local goat breed of Turkey. The goats are favorable due to pre-weaning viability and reproduction values compared to the regional breeds. Development in sequencing technologies has let to understand huge genomic structures and complex phenotypes. Until now, such a comprehensive study has not been carried out to understand the genomic structure of the Norduz goats, yet. In the study, the next-generation sequencing was carried out to understand the genomic structure of Norduz goat. Real-time PCR was used to evaluate prominent CNVs in the Norduz goat individuals. Whole genome of the goat was constructed with an average of 33.1X coverage level. In the stringent filtering condition, 9,757,980 SNPs, 1,536,715 InDels, and 290 CNVs were detected in the Norduz goat genome. Functional analysis of high-impact SNP variations showed that the classical complement activation biological process was affected significantly in the goat. CNVs in the goat genome were found in genes related to defense against viruses, immune response, and cell membrane transporters. It was shown that GBP2, GBP5, and mammalian ortholog GBP1, which are INF-stimulated GTPases, were found to be high copy numbers in the goats. To conclude, genetic variations mainly in immunological response processes suggest that Norduz goat is an immunologically improved goat breed and natural selection could take an important role in the genetical improvements of the goats.
Collapse
Affiliation(s)
- Mevlüt Arslan
- Department of Genetics, Faculty of Veterinary Medicine, Van Yüzüncü Yıl University, Tuşba, 65080, Van, Turkey.
| |
Collapse
|
3
|
Lee T, Lee S, Kang M, Kim S. Deep hierarchical embedding for simultaneous modeling of GPCR proteins in a unified metric space. Sci Rep 2021; 11:9543. [PMID: 33953216 PMCID: PMC8100104 DOI: 10.1038/s41598-021-88623-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 04/13/2021] [Indexed: 11/23/2022] Open
Abstract
GPCR proteins belong to diverse families of proteins that are defined at multiple hierarchical levels. Inspecting relationships between GPCR proteins on the hierarchical structure is important, since characteristics of the protein can be inferred from proteins in similar hierarchical information. However, modeling of GPCR families has been performed separately for each of the family, subfamily, and sub-subfamily level. Relationships between GPCR proteins are ignored in these approaches as they process the information in the proteins with several disconnected models. In this study, we propose DeepHier, a deep learning model to simultaneously learn representations of GPCR family hierarchy from the protein sequences with a unified single model. Novel loss term based on metric learning is introduced to incorporate hierarchical relations between proteins. We tested our approach using a public GPCR sequence dataset. Metric distances in the deep feature space corresponded to the hierarchical family relation between GPCR proteins. Furthermore, we demonstrated that further downstream tasks, like phylogenetic reconstruction and motif discovery, are feasible in the constructed embedding space. These results show that hierarchical relations between sequences were successfully captured in both of technical and biological aspects.
Collapse
Affiliation(s)
- Taeheon Lee
- Looxid Labs, Seoul, 06628, Republic of Korea
| | - Sangseon Lee
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, 08826, Republic of Korea
| | - Minji Kang
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul, 08826, Republic of Korea. .,Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea. .,Institute of Engineering Research, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
4
|
Urits I, Viswanath O, Orhurhu V, Gress K, Charipova K, Kaye AD, Ngo A. The Utilization of Mu-Opioid Receptor Biased Agonists: Oliceridine, an Opioid Analgesic with Reduced Adverse Effects. Curr Pain Headache Rep 2019; 23:31. [PMID: 30880365 DOI: 10.1007/s11916-019-0773-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
PURPOSE OF REVIEW The purpose of this review is to summarize the current understanding of opioid pathways in mediating and/or modulating analgesia and adverse effects. Oliceridine is highlighted as a novel mu-opioid receptor agonist with selective activation of G protein and β-arrestin signaling pathways. RECENT FINDINGS Oliceridine (TRV130; [(3-methoxythiophen-2-yl)methyl]({2-[(9R)-9-(pyridin-2-yl)-6-oxaspiro[4.5]decan-9-yl]ethyl})amine) is a novel MOR agonist that selectively activates G protein and β-arrestin signaling pathways. A growing body of evidence suggests that compared to existing MOR agonists, Oliceridine and other G protein-selective modulators may produce therapeutic analgesic effects with reduced adverse effects. Oliceridine provides analgesic benefits of a pure opioid agonist while limiting related adverse effects mediated through the β-arrestin pathway. Recent insights into the function and structure of G protein-coupled receptors has led to the development of novel analgesic therapies.
Collapse
Affiliation(s)
- Ivan Urits
- Beth Israel Deaconess Medical Center, Department of Anesthesia, Critical Care, and Pain Medicine, Harvard Medical School, 330 Brookline Ave, Boston, MA, 02215, USA.
| | - Omar Viswanath
- Valley Anesthesiology and Pain Consultants, Phoenix, AZ, USA.,University of Arizona College of Medicine-Phoenix, Phoenix, AZ, USA.,Creighton University School of Medicine, Omaha, NE, USA
| | - Vwaire Orhurhu
- Beth Israel Deaconess Medical Center, Department of Anesthesia, Critical Care, and Pain Medicine, Harvard Medical School, 330 Brookline Ave, Boston, MA, 02215, USA
| | - Kyle Gress
- Georgetown University School of Medicine, Washington, DC, USA
| | | | - Alan D Kaye
- Department of Anesthesiology, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Anh Ngo
- Beth Israel Deaconess Medical Center, Department of Anesthesia, Critical Care, and Pain Medicine, Harvard Medical School, 330 Brookline Ave, Boston, MA, 02215, USA
| |
Collapse
|
5
|
Black JB, Premont RT, Daaka Y. Feedback regulation of G protein-coupled receptor signaling by GRKs and arrestins. Semin Cell Dev Biol 2016; 50:95-104. [PMID: 26773211 PMCID: PMC4779377 DOI: 10.1016/j.semcdb.2015.12.015] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 12/19/2015] [Indexed: 12/16/2022]
Abstract
GPCRs are ubiquitous in mammalian cells and present intricate mechanisms for cellular signaling and communication. Mechanistically, GPCR signaling was identified to occur vectorially through heterotrimeric G proteins that are negatively regulated by GRK and arrestin effectors. Emerging evidence highlights additional roles for GRK and Arrestin partners, and establishes the existence of interconnected feedback pathways that collectively define GPCR signaling. GPCRs influence cellular dynamics and can mediate pathologic development, such as cancer and cardiovascular remolding. Hence, a better understanding of their overall signal regulation is of great translational interest and research continues to exploit the pharmacologic potential for modulating their activity.
Collapse
Affiliation(s)
- Joseph B Black
- Department of Anatomy and Cell Biology, University of Florida College of Medicine, Gainesville, FL 32610, United States
| | - Richard T Premont
- Department of Medicine, Duke University Medical Center, Durham, NC 27710, United States
| | - Yehia Daaka
- Department of Anatomy and Cell Biology, University of Florida College of Medicine, Gainesville, FL 32610, United States.
| |
Collapse
|
6
|
Rios S, Fernandez MF, Caltabiano G, Campillo M, Pardo L, Gonzalez A. GPCRtm: An amino acid substitution matrix for the transmembrane region of class A G Protein-Coupled Receptors. BMC Bioinformatics 2015; 16:206. [PMID: 26134144 PMCID: PMC4489126 DOI: 10.1186/s12859-015-0639-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/06/2015] [Indexed: 01/08/2023] Open
Abstract
Background Protein sequence alignments and database search methods use standard scoring matrices calculated from amino acid substitution frequencies in general sets of proteins. These general-purpose matrices are not optimal to align accurately sequences with marked compositional biases, such as hydrophobic transmembrane regions found in membrane proteins. In this work, an amino acid substitution matrix (GPCRtm) is calculated for the membrane spanning segments of the G protein-coupled receptor (GPCR) rhodopsin family; one of the largest transmembrane protein family in humans with great importance in health and disease. Results The GPCRtm matrix reveals the amino acid compositional bias distinctive of the GPCR rhodopsin family and differs from other standard substitution matrices. These membrane receptors, as expected, are characterized by a high content of hydrophobic residues with regard to globular proteins. On the other hand, the presence of polar and charged residues is higher than in average membrane proteins, displaying high frequencies of replacement within themselves. Conclusions Analysis of amino acid frequencies and values obtained from the GPCRtm matrix reveals patterns of residue replacements different from other standard substitution matrices. GPCRs prioritize the reactivity properties of the amino acids over their bulkiness in the transmembrane regions. A distinctive role is that charged and polar residues seem to evolve at different rates than other amino acids. This observation is related to the role of the transmembrane bundle in the binding of ligands, that in many cases involve electrostatic and hydrogen bond interactions. This new matrix can be useful in database search and for the construction of more accurate sequence alignments of GPCRs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0639-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Santiago Rios
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Marta F Fernandez
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Gianluigi Caltabiano
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Mercedes Campillo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Leonardo Pardo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Angel Gonzalez
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain.
| |
Collapse
|
7
|
Sinha S, Lynn AM. HMM-ModE: implementation, benchmarking and validation with HMMER3. BMC Res Notes 2014; 7:483. [PMID: 25073805 PMCID: PMC4236727 DOI: 10.1186/1756-0500-7-483] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Accepted: 07/21/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation. RESULTS The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy. CONCLUSIONS The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families.
Collapse
Affiliation(s)
| | - Andrew Michael Lynn
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
8
|
Bioinformatics tools for predicting GPCR gene functions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 796:205-24. [PMID: 24158807 DOI: 10.1007/978-94-007-7423-0_10] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
The automatic classification of GPCRs by bioinformatics methodology can provide functional information for new GPCRs in the whole 'GPCR proteome' and this information is important for the development of novel drugs. Since GPCR proteome is classified hierarchically, general ways for GPCR function prediction are based on hierarchical classification. Various computational tools have been developed to predict GPCR functions; those tools use not simple sequence searches but more powerful methods, such as alignment-free methods, statistical model methods, and machine learning methods used in protein sequence analysis, based on learning datasets. The first stage of hierarchical function prediction involves the discrimination of GPCRs from non-GPCRs and the second stage involves the classification of the predicted GPCR candidates into family, subfamily, and sub-subfamily levels. Then, further classification is performed according to their protein-protein interaction type: binding G-protein type, oligomerized partner type, etc. Those methods have achieved predictive accuracies of around 90 %. Finally, I described the future subject of research of the bioinformatics technique about functional prediction of GPCR.
Collapse
|
9
|
Port JA, Parker MS, Kodner RB, Wallace JC, Armbrust EV, Faustman EM. Identification of G protein-coupled receptor signaling pathway proteins in marine diatoms using comparative genomics. BMC Genomics 2013; 14:503. [PMID: 23883327 PMCID: PMC3727952 DOI: 10.1186/1471-2164-14-503] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 07/17/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The G protein-coupled receptor (GPCR) signaling pathway plays an essential role in signal transmission and response to external stimuli in mammalian cells. Protein components of this pathway have been characterized in plants and simpler eukaryotes such as yeast, but their presence and role in unicellular photosynthetic eukaryotes have not been determined. We use a comparative genomics approach using whole genome sequences and gene expression libraries of four diatoms (Pseudo-nitzschia multiseries, Thalassiosira pseudonana, Phaeodactylum tricornutum and Fragilariopsis cylindrus) to search for evidence of GPCR signaling pathway proteins that share sequence conservation to known GPCR pathway proteins. RESULTS The majority of the core components of GPCR signaling were well conserved in all four diatoms, with protein sequence similarity to GPCRs, human G protein α- and β-subunits and downstream effectors. There was evidence for the Gγ-subunit and thus a full heterotrimeric G protein only in T. pseudonana. Phylogenetic analysis of putative diatom GPCRs indicated similarity but deep divergence to the class C GPCRs, with branches basal to the GABAB receptor subfamily. The extracellular and intracellular regions of these putative diatom GPCR sequences exhibited large variation in sequence length, and seven of these sequences contained the necessary ligand binding domain for class C GPCR activation. Transcriptional data indicated that a number of the putative GPCR sequences are expressed in diatoms under various stress conditions in culture, and that many of the GPCR-activated signaling proteins, including the G protein, are also expressed. CONCLUSIONS The presence of sequences in all four diatoms that code for the proteins required for a functional mammalian GPCR pathway highlights the highly conserved nature of this pathway and suggests a complex signaling machinery related to environmental perception and response in these unicellular organisms. The lack of evidence for some GPCR pathway proteins in one or more of the diatoms, such as the Gγ-subunit, may be due to differences in genome completeness and genome coverage for the four diatoms. The high divergence of putative diatom GPCR sequences to known class C GPCRs suggests these sequences may represent another, potentially ancestral, subfamily of class C GPCRs.
Collapse
Affiliation(s)
- Jesse A Port
- Department of Environmental and Occupational Health Sciences, School of Public Health, University of Washington, Seattle, WA, USA
| | | | | | | | | | | |
Collapse
|
10
|
Classification of G proteins and prediction of GPCRs-G proteins coupling specificity using continuous wavelet transform and information theory. Amino Acids 2011; 43:793-804. [PMID: 22086210 DOI: 10.1007/s00726-011-1133-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Accepted: 10/20/2011] [Indexed: 10/15/2022]
Abstract
The coupling between G protein-coupled receptors (GPCRs) and guanine nucleotide-binding proteins (G proteins) regulates various signal transductions from extracellular space into the cell. However, the coupling mechanism between GPCRs and G proteins is still unknown, and experimental determination of their coupling specificity and function is both expensive and time consuming. Therefore, it is significant to develop a theoretical method to predict the coupling specificity between GPCRs and G proteins as well as their function using their primary sequences. In this study, a novel four-layer predictor (GPCRsG_CWTIT) based on support vector machine (SVM), continuous wavelet transform (CWT) and information theory (IT) is developed to classify G proteins and predict the coupling specificity between GPCRs and G proteins. SVM is used for construction of models. CWT and IT are used to characterize the primary structure of protein. Performance of GPCRsG_CWTIT is evaluated with cross-validation test on various working dataset. The overall accuracy of the G proteins at the levels of class and family is 98.23 and 85.42%, respectively. The accuracy of the coupling specificity prediction varies from 74.60 to 94.30%. These results indicate that the proposed predictor is an effective and feasible tool to predict the coupling specificity between GPCRs and G proteins as well as their functions using only the protein full sequence. The establishment of such an accurate prediction method will facilitate drug discovery by improving the ability to identify and predict protein-protein interactions. GPCRsG_CWTIT and dataset can be acquired freely on request from the authors.
Collapse
|
11
|
Naveed M, Khan AU. GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble. Amino Acids 2011; 42:1809-23. [DOI: 10.1007/s00726-011-0902-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 03/26/2011] [Indexed: 11/27/2022]
|
12
|
Li Z, Zhou X, Dai Z, Zou X. Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm. BMC Bioinformatics 2010; 11:325. [PMID: 20550715 PMCID: PMC2905366 DOI: 10.1186/1471-2105-11-325] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 06/16/2010] [Indexed: 11/25/2022] Open
Abstract
Background Because a priori knowledge about function of G protein-coupled receptors (GPCRs) can provide useful information to pharmaceutical research, the determination of their function is a quite meaningful topic in protein science. However, with the rapid increase of GPCRs sequences entering into databanks, the gap between the number of known sequence and the number of known function is widening rapidly, and it is both time-consuming and expensive to determine their function based only on experimental techniques. Therefore, it is vitally significant to develop a computational method for quick and accurate classification of GPCRs. Results In this study, a novel three-layer predictor based on support vector machine (SVM) and feature selection is developed for predicting and classifying GPCRs directly from amino acid sequence data. The maximum relevance minimum redundancy (mRMR) is applied to pre-evaluate features with discriminative information while genetic algorithm (GA) is utilized to find the optimized feature subsets. SVM is used for the construction of classification models. The overall accuracy with three-layer predictor at levels of superfamily, family and subfamily are obtained by cross-validation test on two non-redundant dataset. The results are about 0.5% to 16% higher than those of GPCR-CA and GPCRPred. Conclusion The results with high success rates indicate that the proposed predictor is a useful automated tool in predicting GPCRs. GPCR-SVMFS, a corresponding executable program for GPCRs prediction and classification, can be acquired freely on request from the authors.
Collapse
Affiliation(s)
- Zhanchao Li
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, PR China
| | | | | | | |
Collapse
|
13
|
Suwa M, Ono Y. Computational overview of GPCR gene universe to support reverse chemical genomics study. Methods Mol Biol 2010; 577:41-54. [PMID: 19718507 DOI: 10.1007/978-1-60761-232-2_4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
Abstract
In order to support high-throughput screening for ligands of G-protein coupled receptors (GPCRs) by using bioinformatics technology, we introduce a database (SEVENS) with genome-scale annotation and software (GRIFFIN) that can simulate GPCR function. SEVENS ( http://sevens.cbrc.jp/ ) is an integrated database that includes GPCR genes that are identified with high accuracy (99.4% sensitivity and 96.6% specificity) from various types of genomes, by a pipeline that integrates such software as a gene finder, a sequence alignment tool, a motif and domain assignment tool, and a transmembrane helix (TMH) predictor. SEVENS provides the user a genome-scale overview of the "GPCR universe" with detailed information of chromosomal mapping, phylogenetic tree, protein sequence and structure, and experimental evidence, all of which are accessible via a user-friendly interface. GRIFFIN ( http://griffin.cbrc.jp/ ) can predict GPCR and G-protein coupling selectivity induced by ligand binding with high sensitivity and specificity (more than 87% on average), based on the support vector machine (SVM) and hidden Markov Model (HMM). SEVENS and GRIFFIN are expected to contribute to revealing the function of orphan and unknown GPCRs.
Collapse
Affiliation(s)
- Makiko Suwa
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | | |
Collapse
|
14
|
Qiu JD, Huang JH, Liang RP, Lu XQ. Prediction of G-protein-coupled receptor classes based on the concept of Chou’s pseudo amino acid composition: An approach from discrete wavelet transform. Anal Biochem 2009; 390:68-73. [DOI: 10.1016/j.ab.2009.04.009] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2009] [Revised: 03/27/2009] [Accepted: 04/06/2009] [Indexed: 10/20/2022]
|
15
|
Gupta R, Mittal A, Singh K. A novel and efficient technique for identification and classification of GPCRs. ACTA ACUST UNITED AC 2008; 12:541-8. [PMID: 18632334 DOI: 10.1109/titb.2007.911308] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
G-protein coupled receptors (GPCRs) play a vital role in different biological processes, such as regulation of growth, death, and metabolism of cells. GPCRs are the focus of significant amount of current pharmaceutical research since they interact with more than 50% of prescription drugs. The dipeptide-based support vector machine (SVM) approach is the most accurate technique to identify and classify the GPCRs. However, this approach has two major disadvantages. First, the dimension of dipeptide-based feature vector is equal to 400. The large dimension makes the classification task computationally and memory wise inefficient. Second, it does not consider the biological properties of protein sequence for identification and classification of GPCRs. In this paper, we present a novel-feature-based SVM classification technique. The novel features are derived by applying wavelet-based time series analysis approach on protein sequences. The proposed feature space summarizes the variance information of seven important biological properties of amino acids in a protein sequence. In addition, the dimension of the feature vector for proposed technique is equal to 35. Experiments were performed on GPCRs protein sequences available at GPCRs Database. Our approach achieves an accuracy of 99.9%, 98.06%, 97.78%, and 94.08% for GPCR superfamily, families, subfamilies, and subsubfamilies (amine group), respectively, when evaluated using fivefold cross-validation. Further, an accuracy of 99.8%, 97.26%, and 97.84% was obtained when evaluated on unseen or recall datasets of GPCR superfamily, families, and subfamilies, respectively. Comparison with dipeptide-based SVM technique shows the effectiveness of our approach.
Collapse
Affiliation(s)
- Ravi Gupta
- Department of Electronics and Computer Engineering, Indian Institute of Technology-Roorkee, Roorkee 247667, India.
| | | | | |
Collapse
|
16
|
Lu F, Li J, Jiang Z. Computational identification and analysis of G protein-coupled receptor targets. Drug Dev Res 2007. [DOI: 10.1002/ddr.20148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
17
|
Guan CP, Jiang ZR, Zhou YH. Predicting the coupling specificity of GPCRs to G-proteins by support vector machines. GENOMICS PROTEOMICS & BIOINFORMATICS 2006; 3:247-51. [PMID: 16689694 PMCID: PMC5173181 DOI: 10.1016/s1672-0229(05)03035-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
G-protein coupled receptors (GPCRs) represent one of the most important classes of drug targets for pharmaceutical industry and play important roles in cellular signal transduction. Predicting the coupling specificity of GPCRs to G-proteins is vital for further understanding the mechanism of signal transduction and the function of the receptors within a cell, which can provide new clues for pharmaceutical research and development. In this study, the features of amino acid compositions and physiochemical properties of the full-length GPCR sequences have been analyzed and extracted. Based on these features, classifiers have been developed to predict the coupling specificity of GPCRs to G-proteins using support vector machines. The testing results show that this method could obtain better prediction accuracy.
Collapse
|
18
|
Guo Y, Li M, Lu M, Wen Z, Huang Z. Predicting G-protein coupled receptors-G-protein coupling specificity based on autocross-covariance transform. Proteins 2006; 65:55-60. [PMID: 16865706 DOI: 10.1002/prot.21097] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Determining G-protein coupled receptors (GPCRs) coupling specificity is very important for further understanding the functions of receptors. A successful method in this area will benefit both basic research and drug discovery practice. Previously published methods rely on the transmembrane topology prediction at training step, even at prediction step. However, the transmembrane topology predicted by even the best algorithm is not of high accuracy. In this study, we developed a new method, autocross-covariance (ACC) transform based support vector machine (SVM), to predict coupling specificity between GPCRs and G-proteins. The primary amino acid sequences are translated into vectors based on the principal physicochemical properties of the amino acids and the data are transformed into a uniform matrix by applying ACC transform. SVMs for nonpromiscuous coupled GPCRs and promiscuous coupled GPCRs were trained and validated by jackknife test and the results thus obtained are very promising. All classifiers were also evaluated by the test datasets with good performance. Besides the high prediction accuracy, the most important feature of this method is that it does not require any transmembrane topology prediction at either training or prediction step but only the primary sequences of proteins. The results indicate that this relatively simple method is applicable. Academic users can freely download the prediction program at http://www.scucic.net/group/database/Service.asp.
Collapse
Affiliation(s)
- Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, People's Republic of China
| | | | | | | | | |
Collapse
|
19
|
Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J. Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform. Amino Acids 2006; 30:397-402. [PMID: 16773242 DOI: 10.1007/s00726-006-0332-z] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2005] [Accepted: 01/04/2006] [Indexed: 10/24/2022]
Abstract
As the potential drug targets, G-protein coupled receptors (GPCRs) and nuclear receptors (NRs) are the focuses in pharmaceutical research. It is of great practical significance to develop an automated and reliable method to facilitate the identification of novel receptors. In this study, a method of fast Fourier transform-based support vector machine was proposed to classify GPCRs and NRs from the hydrophobicity of proteins. The models for all the GPCR families and NR subfamilies were trained and validated using jackknife test and the results thus obtained are quite promising. Meanwhile, the performance of the method was evaluated on GPCR and NR independent datasets with good performance. The good results indicate the applicability of the method. Two web servers implementing the prediction are available at http://chem.scu.edu.cn/blast/Pred-GPCR and http://chem.scu.edu.cn/blast/Pred-NR.
Collapse
Affiliation(s)
- Y-Z Guo
- College of Chemistry, Sichuan University, Chengdu, China
| | | | | | | | | | | | | |
Collapse
|
20
|
Surgand JS, Rodrigo J, Kellenberger E, Rognan D. A chemogenomic analysis of the transmembrane binding cavity of human G-protein-coupled receptors. Proteins 2006; 62:509-38. [PMID: 16294340 DOI: 10.1002/prot.20768] [Citation(s) in RCA: 189] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The amino acid sequences of 369 human nonolfactory G-protein-coupled receptors (GPCRs) have been aligned at the seven transmembrane domain (TM) and used to extract the nature of 30 critical residues supposed--from the X-ray structure of bovine rhodopsin bound to retinal--to line the TM binding cavity of ground-state receptors. Interestingly, the clustering of human GPCRs from these 30 residues mirrors the recently described phylogenetic tree of full-sequence human GPCRs (Fredriksson et al., Mol Pharmacol 2003;63:1256-1272) with few exceptions. A TM cavity could be found for all investigated GPCRs with physicochemical properties matching that of their cognate ligands. The current approach allows a very fast comparison of most human GPCRs from the focused perspective of the predicted TM cavity and permits to easily detect key residues that drive ligand selectivity or promiscuity.
Collapse
|
21
|
Yabuki Y, Muramatsu T, Hirokawa T, Mukai H, Suwa M. GRIFFIN: a system for predicting GPCR-G-protein coupling selectivity using a support vector machine and a hidden Markov model. Nucleic Acids Res 2005; 33:W148-53. [PMID: 15980445 PMCID: PMC1160255 DOI: 10.1093/nar/gki495] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We describe a novel system, GRIFFIN (G-protein and Receptor Interaction Feature Finding INstrument), that predicts G-protein coupled receptor (GPCR) and G-protein coupling selectivity based on a support vector machine (SVM) and a hidden Markov model (HMM) with high sensitivity and specificity. Based on our assumption that whole structural segments of ligands, GPCRs and G-proteins are essential to determine GPCR and G-protein coupling, various quantitative features were selected for ligands, GPCRs and G-protein complex structures, and those parameters that are the most effective in selecting G-protein type were used as feature vectors in the SVM. The main part of GRIFFIN includes a hierarchical SVM classifier using the feature vectors, which is useful for Class A GPCRs, the major family. For the opsins and olfactory subfamilies of Class A and other minor families (Classes B, C, frizzled and smoothened), the binding G-protein is predicted with high accuracy using the HMM. Applying this system to known GPCR sequences, each binding G-protein is predicted with high sensitivity and specificity (>85% on average). GRIFFIN () is freely available and allows users to easily execute this reliable prediction of G-proteins.
Collapse
Affiliation(s)
- Yukimitsu Yabuki
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Information and Mathematical Science Laboratory (IMS) Inc.Meikei Building, 1-5-21 Otsuka, Bunkyo-ku, Tokyo 112-0012, Japan
| | - Takahiko Muramatsu
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Nara Institute of Science and Technology, Graduate School of Information Science8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, Japan
| | - Takatsugu Hirokawa
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Hidehito Mukai
- Mitsubishi Kagaku Institute of Life Sciences11 Minamiooya, Machida, Tokyo 194-8511, Japan
| | - Makiko Suwa
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Nara Institute of Science and Technology, Graduate School of Information Science8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, Japan
- To whom correspondence should be addressed. Tel: +81 3 3599 8051; Fax: +81 3 3599 8081;
| |
Collapse
|
22
|
Huang Y, Cai J, Ji L, Li Y. Classifying G-protein coupled receptors with bagging classification tree. Comput Biol Chem 2004; 28:275-80. [PMID: 15548454 DOI: 10.1016/j.compbiolchem.2004.08.001] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2004] [Revised: 08/05/2004] [Accepted: 08/06/2004] [Indexed: 11/17/2022]
Abstract
G-protein coupled receptors (GPCRs) play a key role in different biological processes, such as regulation of growth, death and metabolism of cells. They are major therapeutic targets of numerous prescribed drugs. However, the ligand specificity of many receptors is unknown and there is little structural information available. Bioinformatics may offer one approach to bridge the gap between sequence data and functional knowledge of a receptor. In this paper, we use a bagging classification tree algorithm to predict the type of the receptor based on its amino acid composition. The prediction is performed for GPCR at the sub-family and sub-sub-family level. In a cross-validation test, we achieved an overall predictive accuracy of 91.1% for GPCR sub-family classification, and 82.4% for sub-sub-family classification. These results demonstrate the applicability of this relative simple method and its potential for improving prediction accuracy.
Collapse
Affiliation(s)
- Ying Huang
- Department of Automation, MOE Key Laboratory of Bioinformatics, Institute of Bioinformatics, Tsinghua University, Beijing 10084, China.
| | | | | | | |
Collapse
|