1
|
Baumann C, Zerbe O. The role of leucine and isoleucine in tuning the hydropathy of class A GPCRs. Proteins 2024; 92:15-23. [PMID: 37497770 DOI: 10.1002/prot.26559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 06/19/2023] [Accepted: 07/11/2023] [Indexed: 07/28/2023]
Abstract
Leucine and Isoleucine are two amino acids that differ only by the positioning of one methyl group. This small difference can have important consequences in α-helices, as the β-branching of Ile results in helix destabilization. We set out to investigate whether there are general trends for the occurrences of Leu and Ile residues in the structures and sequences of class A GPCRs (G protein-coupled receptors). GPCRs are integral membrane proteins in which α-helices span the plasma membrane seven times and which play a crucial role in signal transmission. We found that Leu side chains are generally more exposed at the protein surface than Ile side chains. We explored whether this difference might be attributed to different functions of the two amino acids and tested if Leu tunes the hydrophobicity of the transmembrane domain based on the Wimley-White whole-residue hydrophobicity scales. Leu content decreases the variation in hydropathy between receptors and correlates with the non-Leu receptor hydropathy. Both measures indicate that hydropathy is tuned by Leu. To test this idea further, we generated protein sequences with random amino acid compositions using a simple numerical model, in which hydropathy was tuned by adjusting the number of Leu residues. The model was able to replicate the observations made with class A GPCR sequences. We speculate that the hydropathy of transmembrane domains of class A GPCRs is tuned by Leu (and to some lesser degree by Lys and Val) to facilitate correct insertion into membranes and/or to stably anchor the receptors within membranes.
Collapse
Affiliation(s)
| | - Oliver Zerbe
- Department of Chemistry, University of Zurich, Zurich, Switzerland
| |
Collapse
|
2
|
Jaster AM, González-Maeso J. Mechanisms and molecular targets surrounding the potential therapeutic effects of psychedelics. Mol Psychiatry 2023; 28:3595-3612. [PMID: 37759040 DOI: 10.1038/s41380-023-02274-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]
Abstract
Psychedelics, also known as classical hallucinogens, have been investigated for decades due to their potential therapeutic effects in the treatment of neuropsychiatric and substance use disorders. The results from clinical trials have shown promise for the use of psychedelics to alleviate symptoms of depression and anxiety, as well as to promote substantial decreases in the use of nicotine and alcohol. While these studies provide compelling evidence for the powerful subjective experience and prolonged therapeutic adaptations, the underlying molecular reasons for these robust and clinically meaningful improvements are still poorly understood. Preclinical studies assessing the targets and circuitry of the post-acute effects of classical psychedelics are ongoing. Current literature is split between a serotonin 5-HT2A receptor (5-HT2AR)-dependent or -independent signaling pathway, as researchers are attempting to harness the mechanisms behind the sustained post-acute therapeutically relevant effects. A combination of molecular, behavioral, and genetic techniques in neuropharmacology has begun to show promise for elucidating these mechanisms. As the field progresses, increasing evidence points towards the importance of the subjective experience induced by psychedelic-assisted therapy, but without further cross validation between clinical and preclinical research, the why behind the experience and its translational validity may be lost.
Collapse
Affiliation(s)
- Alaina M Jaster
- Department of Physiology and Biophysics, Virginia Commonwealth University School of Medicine, Richmond, VA, 23298, USA
- Department of Pharmacology and Toxicology, Virginia Commonwealth University School of Medicine, Richmond, VA, 23298, USA
| | - Javier González-Maeso
- Department of Physiology and Biophysics, Virginia Commonwealth University School of Medicine, Richmond, VA, 23298, USA.
| |
Collapse
|
3
|
To Affinity and Beyond: A Personal Reflection on the Design and Discovery of Drugs. Molecules 2022; 27:molecules27217624. [DOI: 10.3390/molecules27217624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/19/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
Faced with new and as yet unmet medical need, the stark underperformance of the pharmaceutical discovery process is well described if not perfectly understood. Driven primarily by profit rather than societal need, the search for new pharmaceutical products—small molecule drugs, biologicals, and vaccines—is neither properly funded nor sufficiently systematic. Many innovative approaches remain significantly underused and severely underappreciated, while dominant methodologies are replete with problems and limitations. Design is a component of drug discovery that is much discussed but seldom realised. In and of itself, technical innovation alone is unlikely to fulfil all the possibilities of drug discovery if the necessary underlying infrastructure remains unaltered. A fundamental revision in attitudes, with greater reliance on design powered by computational approaches, as well as a move away from the commercial imperative, is thus essential to capitalise fully on the potential of pharmaceutical intervention in healthcare.
Collapse
|
4
|
Kabir MN, Wong L. EnsembleFam: towards more accurate protein family prediction in the twilight zone. BMC Bioinformatics 2022; 23:90. [PMID: 35287576 PMCID: PMC8919565 DOI: 10.1186/s12859-022-04626-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 03/02/2022] [Indexed: 11/30/2022] Open
Abstract
Background Current protein family modeling methods like profile Hidden Markov Model (pHMM), k-mer based methods, and deep learning-based methods do not provide very accurate protein function prediction for proteins in the twilight zone, due to low sequence similarity to reference proteins with known functions. Results We present a novel method EnsembleFam, aiming at better function prediction for proteins in the twilight zone. EnsembleFam extracts the core characteristics of a protein family using similarity and dissimilarity features calculated from sequence homology relations. EnsembleFam trains three separate Support Vector Machine (SVM) classifiers for each family using these features, and an ensemble prediction is made to classify novel proteins into these families. Extensive experiments are conducted using the Clusters of Orthologous Groups (COG) dataset and G Protein-Coupled Receptor (GPCR) dataset. EnsembleFam not only outperforms state-of-the-art methods on the overall dataset but also provides a much more accurate prediction for twilight zone proteins. Conclusions EnsembleFam, a machine learning method to model protein families, can be used to better identify members with very low sequence homology. Using EnsembleFam protein functions can be predicted using just sequence information with better accuracy than state-of-the-art methods.
Collapse
Affiliation(s)
- Mohammad Neamul Kabir
- Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore.
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore
| |
Collapse
|
5
|
Fan Y, Lu X, Zhao J, Fu H, Liu Y. Estimating individualized treatment rules for treatments with hierarchical structure. Electron J Stat 2022. [DOI: 10.1214/21-ejs1948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Yiwei Fan
- Center for Applied Statistics, School of Statistics, Renmin University of China, China
| | - Xiaoling Lu
- Center for Applied Statistics, School of Statistics, Renmin University of China, China
| | - Junlong Zhao
- School of Statistics, Beijing Normal University, China
| | - Haoda Fu
- Advanced Analytics and Data Sciences, Eli Lilly and Company, U.S.A
| | - Yufeng Liu
- Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, Carolina Center for Genome Sciences, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, U.S.A
| |
Collapse
|
6
|
Velloso JPL, Ascher DB, Pires DEV. pdCSM-GPCR: predicting potent GPCR ligands with graph-based signatures. BIOINFORMATICS ADVANCES 2021; 1:vbab031. [PMID: 34901870 PMCID: PMC8651072 DOI: 10.1093/bioadv/vbab031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 09/30/2021] [Accepted: 11/02/2021] [Indexed: 01/26/2023]
Abstract
MOTIVATION G protein-coupled receptors (GPCRs) can selectively bind to many types of ligands, ranging from light-sensitive compounds, ions, hormones, pheromones and neurotransmitters, modulating cell physiology. Considering their role in many essential cellular processes, they are one of the most targeted protein families, with over a third of all approved drugs modulating GPCR signalling. Despite this, the large diversity of receptors and their multipass transmembrane architectures make the identification and development of novel specific, and safe GPCR ligands a challenge. While computational approaches have the potential to assist GPCR drug development, they have presented limited performance and generalization capabilities. Here, we explored the use of graph-based signatures to develop pdCSM-GPCR, a method capable of rapidly and accurately screening potential GPCR ligands. RESULTS Bioactivity data (IC50, EC50, Ki and Kd) for individual GPCRs were curated. After curation, we used the data for developing predictive models for 36 major GPCR targets, across 4 classes (A, B, C and F). Our models compose the most comprehensive computational resource for GPCR bioactivity prediction to date. Across stratified 10-fold cross-validation and blind tests, our approach achieved Pearson's correlations of up to 0.89, significantly outperforming previous methods. Interpreting our results, we identified common important features of potent GPCRs ligands, which tend to have bicyclic rings, leading to higher levels of aromaticity. We believe pdCSM-GPCR will be an invaluable tool to assist screening efforts, enriching compound libraries and ranking candidates for further experimental validation. AVAILABILITY AND IMPLEMENTATION pdCSM-GPCR predictive models and datasets used have been made available via a freely accessible and easy-to-use web server at http://biosig.unimelb.edu.au/pdcsm_gpcr/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- João Paulo L Velloso
- Fundação Oswaldo Cruz, Instituto René Rachou, Belo Horizonte 30190-009, Brazil,Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia,Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Melbourne 3052, Australia,Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,To whom correspondence should be addressed. or
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia,School of Computing and Information Systems, University of Melbourne, Melbourne 3053, Australia,To whom correspondence should be addressed. or
| |
Collapse
|
7
|
Sandaruwan PD, Wannige CT. An improved deep learning model for hierarchical classification of protein families. PLoS One 2021; 16:e0258625. [PMID: 34669708 PMCID: PMC8528337 DOI: 10.1371/journal.pone.0258625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 10/01/2021] [Indexed: 12/28/2022] Open
Abstract
Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.
Collapse
|
8
|
Ling C, Wei X, Shen Y, Zhang H. Development and validation of multiple machine learning algorithms for the classification of G-protein-coupled receptors using molecular evolution model-based feature extraction strategy. Amino Acids 2021; 53:1705-1714. [PMID: 34562175 DOI: 10.1007/s00726-021-03080-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 09/13/2021] [Indexed: 11/25/2022]
Abstract
Machine learning is one of the most potential ways to realize the function prediction of the incremental large-scale G-protein-coupled receptors (GPCR). Prior research reveals that the key to determining the overall classification accuracy of GPCR is extracting valuable features and filtering out redundancy. To achieve a more efficient classification model, we put the feature synonym problem into consideration and create a new method based on functional word clustering and integration. Through evaluating the evolution correlation between features using the transition scores in mature molecular substitution matrices, candidate features are clustered into synonym groups. Each group of the clustered features is then integrated and represented by a unique key functional word. These retained key functional words are used to form a feature knowledge base. The original GPCR sequences are then transferred into feature vectors based on a feature re-extraction strategy according to the features in the knowledge base before the training and testing stage. We create multiple machine learning models based on Naïve Bayesian (NB), random forest (RF), support vector machine (SVM), and multi-layer perceptron (MLP) algorithms. The established model is applied to classify two public data sets containing 8354 and 12,731 GPCRs, respectively. These models achieve significant performance in almost all evaluation criteria in comparison with state-of-the art. This work demonstrated the potential of the novel feature extraction strategy and provided an effective theoretical design for the hierarchical classification of GPCRs.
Collapse
Affiliation(s)
- Cheng Ling
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Xiaolin Wei
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Yitian Shen
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Haoyu Zhang
- School of Information Engineering, Zhejiang Ocean University, Zhoushan, China.
| |
Collapse
|
9
|
Akon M, Akon M, Kabir M, Rahman MS, Rahman MS. ADACT: a tool for analysing (dis)similarity among nucleotide and protein sequences using minimal and relative absent words. Bioinformatics 2021; 37:1468-1470. [PMID: 33016997 DOI: 10.1093/bioinformatics/btaa853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 09/09/2020] [Accepted: 09/21/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Researchers and practitioners use a number of popular sequence comparison tools that use many alignment-based techniques. Due to high time and space complexity and length-related restrictions, researchers often seek alignment-free tools. Recently, some interesting ideas, namely, Minimal Absent Words (MAW) and Relative Absent Words (RAW), have received much interest among the scientific community as distance measures that can give us alignment-free alternatives. This drives us to structure a framework for analysing biological sequences in an alignment-free manner. RESULTS In this application note, we present Alignment-free Dissimilarity Analysis & Comparison Tool (ADACT), a simple web-based tool that computes the analogy among sequences using a varied number of indexes through the distance matrix, species relation list and phylogenetic tree. This tool basically combines absent word (MAW or RAW) computation, dissimilarity measures, species relationship and thus brings all required software in one platform for the ease of researchers and practitioners alike in the field of bioinformatics. We have also developed a restful API. AVAILABILITY AND IMPLEMENTATION ADACT has been hosted at http://research.buet.ac.bd/ADACT/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
10
|
Solís KH, Romero-Ávila MT, Guzmán-Silva A, García-Sáinz JA. The LPA 3 Receptor: Regulation and Activation of Signaling Pathways. Int J Mol Sci 2021; 22:ijms22136704. [PMID: 34201414 PMCID: PMC8269014 DOI: 10.3390/ijms22136704] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 06/08/2021] [Accepted: 06/12/2021] [Indexed: 12/17/2022] Open
Abstract
The lysophosphatidic acid 3 receptor (LPA3) participates in different physiological actions and in the pathogenesis of many diseases through the activation of different signal pathways. Knowledge of the regulation of the function of the LPA3 receptor is a crucial element for defining its roles in health and disease. This review describes what is known about the signaling pathways activated in terms of its various actions. Next, we review knowledge on the structure of the LPA3 receptor, the domains found, and the roles that the latter might play in ligand recognition, signaling, and cellular localization. Currently, there is some information on the action of LPA3 in different cells and whole organisms, but very little is known about the regulation of its function. Areas in which there is a gap in our knowledge are indicated in order to further stimulate experimental work on this receptor and on other members of the LPA receptor family. We are convinced that knowledge on how this receptor is activated, the signaling pathways employed and how the receptor internalization and desensitization are controlled will help design new therapeutic interventions for treating diseases in which the LPA3 receptor is implicated.
Collapse
|
11
|
Lee T, Lee S, Kang M, Kim S. Deep hierarchical embedding for simultaneous modeling of GPCR proteins in a unified metric space. Sci Rep 2021; 11:9543. [PMID: 33953216 PMCID: PMC8100104 DOI: 10.1038/s41598-021-88623-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 04/13/2021] [Indexed: 11/23/2022] Open
Abstract
GPCR proteins belong to diverse families of proteins that are defined at multiple hierarchical levels. Inspecting relationships between GPCR proteins on the hierarchical structure is important, since characteristics of the protein can be inferred from proteins in similar hierarchical information. However, modeling of GPCR families has been performed separately for each of the family, subfamily, and sub-subfamily level. Relationships between GPCR proteins are ignored in these approaches as they process the information in the proteins with several disconnected models. In this study, we propose DeepHier, a deep learning model to simultaneously learn representations of GPCR family hierarchy from the protein sequences with a unified single model. Novel loss term based on metric learning is introduced to incorporate hierarchical relations between proteins. We tested our approach using a public GPCR sequence dataset. Metric distances in the deep feature space corresponded to the hierarchical family relation between GPCR proteins. Furthermore, we demonstrated that further downstream tasks, like phylogenetic reconstruction and motif discovery, are feasible in the constructed embedding space. These results show that hierarchical relations between sequences were successfully captured in both of technical and biological aspects.
Collapse
Affiliation(s)
- Taeheon Lee
- Looxid Labs, Seoul, 06628, Republic of Korea
| | - Sangseon Lee
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, 08826, Republic of Korea
| | - Minji Kang
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul, 08826, Republic of Korea. .,Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea. .,Institute of Engineering Research, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
12
|
Lengger B, Jensen MK. Engineering G protein-coupled receptor signalling in yeast for biotechnological and medical purposes. FEMS Yeast Res 2021; 20:5673487. [PMID: 31825496 PMCID: PMC6977407 DOI: 10.1093/femsyr/foz087] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 12/09/2019] [Indexed: 12/13/2022] Open
Abstract
G protein-coupled receptors (GPCRs) comprise the largest class of membrane proteins in the human genome, with a common denominator of seven-transmembrane domains largely conserved among eukaryotes. Yeast is naturally armoured with three different GPCRs for pheromone and sugar sensing, with the pheromone pathway being extensively hijacked for characterising heterologous GPCR signalling in a model eukaryote. This review focusses on functional GPCR studies performed in yeast and on the elucidated hotspots for engineering, and discusses both endogenous and heterologous GPCR signalling. Key emphasis will be devoted to studies describing important engineering parameters to consider for successful coupling of GPCRs to the yeast mating pathway. We also review the various means of applying yeast for studying GPCRs, including the use of yeast armed with heterologous GPCRs as a platform for (i) deorphanisation of orphan receptors, (ii) metabolic engineering of yeast for production of bioactive products and (iii) medical applications related to pathogen detection and drug discovery. Finally, this review summarises the current challenges related to expression of functional membrane-bound GPCRs in yeast and discusses the opportunities to continue capitalising on yeast as a model chassis for functional GPCR signalling studies.
Collapse
Affiliation(s)
- Bettina Lengger
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, Kgs. Lyngby, 2800, Denmark
| | - Michael K Jensen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, Kgs. Lyngby, 2800, Denmark
| |
Collapse
|
13
|
Du N, Shang J, Sun Y. Improving protein domain classification for third-generation sequencing reads using deep learning. BMC Genomics 2021; 22:251. [PMID: 33836667 PMCID: PMC8033682 DOI: 10.1186/s12864-021-07468-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Accepted: 02/19/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND With the development of third-generation sequencing (TGS) technologies, people are able to obtain DNA sequences with lengths from 10s to 100s of kb. These long reads allow protein domain annotation without assembly, thus can produce important insights into the biological functions of the underlying data. However, the high error rate in TGS data raises a new challenge to established domain analysis pipelines. The state-of-the-art methods are not optimized for noisy reads and have shown unsatisfactory accuracy of domain classification in TGS data. New computational methods are still needed to improve the performance of domain prediction in long noisy reads. RESULTS In this work, we introduce ProDOMA, a deep learning model that conducts domain classification for TGS reads. It uses deep neural networks with 3-frame translation encoding to learn conserved features from partially correct translations. In addition, we formulate our problem as an open-set problem and thus our model can reject reads not containing the targeted domains. In the experiments on simulated long reads of protein coding sequences and real TGS reads from the human genome, our model outperforms HMMER and DeepFam on protein domain classification. CONCLUSIONS In summary, ProDOMA is a useful end-to-end protein domain analysis tool for long noisy reads without relying on error correction.
Collapse
Affiliation(s)
- Nan Du
- Computer Science and Engineering, Michigan State University, East Lansing, 48824 USA
| | - Jiayu Shang
- Electrical Engineering, City University of Hong Kong, Hong Kong, People’s Republic of China
| | - Yanni Sun
- Electrical Engineering, City University of Hong Kong, Hong Kong, People’s Republic of China
| |
Collapse
|
14
|
Yusuf SM, Zhang F, Zeng M, Li M. DeepPPF: A deep learning framework for predicting protein family. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.11.062] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
15
|
Khelifa MS, Skov LJ, Holst B. Biased Ghrelin Receptor Signaling and the Dopaminergic System as Potential Targets for Metabolic and Psychological Symptoms of Anorexia Nervosa. Front Endocrinol (Lausanne) 2021; 12:734547. [PMID: 34646236 PMCID: PMC8503187 DOI: 10.3389/fendo.2021.734547] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 08/16/2021] [Indexed: 12/15/2022] Open
Abstract
Anorexia Nervosa (AN) is a complex disease that impairs the metabolic, mental and physiological health of affected individuals in a severe and sometimes lethal way. Many of the common symptoms in AN patients, such as reduced food intake, anxiety, impaired gut motility or overexercising are connected to both the orexigenic gut hormone ghrelin and the dopaminergic system. Targeting the ghrelin receptor (GhrR) to treat AN seems a promising possibility in current research. However, GhrR signaling is highly complex. First, the GhrR can activate four known intracellular pathways Gαq, Gαi/o, Gα12/13 and the recruitment of β-arrestin. Biased signaling provides the possibility to activate or inhibit only one or a subset of the intracellular pathways of a pleiotropic receptor. This allows specific targeting of physiological functions without adverse effects. Currently little is known on how biased signaling could specifically modulate GhrR effects. Second, GhrR signaling has been shown to be interconnected with the dopaminergic system, particularly in the context of AN symptoms. This review highlights that a biased agonist for the GhrR may be a promising target for the treatment of AN, however extensive and systematic translational studies are still needed and the connection to the dopaminergic system has to be taken into account.
Collapse
|
16
|
Paki R, Nourani E, Farajzadeh D. Classification of G protein-coupled receptors using attention mechanism. GENE REPORTS 2020. [DOI: 10.1016/j.genrep.2020.100882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
17
|
Odoemelam CS, Percival B, Wallis H, Chang MW, Ahmad Z, Scholey D, Burton E, Williams IH, Kamerlin CL, Wilson PB. G-Protein coupled receptors: structure and function in drug discovery. RSC Adv 2020; 10:36337-36348. [PMID: 35517958 PMCID: PMC9057076 DOI: 10.1039/d0ra08003a] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 09/22/2020] [Indexed: 12/13/2022] Open
Abstract
The G-protein coupled receptors (GPCRs) superfamily comprise similar proteins arranged into families or classes thus making it one of the largest in the mammalian genome. GPCRs take part in many vital physiological functions making them targets for numerous novel drugs. GPCRs share some distinctive features, such as the seven transmembrane domains, they also differ in the number of conserved residues in their transmembrane domain. Here we provide an introductory and accessible review detailing the computational advances in GPCR pharmacology and drug discovery. An overview is provided on family A-C GPCRs; their structural differences, GPCR signalling, allosteric binding and cooperativity. The dielectric constant (relative permittivity) of proteins is also discussed in the context of site-specific environmental effects.
Collapse
Affiliation(s)
| | - Benita Percival
- Nottingham Trent University 50 Shakespeare St Nottingham NG1 4FQ UK
| | - Helen Wallis
- Nottingham Trent University 50 Shakespeare St Nottingham NG1 4FQ UK
| | - Ming-Wei Chang
- Nanotechnology and Integrated Bioengineering Centre, University of Ulster Jordanstown Campus Newtownabbey BT37 0QB Northern Ireland UK
| | - Zeeshan Ahmad
- De Montfort University The Gateway Leicester LE1 9BH UK
| | - Dawn Scholey
- Nottingham Trent University 50 Shakespeare St Nottingham NG1 4FQ UK
| | - Emily Burton
- Nottingham Trent University 50 Shakespeare St Nottingham NG1 4FQ UK
| | - Ian H Williams
- Department of Chemistry, University of Bath Claverton Down Bath BA1 7AY UK
| | | | | |
Collapse
|
18
|
Fan Y, Lu X, Liu Y, Zhao J. Angle-Based Hierarchical Classification Using Exact Label Embedding. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1801450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Yiwei Fan
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
| | - Xiaoling Lu
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
| | - Yufeng Liu
- Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, Carolina Center for Genome Science, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, NC
| | - Junlong Zhao
- School of Statistics, Beijing Normal University, Beijing, China
| |
Collapse
|
19
|
Seo S, Oh M, Park Y, Kim S. DeepFam: deep learning based alignment-free method for protein family modeling and prediction. Bioinformatics 2019; 34:i254-i262. [PMID: 29949966 PMCID: PMC6022622 DOI: 10.1093/bioinformatics/bty275] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Motivation A large number of newly sequenced proteins are generated by the next-generation sequencing technologies and the biochemical function assignment of the proteins is an important task. However, biological experiments are too expensive to characterize such a large number of protein sequences, thus protein function prediction is primarily done by computational modeling methods, such as profile Hidden Markov Model (pHMM) and k-mer based methods. Nevertheless, existing methods have some limitations; k-mer based methods are not accurate enough to assign protein functions and pHMM is not fast enough to handle large number of protein sequences from numerous genome projects. Therefore, a more accurate and faster protein function prediction method is needed. Results In this paper, we introduce DeepFam, an alignment-free method that can extract functional information directly from sequences without the need of multiple sequence alignments. In extensive experiments using the Clusters of Orthologous Groups (COGs) and G protein-coupled receptor (GPCR) dataset, DeepFam achieved better performance in terms of accuracy and runtime for predicting functions of proteins compared to the state-of-the-art methods, both alignment-free and alignment-based methods. Additionally, we showed that DeepFam has a power of capturing conserved regions to model protein families. In fact, DeepFam was able to detect conserved regions documented in the Prosite database while predicting functions of proteins. Our deep learning method will be useful in characterizing functions of the ever increasing protein sequences. Availability and implementation Codes are available at https://bhi-kimlab.github.io/DeepFam.
Collapse
Affiliation(s)
- Seokjun Seo
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
| | - Minsik Oh
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
| | - Youngjune Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.,Bioinformatics Institute, Seoul National University, Seoul, Korea
| |
Collapse
|
20
|
Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X. DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 2018; 34:760-769. [PMID: 29069344 PMCID: PMC6030869 DOI: 10.1093/bioinformatics/btx680] [Citation(s) in RCA: 124] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/20/2017] [Indexed: 11/15/2022] Open
Abstract
Motivation Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number. Results We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre’s ability to capture the functional difference of enzyme isoforms. Availability and implementation The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu Li
- Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Sheng Wang
- Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Ramzan Umarov
- Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Bingqing Xie
- Computer Science Department, Illinois Institute of Technology, Chicago, IL 60616, USA
| | - Ming Fan
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Lihua Li
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Xin Gao
- Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
21
|
GPCR-SAS: A web application for statistical analyses on G protein-coupled receptors sequences. PLoS One 2018; 13:e0199843. [PMID: 30044824 PMCID: PMC6059404 DOI: 10.1371/journal.pone.0199843] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 06/14/2018] [Indexed: 11/19/2022] Open
Abstract
G protein-coupled receptors (GPCRs) are one of the largest protein families in mammals. They mediate signal transduction across cell membranes and are important targets for the pharmaceutical industry. The G Protein-Coupled Receptors-Sequence Analysis and Statistics (GPCR-SAS) web application provides a set of tools to perform comparative analysis of sequence positions between receptors, based on a curated structural-informed multiple sequence alignment. The analysis tools include: (i) percentage of occurrence of an amino acid or motif and entropy at a position or range of positions, (ii) covariance of two positions, (iii) correlation between two amino acids in two positions (or two sequence motifs in two ranges of positions), and (iv) snake-plot representation for a specific receptor or for the consensus sequence of a group of selected receptors. The analysis of conservation of residues and motifs across transmembrane (TM) segments may guide the design of more selective ligands or help to rationalize activation mechanisms, among others. As an example, here we analyze the amino acids of the "transmission switch", that initiates receptor activation following ligand binding. The tool is freely accessible at http://lmc.uab.cat/gpcrsas/.
Collapse
|
22
|
Yu B, Lou L, Li S, Zhang Y, Qiu W, Wu X, Wang M, Tian B. Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising. J Mol Graph Model 2017; 76:260-273. [DOI: 10.1016/j.jmgm.2017.07.012] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 07/11/2017] [Accepted: 07/12/2017] [Indexed: 11/25/2022]
|
23
|
Cao L, Graauw MD, Yan K, Winkel L, Verbeek FJ. Hierarchical classification strategy for Phenotype extraction from epidermal growth factor receptor endocytosis screening. BMC Bioinformatics 2016; 17:196. [PMID: 27142862 PMCID: PMC4855371 DOI: 10.1186/s12859-016-1053-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 04/13/2016] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Endocytosis is regarded as a mechanism of attenuating the epidermal growth factor receptor (EGFR) signaling and of receptor degradation. There is increasing evidence becoming available showing that breast cancer progression is associated with a defect in EGFR endocytosis. In order to find related Ribonucleic acid (RNA) regulators in this process, high-throughput imaging with fluorescent markers is used to visualize the complex EGFR endocytosis process. Subsequently a dedicated automatic image and data analysis system is developed and applied to extract the phenotype measurement and distinguish different developmental episodes from a huge amount of images acquired through high-throughput imaging. For the image analysis, a phenotype measurement quantifies the important image information into distinct features or measurements. Therefore, the manner in which prominent measurements are chosen to represent the dynamics of the EGFR process becomes a crucial step for the identification of the phenotype. In the subsequent data analysis, classification is used to categorize each observation by making use of all prominent measurements obtained from image analysis. Therefore, a better construction for a classification strategy will support to raise the performance level in our image and data analysis system. RESULTS In this paper, we illustrate an integrated analysis method for EGFR signalling through image analysis of microscopy images. Sophisticated wavelet-based texture measurements are used to obtain a good description of the characteristic stages in the EGFR signalling. A hierarchical classification strategy is designed to improve the recognition of phenotypic episodes of EGFR during endocytosis. Different strategies for normalization, feature selection and classification are evaluated. CONCLUSIONS The results of performance assessment clearly demonstrate that our hierarchical classification scheme combined with a selected set of features provides a notable improvement in the temporal analysis of EGFR endocytosis. Moreover, it is shown that the addition of the wavelet-based texture features contributes to this improvement. Our workflow can be applied to drug discovery to analyze defected EGFR endocytosis processes.
Collapse
Affiliation(s)
- Lu Cao
- />Imaging and Bio-informatics group, LIACS, Leiden University, Niels Bohrweg 1, Leiden, 2333 CA The Netherlands
- />The Department of Anatomy and Embryology, LUMC, Einthovenweg 20, Leiden, 2333 ZC The Netherlands
| | - Marjo de Graauw
- />Division of Toxicology, LACDR, Leiden University, Einsteinweg 55, Leiden, 2333 CC The Netherlands
| | - Kuan Yan
- />Imaging and Bio-informatics group, LIACS, Leiden University, Niels Bohrweg 1, Leiden, 2333 CA The Netherlands
| | - Leah Winkel
- />Biomechanics Laboratory, Erasmus MC, Wytemaweg 80, Rotterdam, 3015 CN The Netherlands
| | - Fons J. Verbeek
- />Imaging and Bio-informatics group, LIACS, Leiden University, Niels Bohrweg 1, Leiden, 2333 CA The Netherlands
| |
Collapse
|
24
|
Shalaeva DN, Galperin MY, Mulkidjanian AY. Eukaryotic G protein-coupled receptors as descendants of prokaryotic sodium-translocating rhodopsins. Biol Direct 2015; 10:63. [PMID: 26472483 PMCID: PMC4608122 DOI: 10.1186/s13062-015-0091-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 10/12/2015] [Indexed: 12/20/2022] Open
Abstract
Abstract Microbial rhodopsins and G-protein coupled receptors (GPCRs, which include animal rhodopsins) are two distinct (super) families of heptahelical (7TM) membrane proteins that share obvious structural similarities but no significant sequence similarity. Comparison of the recently solved high-resolution structures of the sodium-translocating bacterial rhodopsin and various Na+-binding GPCRs revealed striking similarity of their sodium-binding sites. This similarity allowed us to construct a structure-guided sequence alignment for the two (super)families, which highlighted their evolutionary relatedness. Our analysis supports a common underlying molecular mechanism for both families that involves a highly conserved aromatic residue playing a pivotal role in rotation of the 6th transmembrane helix. Reviewers This article was reviewed by Oded Beja, G. P. S. Raghava and L. Aravind. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0091-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daria N Shalaeva
- School of Physics, Osnabrueck University, 49069, Osnabrueck, Germany. .,School of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119992, Russia.
| | - Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| | - Armen Y Mulkidjanian
- School of Physics, Osnabrueck University, 49069, Osnabrueck, Germany. .,School of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119992, Russia. .,A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119992, Russia.
| |
Collapse
|
25
|
Helix 8 of the angiotensin- II type 1A receptor interacts with phosphatidylinositol phosphates and modulates membrane insertion. Sci Rep 2015; 5:9972. [PMID: 26126083 PMCID: PMC5378882 DOI: 10.1038/srep09972] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 03/26/2015] [Indexed: 11/16/2022] Open
Abstract
The carboxyl-terminus of the type 1 angiotensin II receptor (AT1A) regulates receptor activation/deactivation and the amphipathic Helix 8 within the carboxyl-terminus is a high affinity interaction motif for plasma membrane lipids. We have used dual polarisation interferometry (DPI) to examine the role of phosphatidylinositdes in the specific recognition of Helix 8 in the AT1A receptor. A synthetic peptide corresponding to Leu305 to Lys325 (Helix 8 AT1A) discriminated between PIPs and different charges on lipid membranes. Peptide binding to PtdIns(4)P-containing bilayers caused a dramatic change in the birefringence (a measure of membrane order) of the bilayer. Kinetic modelling showed that PtdIns(4)P is held above the bilayer until the mass of bound peptide reaches a threshold, after which the peptides insert further into the bilayer. This suggests that Helix 8 can respond to the presence of PI(4)P by withdrawing from the bilayer, resulting in a functional conformational change in the receptor.
Collapse
|
26
|
Mani A, Ravindran R, Mannepalli S, Vang D, Luciw PA, Hogarth M, Khan IH, Krishnan VV. Data mining strategies to improve multiplex microbead immunoassay tolerance in a mouse model of infectious diseases. PLoS One 2015; 10:e0116262. [PMID: 25614982 PMCID: PMC4304816 DOI: 10.1371/journal.pone.0116262] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/04/2014] [Indexed: 11/25/2022] Open
Abstract
Multiplex methodologies, especially those with high-throughput capabilities generate large volumes of data. Accumulation of such data (e.g., genomics, proteomics, metabolomics etc.) is fast becoming more common and thus requires the development and implementation of effective data mining strategies designed for biological and clinical applications. Multiplex microbead immunoassay (MMIA), on xMAP or MagPix platform (Luminex), which is amenable to automation, offers a major advantage over conventional methods such as Western blot or ELISA, for increasing the efficiencies in serodiagnosis of infectious diseases. MMIA allows detection of antibodies and/or antigens efficiently for a wide range of infectious agents simultaneously in host blood samples, in one reaction vessel. In the process, MMIA generates large volumes of data. In this report we demonstrate the application of data mining tools on how the inherent large volume data can improve the assay tolerance (measured in terms of sensitivity and specificity) by analysis of experimental data accumulated over a span of two years. The combination of prior knowledge with machine learning tools provides an efficient approach to improve the diagnostic power of the assay in a continuous basis. Furthermore, this study provides an in-depth knowledge base to study pathological trends of infectious agents in mouse colonies on a multivariate scale. Data mining techniques using serodetection of infections in mice, developed in this study, can be used as a general model for more complex applications in epidemiology and clinical translational research.
Collapse
Affiliation(s)
- Akshay Mani
- Center for Comparative Medicine, University of California Davis, Davis, California, United States of America
| | - Resmi Ravindran
- Center for Comparative Medicine, University of California Davis, Davis, California, United States of America
| | - Soujanya Mannepalli
- Department of Chemistry, California State University, Fresno, California, United States of America
| | - Daniel Vang
- Center for Comparative Medicine, University of California Davis, Davis, California, United States of America
| | - Paul A Luciw
- Center for Comparative Medicine, University of California Davis, Davis, California, United States of America; Department of Pathology and Laboratory Medicine, University of California School of Medicine, Davis, California, United States of America
| | - Michael Hogarth
- Department of Pathology and Laboratory Medicine, University of California School of Medicine, Davis, California, United States of America
| | - Imran H Khan
- Center for Comparative Medicine, University of California Davis, Davis, California, United States of America; Department of Pathology and Laboratory Medicine, University of California School of Medicine, Davis, California, United States of America
| | - Viswanathan V Krishnan
- Center for Comparative Medicine, University of California Davis, Davis, California, United States of America; Department of Chemistry, California State University, Fresno, California, United States of America; Department of Pathology and Laboratory Medicine, University of California School of Medicine, Davis, California, United States of America
| |
Collapse
|
27
|
Cruz-Barbosa R, Vellido A, Giraldo J. The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs. Med Biol Eng Comput 2014; 53:137-49. [PMID: 25367737 DOI: 10.1007/s11517-014-1218-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 10/20/2014] [Indexed: 11/29/2022]
Abstract
G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The tertiary structure of the transmembrane domain, a gate to the study of protein functionality, is unknown for almost all members of class C GPCRs, which are the target of the current study. As a result, their investigation must often rely on alignments of their amino acid sequences. Sequence alignment entails the risk of missing relevant information. Various approaches have attempted to circumvent this risk through alignment-free transformations of the sequences on the basis of different amino acid physicochemical properties. In this paper, we use several of these alignment-free methods, as well as a basic amino acid composition representation, to transform the available sequences. Novel semi-supervised statistical machine learning methods are then used to discriminate the different class C GPCRs types from the transformed data. This approach is relevant due to the existence of orphan proteins to which type labels should be assigned in a process of deorphanization or reverse pharmacology. The reported experiments show that the proposed techniques provide accurate classification even in settings of extreme class-label scarcity and that fair accuracy can be achieved even with very simple transformation strategies that ignore the sequence ordering.
Collapse
Affiliation(s)
- Raúl Cruz-Barbosa
- Computer Science Institute, Universidad Tecnológica de la Mixteca, Huajuapan, Oaxaca, México,
| | | | | |
Collapse
|
28
|
Sahin ME, Can T, Son CD. GPCRsort-responding to the next generation sequencing data challenge: prediction of G protein-coupled receptor classes using only structural region lengths. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:636-44. [PMID: 25133496 DOI: 10.1089/omi.2014.0073] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Next generation sequencing (NGS) and the attendant data deluge are increasingly impacting molecular life sciences research. Chief among the challenges and opportunities is to enhance our ability to classify molecular target data into meaningful and cohesive systematic nomenclature. In this vein, the G protein-coupled receptors (GPCRs) are the largest and most divergent receptor family that plays a crucial role in a host of pathophysiological pathways. For the pharmaceutical industry, GPCRs are a major drug target and it is estimated that 60%-70% of all medicines in development today target GPCRs. Hence, they require an efficient and rapid classification to group the members according to their functions. In addition to NGS and the Big Data challenge we currently face, an emerging number of orphan GPCRs further demand for novel, rapid, and accurate classification of the receptors since the current classification tools are inadequate and slow. This study presents the development of a new classification tool for GPCRs using the structural features derived from their primary sequences: GPCRsort. Comparison experiments with the current known GPCR classification techniques showed that GPCRsort is able to rapidly (in the order of minutes) classify uncharacterized GPCRs with 97.3% accuracy, whereas the best available technique's accuracy is 90.7%. GPCRsort is available in the public domain for postgenomics life scientists engaged in GPCR research with NGS: http://bioserver.ceng.metu.edu.tr/GPCRSort .
Collapse
Affiliation(s)
- Mehmet Emre Sahin
- 1 Department of Computer Engineering, Middle East Technical University , Ankara, Turkey
| | | | | |
Collapse
|
29
|
Nantasenamat C, Simeon S, Owasirikul W, Songtawee N, Lapins M, Prachayasittikul V, Wikberg JES. Illuminating the origins of spectral properties of green fluorescent proteins via proteochemometric and molecular modeling. J Comput Chem 2014; 35:1951-66. [DOI: 10.1002/jcc.23708] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Revised: 04/28/2014] [Accepted: 07/28/2014] [Indexed: 01/06/2023]
Affiliation(s)
- Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics; Faculty of Medical Technology, Mahidol University; Bangkok 10700 Thailand
- Department of Clinical Microbiology and Applied Technology; Faculty of Medical Technology, Mahidol University; Bangkok 10700 Thailand
| | - Saw Simeon
- Center of Data Mining and Biomedical Informatics; Faculty of Medical Technology, Mahidol University; Bangkok 10700 Thailand
| | - Wiwat Owasirikul
- Center of Data Mining and Biomedical Informatics; Faculty of Medical Technology, Mahidol University; Bangkok 10700 Thailand
- Department of Radiological Technology; Faculty of Medical Technology, Mahidol University; Bangkok 10700 Thailand
| | - Napat Songtawee
- Center of Data Mining and Biomedical Informatics; Faculty of Medical Technology, Mahidol University; Bangkok 10700 Thailand
| | - Maris Lapins
- Department of Pharmaceutical Biosciences; Uppsala University; Uppsala Sweden
| | - Virapong Prachayasittikul
- Department of Clinical Microbiology and Applied Technology; Faculty of Medical Technology, Mahidol University; Bangkok 10700 Thailand
| | - Jarl E. S. Wikberg
- Department of Pharmaceutical Biosciences; Uppsala University; Uppsala Sweden
| |
Collapse
|
30
|
Bioinformatics tools for predicting GPCR gene functions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 796:205-24. [PMID: 24158807 DOI: 10.1007/978-94-007-7423-0_10] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
The automatic classification of GPCRs by bioinformatics methodology can provide functional information for new GPCRs in the whole 'GPCR proteome' and this information is important for the development of novel drugs. Since GPCR proteome is classified hierarchically, general ways for GPCR function prediction are based on hierarchical classification. Various computational tools have been developed to predict GPCR functions; those tools use not simple sequence searches but more powerful methods, such as alignment-free methods, statistical model methods, and machine learning methods used in protein sequence analysis, based on learning datasets. The first stage of hierarchical function prediction involves the discrimination of GPCRs from non-GPCRs and the second stage involves the classification of the predicted GPCR candidates into family, subfamily, and sub-subfamily levels. Then, further classification is performed according to their protein-protein interaction type: binding G-protein type, oligomerized partner type, etc. Those methods have achieved predictive accuracies of around 90 %. Finally, I described the future subject of research of the bioinformatics technique about functional prediction of GPCR.
Collapse
|
31
|
Gao QB, Ye XF, He J. Classifying G-protein-coupled receptors to the finest subtype level. Biochem Biophys Res Commun 2013; 439:303-8. [DOI: 10.1016/j.bbrc.2013.08.023] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 08/08/2013] [Indexed: 11/17/2022]
|
32
|
Ghanemi A. Targeting G protein coupled receptor-related pathways as emerging molecular therapies. Saudi Pharm J 2013; 23:115-29. [PMID: 25972730 PMCID: PMC4420995 DOI: 10.1016/j.jsps.2013.07.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 07/29/2013] [Indexed: 12/20/2022] Open
Abstract
G protein coupled receptors (GPCRs) represent the most important targets in modern pharmacology because of the different functions they mediate, especially within brain and peripheral nervous system, and also because of their functional and stereochemical properties. In this paper, we illustrate, via a variety of examples, novel advances about the GPCR-related molecules that have been shown to play diverse roles in GPCR pathways and in pathophysiological phenomena. We have exemplified how those GPCRs’ pathways are, or might constitute, potential targets for different drugs either to stimulate, modify, regulate or inhibit the cellular mechanisms that are hypothesized to govern some pathologic, physiologic, biologic and cellular or molecular aspects both in vivo and in vitro. Therefore, influencing such pathways will, undoubtedly, lead to different therapeutical applications based on the related pharmacological implications. Furthermore, such new properties can be applied in different fields. In addition to offering fruitful directions for future researches, we hope the reviewed data, together with the elements found within the cited references, will inspire clinicians and researchers devoted to the studies on GPCR’s properties.
Collapse
Affiliation(s)
- Abdelaziz Ghanemi
- Department of Pharmacology, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
33
|
Flower DR, Perrie Y. Identification of Candidate Vaccine Antigens In Silico. IMMUNOMIC DISCOVERY OF ADJUVANTS AND CANDIDATE SUBUNIT VACCINES 2013. [PMCID: PMC7120937 DOI: 10.1007/978-1-4614-5070-2_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The identification of immunogenic whole-protein antigens is fundamental to the successful discovery of candidate subunit vaccines and their rapid, effective, and efficient transformation into clinically useful, commercially successful vaccine formulations. In the wider context of the experimental discovery of vaccine antigens, with particular reference to reverse vaccinology, this chapter adumbrates the principal computational approaches currently deployed in the hunt for novel antigens: genome-level prediction of antigens, antigen identification through the use of protein sequence alignment-based approaches, antigen detection through the use of subcellular location prediction, and the use of alignment-independent approaches to antigen discovery. Reference is also made to the recent emergence of various expert systems for protein antigen identification.
Collapse
Affiliation(s)
- Darren R. Flower
- Aston Pharmacy School, School of Life and Health Sciences, University of Aston, Aston Triangle, Birmingham, B4 7ET United Kingdom
| | - Yvonne Perrie
- Aston Pharmacy School, School of Life and Health Sciences, Aston University, Aston Triangle, Birmingham, B4 7ET United Kingdom
| |
Collapse
|
34
|
Cobanoglu MC, Saygin Y, Sezerman U. Classification of GPCRs using family specific motifs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1495-1508. [PMID: 20876934 DOI: 10.1109/tcbb.2010.101] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The classification of G-Protein Coupled Receptor (GPCR) sequences is an important problem that arises from the need to close the gap between the large number of orphan receptors and the relatively small number of annotated receptors. Equally important is the characterization of GPCR Class A subfamilies and gaining insight into the ligand interaction since GPCR Class A encompasses a very large number of drug-targeted receptors. In this work, we propose a method for Class A subfamily classification using sequence-derived motifs which characterizes the subfamilies by discovering receptor-ligand interaction sites. The motifs that best characterize a subfamily are selected by the Distinguishing Power Evaluation (DPE) technique we propose. The experiments performed on GPCR sequence databases show that our method outperforms state-of-the-art classification techniques for GPCR Class A subfamily prediction. An important contribution of our work is to discover key receptor-ligand interaction sites which is very important for drug design.
Collapse
|
35
|
ur-Rehman Z, Khan A. G-protein-coupled receptor prediction using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties. Anal Biochem 2011; 412:173-82. [DOI: 10.1016/j.ab.2011.01.040] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2010] [Revised: 01/26/2011] [Accepted: 01/27/2011] [Indexed: 11/28/2022]
|
36
|
Naveed M, Khan AU. GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble. Amino Acids 2011; 42:1809-23. [DOI: 10.1007/s00726-011-0902-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 03/26/2011] [Indexed: 11/27/2022]
|
37
|
Berger C, Montag C, Berndt S, Huster D. Optimization of Escherichia coli cultivation methods for high yield neuropeptide Y receptor type 2 production. Protein Expr Purif 2011; 76:25-35. [DOI: 10.1016/j.pep.2010.10.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2010] [Revised: 10/20/2010] [Accepted: 10/27/2010] [Indexed: 12/11/2022]
|
38
|
Sarkar A, Kumar S, Sundar D. The G protein-coupled receptors in the pufferfish Takifugu rubripes. BMC Bioinformatics 2011; 12 Suppl 1:S3. [PMID: 21342560 PMCID: PMC3044285 DOI: 10.1186/1471-2105-12-s1-s3] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Guanine protein-coupled receptors (GPCRs) constitute a eukaryotic transmembrane protein family and function as "molecular switches" in the second messenger cascades and are found in all organisms between yeast and humans. They form the single, biggest drug-target family due to their versatility of action and their role in several physiological functions, being active players in detecting the presence of light, a variety of smells and tastes, amino acids, nucleotides, lipids, chemicals etc. in the environment of the cell. Comparative genomic studies on model organisms provide information on target receptors in humans and their function. The Japanese teleost Fugu has been identified as one of the smallest vertebrate genomes and a compact model to study the human genome, owing to the great similarity in its gene repertoire with that of human and other vertebrates. Thus the characterization of the GPCRs of Fugu would provide insights to the evolution of the vertebrate genome. RESULTS We classified the GPCRs in the Fugu genome and our analysis of its 316 membrane-bound receptors, available on the public databases as well as from literature, detected 298 GPCRs that were grouped into five main families according to the GRAFS classification system (namely, Glutamate, Rhodopsin, Adhesion, Frizzled and Secretin). We also identified 18 other GPCRs that could not be grouped under the GRAFS family and hence were classified as 'Other 7TM' receptors. On comparison of the GPCR information from the Fugu genome with those in the human and chicken genomes, we detected 96.83% (306/316) and 96.51% (305/316) orthology in GPCRs among the Fugu-human genomes and Fugu-chicken genomes, respectively. CONCLUSIONS This study reveals the position of pisces in vertebrate evolution from the GPCR perspective. Fugu can act as a reference model for the human genome for other protein families as well, going by the high orthology observed for GPCRs between Fugu and human. The evolutionary comparison of GPCR sequences between key vertebrate classes of mammals, birds and fish will help in identifying key functional residues and motifs so as to fill in the blanks in the evolution of GPCRs in vertebrates.
Collapse
Affiliation(s)
- Anita Sarkar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi, India
| | | | | |
Collapse
|
39
|
Donnellan PD, Kimbembe CC, Reid HM, Kinsella BT. Identification of a novel endoplasmic reticulum export motif within the eighth α-helical domain (α-H8) of the human prostacyclin receptor. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2011; 1808:1202-18. [PMID: 21223948 DOI: 10.1016/j.bbamem.2011.01.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2010] [Revised: 12/20/2010] [Accepted: 01/03/2011] [Indexed: 01/20/2023]
Abstract
The human prostacyclin receptor (hIP) undergoes agonist-dependent trafficking involving a direct interaction with Rab11a GTPase. The region of interaction was localised to a 14 residue Rab11a binding domain (RBD) within the proximal carboxyl-terminal (C)-tail domain of the hIP, consisting of Val(299)-Val(307) within the eighth helical domain (α-H8) adjacent to the palmitoylated residues at Cys(308)-Cys(311). However, the factors determining the anterograde transport of the newly synthesised hIP from the endoplasmic reticulum (ER) to the plasma membrane (PM) have not been identified. The aim of the current study was to identify the major ER export motif(s) within the hIP initially by investigating the role of Lys residues in its maturation and processing. Through site-directed and Ala-scanning mutational studies in combination with analyses of protein expression and maturation, functional analyses of ligand binding, agonist-induced intracellular signalling and confocal image analyses, it was determined that Lys(297), Arg(302) and Lys(304) located within α-H8 represent the critical determinants of a novel ER export motif of the hIP. Furthermore, while substitution of those critical residues significantly impaired maturation and processing of the hIP, replacement of the positively charged Lys with Arg residues, and vice versa, was functionally permissible. Hence, this study has identified a novel 8 residue ER export motif within the functionally important α-H8 of the hIP. This ER export motif, defined by "K/R(X)(4)K/R(X)K/R," has a strict requirement for positively charged, basic Lys/Arg residues at the 1st, 6th and 8th positions and appears to be evolutionarily conserved within IP sequences from mouse to man.
Collapse
MESH Headings
- Amino Acid Motifs
- Amino Acid Sequence
- Arginine/chemistry
- Arginine/genetics
- Arginine/metabolism
- Binding Sites
- Blotting, Western
- Calcium/metabolism
- Calnexin/metabolism
- Computational Biology
- Endoplasmic Reticulum/metabolism
- HEK293 Cells
- Humans
- Lysine/chemistry
- Lysine/genetics
- Lysine/metabolism
- Microscopy, Confocal
- Models, Molecular
- Molecular Sequence Data
- Mutagenesis, Site-Directed
- Mutation
- Protein Binding
- Protein Structure, Secondary
- Protein Structure, Tertiary
- Protein Transport
- Radioligand Assay
- Receptors, Epoprostenol/chemistry
- Receptors, Epoprostenol/genetics
- Receptors, Epoprostenol/metabolism
- Sequence Homology, Amino Acid
Collapse
Affiliation(s)
- Peter D Donnellan
- School of Biomeolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Ireland
| | | | | | | |
Collapse
|
40
|
Atwood BK, Lopez J, Wager-Miller J, Mackie K, Straiker A. Expression of G protein-coupled receptors and related proteins in HEK293, AtT20, BV2, and N18 cell lines as revealed by microarray analysis. BMC Genomics 2011; 12:14. [PMID: 21214938 PMCID: PMC3024950 DOI: 10.1186/1471-2164-12-14] [Citation(s) in RCA: 293] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2010] [Accepted: 01/07/2011] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND G protein coupled receptors (GPCRs) are one of the most widely studied gene superfamilies. Thousands of GPCR research studies have utilized heterologous expression systems such as human embryonic kidney cells (HEK293). Though often treated as 'blank slates', these cell lines nevertheless endogenously express GPCRs and related signaling proteins. The outcome of a given GPCR study can be profoundly influenced by this largely unknown complement of receptors and/or signaling proteins. Little easily accessible information exists that describes the expression profiles of the GPCRs in cell lines. What is accessible is often limited in scope - of the hundreds of GPCRs and related proteins, one is unlikely to find information on expression of more than a dozen proteins in a given cell line. Microarray technology has allowed rapid analysis of mRNA levels of thousands of candidate genes, but though often publicly available, the results can be difficult to efficiently access or even to interpret. RESULTS To bridge this gap, we have used microarrays to measure the mRNA levels of a comprehensive profile of non-chemosensory GPCRs and over a hundred GPCR signaling related gene products in four cell lines frequently used for GPCR research: HEK293, AtT20, BV2, and N18. CONCLUSIONS This study provides researchers an easily accessible mRNA profile of the endogenous signaling repertoire that these four cell lines possess. This will assist in choosing the most appropriate cell line for studying GPCRs and related signaling proteins. It also provides a better understanding of the potential interactions between GPCRs and those signaling proteins.
Collapse
Affiliation(s)
- Brady K Atwood
- Department of Psychological & Brain Sciences, The Gill Center for Biomolecular Science, Indiana University, Bloomington, Indiana, USA
| | | | | | | | | |
Collapse
|
41
|
Schaadt NS, Christoph J, Helms V. Classifying Substrate Specificities of Membrane Transporters from Arabidopsis thaliana. J Chem Inf Model 2010; 50:1899-905. [DOI: 10.1021/ci100243m] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Nadine S. Schaadt
- Center for Bioinformatics, Saarland University, D-66123 Saarbrücken, Germany
| | - Jan Christoph
- Center for Bioinformatics, Saarland University, D-66123 Saarbrücken, Germany
| | - Volkhard Helms
- Center for Bioinformatics, Saarland University, D-66123 Saarbrücken, Germany
| |
Collapse
|
42
|
Peng ZL, Yang JY, Chen X. An improved classification of G-protein-coupled receptors using sequence-derived features. BMC Bioinformatics 2010; 11:420. [PMID: 20696050 PMCID: PMC3247138 DOI: 10.1186/1471-2105-11-420] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 08/09/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND G-protein-coupled receptors (GPCRs) play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and characterization of novel receptors, it is therefore very valuable to develop a computational method to accurately predict GPCRs from the protein primary sequences. RESULTS We propose a new method called PCA-GPCR, to predict GPCRs using a comprehensive set of 1497 sequence-derived features. The principal component analysis is first employed to reduce the dimension of the feature space to 32. Then, the resulting 32-dimensional feature vectors are fed into a simple yet powerful classification algorithm, called intimate sorting, to predict GPCRs at five levels. The prediction at the first level determines whether a protein is a GPCR or a non-GPCR. If it is predicted to be a GPCR, then it will be further predicted into certain family, subfamily, sub-subfamily and subtype by the classifiers at the second, third, fourth, and fifth levels, respectively. To train the classifiers applied at five levels, a non-redundant dataset is carefully constructed, which contains 3178, 1589, 4772, 4924, and 2741 protein sequences at the respective levels. Jackknife tests on this training dataset show that the overall accuracies of PCA-GPCR at five levels (from the first to the fifth) can achieve up to 99.5%, 88.8%, 80.47%, 80.3%, and 92.34%, respectively. We further perform predictions on a dataset of 1238 GPCRs at the second level, and on another two datasets of 167 and 566 GPCRs respectively at the fourth level. The overall prediction accuracies of our method are consistently higher than those of the existing methods to be compared. CONCLUSIONS The comprehensive set of 1497 features is believed to be capable of capturing information about amino acid composition, sequence order as well as various physicochemical properties of proteins. Therefore, high accuracies are achieved when predicting GPCRs at all the five levels with our proposed method.
Collapse
Affiliation(s)
- Zhen-Ling Peng
- 1Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, T6G 2V4, Canada
| | | | | |
Collapse
|
43
|
Li Z, Zhou X, Dai Z, Zou X. Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm. BMC Bioinformatics 2010; 11:325. [PMID: 20550715 PMCID: PMC2905366 DOI: 10.1186/1471-2105-11-325] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 06/16/2010] [Indexed: 11/25/2022] Open
Abstract
Background Because a priori knowledge about function of G protein-coupled receptors (GPCRs) can provide useful information to pharmaceutical research, the determination of their function is a quite meaningful topic in protein science. However, with the rapid increase of GPCRs sequences entering into databanks, the gap between the number of known sequence and the number of known function is widening rapidly, and it is both time-consuming and expensive to determine their function based only on experimental techniques. Therefore, it is vitally significant to develop a computational method for quick and accurate classification of GPCRs. Results In this study, a novel three-layer predictor based on support vector machine (SVM) and feature selection is developed for predicting and classifying GPCRs directly from amino acid sequence data. The maximum relevance minimum redundancy (mRMR) is applied to pre-evaluate features with discriminative information while genetic algorithm (GA) is utilized to find the optimized feature subsets. SVM is used for the construction of classification models. The overall accuracy with three-layer predictor at levels of superfamily, family and subfamily are obtained by cross-validation test on two non-redundant dataset. The results are about 0.5% to 16% higher than those of GPCR-CA and GPCRPred. Conclusion The results with high success rates indicate that the proposed predictor is a useful automated tool in predicting GPCRs. GPCR-SVMFS, a corresponding executable program for GPCRs prediction and classification, can be acquired freely on request from the authors.
Collapse
Affiliation(s)
- Zhanchao Li
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, PR China
| | | | | | | |
Collapse
|
44
|
Identification of protein functions using a machine-learning approach based on sequence-derived properties. Proteome Sci 2009; 7:27. [PMID: 19664241 PMCID: PMC2731080 DOI: 10.1186/1477-5956-7-27] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2009] [Accepted: 08/09/2009] [Indexed: 02/07/2023] Open
Abstract
Background Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities. Results A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function. Conclusion We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new PNPRD features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions.
Collapse
|
45
|
Huynh J, Thomas WG, Aguilar MI, Pattenden LK. Role of helix 8 in G protein-coupled receptors based on structure-function studies on the type 1 angiotensin receptor. Mol Cell Endocrinol 2009; 302:118-27. [PMID: 19418628 DOI: 10.1016/j.mce.2009.01.002] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
G protein-coupled receptors (GPCRs) are transmembrane receptors that convert extracellular stimuli to intracellular signals. The type 1 angiotensin II receptor is a widely studied GPCR with roles in blood pressure regulation,water and salt balance and cell growth. The complex molecular and structural changes that underpin receptor activation and signaling are the focus of intense research. Increasingly, there is an appreciation that the plasma membrane participates in receptor function via direct, physical interactions that reciprocally modulate both lipid and receptor and provide microdomains for specialized activities. Reversible protein:lipid interactions are commonly mediated by amphipathic -helices in proteins and one such motif - a short helix, referred to as helix VIII/8 (H8), located at the start of the carboxyl (C)-terminus of GPCRs - is gaining recognition for its importance to GPCR function. Here, we review the identification of H8 in GPCRs and examine its capacity to sense and interact with diverse proteins and lipid environment, most notably with acidic lipids that include phosphatidylinositol phosphates.
Collapse
MESH Headings
- Binding Sites
- Humans
- Lipids/chemistry
- Protein Binding
- Protein Structure, Secondary
- Receptor, Angiotensin, Type 1/chemistry
- Receptor, Angiotensin, Type 1/metabolism
- Receptor, Angiotensin, Type 1/physiology
- Receptors, G-Protein-Coupled/chemistry
- Receptors, G-Protein-Coupled/metabolism
- Receptors, G-Protein-Coupled/physiology
- Signal Transduction
Collapse
Affiliation(s)
- John Huynh
- School of Biomedical Sciences, The University of Queensland, Brisbane, St Lucia, Queensland, Australia
| | | | | | | |
Collapse
|
46
|
Davies MN, Secker A, Halling-Brown M, Moss DS, Freitas AA, Timmis J, Clark E, Flower DR. GPCRTree: online hierarchical classification of GPCR function. BMC Res Notes 2008; 1:67. [PMID: 18717986 PMCID: PMC2547103 DOI: 10.1186/1756-0500-1-67] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2008] [Accepted: 08/21/2008] [Indexed: 11/25/2022] Open
Abstract
Background G protein-coupled receptors (GPCRs) play important physiological roles transducing extracellular signals into intracellular responses. Approximately 50% of all marketed drugs target a GPCR. There remains considerable interest in effectively predicting the function of a GPCR from its primary sequence. Findings Using techniques drawn from data mining and proteochemometrics, an alignment-free approach to GPCR classification has been devised. It uses a simple representation of a protein's physical properties. GPCRTree, a publicly-available internet server, implements an algorithm that classifies GPCRs at the class, sub-family and sub-subfamily level. Conclusion A selective top-down classifier was developed which assigns sequences within a GPCR hierarchy. Compared to other publicly available GPCR prediction servers, GPCRTree is considerably more accurate at every level of classification. The server has been available online since March 2008 at URL: .
Collapse
Affiliation(s)
- Matthew N Davies
- The Jenner Institute, University of Oxford, Compton, Newbury, Berkshire, RG20 7NN, UK.
| | | | | | | | | | | | | | | |
Collapse
|
47
|
|