1
|
Zhang ZM, Guan ZX, Wang F, Zhang D, Ding H. Application of Machine Learning Methods in Predicting Nuclear Receptors and their Families. Med Chem 2021; 16:594-604. [PMID: 31584374 DOI: 10.2174/1573406415666191004125551] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/18/2019] [Accepted: 08/23/2019] [Indexed: 11/22/2022]
Abstract
Nuclear receptors (NRs) are a superfamily of ligand-dependent transcription factors that are closely related to cell development, differentiation, reproduction, homeostasis, and metabolism. According to the alignments of the conserved domains, NRs are classified and assigned the following seven subfamilies or eight subfamilies: (1) NR1: thyroid hormone like (thyroid hormone, retinoic acid, RAR-related orphan receptor, peroxisome proliferator activated, vitamin D3- like), (2) NR2: HNF4-like (hepatocyte nuclear factor 4, retinoic acid X, tailless-like, COUP-TFlike, USP), (3) NR3: estrogen-like (estrogen, estrogen-related, glucocorticoid-like), (4) NR4: nerve growth factor IB-like (NGFI-B-like), (5) NR5: fushi tarazu-F1 like (fushi tarazu-F1 like), (6) NR6: germ cell nuclear factor like (germ cell nuclear factor), and (7) NR0: knirps like (knirps, knirpsrelated, embryonic gonad protein, ODR7, trithorax) and DAX like (DAX, SHP), or dividing NR0 into (7) NR7: knirps like and (8) NR8: DAX like. Different NRs families have different structural features and functions. Since the function of a NR is closely correlated with which subfamily it belongs to, it is highly desirable to identify NRs and their subfamilies rapidly and effectively. The knowledge acquired is essential for a proper understanding of normal and abnormal cellular mechanisms. With the advent of the post-genomics era, huge amounts of sequence-known proteins have increased explosively. Conventional methods for accurately classifying the family of NRs are experimental means with high cost and low efficiency. Therefore, it has created a greater need for bioinformatics tools to effectively recognize NRs and their subfamilies for the purpose of understanding their biological function. In this review, we summarized the application of machine learning methods in the prediction of NRs from different aspects. We hope that this review will provide a reference for further research on the classification of NRs and their families.
Collapse
Affiliation(s)
- Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
2
|
Kabir M, Ahmad S, Iqbal M, Hayat M. iNR-2L: A two-level sequence-based predictor developed via Chou's 5-steps rule and general PseAAC for identifying nuclear receptors and their families. Genomics 2019; 112:276-285. [PMID: 30779939 DOI: 10.1016/j.ygeno.2019.02.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/09/2019] [Accepted: 02/07/2019] [Indexed: 12/25/2022]
Abstract
Nuclear receptor proteins (NRPs) perform a vital role in regulating gene expression. With the rapidity growth of NRPs in post-genomic era, it is highly recommendable to identify NRPs and their sub-families accurately from their primary sequences. Several conventional methods have been used for discrimination of NRPs and their sub-families, but did not achieve considerable results. In a sequel, a two-level new computational model "iNR-2 L" is developed. Two discrete methods namely: Dipeptide Composition and Tripeptide Composition were used to formulate NRPs sequences. Further, both the descriptor spaces were merged to construct hybrid space. Furthermore, feature selection technique minimum redundancy and maximum relevance was employed in order to select salient features as well as reduce the noise and redundancy. The experiential outcomes exhibited that the proposed model iNR-2 L achieved outstanding results. It is anticipated that the proposed computational model might be a practical and effective tool for academia and research community.
Collapse
Affiliation(s)
- Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
| | - Saeed Ahmad
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| |
Collapse
|
3
|
Ikeda M, Sugihara M, Suwa M. SEVENS: a database for comprehensive GPCR genes obtained from genomes: -Update to 68 eukaryotes. Biophys Physicobiol 2018; 15:104-110. [PMID: 29892516 PMCID: PMC5992857 DOI: 10.2142/biophysico.15.0_104] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 03/25/2018] [Indexed: 12/01/2022] Open
Abstract
We report the development of the SEVENS database, which contains information on G-protein coupled receptor (GPCR) genes that are identified with high confidence levels (A, B, C, and D) from various eukaryotic genomes, by using a pipeline comprising bioinformatics softwares, including a gene finder, a sequence alignment tool, a motif and domain assignment tool, and a transmembrane helix predictor. SEVENS compiles detailed information on GPCR genes, such as chromosomal mapping position, phylogenetic tree, sequence similarity to known genes, and protein function described by motif/domain and transmembrane helices. They are presented in a user-friendly interface. Because of the comprehensive gene findings from genomes, SEVENS contains a larger data set than that of previous databases and enables the performance of a genome-scale overview of all the GPCR genes. We surveyed the complete genomes of 68 eukaryotes, and found that there were between 6 and 3,470 GPCR genes for each genome (Level A data). Within these genes, the number of receptors for various molecules, including biological amines, peptides, and lipids, were conserved in mammals, birds, and fishes, whereas the numbers of odorant receptors and pheromone receptors were highly diverse in mammals. SEVENS is freely available at http://sevens.cbrc.jp or http://sevens.chem.aoyama.ac.jp.
Collapse
Affiliation(s)
- Masami Ikeda
- Aoyama Gakuin University, College of Science and Engineering, Sagamihara, Kanagawa 252-5258, Japan
| | - Minoru Sugihara
- Meiji Pharmaceutical University, Pharmaceutical Education and Research Center, Kiyose, Tokyo 204-8588, Japan
| | - Makiko Suwa
- Aoyama Gakuin University, College of Science and Engineering, Sagamihara, Kanagawa 252-5258, Japan
| |
Collapse
|
4
|
Li YH, Xu JY, Tao L, Li XF, Li S, Zeng X, Chen SY, Zhang P, Qin C, Zhang C, Chen Z, Zhu F, Chen YZ. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity. PLoS One 2016; 11:e0155290. [PMID: 27525735 PMCID: PMC4985167 DOI: 10.1371/journal.pone.0155290] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 04/27/2016] [Indexed: 12/20/2022] Open
Abstract
Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.
Collapse
Affiliation(s)
- Ying Hong Li
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Jing Yu Xu
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China
| | - Lin Tao
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Xiao Feng Li
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Shuang Li
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Xian Zeng
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Shang Ying Chen
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Peng Zhang
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Chu Qin
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Cheng Zhang
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Zhe Chen
- Zhejiang Key Laboratory of Gastro-intestinal Pathophysiology, Zhejiang Hospital of Traditional Chinese Medicine, Zhejiang Chinese Medical University, Hangzhou, P. R. China
| | - Feng Zhu
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Yu Zong Chen
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| |
Collapse
|
5
|
Wang H, Hu X. Accurate prediction of nuclear receptors with conjoint triad feature. BMC Bioinformatics 2015; 16:402. [PMID: 26630876 PMCID: PMC4668603 DOI: 10.1186/s12859-015-0828-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 11/17/2015] [Indexed: 11/26/2022] Open
Abstract
Background Nuclear receptors (NRs) form a large family of ligand-inducible transcription factors that regulate gene expressions involved in numerous physiological phenomena, such as embryogenesis, homeostasis, cell growth and death. These nuclear receptors-related pathways are important targets of marketed drugs. Therefore, the design of a reliable computational model for predicting NRs from amino acid sequence has now been a significant biomedical problem. Results Conjoint triad feature (CTF) mainly considers neighbor relationships in protein sequences by encoding each protein sequence using the triad (continuous three amino acids) frequency distribution extracted from a 7-letter reduced alphabet. In addition, chaos game representation (CGR) can investigate the patterns hidden in protein sequences and visually reveal previously unknown structure. In this paper, three methods, CTF, CGR, amino acid composition (AAC), are applied to formulate the protein samples. By considering different combinations of three methods, we study seven groups of features, and each group is evaluated by the 10-fold cross-validation test. Meanwhile, a new non-redundant dataset containing 474 NR sequences and 500 non-NR sequences is built based on the latest NucleaRDB database. Comparing the results of numerical experiments, the group of combined features with CTF and AAC gets the best result with the accuracy of 96.30 % for identifying NRs from non-NRs. Moreover, if it is classified as a NR, it will be further put into the second level, which will classify a NR into one of the eight main subfamilies. At the second level, the group of combined features with CTF and AAC also gets the best accuracy of 94.73 %. Subsequently, the proposed predictor is compared with two existing methods, and the comparisons show that the accuracies of two levels significantly increase to 98.79 % (NR-2L: 92.56 %; iNR-PhysChem: 98.18 %; the first level) and 93.71 % (NR-2L: 88.68 %; iNR-PhysChem: 92.45 %; the second level) with the introduction of our CTF-based method. Finally, each component of CTF features is analyzed via the statistical significant test, and a simplified model only with the resulting top-50 significant features achieves accuracy of 95.28 %. Conclusions The experimental results demonstrate that our CTF-based method is an effective way for predicting nuclear receptor proteins. Furthermore, the top-50 significant features obtained from the statistical significant test are considered as the “intrinsic features” in predicting NRs based on the analysis of relative importance. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0828-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hongchu Wang
- Department of Mathemaitcs, South China Normal University, Guangzhou, 510631, P.R. of China
| | - Xuehai Hu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, P.R. of China.
| |
Collapse
|
6
|
Isberg V, Mordalski S, Munk C, Rataj K, Harpsøe K, Hauser AS, Vroling B, Bojarski AJ, Vriend G, Gloriam DE. GPCRdb: an information system for G protein-coupled receptors. Nucleic Acids Res 2015; 44:D356-64. [PMID: 26582914 PMCID: PMC4702843 DOI: 10.1093/nar/gkv1178] [Citation(s) in RCA: 208] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 10/22/2015] [Indexed: 12/30/2022] Open
Abstract
Recent developments in G protein-coupled receptor (GPCR) structural biology and pharmacology have greatly enhanced our knowledge of receptor structure-function relations, and have helped improve the scientific foundation for drug design studies. The GPCR database, GPCRdb, serves a dual role in disseminating and enabling new scientific developments by providing reference data, analysis tools and interactive diagrams. This paper highlights new features in the fifth major GPCRdb release: (i) GPCR crystal structure browsing, superposition and display of ligand interactions; (ii) direct deposition by users of point mutations and their effects on ligand binding; (iii) refined snake and helix box residue diagram looks; and (iii) phylogenetic trees with receptor classification colour schemes. Under the hood, the entire GPCRdb front- and back-ends have been re-coded within one infrastructure, ensuring a smooth browsing experience and development. GPCRdb is available at http://www.gpcrdb.org/ and it's open source code at https://bitbucket.org/gpcr/protwis.
Collapse
Affiliation(s)
- Vignir Isberg
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 162, DK-2100 Copenhagen, Denmark
| | - Stefan Mordalski
- Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, Smetna 12, 31-343 Krakow, Poland
| | - Christian Munk
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 162, DK-2100 Copenhagen, Denmark
| | - Krzysztof Rataj
- Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, Smetna 12, 31-343 Krakow, Poland
| | - Kasper Harpsøe
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 162, DK-2100 Copenhagen, Denmark
| | - Alexander S Hauser
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 162, DK-2100 Copenhagen, Denmark
| | - Bas Vroling
- Bio-Prodict B.V., Nieuwe Markstraat 54E, 6511 AA, Nijmegen, The Netherlands
| | - Andrzej J Bojarski
- Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, Smetna 12, 31-343 Krakow, Poland
| | - Gert Vriend
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA, Nijmegen, The Netherlands
| | - David E Gloriam
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 162, DK-2100 Copenhagen, Denmark
| |
Collapse
|
7
|
Kumar R, Kumari B, Srivastava A, Kumar M. NRfamPred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families. Sci Rep 2014; 4:6810. [PMID: 25351274 PMCID: PMC5381360 DOI: 10.1038/srep06810] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 10/09/2014] [Indexed: 11/09/2022] Open
Abstract
Nuclear receptor proteins (NRP) are transcription factor that regulate many vital cellular processes in animal cells. NRPs form a super-family of phylogenetically related proteins and divided into different sub-families on the basis of ligand characteristics and their functions. In the post-genomic era, when new proteins are being added to the database in a high-throughput mode, it becomes imperative to identify new NRPs using information from amino acid sequence alone. In this study we report a SVM based two level prediction systems, NRfamPred, using dipeptide composition of proteins as input. At the 1st level, NRfamPred screens whether the query protein is NRP or non-NRP; if the query protein belongs to NRP class, prediction moves to 2nd level and predicts the sub-family. Using leave-one-out cross-validation, we were able to achieve an overall accuracy of 97.88% at the 1st level and an overall accuracy of 98.11% at the 2nd level with dipeptide composition. Benchmarking on independent datasets showed that NRfamPred had comparable accuracy to other existing methods, developed on the same dataset. Our method predicted the existence of 76 NRPs in the human proteome, out of which 14 are novel NRPs. NRfamPred also predicted the sub-families of these 14 NRPs.
Collapse
Affiliation(s)
- Ravindra Kumar
- Department of Biophysics, University of Delhi South Campus, Benito Juarez Road, New Delhi, India-110021
| | - Bandana Kumari
- Department of Biophysics, University of Delhi South Campus, Benito Juarez Road, New Delhi, India-110021
| | - Abhishikha Srivastava
- Department of Biophysics, University of Delhi South Campus, Benito Juarez Road, New Delhi, India-110021
| | - Manish Kumar
- Department of Biophysics, University of Delhi South Campus, Benito Juarez Road, New Delhi, India-110021
| |
Collapse
|
8
|
Zhao F, Guo X, Wang Y, Liu J, Lee WH, Zhang Y. Drug target mining and analysis of the Chinese tree shrew for pharmacological testing. PLoS One 2014; 9:e104191. [PMID: 25105297 PMCID: PMC4126716 DOI: 10.1371/journal.pone.0104191] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2014] [Accepted: 07/10/2014] [Indexed: 01/05/2023] Open
Abstract
The discovery of new drugs requires the development of improved animal models for drug testing. The Chinese tree shrew is considered to be a realistic candidate model. To assess the potential of the Chinese tree shrew for pharmacological testing, we performed drug target prediction and analysis on genomic and transcriptomic scales. Using our pipeline, 3,482 proteins were predicted to be drug targets. Of these predicted targets, 446 and 1,049 proteins with the highest rank and total scores, respectively, included homologs of targets for cancer chemotherapy, depression, age-related decline and cardiovascular disease. Based on comparative analyses, more than half of drug target proteins identified from the tree shrew genome were shown to be higher similarity to human targets than in the mouse. Target validation also demonstrated that the constitutive expression of the proteinase-activated receptors of tree shrew platelets is similar to that of human platelets but differs from that of mouse platelets. We developed an effective pipeline and search strategy for drug target prediction and the evaluation of model-based target identification for drug testing. This work provides useful information for future studies of the Chinese tree shrew as a source of novel targets for drug discovery research.
Collapse
Affiliation(s)
- Feng Zhao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, PR China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, PR China
| | - Xiaolong Guo
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, PR China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, PR China
| | - Yanjie Wang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, PR China
| | - Jie Liu
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, PR China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, PR China
| | - Wen-hui Lee
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, PR China
| | - Yun Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, PR China
- * E-mail:
| |
Collapse
|
9
|
Bioinformatics tools for predicting GPCR gene functions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 796:205-24. [PMID: 24158807 DOI: 10.1007/978-94-007-7423-0_10] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
The automatic classification of GPCRs by bioinformatics methodology can provide functional information for new GPCRs in the whole 'GPCR proteome' and this information is important for the development of novel drugs. Since GPCR proteome is classified hierarchically, general ways for GPCR function prediction are based on hierarchical classification. Various computational tools have been developed to predict GPCR functions; those tools use not simple sequence searches but more powerful methods, such as alignment-free methods, statistical model methods, and machine learning methods used in protein sequence analysis, based on learning datasets. The first stage of hierarchical function prediction involves the discrimination of GPCRs from non-GPCRs and the second stage involves the classification of the predicted GPCR candidates into family, subfamily, and sub-subfamily levels. Then, further classification is performed according to their protein-protein interaction type: binding G-protein type, oligomerized partner type, etc. Those methods have achieved predictive accuracies of around 90 %. Finally, I described the future subject of research of the bioinformatics technique about functional prediction of GPCR.
Collapse
|
10
|
Isberg V, Vroling B, van der Kant R, Li K, Vriend G, Gloriam D. GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Res 2013; 42:D422-5. [PMID: 24304901 PMCID: PMC3965068 DOI: 10.1093/nar/gkt1255] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
For the past 20 years, the GPCRDB (G protein-coupled receptors database; http://www.gpcr.org/7tm/) has been a ‘one-stop shop’ for G protein-coupled receptor (GPCR)-related data. The GPCRDB contains experimental data on sequences, ligand-binding constants, mutations and oligomers, as well as many different types of computationally derived data, such as multiple sequence alignments and homology models. The GPCRDB also provides visualization and analysis tools, plus a number of query systems. In the latest GPCRDB release, all multiple sequence alignments, and >65 000 homology models, have been significantly improved, thanks to a recent flurry of GPCR X-ray structure data. Tools were introduced to browse X-ray structures, compare binding sites, profile similar receptors and generate amino acid conservation statistics. Snake plots and helix box diagrams can now be custom coloured (e.g. by chemical properties or mutation data) and saved as figures. A series of sequence alignment visualization tools has been added, and sequence alignments can now be created for subsets of sequences and sequence positions, and alignment statistics can be produced for any of these subsets.
Collapse
Affiliation(s)
- Vignir Isberg
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, DK-2100 Copenhagen, Denmark, Bio-Prodict B.V., Castellastraat 116, 6512 EZ, Nijmegen, The Netherlands and CMBI, NCMLS, Radboudumc Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA, Nijmegen, The Netherlands
| | | | | | | | | | | |
Collapse
|
11
|
Costa EP, Vens C, Blockeel H. Top-down clustering for protein subfamily identification. Evol Bioinform Online 2013; 9:185-202. [PMID: 23700359 PMCID: PMC3653887 DOI: 10.4137/ebo.s11609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.
Collapse
|
12
|
Elfferich P, van Royen M, van de Wijngaart D, Trapman J, Drop S, van den Akker E, Lusher S, Bosch R, Bunch T, Hughes I, Houtsmuller A, Cools M, Faradz S, Bisschop P, Bunck M, Oostdijk W, Brüggenwirth H, Brinkmann A. Variable Loss of Functional Activities of Androgen Receptor Mutants in Patients with Androgen Insensitivity Syndrome. Sex Dev 2013; 7:223-34. [DOI: 10.1159/000351820] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2013] [Indexed: 01/05/2023] Open
|
13
|
Xiao X, Wang P, Chou KC. iNR-PhysChem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix. PLoS One 2012; 7:e30869. [PMID: 22363503 PMCID: PMC3283608 DOI: 10.1371/journal.pone.0030869] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2011] [Accepted: 12/22/2011] [Indexed: 11/30/2022] Open
Abstract
Nuclear receptors (NRs) form a family of ligand-activated transcription factors that regulate a wide variety of biological processes, such as homeostasis, reproduction, development, and metabolism. Human genome contains 48 genes encoding NRs. These receptors have become one of the most important targets for therapeutic drug development. According to their different action mechanisms or functions, NRs have been classified into seven subfamilies. With the avalanche of protein sequences generated in the postgenomic age, we are facing the following challenging problems. Given an uncharacterized protein sequence, how can we identify whether it is a nuclear receptor? If it is, what subfamily it belongs to? To address these problems, we developed a predictor called iNR-PhysChem in which the protein samples were expressed by a novel mode of pseudo amino acid composition (PseAAC) whose components were derived from a physical-chemical matrix via a series of auto-covariance and cross-covariance transformations. It was observed that the overall success rate achieved by iNR-PhysChem was over 98% in identifying NRs or non-NRs, and over 92% in identifying NRs among the following seven subfamilies: NR1thyroid hormone like, NR2HNF4-like, NR3estrogen like, NR4nerve growth factor IB-like, NR5fushi tarazu-F1 like, NR6germ cell nuclear factor like, and NR0knirps like. These rates were derived by the jackknife tests on a stringent benchmark dataset in which none of protein sequences included has pairwise sequence identity to any other in a same subset. As a user-friendly web-server, iNR-PhysChem is freely accessible to the public at either http://www.jci-bioinfo.cn/iNR-PhysChem or http://icpr.jci.edu.cn/bioinfo/iNR-PhysChem. Also a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics involved in developing the predictor. It is anticipated that iNR-PhysChem may become a useful high throughput tool for both basic research and drug design.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China.
| | | | | |
Collapse
|
14
|
Buzón V, Carbó LR, Estruch SB, Fletterick RJ, Estébanez-Perpiñá E. A conserved surface on the ligand binding domain of nuclear receptors for allosteric control. Mol Cell Endocrinol 2012; 348:394-402. [PMID: 21878368 DOI: 10.1016/j.mce.2011.08.012] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/12/2011] [Revised: 08/08/2011] [Accepted: 08/12/2011] [Indexed: 12/26/2022]
Abstract
Nuclear receptors (NRs) form a large superfamily of transcription factors that participate in virtually every key biological process. They control development, fertility, gametogenesis and are misregulated in many cancers. Their enormous functional plasticity as transcription factors relates in part to NR-mediated interactions with hundreds of coregulatory proteins upon ligand (e.g., hormone) binding to their ligand binding domains (LBD), or following covalent modification. Some coregulator association relates to the distinct residues that shape a coactivator binding pocket termed AF-2, a surface groove that primarily determines the preference and specificity of protein-protein interactions. However, the highly conserved AF-2 pocket in the NR superfamily appears to be insufficient to account for NR subtype specificity leading to fine transcriptional modulation in certain settings. Additional protein-protein interaction surfaces, most notably on their LBD, may contribute to modulating NR function. NR coregulators and chaperones, normally much larger than the NR itself, may also bind to such interfaces. In the case of the androgen receptor (AR) LBD surface, structural and functional data highlighted the presence of another site named BF-3, which lies at a distinct but topographically adjacent surface to AF-2. AR BF-3 is a hot spot for mutations involved in prostate cancer and androgen insensitivity syndromes, and some FDA-approved drugs bind at this site. Structural studies suggested an allosteric relationship between AF-2 and BF-3, as occupancy of the latter affected coactivator recruitment to AF-2. Physiological relevant partners of AR BF-3 have not been described as yet. The newly discovered site is highly conserved among the steroid receptors subclass, but is also present in other NRs. Several missense mutations in the BF-3 regions of these human NRs are implicated in pathology and affect their function in vitro. The fact that AR BF-3 pocket is a druggable site evidences its pharmacological potential. Compounds that may affect allosterically NR function by binding to BF-3 open promising avenues to develop type-specific NR modulators.
Collapse
Affiliation(s)
- Víctor Buzón
- Institut de Biomedicina, Universitat de Barcelona, Baldiri Reixac 15-21, Parc Científic de Barcelona, 08028 Barcelona, Spain
| | | | | | | | | |
Collapse
|
15
|
Abstract
Ligand binding to receptors is a key step in the regulation of cellular function by neurotransmitters, hormones, and many drugs. Not surprisingly then, genome projects have found that families of receptor genes form the largest groups of functional genes in mammalian genomes. A large body of experimental data have thus been generated on receptor-ligand interactions, and in turn, numerous computational tools for the in silico prediction of receptor-ligand interactions have been developed. Websites containing ligand binding data and tools to assess and manipulate such data are available in the public domain. Such Websites provide a resource for experimentalists studying receptor binding and for scientists interested in utilizing large data sets for other purposes, which include modeling structure-function relationships, defining patterns of interactions of drugs with different receptors, and computational comparisons among receptors. The Websites include databases of receptor protein and nucleotide sequences for particular classes of receptors (such as G-protein-coupled receptors and nuclear receptors) and of experimental results from receptor-ligand binding assays, as well as computational tools for modeling the interactions between ligands and receptors and predicting the function of orphan receptors. In this chapter, we provide information and Uniform Resource Locators (URLs) for Websites that facilitate computational and experimental studies of receptor-ligand interactions. This list will be periodically updated at https://sites.google.com/site/receptorligandbinding/.
Collapse
Affiliation(s)
- Brinda K Rana
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA.
| | | | | |
Collapse
|
16
|
Fanelli F, De Benedetti PG. Update 1 of: computational modeling approaches to structure-function analysis of G protein-coupled receptors. Chem Rev 2011; 111:PR438-535. [PMID: 22165845 DOI: 10.1021/cr100437t] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Francesca Fanelli
- Dulbecco Telethon Institute, University of Modena and Reggio Emilia, via Campi 183, 41125 Modena, Italy.
| | | |
Collapse
|
17
|
Vroling B, Thorne D, McDermott P, Joosten HJ, Attwood TK, Pettifer S, Vriend G. NucleaRDB: information system for nuclear receptors. Nucleic Acids Res 2011; 40:D377-80. [PMID: 22064856 PMCID: PMC3245090 DOI: 10.1093/nar/gkr960] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The NucleaRDB is a Molecular Class-Specific Information System that collects, combines, validates and disseminates large amounts of heterogeneous data on nuclear hormone receptors. It contains both experimental and computationally derived data. The data and knowledge present in the NucleaRDB can be accessed using a number of different interactive and programmatic methods and query systems. A nuclear hormone receptor-specific PDF reader interface is available that can integrate the contents of the NucleaRDB with full-text scientific articles. The NucleaRDB is freely available at http://www.receptors.org/nucleardb.
Collapse
Affiliation(s)
- Bas Vroling
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Nijmegen, Bio-Prodict, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
18
|
Toledo D, Ramon E, Aguilà M, Cordomí A, Pérez JJ, Mendes HF, Cheetham ME, Garriga P. Molecular mechanisms of disease for mutations at Gly-90 in rhodopsin. J Biol Chem 2011; 286:39993-40001. [PMID: 21940625 DOI: 10.1074/jbc.m110.201517] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Two different mutations at Gly-90 in the second transmembrane helix of the photoreceptor protein rhodopsin have been proposed to lead to different phenotypes. G90D has been classically associated with congenital night blindness, whereas the newly reported G90V substitution was linked to a retinitis pigmentosa phenotype. Here, we used Val/Asp replacements of the native Gly at position 90 to unravel the structure/function divergences caused by these mutations and the potential molecular mechanisms of inherited retinal disease. The G90V and G90D mutants have a similar conformation around the Schiff base linkage region in the dark state and same regeneration kinetics with 11-cis-retinal, but G90V has dramatically reduced thermal stability when compared with the G90D mutant rhodopsin. The G90V mutant also shows, like G90D, an altered photobleaching pattern and capacity to activate Gt in the opsin state. Furthermore, the regeneration of the G90V mutant with 9-cis-retinal was improved, achieving the same A(280)/A(500) as wild type isorhodopsin. Hydroxylamine resistance was also recovered, indicating a compact structure around the Schiff base linkage, and the thermal stability was substantially improved when compared with the 11-cis-regenerated mutant. These results support the role of thermal instability and/or abnormal photoproduct formation in eliciting a retinitis pigmentosa phenotype. The improved stability and more compact structure of the G90V mutant when it was regenerated with 9-cis-retinal brings about the possibility that this isomer or other modified retinoid analogues might be used in potential treatment strategies for mutants showing the same structural features.
Collapse
Affiliation(s)
- Darwin Toledo
- Centre de Biotecnologia Molecular, Departament d'Enginyeria Química, Universitat Politècnica de Catalunya, 08222 Terrassa, Spain
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Yera ER, Cleves AE, Jain AN. Chemical structural novelty: on-targets and off-targets. J Med Chem 2011; 54:6771-85. [PMID: 21916467 DOI: 10.1021/jm200666a] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Drug structures may be quantitatively compared based on 2D topological structural considerations and based on 3D characteristics directly related to binding. A framework for combining multiple similarity computations is presented along with its systematic application to 358 drugs with overlapping pharmacology. Given a new molecule along with a set of molecules sharing some biological effect, a single score based on comparison to the known set is produced, reflecting either 2D similarity, 3D similarity, or their combination. For prediction of primary targets, the benefit of 3D over 2D was relatively small, but for prediction of off-targets, the added benefit was large. In addition to assessing prediction, the relationship between chemical similarity and pharmacological novelty was studied. Drug pairs that shared high 3D similarity but low 2D similarity (i.e., a novel scaffold) were shown to be much more likely to exhibit pharmacologically relevant differences in terms of specific protein target modulation.
Collapse
Affiliation(s)
- Emmanuel R Yera
- University of California, San Francisco, Department of Bioengineering and Therapeutic Sciences, Helen Diller Family Comprehensive Cancer Center, San Francisco, California 94158, United States
| | | | | |
Collapse
|
20
|
Wang P, Xiao X, Chou KC. NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features. PLoS One 2011; 6:e23505. [PMID: 21858146 PMCID: PMC3156231 DOI: 10.1371/journal.pone.0023505] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2011] [Accepted: 07/19/2011] [Indexed: 11/18/2022] Open
Abstract
Nuclear receptors (NRs) are one of the most abundant classes of transcriptional regulators in animals. They regulate diverse functions, such as homeostasis, reproduction, development and metabolism. Therefore, NRs are a very important target for drug development. Nuclear receptors form a superfamily of phylogenetically related proteins and have been subdivided into different subfamilies due to their domain diversity. In this study, a two-level predictor, called NR-2L, was developed that can be used to identify a query protein as a nuclear receptor or not based on its sequence information alone; if it is, the prediction will be automatically continued to further identify it among the following seven subfamilies: (1) thyroid hormone like (NR1), (2) HNF4-like (NR2), (3) estrogen like, (4) nerve growth factor IB-like (NR4), (5) fushi tarazu-F1 like (NR5), (6) germ cell nuclear factor like (NR6), and (7) knirps like (NR0). The identification was made by the Fuzzy K nearest neighbor (FK-NN) classifier based on the pseudo amino acid composition formed by incorporating various physicochemical and statistical features derived from the protein sequences, such as amino acid composition, dipeptide composition, complexity factor, and low-frequency Fourier spectrum components. As a demonstration, it was shown through some benchmark datasets derived from the NucleaRDB and UniProt with low redundancy that the overall success rates achieved by the jackknife test were about 93% and 89% in the first and second level, respectively. The high success rates indicate that the novel two-level predictor can be a useful vehicle for identifying NRs and their subfamilies. As a user-friendly web server, NR-2L is freely accessible at either http://icpr.jci.edu.cn/bioinfo/NR2L or http://www.jci-bioinfo.cn/NR2L. Each job submitted to NR-2L can contain up to 500 query protein sequences and be finished in less than 2 minutes. The less the number of query proteins is, the shorter the time will usually be. All the program codes for NR-2L are available for non-commercial purpose upon request.
Collapse
Affiliation(s)
- Pu Wang
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
- Gordon Life Science Institute, San Diego, California, United States of America
- * E-mail:
| | - Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, California, United States of America
| |
Collapse
|
21
|
Lusher SJ, McGuire R, Azevedo R, Boiten JW, van Schaik RC, de Vlieg J. A molecular informatics view on best practice in multi-parameter compound optimization. Drug Discov Today 2011; 16:555-68. [DOI: 10.1016/j.drudis.2011.05.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2010] [Revised: 02/25/2011] [Accepted: 05/06/2011] [Indexed: 01/30/2023]
|
22
|
Functional and Structural Overview of G-Protein-Coupled Receptors Comprehensively Obtained from Genome Sequences. Pharmaceuticals (Basel) 2011. [PMCID: PMC4055883 DOI: 10.3390/ph4040652] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
An understanding of the functional mechanisms of G-protein-coupled receptors (GPCRs) is very important for GPCR-related drug design. We have developed an integrated GPCR database (SEVENS http://sevens.cbrc.jp/) that includes 64,090 reliable GPCR genes comprehensively identified from 56 eukaryote genome sequences, and overviewed the sequences and structure spaces of the GPCRs. In vertebrates, the number of receptors for biological amines, peptides, etc. is conserved in most species, whereas the number of chemosensory receptors for odorant, pheromone, etc. significantly differs among species. The latter receptors tend to be single exon type or a few exon type and show a high ratio in the numbers of GPCRs, whereas some families, such as Class B and Class C receptors, have long lengths due to the presence of many exons. Statistical analyses of amino acid residues reveal that most of the conserved residues in Class A GPCRs are found in the cytoplasmic half regions of transmembrane (TM) helices, while residues characteristic to each subfamily found on the extracellular half regions. The 69 of Protein Data Bank (PDB) entries of complete or fragmentary structures could be mapped on the TM/loop regions of Class A GPCRs covering 14 subfamilies.
Collapse
|
23
|
McKenna NJ. Discovery-driven research and bioinformatics in nuclear receptor and coregulator signaling. Biochim Biophys Acta Mol Basis Dis 2010; 1812:808-17. [PMID: 21029773 DOI: 10.1016/j.bbadis.2010.10.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Revised: 10/18/2010] [Accepted: 10/19/2010] [Indexed: 10/18/2022]
Abstract
Nuclear receptors (NRs) are a superfamily of ligand-regulated transcription factors that interact with coregulators and other transcription factors to direct tissue-specific programs of gene expression. Recent years have witnessed a rapid acceleration of the output of high-content data platforms in this field, generating discovery-driven datasets that have collectively described: the organization of the NR superfamily (phylogenomics); the expression patterns of NRs, coregulators and their target genes (transcriptomics); ligand- and tissue-specific functional NR and coregulator sites in DNA (cistromics); the organization of nuclear receptors and coregulators into higher order complexes (proteomics); and their downstream effects on homeostasis and metabolism (metabolomics). Significant bioinformatics challenges lie ahead both in the integration of this information into meaningful models of NR and coregulator biology, as well as in the archiving and communication of datasets to the global nuclear receptor signaling community. While holding great promise for the field, the ascendancy of discovery-driven research in this field brings with it a collective responsibility for researchers, publishers and funding agencies alike to ensure the effective archiving and management of these data. This review will discuss factors lying behind the increasing impact of discovery-driven research, examples of high-content datasets and their bioinformatic analysis, as well as a summary of currently curated web resources in this field. This article is part of a Special Issue entitled: Translating nuclear receptors from health to disease.
Collapse
Affiliation(s)
- Neil J McKenna
- Department of Molecular and Cellular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
| |
Collapse
|
24
|
Triplet T, Shortridge MD, Griep MA, Stark JL, Powers R, Revesz P. PROFESS: a PROtein function, evolution, structure and sequence database. Database (Oxford) 2010; 2010:baq011. [PMID: 20624718 PMCID: PMC2911846 DOI: 10.1093/database/baq011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2009] [Revised: 06/03/2010] [Accepted: 06/06/2010] [Indexed: 11/13/2022]
Abstract
The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are approximately 1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein-protein interaction networks. Database URL: http://cse.unl.edu/~profess/
Collapse
Affiliation(s)
- Thomas Triplet
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Matthew D. Shortridge
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Mark A. Griep
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Jaime L. Stark
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Robert Powers
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| | - Peter Revesz
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115 and Department of Chemistry, University of Nebraska-Lincoln, Lincoln NE 68588-0304, USA
| |
Collapse
|
25
|
Stead LF, Wood IC, Westhead DR. KvDB; mining and mapping sequence variants in voltage-gated potassium channels. Hum Mutat 2010; 31:908-17. [DOI: 10.1002/humu.21295] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
26
|
Suwa M, Ono Y. Computational overview of GPCR gene universe to support reverse chemical genomics study. Methods Mol Biol 2010; 577:41-54. [PMID: 19718507 DOI: 10.1007/978-1-60761-232-2_4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
Abstract
In order to support high-throughput screening for ligands of G-protein coupled receptors (GPCRs) by using bioinformatics technology, we introduce a database (SEVENS) with genome-scale annotation and software (GRIFFIN) that can simulate GPCR function. SEVENS ( http://sevens.cbrc.jp/ ) is an integrated database that includes GPCR genes that are identified with high accuracy (99.4% sensitivity and 96.6% specificity) from various types of genomes, by a pipeline that integrates such software as a gene finder, a sequence alignment tool, a motif and domain assignment tool, and a transmembrane helix (TMH) predictor. SEVENS provides the user a genome-scale overview of the "GPCR universe" with detailed information of chromosomal mapping, phylogenetic tree, protein sequence and structure, and experimental evidence, all of which are accessible via a user-friendly interface. GRIFFIN ( http://griffin.cbrc.jp/ ) can predict GPCR and G-protein coupling selectivity induced by ligand binding with high sensitivity and specificity (more than 87% on average), based on the support vector machine (SVM) and hidden Markov Model (HMM). SEVENS and GRIFFIN are expected to contribute to revealing the function of orphan and unknown GPCRs.
Collapse
Affiliation(s)
- Makiko Suwa
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | | |
Collapse
|
27
|
Kuipers RK, Joosten HJ, van Berkel WJH, Leferink NGH, Rooijen E, Ittmann E, van Zimmeren F, Jochens H, Bornscheuer U, Vriend G, Martins dos Santos VAP, Schaap PJ. 3DM: Systematic analysis of heterogeneous superfamily data to discover protein functionalities. Proteins 2010; 78:2101-13. [DOI: 10.1002/prot.22725] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
28
|
Langham JJ, Cleves AE, Spitzer R, Kirshner D, Jain AN. Physical binding pocket induction for affinity prediction. J Med Chem 2009; 52:6107-25. [PMID: 19754201 DOI: 10.1021/jm901096y] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Computational methods for predicting ligand affinity where no protein structure is known generally take the form of regression analysis based on molecular features that have only a tangential relationship to a protein/ligand binding event. Such methods have limited utility when structural variation moves beyond congeneric series. We present a novel approach based on the multiple-instance learning method of Compass, where a physical model of a binding site is induced from ligands and their corresponding activity data. The model consists of molecular fragments that can account for multiple positions of literal protein residues. We demonstrate the method on 5HT1a ligands by training on a series with limited scaffold variation and testing on numerous ligands with variant scaffolds. Predictive error was between 0.5 and 1.0 log units (0.7-1.4 kcal/mol), with statistically significant rank correlations. Accurate activity predictions of novel ligands were demonstrated using a validation approach where a small number of ligands of limited structural variation known at a fixed time point were used to make predictions on a blind test set of widely varying molecules, some discovered at a much later time point.
Collapse
Affiliation(s)
- James J Langham
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California 94158-9001, USA
| | | | | | | | | |
Collapse
|
29
|
Wohlfahrt G, Sipilä J, Pietilä LO. Field-based comparison of ligand and coactivator binding sites of nuclear receptors. Biopolymers 2009; 91:884-94. [DOI: 10.1002/bip.21273] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
30
|
Rashid M, Singla D, Sharma A, Kumar M, Raghava GPS. Hmrbase: a database of hormones and their receptors. BMC Genomics 2009; 10:307. [PMID: 19589147 PMCID: PMC2720991 DOI: 10.1186/1471-2164-10-307] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Accepted: 07/09/2009] [Indexed: 12/04/2022] Open
Abstract
Background Hormones are signaling molecules that play vital roles in various life processes, like growth and differentiation, physiology, and reproduction. These molecules are mostly secreted by endocrine glands, and transported to target organs through the bloodstream. Deficient, or excessive, levels of hormones are associated with several diseases such as cancer, osteoporosis, diabetes etc. Thus, it is important to collect and compile information about hormones and their receptors. Description This manuscript describes a database called Hmrbase which has been developed for managing information about hormones and their receptors. It is a highly curated database for which information has been collected from the literature and the public databases. The current version of Hmrbase contains comprehensive information about ~2000 hormones, e.g., about their function, source organism, receptors, mature sequences, structures etc. Hmrbase also contains information about ~3000 hormone receptors, in terms of amino acid sequences, subcellular localizations, ligands, and post-translational modifications etc. One of the major features of this database is that it provides data about ~4100 hormone-receptor pairs. A number of online tools have been integrated into the database, to provide the facilities like keyword search, structure-based search, mapping of a given peptide(s) on the hormone/receptor sequence, sequence similarity search. This database also provides a number of external links to other resources/databases in order to help in the retrieving of further related information. Conclusion Owing to the high impact of endocrine research in the biomedical sciences, the Hmrbase could become a leading data portal for researchers. The salient features of Hmrbase are hormone-receptor pair-related information, mapping of peptide stretches on the protein sequences of hormones and receptors, Pfam domain annotations, categorical browsing options, online data submission, DrugPedia linkage etc. Hmrbase is available online for public from .
Collapse
Affiliation(s)
- Mamoon Rashid
- Bioinformatics Centre, Institute of Microbial Technology, Chandigarh-160036, India.
| | | | | | | | | |
Collapse
|
31
|
Gupta R, Mittal A, Singh K, Narang V, Roy S. Time-series approach to protein classification problem: WaVe-GPCR: wavelet variant feature for identification and classification of GPCR. IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE : THE QUARTERLY MAGAZINE OF THE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY 2009; 28:32-37. [PMID: 19622422 DOI: 10.1109/memb.2009.932903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Affiliation(s)
- Ravi Gupta
- Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee, Uttarakhand, India.
| | | | | | | | | |
Collapse
|
32
|
BAFF-R promotes cell proliferation and survival through interaction with IKKbeta and NF-kappaB/c-Rel in the nucleus of normal and neoplastic B-lymphoid cells. Blood 2009; 113:4627-36. [PMID: 19258594 DOI: 10.1182/blood-2008-10-183467] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BLyS and its major receptor BAFF-R have been shown to be critical for development and homeostasis of normal B lymphocytes, and for cell growth and survival of neoplastic B lymphocytes, but the biologic mechanisms of this ligand/receptor-derived intracellular signaling pathway(s) have not been completely defined. We have discovered that the BAFF-R protein was present in the cell nucleus, in addition to its integral presence in the plasma membrane and cytoplasm, in both normal and neoplastic B cells. BAFF-R interacted with histone H3 and IKKbeta in the cell nucleus, enhancing histone H3 phosphorylation through IKKbeta. Nuclear BAFF-R was also associated with NF-kappaB/c-Rel and bound to NF-kappaB targeted promoters including BLyS, CD154, Bcl-xL, IL-8, and Bfl-1/A1, promoting the transcription of these genes. These observations suggested that in addition to activating NF-kappaB pathways in the plasma membrane, BAFF-R also promotes normal B-cell and B-cell non-Hodgkin lymphoma (NHL-B) survival and proliferation by functioning as a transcriptional regulator through a chromatin remodeling mechanism(s) and NF-kappaB association. Our studies provide an expanded conceptual view of the BAFF-R signaling, which should contribute a better understanding of the physiologic mechanisms involved in normal B-cell survival and growth, as well as in the pathophysiology of aggressive B-cell malignancies and autoimmune diseases.
Collapse
|
33
|
Gao QB, Jin ZC, Ye XF, Wu C, He J. Prediction of nuclear receptors with optimal pseudo amino acid composition. Anal Biochem 2009; 387:54-9. [PMID: 19454254 DOI: 10.1016/j.ab.2009.01.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2008] [Revised: 12/04/2008] [Accepted: 01/09/2009] [Indexed: 10/21/2022]
Abstract
Nuclear receptors are involved in multiple cellular signaling pathways that affect and regulate processes such as organ development and maintenance, ion transport, homeostasis, and apoptosis. In this article, an optimal pseudo amino acid composition based on physicochemical characters of amino acids is suggested to represent proteins for predicting the subfamilies of nuclear receptors. Six physicochemical characters of amino acids were adopted to generate the protein sequence features via web server PseAAC. The optimal values of the rank of correlation factor and the weighting factor about PseAAC were determined to get the appropriate descriptor of proteins that leads to the best performance. A nonredundant dataset of nuclear receptors in four subfamilies is constructed to evaluate the method using support vector machines. An overall accuracy of 99.6% was achieved in the fivefold cross-validation test as well as the jackknife test, and an overall accuracy of 98.4% was reached in a blind dataset test. The performance is very competitive with that of some previous methods.
Collapse
Affiliation(s)
- Qing-Bin Gao
- Department of Health Statistics, Second Military Medical University, Shanghai 200433, China
| | | | | | | | | |
Collapse
|
34
|
Grau B, Eilert JC, Munck S, Harz H. Adenosine induces growth-cone turning of sensory neurons. Purinergic Signal 2008; 4:357-64. [PMID: 18777107 PMCID: PMC2583205 DOI: 10.1007/s11302-008-9121-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2008] [Accepted: 08/04/2008] [Indexed: 11/24/2022] Open
Abstract
The formation of appropriate connections between neurons and their specific targets is an essential step during development and repair of the nervous system. Growth cones are located at the leading edges of the growing neurites and respond to environmental cues in order to be guided to their final targets. Directional information can be coded by concentration gradients of substrate-bound or diffusible-guidance molecules. Here we show that concentration gradients of adenosine stimulate growth cones of sensory neurons (dorsal root ganglia) from chicken embryos to turn towards the adenosine source. This response is mediated by adenosine receptors. The subsequent signal transduction process involves cAMP. It may be speculated that the in vivo function of this response is concerned with the formation or the repair and regeneration of the peripheral nervous system.
Collapse
Affiliation(s)
- Benjamin Grau
- Department of Cellular, Molecular, and Developmental Neurobiology, Cajal Institute, C.S.I.C., Ave. Dr. Arce 37, Madrid, 28002, Spain
| | | | | | | |
Collapse
|
35
|
Brown DP. Efficient functional clustering of protein sequences using the Dirichlet process. Bioinformatics 2008; 24:1765-71. [PMID: 18511467 DOI: 10.1093/bioinformatics/btn244] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Automatic clustering of protein sequences is an important problem in computational biology. The recent explosion in genome sequences has given biological researchers a vast number of novel protein sequences. However, the majority of these sequences have no experimental evidence for their molecular function in the cell, and the responsibility for correctly annotating these sequences falls upon the bioinformatics community. Ideally, we would like to be able to group sequences of similar or identical molecular function in an automatic fashion, without relying on experimental evidence. RESULTS In this article I present a novel probabilistic framework that models subfamilies within a known protein family. Given a multiple sequence alignment, the model uses Dirichlet mixture densities to estimate amino acid preferences within subfamily clusters, and places a Dirichlet process prior on the overall set of clusters. Based on results from several datasets, the model breaks data accurately into functional subgroups. AVAILABILITY The algorithm is implemented as c++ software available at bpg-research.berkeley.edu/approximately duncanb/dpcluster/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Duncan P Brown
- Department of Bioengineering, UC Berkeley and Merck & Co, Inc, 1700 Owens St, San Francisco, CA 94158, USA.
| |
Collapse
|
36
|
Thomas J, Ramakrishnan N, Bailey-Kellogg C. Graphical models of residue coupling in protein families. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:183-197. [PMID: 18451428 DOI: 10.1109/tcbb.2007.70225] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common than others. While the proposed approaches have proven useful in finding and describing coupling, a significant missing component is a formal probabilistic model that explicates and compactly represents the coupling, integrates information about sequence,structure, and function, and supports inferential procedures for analysis, diagnosis, and prediction.We present an approach to learning and using probabilistic graphical models of residue coupling. These models capture significant conservation and coupling constraints observable ina multiply-aligned set of sequences. Our approach can place a structural prior on considered couplings, so that all identified relationships have direct mechanistic explanations. It can also incorporate information about functional classes, and thereby learn a differential graphical model that distinguishes constraints common to all classes from those unique to individual classes. Such differential models separately account for class-specific conservation and family-wide coupling, two different sources of sequence covariation. They are then able to perform interpretable functional classification of new sequences, explaining classification decisions in terms of the underlying conservation and coupling constraints. We apply our approach in studies of both G protein-coupled receptors and PDZ domains, identifying and analyzing family-wide and class-specific constraints, and performing functional classification. The results demonstrate that graphical models of residue coupling provide a powerful tool for uncovering, representing, and utilizing significant sequence structure-function relationships in protein families.
Collapse
Affiliation(s)
- John Thomas
- Department of Computer Science, Dartmouth College, Sudikoff Laboratory, Hanover, NH 03755, USA.
| | | | | |
Collapse
|
37
|
Kedarisetti KD, Dick S, Kurgan L. Searching for factors that distinguish disease-prone and disease-resistant prions via sequence analysis. Bioinform Biol Insights 2008; 2:133-44. [PMID: 19812771 DOI: 10.4137/bbi.s550] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
The exact mechanisms of prion misfolding and factors that predispose an individual to prion diseases are largely unknown. Our approach to identifying candidate factors in-silico relies on contrasting the C-terminal domain of PrP(C) sequences from two groups of vertebrate species: those that have been found to suffer from prion diseases, and those that have not. We propose that any significant differences between the two groups are candidate factors that may predispose individuals to develop prion disease, which should be further analyzed by wet-lab investigations. Using an array of computational methods we identified possible point mutations that could predispose PrP(C) to misfold into PrP(Sc). Our results include confirmatory findings such as the V210I mutation, and new findings including P137M, G142D, G142N, D144P, K185T, V189I, H187Y and T191P mutations, which could impact structural stability. We also propose new hypotheses that give insights into the stability of helix-2 and -3. These include destabilizing effects of Histidine and T188-T193 segment in helix-2 in the disease-prone prions, and a stabilizing effect of Leucine on helix-3 in the disease-resistant prions.
Collapse
Affiliation(s)
- Kanaka Durga Kedarisetti
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada
| | | | | |
Collapse
|
38
|
Kemmer D, Podowski RM, Yusuf D, Brumm J, Cheung W, Wahlestedt C, Lenhard B, Wasserman WW. Gene characterization index: assessing the depth of gene annotation. PLoS One 2008; 3:e1440. [PMID: 18213364 PMCID: PMC2194620 DOI: 10.1371/journal.pone.0001440] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2007] [Accepted: 12/16/2007] [Indexed: 11/19/2022] Open
Abstract
Background We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets. Methodology/Principal Findings The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation. Conclusions/Significance The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/.
Collapse
Affiliation(s)
- Danielle Kemmer
- Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, Canada
| | - Raf M. Podowski
- Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden
| | - Dimas Yusuf
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, Canada
| | - Jochen Brumm
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, Canada
- Department of Statistics, University of British Columbia, Vancouver, Canada
| | - Warren Cheung
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, Canada
| | - Claes Wahlestedt
- Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden
- Molecular and Integrative Neurosciences Department, The Scripps Research Institute, Jupiter, Florida, United States of America
| | - Boris Lenhard
- Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden
- Computational Biology Unit, Bergen Center for Computational Science, Sars International Centre for Marine Molecular Biology, Unifob AS, University of Bergen, Bergen, Norway
| | - Wyeth W. Wasserman
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, Canada
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
39
|
Metpally RPR, Vigneshwar R, Sowdhamini R. Genome inventory and analysis of nuclear hormone receptors in Tetraodon nigroviridis. J Biosci 2007; 32:43-50. [PMID: 17426379 DOI: 10.1007/s12038-007-0005-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Nuclear hormone receptors (NRs) form a large superfamily of ligand-activated transcription factors, which regulate genes underlying a wide range of (patho) physiological phenomena. Availability of the full genome sequence of Tetraodon nigroviridis facilitated a genome wide analysis of the NRs in fish genome. Seventy one NRs were found in Tetraodon and were compared with mammalian and fish NR family members. In general, there is a higher representation of NRs in fish genomes compared to mammalian ones. They showed high diversity across classes as observed by phylogenetic analysis. Nucleotide substitution rates show strong negative selection among fish NRs except for pregnane x receptor (PxR), estrogen receptor (ER) and liver x receptor (LxR). This may be attributed to crucial role played by them in metabolism and detoxification of xenobiotic and endobiotic compounds and might have resulted in slight positive selection. Chromosomal mapping and pairwise comparisons of NR distribution in Tetraodon and humans led to the identification of nine syntenic NR regions, of which three are common among fully sequenced vertebrate genomes. Gene structure analysis shows strong conservation of exon structures among orthologoues. Whereas paralogous members show different splicing patterns with intron gain or loss and addition or substitution of exons played a major role in evolution of NR superfamily.
Collapse
Affiliation(s)
- Raghu Prasad Rao Metpally
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK Campus, Bellary Road, Bangalore 560 065, India
| | | | | |
Collapse
|
40
|
Brown DP, Krishnamurthy N, Sjölander K. Automated protein subfamily identification and classification. PLoS Comput Biol 2007; 3:e160. [PMID: 17708678 PMCID: PMC1950344 DOI: 10.1371/journal.pcbi.0030160] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2006] [Accepted: 06/25/2007] [Indexed: 11/22/2022] Open
Abstract
Function prediction by homology is widely used to provide preliminary functional annotations for genes for which experimental evidence of function is unavailable or limited. This approach has been shown to be prone to systematic error, including percolation of annotation errors through sequence databases. Phylogenomic analysis avoids these errors in function prediction but has been difficult to automate for high-throughput application. To address this limitation, we present a computationally efficient pipeline for phylogenomic classification of proteins. This pipeline uses the SCI-PHY (Subfamily Classification in Phylogenomics) algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression. Subfamily HMM parameters are estimated using an information-sharing protocol, enabling subfamilies containing even a single sequence to benefit from conservation patterns defining the family as a whole or in related subfamilies. SCI-PHY subfamilies correspond closely to functional subtypes defined by experts and to conserved clades found by phylogenetic analysis. Extensive comparisons of subfamily and family HMM performances show that subfamily HMMs dramatically improve the separation between homologous and non-homologous proteins in sequence database searches. Subfamily HMMs also provide extremely high specificity of classification and can be used to predict entirely novel subtypes. The SCI-PHY Web server at http://phylogenomics.berkeley.edu/SCI-PHY/ allows users to upload a multiple sequence alignment for subfamily identification and subfamily HMM construction. Biologists wishing to provide their own subfamily definitions can do so. Source code is available on the Web page. The Berkeley Phylogenomics Group PhyloFacts resource contains pre-calculated subfamily predictions and subfamily HMMs for more than 40,000 protein families and domains at http://phylogenomics.berkeley.edu/phylofacts/.
Collapse
Affiliation(s)
- Duncan P Brown
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| | - Nandini Krishnamurthy
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| | - Kimmen Sjölander
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| |
Collapse
|
41
|
Sun K, Montana V, Chellappa K, Brelivet Y, Moras D, Maeda Y, Parpura V, Paschal BM, Sladek FM. Phosphorylation of a conserved serine in the deoxyribonucleic acid binding domain of nuclear receptors alters intracellular localization. Mol Endocrinol 2007; 21:1297-311. [PMID: 17389749 DOI: 10.1210/me.2006-0300] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Nuclear receptors (NRs) are a superfamily of transcription factors whose genomic functions are known to be activated by lipophilic ligands, but little is known about how to deactivate them or how to turn on their nongenomic functions. One obvious mechanism is to alter the nuclear localization of the receptors. Here, we show that protein kinase C (PKC) phosphorylates a highly conserved serine (Ser) between the two zinc fingers of the DNA binding domain of orphan receptor hepatocyte nuclear factor 4alpha (HNF4alpha). This Ser (S78) is adjacent to several positively charged residues (Arg or Lys), which we show here are involved in nuclear localization of HNF4alpha and are conserved in nearly all other NRs, along with the Ser/threonine (Thr). A phosphomimetic mutant of HNF4alpha (S78D) reduced DNA binding, transactivation ability, and protein stability. It also impaired nuclear localization, an effect that was greatly enhanced in the MODY1 mutant Q268X. Treatment of the hepatocellular carcinoma cell line HepG2 with PKC activator phorbol 12-myristate 13-acetate also resulted in increased cytoplasmic localization of HNF4alpha as well as decreased endogenous HNF4alpha protein levels in a proteasome-dependent fashion. We also show that PKC phosphorylates the DNA binding domain of other NRs (retinoic acid receptor alpha, retinoid X receptor alpha, and thyroid hormone receptor beta) and that phosphomimetic mutants of the same Ser/Thr result in cytoplasmic localization of retinoid X receptor alpha and peroxisome proliferator-activated receptor alpha. Thus, phosphorylation of this conserved Ser between the two zinc fingers may be a common mechanism for regulating the function of NRs.
Collapse
Affiliation(s)
- Kai Sun
- Environmental Toxicology Graduate Program, University of California, Riverside, California 92521, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Kuniyeda K, Okuno T, Terawaki K, Miyano M, Yokomizo T, Shimizu T. Identification of the Intracellular Region of the Leukotriene B4 Receptor Type 1 That Is Specifically Involved in Gi Activation. J Biol Chem 2007; 282:3998-4006. [PMID: 17158791 DOI: 10.1074/jbc.m610540200] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Many G-protein-coupled receptors can activate more than one G-protein subfamily member. Leukotriene B4 receptor type 1 (BLT1) is a high affinity G-protein-coupled receptors for leukotriene B4 functioning in host defense, inflammation, and immunity. Previous studies have shown that BLT1 utilizes different G-proteins (the Gi family and G16 G-proteins) in mediating diverse cellular events and that truncation of the cytoplasmic tail of BLT1 does not impair activation of Gi and G16 proteins. To determine responsive regions of BLT1 for G-protein coupling, we performed an extensive mutagenesis study of its intracellular loops. Three intracellular loops (i1, i2, and i3) of BLT1 were found to be important for both Gi and G16 coupling, as judged by Gi-dependent guanosine 5'-(gamma-thio) triphosphate (GTPgammaS) binding and G16-dependent inositol phosphate accumulation assays. The i3-1 mutant, with a mutation at the i3 amino terminus, exhibited greatly reduced GTPgammaS binding but intact inositol phosphate accumulation triggered by leukotriene B4 stimulation. These results suggest that the i3-1 region is required only for Gi activation. Moreover, in the i3-1 mutant, the deficiency in Gi activation was accompanied by a loss of the high affinity leukotriene B4 binding state seen with the wild type receptor. A three-dimensional model of BLT1 constructed based on the structure of bovine rhodopsin suggests that the i3-1 region may consist of the cytoplasmic end of the transmembrane helix V, which protrudes the helix into the cytoplasm. From mutational studies and three-dimensional modeling, we propose that the extended cytoplasmic helix connected to the transmembrane helix V of BLT1 might be a key region for selective activation of Gi proteins.
Collapse
Affiliation(s)
- Kanako Kuniyeda
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | | | | | | | | | | |
Collapse
|
43
|
Odani N, Pfaff SL, Nakamura H, Funahashi JI. Cloning and developmental expression of a chick G-protein-coupled receptor SCGPR1. Gene Expr Patterns 2007; 7:375-80. [PMID: 17251065 DOI: 10.1016/j.modgep.2006.12.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2006] [Revised: 12/11/2006] [Accepted: 12/12/2006] [Indexed: 10/23/2022]
Abstract
To identify spinal motor neuron subtype-specific transcripts, we employed a single cell subtractive screen of mRNAs in chick embryos. We cloned a differentially expressed gene that termed spinal cord G-protein-coupled receptor 1 (SCGPR1) from its expression pattern that change dynamically in the developing spinal cord. The vertebrate orthologue of SCGPR1 is termed Gpr37 (GPCR/CNS1, ET(B)R-LP-1, Pael-R), however the specific ligand of this receptor has not been identified. Recent studies indicate that Pael-R can associate with parkin, a ubiquitin ligase which accumulates in Lewy bodies in dopaminergic neurons and is associated with Parkinson's disease. Although SCGPR1 (Gpr37) expression has been examined in adult tissues, the embryonic expression has not reported. Here, we have defined the expression pattern of SCGPR1 by in situ hybridization during chick development. SCGPR1 was first detected at HH stage 7 in the neural tube and notochord. As development progressed, SCGPR1 expression became restricted to the ventral neural tube. SCGPR1 expression was also present in the developing telencephalon, mesencephalon, retina, visceral-class motor neurons, myotome and thyroid invagination.
Collapse
Affiliation(s)
- Noritaka Odani
- Department of Molecular Neurobiology, Graduate School of Life Sciences, Tohoku University, Sendai 980-8575, Japan
| | | | | | | |
Collapse
|
44
|
Wheeler D, Sneddon WB. Mutation of phenylalanine-34 of parathyroid hormone disrupts NHERF1 regulation of PTH type I receptor signaling. Endocrine 2006; 30:343-52. [PMID: 17526947 DOI: 10.1007/s12020-006-0013-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/16/2006] [Revised: 12/11/2006] [Accepted: 12/21/2006] [Indexed: 11/27/2022]
Abstract
Internalization of the PTH type I receptor (PTH1R) is regulated in a cell- and ligand-specific manner. We previously demonstrated that the sodium/proton exchanger regulatory factor type 1 (NHERF1; EBP50) is pivotal in determining the range of peptides that internalize the PTH1R. Antagonist PTH fragments can internalize the PTH1R in some kidney and bone cell models. PTH(7-34), which binds to, but does not activate, the PTH1R, internalizes the PTH1R in kidney distal tubule (DT) cells, where NHERF1 is not expressed. The effect of antagonist PTHrP peptides has not, to this point, been assessed. PTH1R internalization was measured by real-time confocal fluorescence microscopy of DT cells stably expressing 105 EGFP-tagged PTH1R/cell. PTHrP(7-34) internalized the PTH1R in a manner indistinguishable from PTH(7-34). Introduction of NHERF1 into DT cells, however, blocked PTH(7-34)-, but not PTHrP(7-34)-, induced PTH1R internalization. To delineate the sequences within PTHrP that determine whether PTH1R internalization is affected by NHERF1, chimeric PTH/PTHrP fragments were tested for their ability to induce PTH1R internalization. PTH(7-21)/PTHrP (22-34), PTH(7-32)/PTHrP(33-34), and PTH(7-33)/PTHrP(34) at 1 microM each internalized the PTH1R 50-70% in a NHERF1-independent manner. When the C terminus of PTHrP was replaced with homologous amino acids from PTH, NHERF1 inhibited PTH1R internalization. It was determined that simply mutating F34 to A in PTH induced PTH1R internalization in a NHERF1-independent manner. None of the chimeric peptides activated the PTH1R but all effectively competed for 1 nM PTH(1-34) in cyclic AMP assays. In addition, all chimeric peptides competed for radiolabeled PTH(1-34) in binding assays in DT cells. PTH(1- 34) and PTHrP(7-34), but not PTH(7-34), efficiently recruited beta-arrestin1 to plasma membrane PTH1Rs. We, therefore, conclude that PTH(1-34) and PTHrP(7-34) induce a conformational change in the PTH1R that promotes arrestin binding and dissociates NHERF1 from PTH1R internalization.
Collapse
Affiliation(s)
- David Wheeler
- Department of Pharmacology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | | |
Collapse
|
45
|
Raviscioni M, He Q, Salicru EM, Smith CL, Lichtarge O. Evolutionary identification of a subtype specific functional site in the ligand binding domain of steroid receptors. Proteins 2006; 64:1046-57. [PMID: 16835908 DOI: 10.1002/prot.21074] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Nuclear receptors are ubiquitous eukaryotic ligand-activated transcription factors that modulate gene expression through varied interactions. However, the highly conserved functional sites known today seem insufficient to explain receptor specific recruitment of different coactivator and corepressor proteins and regulation of transcription. To search for new receptor-subtype specific functional sites, we applied difference evolutionary trace (difference ET) analysis to the ligand binding domain of steroid receptors, a subgroup of the nuclear receptor (NR) family. This computational approach identified a new functional site located on a surface opposite to currently known protein-protein interaction sites and distinct from the ligand binding pocket. Strikingly, the literature shows that in vivo variations at residues in the new site are linked to androgen resistance and leukemia, and our own targeted mutations to this site lower but do not eradicate transcriptional activation by estrogen receptor alpha (ERalpha), with reduced ligand binding affinity and SRC-1 interaction. Thus, these data demonstrate that this evolutionary important surface can function as an allosteric site that modulates some but not all receptor binding interactions. Evolutionary analysis further shows that this allosteric regulatory site is shared among all NRs from groups 2 (HNF4-like) and 4 (NGFIB-like), suggesting a role among many nuclear receptors. Its concave structure, hydrophobic composition, and residue variability among nuclear receptors further suggest that it would be amenable for specific drug design. This highlights the power of evolutionary information for the identification of new functional sites even in a protein family as well studied as NRs.
Collapse
Affiliation(s)
- Michele Raviscioni
- W. M. Keck Center for Computational and Structural Biology, Baylor College of Medicine, Houston Texas 77030, USA
| | | | | | | | | |
Collapse
|
46
|
Ye K, Lameijer EWM, Beukers MW, Ijzerman AP. A two-entropies analysis to identify functional positions in the transmembrane region of class A G protein-coupled receptors. Proteins 2006; 63:1018-30. [PMID: 16532452 DOI: 10.1002/prot.20899] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Residues in the transmembrane region of G protein-coupled receptors (GPCRs) are important for ligand binding and activation, but the function of individual positions is poorly understood. Using a sequence alignment of class A GPCRs (grouped in subfamilies), we propose a so-called "two-entropies analysis" to determine the potential role of individual positions in the transmembrane region of class A GPCRs. In our approach, such positions appear scattered, while largely clustered according to their biological function. Our method appears superior when compared to other bioinformatics approaches, such as the evolutionary trace method, entropy-variability plot, and correlated mutation analysis, both qualitatively and quantitatively.
Collapse
Affiliation(s)
- Kai Ye
- Division of Medicinal Chemistry, Leiden/Amsterdam Center for Drug Research, Leiden University, Leiden, The Netherlands
| | | | | | | |
Collapse
|
47
|
Cox DL, Pan J, Singh RRP. A mechanism for copper inhibition of infectious prion conversion. Biophys J 2006; 91:L11-3. [PMID: 16698781 PMCID: PMC1483082 DOI: 10.1529/biophysj.106.083642] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2006] [Accepted: 04/17/2006] [Indexed: 11/18/2022] Open
Abstract
We employ ab initio electronic structure calculations to obtain two structural models for copper bound in the strongest binding site of the noninfectious form of the prion protein. The models are compatible with available experimental constraints from electron spin resonance data. The bending of the peptide backbone attendant with the copper binding is not compatible with the requisite straight beta-strand backbone structure for the same sequence contained in two recently proposed models of the prion protein structure in its infectious form. We hypothesize that copper binding at this site is protective against conversion to the infectious form, discuss experimental data that appear to support and conflict with our hypothesis, and propose tests using recombinant prion protein, genetically modified cultured neurons, and transgenic mice.
Collapse
|
48
|
Cheng BYM, Carbonell JG, Klein-Seetharaman J. Protein classification based on text document classification techniques. Proteins 2006; 58:955-70. [PMID: 15645499 DOI: 10.1002/prot.20373] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively.
Collapse
Affiliation(s)
- Betty Yee Man Cheng
- Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania, USA
| | | | | |
Collapse
|
49
|
Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J. Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform. Amino Acids 2006; 30:397-402. [PMID: 16773242 DOI: 10.1007/s00726-006-0332-z] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2005] [Accepted: 01/04/2006] [Indexed: 10/24/2022]
Abstract
As the potential drug targets, G-protein coupled receptors (GPCRs) and nuclear receptors (NRs) are the focuses in pharmaceutical research. It is of great practical significance to develop an automated and reliable method to facilitate the identification of novel receptors. In this study, a method of fast Fourier transform-based support vector machine was proposed to classify GPCRs and NRs from the hydrophobicity of proteins. The models for all the GPCR families and NR subfamilies were trained and validated using jackknife test and the results thus obtained are quite promising. Meanwhile, the performance of the method was evaluated on GPCR and NR independent datasets with good performance. The good results indicate the applicability of the method. Two web servers implementing the prediction are available at http://chem.scu.edu.cn/blast/Pred-GPCR and http://chem.scu.edu.cn/blast/Pred-NR.
Collapse
Affiliation(s)
- Y-Z Guo
- College of Chemistry, Sichuan University, Chengdu, China
| | | | | | | | | | | | | |
Collapse
|
50
|
Rezmann-Vitti LA, Nero TL, Jackman GP, Machida CA, Duke BJ, Louis WJ, Louis SNS. Role of Tyr356(7.43) and Ser190(4.57) in Antagonist Binding in the Rat β1-Adrenergic Receptor. J Med Chem 2006; 49:3467-77. [PMID: 16759089 DOI: 10.1021/jm050624l] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Site-directed mutagenesis and photoaffinity labeling experiments suggest the existence of at least two distinct binding orientations for aryloxypropanolamine competitive antagonists in the beta-adrenergic receptor (beta-AR), one where the aryloxy moiety is located near transmembrane alpha-helix 7 (tm 7) and another where it is near tm 5. To explore a hydrophobic pocket involving tms 1, 2, 3, and 7 for potential aryloxy interaction sites, we selected Tyr(356(7.43)) and Trp(134(3.28)) in the rat beta(1)-AR for site-directed mutagenesis studies. Ser(190(4.57)) was also investigated, as the equivalent residues are known antagonist interaction sites in the muscarinic M(1) and the dopamine D(2) receptors. Binding affinities (pK(i)) of a series of structurally diverse aryloxypropanolamine competitive antagonists were determined for wild type and Y356A, Y356F, W134A, and S190A mutant rat beta(1)-ARs stably expressed in Chinese hamster ovary cells. To visualize possible antagonist/receptor interactions, the compounds were docked into a three-dimensional model of the wild-type rat beta(1)-AR. The results indicate that Tyr(356(7.43)) is an important aromatic interaction site for five of the eight competitive antagonists studied, whereas none of the compounds appeared to interact directly with Trp(134(3.28)). Only two of the competitive antagonists interacted with Ser(190(4.57)) on tm 4. Overall, the results extend our understanding of how beta(1)-AR competitive antagonists bind to the hydrophobic pocket involving tms 1, 2, 3, and 7; highlight the importance of Tyr(356(7.43)) in this binding pocket; and demonstrate the involvement of tm 4 in competitive antagonist binding.
Collapse
Affiliation(s)
- Linda A Rezmann-Vitti
- Department of Medicine, Austin Health, Clinical Pharmacology and Therapeutics Unit, The University of Melbourne, Heidelberg, 3084, Victoria, Australia
| | | | | | | | | | | | | |
Collapse
|