1
|
Rudenko V, Korotkov E. Search for Highly Divergent Tandem Repeats in Amino Acid Sequences. Int J Mol Sci 2021; 22:ijms22137096. [PMID: 34281150 PMCID: PMC8269118 DOI: 10.3390/ijms22137096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 06/25/2021] [Accepted: 06/28/2021] [Indexed: 11/29/2022] Open
Abstract
We report a Method to Search for Highly Divergent Tandem Repeats (MSHDTR) in protein sequences which considers pairwise correlations between adjacent residues. MSHDTR was compared with some previously developed methods for searching for tandem repeats (TRs) in amino acid sequences, such as T-REKS and XSTREAM, which focus on the identification of TRs with significant sequence similarity, whereas MSHDTR detects repeats that significantly diverged during evolution, accumulating deletions, insertions, and substitutions. The application of MSHDTR to a search of the Swiss-Prot databank revealed over 15 thousand TR-containing amino acid sequences that were difficult to find using the other methods. Among the detected TRs, the most representative were those with consensus lengths of two and seven residues; these TRs were subjected to cluster analysis and the classes of patterns were identified. All TRs detected in this study have been combined into a databank accessible over the WWW.
Collapse
Affiliation(s)
- Valentina Rudenko
- Center of Bioengineering Research Center of Biotechnology RAS, 119071 Moscow, Russia;
- Correspondence: ; Tel.: +7-926-7248271
| | - Eugene Korotkov
- Center of Bioengineering Research Center of Biotechnology RAS, 119071 Moscow, Russia;
- Moscow Engineering Physics Institute, National Research Nuclear University MEPhI, 115409 Moscow, Russia
| |
Collapse
|
2
|
Paladin L, Bevilacqua M, Errigo S, Piovesan D, Mičetić I, Necci M, Monzon AM, Fabre ML, Lopez JL, Nilsson JF, Rios J, Menna PL, Cabrera M, Buitron MG, Kulik MG, Fernandez-Alberti S, Fornasari MS, Parisi G, Lagares A, Hirsh L, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures. Nucleic Acids Res 2021; 49:D452-D457. [PMID: 33237313 PMCID: PMC7778985 DOI: 10.1093/nar/gkaa1097] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/17/2020] [Accepted: 11/19/2020] [Indexed: 11/21/2022] Open
Abstract
The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.
Collapse
Affiliation(s)
- Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Sara Errigo
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Ivan Mičetić
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | | | - Maria Laura Fabre
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Jose Luis Lopez
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Juliet F Nilsson
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Javier Rios
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Pablo Lorenzano Menna
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Maia Cabrera
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Martin Gonzalez Buitron
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Mariane Gonçalves Kulik
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Sebastian Fernandez-Alberti
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Maria Silvina Fornasari
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Antonio Lagares
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Univ. Montpellier, Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| |
Collapse
|
3
|
Tulub AA, Stefanov VE. Hidden symmetries of DNA molecule. J Theor Biol 2017; 416:144-148. [PMID: 28077290 DOI: 10.1016/j.jtbi.2017.01.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Revised: 01/02/2017] [Accepted: 01/03/2017] [Indexed: 11/29/2022]
Abstract
Despite the fact that DNA molecule is studied up and down, we know very little about the role of DNA triplets in coding amino acids and stop-codons. The paper aims to fill this gap through attracting spintronic ideas and carrying out QM/MM computations on a full-turn DNA fragment. The computations reveal two hidden symmetries: the spin splitting (the Rashba effect), confined within each triplet, and the quantum "phase" link between the triplet nature (in total, 64 triplets) and the corresponding amino acid and three stop-codons. The hidden symmetries become evident upon binding the magnesium cofactor to DNA triplets in 5'-3' and 3'-5' directions.
Collapse
Affiliation(s)
- Alexander A Tulub
- Centre for Interdisciplinary Computational and Dynamical Analysis, University of Manchester, Oxford Road, Manchester M13 9PL, UK; Saint-Petersburg State University, Universitetskaya Emb. 7/9, 199034, Saint-Petersburg, RF, Russia.
| | - Vassily E Stefanov
- Saint-Petersburg State University, Universitetskaya Emb. 7/9, 199034, Saint-Petersburg, RF, Russia
| |
Collapse
|
4
|
Al Bataineh M, Al-qudah Z. A novel gene identification algorithm with Bayesian classification. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2016.07.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4628592. [PMID: 27747230 PMCID: PMC5056246 DOI: 10.1155/2016/4628592] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 08/25/2016] [Indexed: 11/24/2022]
Abstract
Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10–15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors.
Collapse
|
6
|
Yin C, Wang J. Periodic power spectrum with applications in detection of latent periodicities in DNA sequences. J Math Biol 2016; 73:1053-1079. [PMID: 26942584 DOI: 10.1007/s00285-016-0982-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 02/19/2016] [Indexed: 12/27/2022]
Abstract
Periodic elements play important roles in genomic structures and functions, yet some complex periodic elements in genomes are difficult to detect by conventional methods such as digital signal processing and statistical analysis. We propose a periodic power spectrum (PPS) method for analyzing periodicities of DNA sequences. The PPS method employs periodic nucleotide distributions of DNA sequences and directly calculates power spectra at specific periodicities. The magnitude of a PPS reflects the strength of a signal on periodic positions. In comparison with Fourier transform, the PPS method avoids spectral leakage, and reduces background noise that appears high in Fourier power spectrum. Thus, the PPS method can effectively capture hidden periodicities in DNA sequences. Using a sliding window approach, the PPS method can precisely locate periodic regions in DNA sequences. We apply the PPS method for detection of hidden periodicities in different genome elements, including exons, microsatellite DNA sequences, and whole genomes. The results show that the PPS method can minimize the impact of spectral leakage and thus capture true hidden periodicities in genomes. In addition, performance tests indicate that the PPS method is more effective and efficient than a fast Fourier transform. The computational complexity of the PPS algorithm is [Formula: see text]. Therefore, the PPS method may have a broad range of applications in genomic analysis. The MATLAB programs for implementing the PPS method are available from MATLAB Central ( http://www.mathworks.com/matlabcentral/fileexchange/55298 ).
Collapse
Affiliation(s)
- Changchuan Yin
- Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, Chicago, IL, 60607-7045, USA.
| | - Jiasong Wang
- Department of Mathematics, Nanjing University, Nanjing, Jiangsu, 210093, China
| |
Collapse
|
7
|
Pellegrini M. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role. Front Bioeng Biotechnol 2015; 3:143. [PMID: 26442257 PMCID: PMC4585158 DOI: 10.3389/fbioe.2015.00143] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 09/07/2015] [Indexed: 12/30/2022] Open
Abstract
Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
Collapse
Affiliation(s)
- Marco Pellegrini
- Laboratory for Integrative Systems Medicine (LISM), Istituto di Informatica e Telematica, and Istituto di Fisiologia Clinica, Consiglio Nazionale delle Ricerche , Pisa , Italy
| |
Collapse
|
8
|
Do Viet P, Roche DB, Kajava AV. TAPO: A combined method for the identification of tandem repeats in protein structures. FEBS Lett 2015; 589:2611-9. [PMID: 26320412 DOI: 10.1016/j.febslet.2015.08.025] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Revised: 08/10/2015] [Accepted: 08/13/2015] [Indexed: 10/23/2022]
Abstract
In recent years, there has been an emergence of new 3D structures of proteins containing tandem repeats (TRs), as a result of improved expression and crystallization strategies. Databases focused on structure classifications (PDB, SCOP, CATH) do not provide an easy solution for selection of these structures from PDB. Several approaches have been developed, but no best approach exists to identify the whole range of 3D TRs. Here we describe the TAndem PrOtein detector (TAPO) that uses periodicities of atomic coordinates and other types of structural representation, including strings generated by conformational alphabets, residue contact maps, and arrangements of vectors of secondary structure elements. The benchmarking shows the superior performance of TAPO over the existing programs. In accordance with our analysis of PDB using TAPO, 19% of proteins contain 3D TRs. This analysis allowed us to identify new families of 3D TRs, suggesting that TAPO can be used to regularly update the collection and classification of existing repetitive structures.
Collapse
Affiliation(s)
- Phuong Do Viet
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France
| | - Daniel B Roche
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France
| | - Andrey V Kajava
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France.
| |
Collapse
|
9
|
Arango-Argoty GA, Jaramillo-Garzón JA, Castellanos-Domínguez G. Feature extraction by statistical contact potentials and wavelet transform for predicting subcellular localizations in gram negative bacterial proteins. J Theor Biol 2015; 364:121-30. [PMID: 25219623 DOI: 10.1016/j.jtbi.2014.08.051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 08/27/2014] [Accepted: 08/28/2014] [Indexed: 11/16/2022]
Abstract
Predicting the localization of a protein has become a useful practice for inferring its function. Most of the reported methods to predict subcellular localizations in Gram-negative bacterial proteins make use of standard protein representations that generally do not take into account the distribution of the amino acids and the structural information of the proteins. Here, we propose a protein representation based on the structural information contained in the pairwise statistical contact potentials. The wavelet transform decodes the information contained in the primary structure of the proteins, allowing the identification of patterns along the proteins, which are used to characterize the subcellular localizations. Then, a support vector machine classifier is trained to categorize them. Cellular compartments like periplasm and extracellular medium are difficult to predict, having a high false negative rate. The wavelet-based method achieves an overall high performance while maintaining a low false negative rate, particularly, on "periplasm" and "extracellular medium". Our results suggest the proposed protein characterization is a useful alternative to representing and predicting protein sequences over the classical and cutting edge protein depictions.
Collapse
Affiliation(s)
- G A Arango-Argoty
- Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia; Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, 3501 Fifth Ave, Pittsburgh, PA 15260, USA.
| | - J A Jaramillo-Garzón
- Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia; Research Center of the Instituto Tecnologico Metropolitano, Calle 73 No 76A-354, Medellín, Colombia
| | - G Castellanos-Domínguez
- Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia
| |
Collapse
|
10
|
Messaoudi I, Oueslati AE, Lachiri Z. Wavelet analysis of frequency chaos game signal: a time-frequency signature of the C. elegans DNA. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2014; 2014:16. [PMID: 28194166 PMCID: PMC5270495 DOI: 10.1186/s13637-014-0016-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 08/26/2014] [Indexed: 11/10/2022]
Abstract
Challenging tasks are encountered in the field of bioinformatics. The choice of the genomic sequence’s mapping technique is one the most fastidious tasks. It shows that a judicious choice would serve in examining periodic patterns distribution that concord with the underlying structure of genomes. Despite that, searching for a coding technique that can highlight all the information contained in the DNA has not yet attracted the attention it deserves. In this paper, we propose a new mapping technique based on the chaos game theory that we call the frequency chaos game signal (FCGS). The particularity of the FCGS coding resides in exploiting the statistical properties of the genomic sequence itself. This may reflect important structural and organizational features of DNA. To prove the usefulness of the FCGS approach in the detection of different local periodic patterns, we use the wavelet analysis because it provides access to information that can be obscured by other time-frequency methods such as the Fourier analysis. Thus, we apply the continuous wavelet transform (CWT) with the complex Morlet wavelet as a mother wavelet function. Scalograms that relate to the organism Caenorhabditis elegans (C. elegans) exhibit a multitude of periodic organization of specific DNA sequences.
Collapse
Affiliation(s)
- Imen Messaoudi
- Ecole Nationale d'Ingénieurs de Tunis, LR Signal, Images et Technologies de l'Information, Université de Tunis El Manar, BP 37, le Belvédère, Tunis, 1002 Tunisia
| | - Afef Elloumi Oueslati
- Ecole Nationale d'Ingénieurs de Tunis, LR Signal, Images et Technologies de l'Information, Université de Tunis El Manar, BP 37, le Belvédère, Tunis, 1002 Tunisia
| | - Zied Lachiri
- Ecole Nationale d'Ingénieurs de Tunis, LR Signal, Images et Technologies de l'Information, Université de Tunis El Manar, BP 37, le Belvédère, Tunis, 1002 Tunisia.,Département de Génie Physique et Instrumentation, INSAT, Centre Urbain Cedex, BP 676, Tunis, 1080 Tunisia
| |
Collapse
|
11
|
Rueda M, Orozco M, Totrov M, Abagyan R. BioSuper: a web tool for the superimposition of biomolecules and assemblies with rotational symmetry. BMC STRUCTURAL BIOLOGY 2013; 13:32. [PMID: 24330655 PMCID: PMC3924234 DOI: 10.1186/1472-6807-13-32] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 12/03/2013] [Indexed: 12/02/2022]
Abstract
Background Most of the proteins in the Protein Data Bank (PDB) are oligomeric complexes consisting of two or more subunits that associate by rotational or helical symmetries. Despite the myriad of superimposition tools in the literature, we could not find any able to account for rotational symmetry and display the graphical results in the web browser. Results BioSuper is a free web server that superimposes and calculates the root mean square deviation (RMSD) of protein complexes displaying rotational symmetry. To the best of our knowledge, BioSuper is the first tool of its kind that provides immediate interactive visualization of the graphical results in the browser, biomolecule generator capabilities, different levels of atom selection, sequence-dependent and structure-based superimposition types, and is the only web tool that takes into account the equivalence of atoms in side chains displaying symmetry ambiguity. BioSuper uses ICM program functionality as a core for the superimpositions and displays the results as text, HTML tables and 3D interactive molecular objects that can be visualized in the browser or in Android and iOS platforms with a free plugin. Conclusions BioSuper is a fast and functional tool that allows for pairwise superimposition of proteins and assemblies displaying rotational symmetry. The web server was created after our own frustration when attempting to superimpose flexible oligomers. We strongly believe that its user-friendly and functional design will be of great interest for structural and computational biologists who need to superimpose oligomeric proteins (or any protein). BioSuper web server is freely available to all users at http://ablab.ucsd.edu/BioSuper.
Collapse
Affiliation(s)
| | | | | | - Ruben Abagyan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| |
Collapse
|
12
|
Meng T, Soliman AT, Shyu ML, Yang Y, Chen SC, Iyengar SS, Yordy JS, Iyengar P. Wavelet analysis in current cancer genome research: a survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1442-1459. [PMID: 24407303 DOI: 10.1109/tcbb.2013.134] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
With the rapid development of next generation sequencing technology, the amount of biological sequence data of the cancer genome increases exponentially, which calls for efficient and effective algorithms that may identify patterns hidden underneath the raw data that may distinguish cancer Achilles' heels. From a signal processing point of view, biological units of information, including DNA and protein sequences, have been viewed as one-dimensional signals. Therefore, researchers have been applying signal processing techniques to mine the potentially significant patterns within these sequences. More specifically, in recent years, wavelet transforms have become an important mathematical analysis tool, with a wide and ever increasing range of applications. The versatility of wavelet analytic techniques has forged new interdisciplinary bounds by offering common solutions to apparently diverse problems and providing a new unifying perspective on problems of cancer genome research. In this paper, we provide a survey of how wavelet analysis has been applied to cancer bioinformatics questions. Specifically, we discuss several approaches of representing the biological sequence data numerically and methods of using wavelet analysis on the numerical sequences.
Collapse
Affiliation(s)
- Tao Meng
- University of Miami, Coral Gables
| | | | | | | | | | | | - John S Yordy
- University of Texas Southwestern Medical Center, Dallas
| | | |
Collapse
|
13
|
Rackovsky S. Sequence determinants of protein architecture. Proteins 2013; 81:1681-5. [PMID: 23720385 DOI: 10.1002/prot.24328] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Revised: 04/28/2013] [Accepted: 05/09/2013] [Indexed: 11/07/2022]
Abstract
Delineation of the relationship between sequence and structure in proteins has proven elusive. Most studies of this problem use alignment methods and other approaches based on the characteristics of individual residues. It is demonstrated herein that the sequence-structure relationship is determined in significant part by global characteristics of sequence organization. Information encoded in complete sequences is required to distinguish proteins in different architectural groups. It is found that the statistically significant differences between sequences encoding different architectures are encoded in a surprisingly small set of low-wave-number sequence periodicities. It would therefore appear that unexpected simplicity in an appropriately defined Fourier space may be an inherent characteristic of the sequences of folded proteins.
Collapse
Affiliation(s)
- S Rackovsky
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York, 14853
| |
Collapse
|
14
|
|
15
|
Song NY, Hong Yan. Autoregressive and Iterative Hidden Markov Models for Periodicity Detection and Solenoid Structure Recognition in Protein Sequences. IEEE J Biomed Health Inform 2013; 17:436-41. [DOI: 10.1109/jbhi.2012.2235852] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
16
|
Arango-Argoty GA, Jaramillo-Garzón JA, Castellanos-Domínguez CG. Contact potentials via wavelet transform for prediction of subcellular localizations in gram negative bacterial proteins. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2013:643-646. [PMID: 24109769 DOI: 10.1109/embc.2013.6609582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Predicting the localization of a protein has become a useful practice for inferring its function. Most of the reported methods to predict subcellular localizations in Gram-negative bacterial proteins have shown a low false positive rate. However, some subcellular compartmens like "periplasm" and "extracellular medium" are difficult to predict and remain high false negative rates. In this paper, a method based on representation from statistical contact potentials and wavelet transform is presented. The wavelet-based method achieves an overall high performance holding low false and negative rates particularly on periplasm and extracellular medium. Results suggest the contact potentials as an useful alternative to characterize protein sequences.
Collapse
|
17
|
Walsh I, Sirocco FG, Minervini G, Di Domenico T, Ferrari C, Tosatto SCE. RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures. ACTA ACUST UNITED AC 2012; 28:3257-64. [PMID: 22962341 DOI: 10.1093/bioinformatics/bts550] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Repeat proteins form a distinct class of structures where folding is greatly simplified. Several classes have been defined, with solenoid repeats of periodicity between ca. 5 and 40 being the most challenging to detect. Such proteins evolve quickly and their periodicity may be rapidly hidden at sequence level. From a structural point of view, finding solenoids may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. RESULTS Here we introduce RAPHAEL, a novel method for the detection of solenoids in protein structures. It reliably solves three problems of increasing difficulty: (1) recognition of solenoid domains, (2) determination of their periodicity and (3) assignment of insertions. RAPHAEL uses a geometric approach mimicking manual classification, producing several numeric parameters that are optimized for maximum performance. The resulting method is very accurate, with 89.5% of solenoid proteins and 97.2% of non-solenoid proteins correctly classified. RAPHAEL periodicities have a Spearman correlation coefficient of 0.877 against the manually established ones. A baseline algorithm for insertion detection in identified solenoids has a Q(2) value of 79.8%, suggesting room for further improvement. RAPHAEL finds 1931 highly confident repeat structures not previously annotated as solenoids in the Protein Data Bank records.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biology, University of Padua, Viale G. Colombo 3, 35131 Padova, Italy
| | | | | | | | | | | |
Collapse
|
18
|
Nounou MN, Nounou HN, Meskin N, Datta A, Dougherty ER. Multiscale denoising of biological data: a comparative analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1539-1544. [PMID: 22566476 DOI: 10.1109/tcbb.2012.67] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Measured microarray genomic and metabolic data are a rich source of information about the biological systems they represent. For example, time-series biological data can be used to construct dynamic genetic regulatory network models, which can be used to design intervention strategies to cure or manage major diseases. Also, copy number data can be used to determine the locations and extent of aberrations in chromosome sequences. Unfortunately, measured biological data are usually contaminated with errors that mask the important features in the data. Therefore, these noisy measurements need to be filtered to enhance their usefulness in practice. Wavelet-based multiscale filtering has been shown to be a powerful denoising tool. In this work, different batch as well as online multiscale filtering techniques are used to denoise biological data contaminated with white or colored noise. The performances of these techniques are demonstrated and compared to those of some conventional low-pass filters using two case studies. The first case study uses simulated dynamic metabolic data, while the second case study uses real copy number data. Simulation results show that significant improvement can be achieved using multiscale filtering over conventional filtering techniques.
Collapse
Affiliation(s)
- M N Nounou
- Chemical Engineering Program, Texas A&M University at Qatar, Doha, Qatar.
| | | | | | | | | |
Collapse
|
19
|
Arango-Argoty GA, Jaramillo-Garzón JA, Röthlisberger S, Castellanos-Dominguez CG. Prediction of protein subcellular localization based on variable-length motifs detection and dissimilarity based classification. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2011:945-8. [PMID: 22254467 DOI: 10.1109/iembs.2011.6090213] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Predict the function of unknown proteins is one of the principal goals in computational biology. The subcellular localization of a protein allows further understanding its structure and molecular function. Numerous prediction techniques have been developed, usually focusing on global information of the protein. But, predictions can be done through the identification of functional sub-sequence patterns known as motifs. For motifs discovery problem, many methods requires a predefined fixed window size in advance and aligned sequences. To confront these problems we proposed a method based on variable length motifs characterization and detection using the continuous wavelet transform (CWT) and a dissimilarity space representation. For analyzing the motifs results generated by our approach, we divide the entire dataset into training (60%) and validation (40%). A Support Vector Machine (SVM) classifier is used as predictor for validation set. The highest Sn = 82.58% and Sp = 92.86%, across 10-fold cross validation, is obtained for endosome proteins. Average results Sn = 74% and Sp = 75.58% are comparable to current state of the art. For data sets whose identity is low (< 40%), the motifs characterization and localization based on CWT shows a good performance and the interpretability of the subsequences in each subcellular localization.
Collapse
Affiliation(s)
- G A Arango-Argoty
- Signal Processing and Recognition Group, Universidad Nacionalde Colombia, Campus La Nubia, Magdalena, Colombia.
| | | | | | | |
Collapse
|
20
|
Abstract
The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.
Collapse
|
21
|
On the existence of wavelet symmetries in archaea DNA. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2011; 2012:673934. [PMID: 22481976 PMCID: PMC3310297 DOI: 10.1155/2012/673934] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Revised: 10/27/2011] [Accepted: 10/29/2011] [Indexed: 11/19/2022]
Abstract
This paper deals with the complex unit roots representation
of archea DNA sequences and the analysis of symmetries in
the wavelet coefficients of the digitalized sequence. It is shown that
even for extremophile archaea, the distribution of nucleotides
has to fulfill some (mathematical) constraints in such a way that the
wavelet coefficients are symmetrically distributed, with respect to the
nucleotides distribution.
Collapse
|
22
|
Chua GH, Krishnan A, Li KB, Tomita M. MULTIRESOLUTION ANALYSIS UNCOVERS HIDDEN CONSERVATION OF PROPERTIES IN STRUCTURALLY AND FUNCTIONALLY SIMILAR PROTEINS. J Bioinform Comput Biol 2011; 4:1245-67. [PMID: 17245813 DOI: 10.1142/s0219720006002442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2006] [Revised: 09/13/2006] [Accepted: 09/13/2006] [Indexed: 11/18/2022]
Abstract
Physicochemcial properties of amino acids are important factors in determining protein structure and function. Most approaches make use of averaged properties over entire domains or even proteins to analyze their structure or function. This level of coarseness tends to hide the richness of the variability in the different properties across functional domains. This paper studies the conservation of physicochemical properties in a functionally similar family of proteins using a novel wavelet-based technique known as multiresolution analysis. Such an analysis can help uncover characteristics that can otherwise remain hidden. We have studied the protein kinase family of sequences and our findings are as follows: (a) a number of different properties are conserved over the functional catalytic domain irrespective of the sequence identities; (b) conservation of properties can be observed at different frequency levels and they agree well with the known structural/functional properties of the subdomains for the protein kinase family; (c) structural differences between the different kinase family members are reflected in the waveforms; and (d) functionally important mutations show distortions in the waveforms of conserved properties. The potential usefulness of the above findings in identifying functionally similar sequences in the twilight and midnight zones is demonstrated through a simple prediction model for the protein kinase family which achieved a recall of 93.7% and a precision of 96.75% in cross-validation tests.
Collapse
Affiliation(s)
- Gek-Huey Chua
- Bioinformatics Institute, 30, Biopolis Street, #07-01, Matrix, Singapore
| | | | | | | |
Collapse
|
23
|
CAGO: a software tool for dynamic visual comparison and correlation measurement of genome organization. PLoS One 2011; 6:e27080. [PMID: 22114666 PMCID: PMC3219657 DOI: 10.1371/journal.pone.0027080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 10/10/2011] [Indexed: 11/26/2022] Open
Abstract
CAGO (Comparative Analysis of Genome Organization) is developed to address two critical shortcomings of conventional genome atlas plotters: lack of dynamic exploratory functions and absence of signal analysis for genomic properties. With dynamic exploratory functions, users can directly manipulate chromosome tracks of a genome atlas and intuitively identify distinct genomic signals by visual comparison. Signal analysis of genomic properties can further detect inconspicuous patterns from noisy genomic properties and calculate correlations between genomic properties across various genomes. To implement dynamic exploratory functions, CAGO presents each genome atlas in Scalable Vector Graphics (SVG) format and allows users to interact with it using a SVG viewer through JavaScript. Signal analysis functions are implemented using R statistical software and a discrete wavelet transformation package waveslim. CAGO is not only a plotter for generating complex genome atlases, but also a platform for exploring genome atlases with dynamic exploratory functions for visual comparison and with signal analysis for comparing genomic properties across multiple organisms. The web-based application of CAGO, its source code, user guides, video demos, and live examples are publicly available and can be accessed at http://cbs.ym.edu.tw/cago.
Collapse
|
24
|
Rackovsky S. Spectral analysis of a protein conformational switch. PHYSICAL REVIEW LETTERS 2011; 106:248101. [PMID: 21770602 DOI: 10.1103/physrevlett.106.248101] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2011] [Indexed: 05/31/2023]
Abstract
The existence of conformational switching in proteins, induced by single amino acid mutations, presents an important challenge to our understanding of the physics of protein folding. Sequence-local methods, commonly used to detect structural homology, are incapable of accounting for this phenomenon. We examine a set of proteins, derived from the G(A) and G(B) domains of Streptococcus protein G, which are known to show a dramatic conformational change as a result of single-residue replacement. It is shown that these sequences, which are almost identical locally, can have very different global patterns of physical properties. These differences are consistent with the observed complete change in conformation. These results suggest that sequence-local methods for identifying structural homology can be misleading. They point to the importance of global sequence analysis in understanding sequence-structure relationships.
Collapse
Affiliation(s)
- S Rackovsky
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine of NYU, New York, New York 10029, USA.
| |
Collapse
|
25
|
Vo A, Nguyen N, Huang H. Solenoid and non-solenoid protein recognition using stationary wavelet packet transform. Bioinformatics 2010; 26:i467-73. [PMID: 20823309 PMCID: PMC2935422 DOI: 10.1093/bioinformatics/btq371] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation: Solenoid proteins are emerging as a protein class with properties intermediate between structured and intrinsically unstructured proteins. Containing repeating structural units, solenoid proteins are expected to share sequence similarities. However, in many cases, the sequence similarities are weak and non-detectable. Moreover, solenoids can be degenerated and widely vary in the number of units. So that it is difficult to detect them. Recently, several solenoid repeats detection methods have been proposed, such as self-alignment of the sequence, spectral analysis and discrete Fourier transform of sequence. Although these methods have shown good performance on certain data sets, they often fail to detect repeats with weak similarities. In this article, we propose a new approach to recognize solenoid repeats and non-solenoid proteins using stationary wavelet packet transform (SWPT). Our method associates with three advantages: (i) naturally representing five main factors of protein structure and properties by wavelet analysis technique; (ii) extracting novel wavelet features that can capture hidden components from solenoid sequence similarities and distinguish them from global proteins; (iii) obtaining statistics features that capture repeating motifs of solenoid proteins. Results: Our method analyzes the characteristics of amino acid sequence in both spectral and temporal domains using SWPT. Both global and local information of proteins are captured by SWPT coefficients. We obtain and integrate wavelet-based features and statistics-based features of amino acid sequence to improve the classification task. Our proposed method is evaluated by comparing to state-of-the-art methods such as HHrepID and REPETITA. The experimental results show that our algorithm consistently outperforms them in areas under ROC curve. At the same false positive rate, the sensitivity of our WAVELET method is higher than other methods. Availability:http://www.naaan.org/anvo/Software/Software.htm Contact:anphuocnhu.vo@mavs.uta.edu
Collapse
Affiliation(s)
- An Vo
- The Feinstein Institute for Medical Research, North Shore LIJ Health System, NY, USA.
| | | | | |
Collapse
|
26
|
Detecting internally symmetric protein structures. BMC Bioinformatics 2010; 11:303. [PMID: 20525292 PMCID: PMC2894822 DOI: 10.1186/1471-2105-11-303] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Accepted: 06/03/2010] [Indexed: 11/30/2022] Open
Abstract
Background Many functional proteins have a symmetric structure. Most of these are multimeric complexes, which are made of non-symmetric monomers arranged in a symmetric manner. However, there are also a large number of proteins that have a symmetric structure in the monomeric state. These internally symmetric proteins are interesting objects from the point of view of their folding, function, and evolution. Most algorithms that detect the internally symmetric proteins depend on finding repeating units of similar structure and do not use the symmetry information. Results We describe a new method, called SymD, for detecting symmetric protein structures. The SymD procedure works by comparing the structure to its own copy after the copy is circularly permuted by all possible number of residues. The procedure is relatively insensitive to symmetry-breaking insertions and deletions and amplifies positive signals from symmetry. It finds 70% to 80% of the TIM barrel fold domains in the ASTRAL 40 domain database and 100% of the beta-propellers as symmetric. More globally, 10% to 15% of the proteins in the ASTRAL 40 domain database may be considered symmetric according to this procedure depending on the precise cutoff value used to measure the degree of perfection of the symmetry. Symmetrical proteins occur in all structural classes and can have a closed, circular structure, a cylindrical barrel-like structure, or an open, helical structure. Conclusions SymD is a sensitive procedure for detecting internally symmetric protein structures. Using this procedure, we estimate that 10% to 15% of the known protein domains may be considered symmetric. We also report an initial, overall view of the types of symmetries and symmetric folds that occur in the protein domain structure universe.
Collapse
|
27
|
Abstract
Computational studies of the relationships between protein sequence, structure, and folding have traditionally relied on purely local sequence representations. Here we show that global representations, on the basis of parameters that encode information about complete sequences, contain otherwise inaccessible information about the organization of sequences. By studying the spectral properties of these parameters, we demonstrate that amino acid physical properties fall into two distinct classes. One class is comprised of properties that favor sequentially localized interaction clusters. The other class is comprised of properties that favor globally distributed interactions. This observation provides a bridge between two classic models of protein folding-the collapse model and the nucleation model-and provides a basis for understanding how any degree of intermediacy between these two extremes can occur.
Collapse
|
28
|
Marsella L, Sirocco F, Trovato A, Seno F, Tosatto SCE. REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform. Bioinformatics 2009; 25:i289-95. [PMID: 19478001 PMCID: PMC2687986 DOI: 10.1093/bioinformatics/btp232] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation: Proteins with solenoid repeats evolve more quickly than non-repetitive ones and their periodicity may be rapidly hidden at sequence level, while still evident in structure. In order to identify these repeats, we propose here a novel method based on a metric characterizing amino-acid properties (polarity, secondary structure, molecular volume, codon diversity, electric charge) using five previously derived numerical functions. Results: The five spectra of the candidate sequences coding for structural repeats, obtained by Discrete Fourier Transform (DFT), show common features allowing determination of repeat periodicity with excellent results. Moreover it is possible to introduce a phase space parameterized by two quantities related to the Fourier spectra which allow for a clear distinction between a non-homologous set of globular proteins and proteins with solenoid repeats. The DFT method is shown to be competitive with other state of the art methods in the detection of solenoid structures, while improving its performance especially in the identification of periodicities, since it is able to recognize the actual repeat length in most cases. Moreover it highlights the relevance of local structural propensities in determining solenoid repeats. Availability: A web tool implementing the algorithm presented in the article (REPETITA) is available with additional details on the data sets at the URL: http://protein.bio.unipd.it/repetita/. Contact:silvio.tosatto@unipd.it
Collapse
|
29
|
Chromosome-specific spatial periodicities in gene expression revealed by spectral analysis. J Theor Biol 2008; 256:333-42. [PMID: 19014953 DOI: 10.1016/j.jtbi.2008.10.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2008] [Revised: 09/16/2008] [Accepted: 10/07/2008] [Indexed: 01/18/2023]
Abstract
Recent years have seen an unprecedented surge of research activity in studies of gene expression. This extensive work, however, has been almost uniformly focused on genome-wide gene expression and has largely ignored the fundamental fact that every gene has a specific chromosome location. We propose a novel method of spectral analysis for detecting hidden periodicities in gene expression signals ordered along the length of each chromosome. Using this method, we have discovered that each chromosome in rodents and humans has a unique periodic pattern of gene expression. The uncovered spatial periodicities in gene expression are tissue-specific in the sense that the largest differences in humans were observed between two normal tissues (brain and mammary gland) as well as between their tumor counterparts (glioma and breast cancer). The smallest differences resulted from the comparison of tumors (glioma and breast cancer) with their normal counterparts. All such effects do not extend to all chromosomes but are limited to only some of them. The estimated periods and amplitudes are identical for the genes located on the positive and negative DNA strands. While precise molecular mechanisms of chromosome-specific periodicities in gene expression have yet to be unraveled, their universal presence in different tissues adds another dimension to the current understanding of the genome organization.
Collapse
|
30
|
Liu H, Wang R, Lu X, Chen J, Liu X, Ding L. A new approach to the prediction of transmembrane structures. Sci Bull (Beijing) 2008; 53:1011-1014. [PMID: 32214729 PMCID: PMC7088861 DOI: 10.1007/s11434-008-0055-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2007] [Accepted: 10/24/2007] [Indexed: 11/29/2022]
Abstract
About 20%–30% of genome products have been predicted as membrane proteins, which have significant biological functions. The prediction of the amount and position for the transmembrane protein helical segments (TMHs) is the hot spot in bioinformatics. In this paper, a new approach, maximum spectrum of continuous wavelet transform (MSCWT), is proposed to predict TMHs. The predictions for eight SARS-CoV membrane proteins indicate that MSCWT has the same capacity with software TMpred. Moreover, the test on a dataset of 131 structure-known proteins with 548 TMHs shows that the prediction accuracy of MSCWT for TMHs is 91.6% and that for membrane protein is 89.3%.
Collapse
Affiliation(s)
- HongDe Liu
- College of Chemistry and Chemical Engineering, Northwest Normal University, Lanzhou, 730070 China
| | - Rui Wang
- College of Chemistry and Chemical Engineering, Northwest Normal University, Lanzhou, 730070 China
| | - XiaoQuan Lu
- College of Chemistry and Chemical Engineering, Northwest Normal University, Lanzhou, 730070 China
| | - Jing Chen
- College of Chemistry and Chemical Engineering, Northwest Normal University, Lanzhou, 730070 China
| | - Xiuhui Liu
- College of Chemistry and Chemical Engineering, Northwest Normal University, Lanzhou, 730070 China
| | - Lan Ding
- College of Life Science, Northwest Normal University, Lanzhou, 730070 China
| |
Collapse
|
31
|
Abstract
A quantitative, property-based approach to protein sequence analysis is presented, grounded in Fourier analysis and signal-processing methodologies. The resulting tools are applied to four protein structure families. We demonstrate the existence of architecture-specific, large amplitude periodicities in amino acid properties encoded in the sequences of proteins. These signals, whose statistical significance we establish, occur at well-defined wavenumbers, but are expressed in different physical properties in the various proteins which fold to a common architecture. This result explains the long-known convergence of unrelated sequences to a common fold. It is further suggested that these results provide a physical basis for the experimental observation that unrelated sequences that adopt similar architectures fold with similar rates.
Collapse
Affiliation(s)
- S Rackovsky
- Department of Pharmacology and Biological Chemistry and Center for Biomathematical Sciences, Mount Sinai School of Medicine, One Gustave L. Levy Place, New York, New York 10029, USA.
| |
Collapse
|
32
|
Cattani C, D'Auria CR. Correlations in DNA sequences. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES 2007. [DOI: 10.1080/02522667.2007.10699728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
33
|
Allen TE, Price ND, Joyce AR, Palsson BØ. Long-range periodic patterns in microbial genomes indicate significant multi-scale chromosomal organization. PLoS Comput Biol 2006; 2:e2. [PMID: 16410829 PMCID: PMC1326223 DOI: 10.1371/journal.pcbi.0020002] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2005] [Accepted: 12/07/2005] [Indexed: 01/02/2023] Open
Abstract
Genome organization can be studied through analysis of chromosome position-dependent patterns in sequence-derived parameters. A comprehensive analysis of such patterns in prokaryotic sequences and genome-scale functional data has yet to be performed. We detected spatial patterns in sequence-derived parameters for 163 chromosomes occurring in 135 bacterial and 16 archaeal organisms using wavelet analysis. Pattern strength was found to correlate with organism-specific features such as genome size, overall GC content, and the occurrence of known motility and chromosomal binding proteins. Given additional functional data for Escherichia coli, we found significant correlations among chromosome position dependent patterns in numerous properties, some of which are consistent with previously experimentally identified chromosome macrodomains. These results demonstrate that the large-scale organization of most sequenced genomes is significantly nonrandom, and, moreover, that this organization is likely linked to genome size, nucleotide composition, and information transfer processes. Constraints on genome evolution and design are thus not solely dependent upon information content, but also upon an intricate multi-parameter, multi-length-scale organization of the chromosome.
Collapse
Affiliation(s)
- Timothy E Allen
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Nathan D Price
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Andrew R Joyce
- Bioinformatics Program, University of California San Diego, La Jolla, California, United States of America
| | - Bernhard Ø Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
34
|
Selz KA, Samoylova TI, Samoylov AM, Vodyanoy VJ, Mandell AJ. Designing allosteric peptide ligands targeting a globular protein. Biopolymers 2006; 85:38-59. [PMID: 17009317 DOI: 10.1002/bip.20607] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Patented signal analytic algorithms applied to hydrophobically transformed, numerical amino acid sequences have previously been used to design short, protein-targeted, L or D retro-inverso peptides. These peptides have demonstrated allosteric and/or indirect agonist effects on a variety of G-protein and tyrosine kinase coupled membrane receptors with 30% to over 80% hit rates. Here we extend these approaches to a globular protein target. We designed eight peptide ligands targeting an ELISA antibody responsive protein, beta-galactosidase, betaGAL. Three of the eight 14mer peptides allosterically activated betaGAL with ELISA methodology. Using Bayesian statistics, this 38% hit rate would have occurred 2 x 10(-9) by chance. These peptides demonstrated binding site competitive or noncompetitive interactions, suggesting allosteric site multiplicity with respect to their betaGAL binding-mediated ELISA signal. Kinetic studies demonstrated the temperature dependence of the betaGAL peptide binding functions. Using the van't Hoff relation, we found evidence for enthalpy-entropy compensation. This relation is often found for hydrophobic interactions in aqueous media, and is consistent with the postulated hydrophobic series encoding underlying our protein-targeted, peptide design methods. It appears that our algorithmic, hydrophobic autocovariance eigenvector template approach to the design of allosteric peptides targeting membrane receptors may also be applicable to the design of peptide ligands targeting nonmembrane involved globular proteins.
Collapse
|
35
|
Wen ZN, Wang KL, Li ML, Nie FS, Yang Y. Analyzing functional similarity of protein sequences with discrete wavelet transform. Comput Biol Chem 2005; 29:220-8. [PMID: 15979042 DOI: 10.1016/j.compbiolchem.2005.04.007] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2004] [Accepted: 04/14/2005] [Indexed: 10/25/2022]
Abstract
This paper applies discrete wavelet transform (DWT) with various protein substitution models to find functional similarity of proteins with low identity. A new metric, 'S' function, based on the DWT is proposed to measure the pair-wise similarity. We also develop a segmentation technique, combined with DWT, to handle long protein sequences. The results are compared with those using the pair-wise alignment and PSI-BLAST.
Collapse
Affiliation(s)
- Zhi-ning Wen
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China
| | | | | | | | | |
Collapse
|
36
|
Murray KB, Taylor WR, Thornton JM. Toward the detection and validation of repeats in protein structure. Proteins 2005; 57:365-80. [PMID: 15340924 DOI: 10.1002/prot.20202] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We present a method called DAVROS to detect, localize, and validate repeating motifs in protein structure allowing for insertions and deletions. DAVROS uses the score matrix from a structural alignment program (SAP) to search for repeating motifs using an algorithm based on concepts from signal processing and the statistical properties of the alignments. The method was tested against a nonredundant Protein Data Bank, and each chain was assigned a score. For the top 50 chains ranked by score, 70% contain repeating motifs detected without error. These represent 14 types of fold covering alpha, beta, and alphabeta protein classes. A second data set comprising protein chains in different sequence families for triosephosphate isomerase (TIM) barrel, leucine-rich repeat (LRR), trefoil, and alpha-alpha barrel folds was used to assess the ability of DAVROS to detect all motifs within a specific fold. For the second test set, the percentage of motifs detected was highest for the LRR chains (88.7%) and least for the TIM barrels (60%). This variability results from the regularity of the LRR motif compared to the alphabeta units of the TIM barrel, which generally have many more indels. These reduce the strength of the repeat signal in the SAP matrix, making repeat detection more difficult.
Collapse
Affiliation(s)
- Kevin B Murray
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | |
Collapse
|
37
|
Abstract
Hydropathy plots or window averages over local stretches of the sequence of residue hydrophobicity have revealed patterns related to various protein tertiary structural features. This has enabled identification of regions of the sequence that are at the surface or within the interior of globular soluble proteins, regions located within the lipid bilayer of transmembrane proteins, portions of the sequence that characterize repeating motifs, as well as motifs that usefully characterize different protein structural families. This, therefore, provides one example of the generally expressed maxim that "sequence determines structure". On the other hand, a number of previous investigations have shown the rapidly varying values of residue hydrophobicity along the sequence to be distributed approximately randomly. So one might question just how much of the sequence actually determines structure. It is, therefore, of interest to extract that part of this rapidly varying distribution of residue hydrophobicity that is responsible for the longer wavelength variations that correlate with protein tertiary structural features and to determine their prevalence within the entire distribution. This is accomplished by a finite Fourier analysis of the sequence of residue hydrophobicity and of a new measure of residue distance from the protein interior. Calculations are performed on a number of globins, immunoglobulins, cuprodoxins, and papain-like structures. The spectral power of the Fourier amplitudes of the frequencies extracted, whose inverse transforms underlie the windowed values of residue hydrophobicity is shown to be a small fraction of the total power of the hydrophobicity distribution and thereby consistent with a distribution that might appear to be predominantly random. The wide range of sequence identity between proteins having the same fold, all exhibiting similar small fractions of power amplitude that correlate with the longer wavelength inside-to-outside excursions of the amino acid residues, supports the general contention that close sequence identity is an expression of a close evolutionary relationship rather than an expression of structural similarity. Practical implications of the present analysis for protein structure prediction and engineering are also described.
Collapse
Affiliation(s)
- B David Silverman
- IBM Thomas J. Watson Research Center, P. O. Box 218, Yorktown Heights, New York 10598, USA.
| |
Collapse
|
38
|
Aggarwal A, Leong SH, Lee C, Kon OL, Tan P. Wavelet Transformations of Tumor Expression Profiles Reveals a Pervasive Genome-Wide Imprinting of Aneuploidy on the Cancer Transcriptome. Cancer Res 2005. [DOI: 10.1158/0008-5472.186.65.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Aneuploidy is frequently observed in many human cancers, but its global effects on the cancer transcriptome are controversial. We did a systematic and unbiased genome-wide survey to determine the extent a tumor's abnormal karyotype (chromosomal amplifications and deletions) is detectably “imprinted” onto that tumor's gene expression profile. By using a novel methodology employing wavelet transform signal-processing algorithms to identify genomic regions of coordinated gene expression (wavelet variance scanning), we analyzed a series of gastric cancer cell lines and identified >100 genomic regions exhibiting distinct patterns of subtle but significant coordinated transcription, ranging from tens to hundreds of genes. A large majority (80%) of these regions could be specifically localized to a site of detectable genomic amplification or deletion; reciprocally, up to 47% of the total aneuploidy in each of the individual cell lines could be directly inferred from the gene expression data. Genome-wide portraits of tumor aneuploidy can thus be successfully reconstructed solely from gene expression data, implying that the effects of aneuploidy must be pervasively and globally imprinted within the cancer transcriptome. Aneuploidy may contribute to tumor behavior not just by affecting the expression of a few key oncogenes and tumor suppressor genes but also by subtly altering the expression levels of hundreds of genes in the oncogenome.
Collapse
Affiliation(s)
- Amit Aggarwal
- 1Cellular and Molecular Research,
- 4Department of Physiology, Faculty of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | | | - Cheryl Lee
- 2Division of Medical Science, National Cancer Centre,
| | - Oi Lian Kon
- 2Division of Medical Science, National Cancer Centre,
| | - Patrick Tan
- 1Cellular and Molecular Research,
- 3Genome Institute of Singapore,
- 4Department of Physiology, Faculty of Medicine, National University of Singapore, Singapore, Republic of Singapore
| |
Collapse
|
39
|
Williams G, Doherty P. Inter-residue distances derived from fold contact propensities correlate with evolutionary substitution costs. BMC Bioinformatics 2004; 5:153. [PMID: 15491497 PMCID: PMC526251 DOI: 10.1186/1471-2105-5-153] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2004] [Accepted: 10/18/2004] [Indexed: 11/15/2022] Open
Abstract
Background The wealth of information on protein structure has led to a variety of statistical analyses of the role played by individual amino acid types in the protein fold. In particular, the contact propensities between the various amino acids can be converted into folding energies that have proved useful in structure prediction. The present study addresses the relationship of protein folding propensities to the evolutionary relationship between residues. Results The contact preferences of residue types observed in a representative sample of protein structures are converted into a residue similarity matrix or inter-residue distance matrix. Remarkably, these distances correlate excellently with evolutionary substitution costs. Residue vectors are derived from the distance matrix. The residue vectors give a concrete picture of the grouping of residues into families sharing properties crucial for protein folding. Conclusions Inter-residue distances have proved useful in showing the explicit relationship between contact preferences and evolutionary substitution rates. It is proposed that the distance matrix derived from structural analysis may be useful in aligning proteins where remote homologs share structural features. Residue vectors derived from the distance matrix illustrate the spatial arrangement of residues and point to ways in which they can be grouped.
Collapse
Affiliation(s)
- Gareth Williams
- Wolfson Centre for Age-Related Diseases, The Wolfson Wing, Hodgkin Building, Kings College London, London SE1 1UL, UK
| | - Patrick Doherty
- Wolfson Centre for Age-Related Diseases, The Wolfson Wing, Hodgkin Building, Kings College London, London SE1 1UL, UK
| |
Collapse
|
40
|
Wang J, Crippen GM. Statistical mechanics of protein folding with separable energy functions. Biopolymers 2004; 74:214-20. [PMID: 15150796 DOI: 10.1002/bip.20077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We have initiated an entirely new approach to statistical mechanical models of strongly interacting systems where the configurational parameters and the potential energy function are both constructed so that the canonical partition function can be evaluated analytically. For a simplified model of proteins consisting of a single, fairly short polypeptide chain without cross-links, we can adjust the energy parameters to favor the experimentally determined native state of seven proteins having diverse types of folds. Then 497 test proteins are predicted to have stable native folds, even though they are also structurally diverse, and 480 of them have no significant sequence similarity to any of the training proteins.
Collapse
Affiliation(s)
- Jianyong Wang
- College of Pharmacy, University of Michigan, Ann Arbor, MI 48109-1065, USA
| | | |
Collapse
|
41
|
Selz KA, Mandell AJ, Shlesinger MF, Arcuragi V, Owens MJ. Designing human m1 muscarinic receptor-targeted hydrophobic eigenmode matched peptides as functional modulators. Biophys J 2004; 86:1308-31. [PMID: 14990463 PMCID: PMC1303971 DOI: 10.1016/s0006-3495(04)74204-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2003] [Accepted: 10/23/2003] [Indexed: 11/24/2022] Open
Abstract
A new proprietary de novo peptide design technique generated ten 15-residue peptides targeting and containing the leading nontransmembrane hydrophobic autocorrelation wavelengths, "modes", of the human m(1) muscarinic cholinergic receptor, m(1)AChR. These modes were also shared by the m(4)AChR subtype (but not the m(2), m(3), or m(5) subtypes) and the three-finger snake toxins that pseudoirreversibly bind m(1)AChR. The linear decomposition of the hydrophobically transformed m(1)AChR amino acid sequence yielded ordered eigenvectors of orthogonal hydrophobic variational patterns. The weighted sum of two eigenvectors formed the peptide design template. Amino acids were iteratively assigned to template positions randomly, within hydrophobic groups. One peptide demonstrated significant functional indirect agonist activity, and five produced significant positive allosteric modulation of atropine-reversible, direct-agonist-induced cellular activation in stably m(1)AChR-transfected Chinese hamster ovary cells, reflected in integrated extracellular acidification responses. The peptide positive allosteric ligands produced left-shifts and peptide concentration-response augmentation in integrated extracellular acidification response asymptotic sigmoidal functions and concentration-response behavior in Hill number indices of positive cooperativity. Peptide mode specificity was suggested by negative crossover experiments with human m(2)ACh and D(2) dopamine receptors. Morlet wavelet transformation of the leading eigenvector-derived, m(1)AChR eigenfunctions locates seven hydrophobic transmembrane segments and suggests possible extracellular loop locations for the peptide-receptor mode-matched, modulatory hydrophobic aggregation sites.
Collapse
Affiliation(s)
- Karen A Selz
- Cielo Institute, Asheville, North Carolina 28804, USA.
| | | | | | | | | |
Collapse
|
42
|
Qiu J, Liang R, Zou X, Mo J. Prediction of Transmembrane Proteins Based on the Continuous Wavelet Transform. ACTA ACUST UNITED AC 2004; 44:741-7. [PMID: 15032556 DOI: 10.1021/ci0303868] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A novel method based on continuous wavelet transform (CWT) for predicting the number and location of helices in membrane proteins is presented. Two bacteria proteins are chosen as examples to describe the prediction of transmembrane helices (HTM) by using this method. Selections of an appropriate dilation and hydrophobicity data types are discussed in the text. The results indicate that CWT is a promising approach for the prediction of HTM.
Collapse
Affiliation(s)
- Jianding Qiu
- School of Chemistry and Chemical Engineering, Zhongshan University, Guangzhou 510275, People's Republic of China
| | | | | | | |
Collapse
|
43
|
Zbilut JP, Colosimo A, Conti F, Colafranceschi M, Manetti C, Valerio M, Webber CL, Giuliani A. Protein aggregation/folding: the role of deterministic singularities of sequence hydrophobicity as determined by nonlinear signal analysis of acylphosphatase and Abeta(1-40). Biophys J 2003; 85:3544-57. [PMID: 14645049 PMCID: PMC1303661 DOI: 10.1016/s0006-3495(03)74774-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2003] [Accepted: 08/07/2003] [Indexed: 11/30/2022] Open
Abstract
The problem of protein folding vs. aggregation was investigated in acylphosphatase and the amyloid protein Abeta(1-40) by means of nonlinear signal analysis of their chain hydrophobicity. Numerical descriptors of recurrence patterns provided the basis for statistical evaluation of folding/aggregation distinctive features. Static and dynamic approaches were used to elucidate conditions coincident with folding vs. aggregation using comparisons with known protein secondary structure classifications, site-directed mutagenesis studies of acylphosphatase, and molecular dynamics simulations of amyloid protein, Abeta(1-40). The results suggest that a feature derived from principal component space characterized by the smoothness of singular, deterministic hydrophobicity patches plays a significant role in the conditions governing protein aggregation.
Collapse
Affiliation(s)
- Joseph P Zbilut
- Department of Molecular Biophysics and Physiology, Rush Medical College, Chicago, Illinois, USA.
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Allen TE, Herrgård MJ, Liu M, Qiu Y, Glasner JD, Blattner FR, Palsson BØ. Genome-scale analysis of the uses of the Escherichia coli genome: model-driven analysis of heterogeneous data sets. J Bacteriol 2003; 185:6392-9. [PMID: 14563874 PMCID: PMC219383 DOI: 10.1128/jb.185.21.6392-6399.2003] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The recent availability of heterogeneous high-throughput data types has increased the need for scalable in silico methods with which to integrate data related to the processes of regulation, protein synthesis, and metabolism. A sequence-based framework for modeling transcription and translation in prokaryotes has been established and has been extended to study the expression state of the entire Escherichia coli genome. The resulting in silico analysis of the expression state highlighted three facets of gene expression in E. coli: (i) the metabolic resources required for genome expression and protein synthesis were found to be relatively invariant under the conditions tested; (ii) effective promoter strengths were estimated at the genome scale by using global mRNA abundance and half-life data, revealing genes subject to regulation under the experimental conditions tested; and (iii) large-scale genome location-dependent expression patterns with approximately 600-kb periodicity were detected in the E. coli genome based on the 49 expression data sets analyzed. These results support the notion that a structured model-driven analysis of expression data yields additional information that can be subjected to commonly used statistical analyses. The integration of heterogeneous genome-scale data (i.e., sequence, expression data, and mRNA half-life data) is readily achieved in the context of an in silico model.
Collapse
Affiliation(s)
- Timothy E Allen
- Department of Bioengineering, University of California-San Diego, La Jolla, California 92093-0412, USA
| | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
Chemokine receptors represent a prime target for the development of novel therapeutic strategies in a variety of disease processes, including inflammation, allergy and neoplasia. Here we use maximum likelihood methods and bootstrap methods to investigate both the phylogenetic relationships in a large set of human chemokine receptor sequences and the relationships between chemokine receptors and their nearest neighbors. We found that CCR and CXCR families are not homogeneous. We also provide evidences that angiotensin receptors are the closest neighbors. Other close neighbors include opioid, somatostatin and melanin-concentrating hormone receptors. The phylogenetic analysis suggests ancient paralogous relationships and establishes a link between immune, metabolic and neural systems modulation. We complement our findings with a structural analysis based on wavelet methods of the major branches of chemokine receptors phylogeny. We hypothesize that receptors very close in the tree can form heterodimers. Our analyses reveal different characteristics of amino acid hydrophobicity and volume propensity in the different subfamilies. We also found that the second extra-cytoplasmic loop has higher rates of evolution than the internal loops and transmembrane segments, suggesting that selection, shifting, reassignments and broadening of receptor binding specificities involve mainly this loop.
Collapse
Affiliation(s)
- Pietro Liò
- Department of Zoology, University of Cambridge, Cambridge, UK.
| | | |
Collapse
|
46
|
Mandell AJ, Selz KA, Owens MJ, Kinkead B, Shlesinger MF, Gutman DA, Arguragi V. Cellular and behavioral effects of D2 dopamine receptor hydrophobic eigenmode-targeted peptide ligands. Neuropsychopharmacology 2003; 28 Suppl 1:S98-107. [PMID: 12827150 DOI: 10.1038/sj.npp.1300134] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Patterns in G-protein-coupled receptors' hydrophobically transformed amino-acid sequences can be computationally characterized as hierarchies of autocorrelation waves, "hydrophobic eigenmodes", using autocovariance matrix decomposition and all poles power spectral and wavelet transformations. L- or D-amino acid (retro-inverso) 12-18 residue peptides targeting these modes can be designed using eigenvector templates derived from these computations. In all, 12 human long-form D(2) dopamine receptor eigenmode-targeted 15 mer peptides were designed, synthesized, and shown to modulate and/or indirectly activate the extracellular acidification response, EAR, in stably receptor-transfected CHO and LtK cells, with an 83% hit rate. Representative L- and D-amino-acid retro-inverso peptides injected bilaterally in the nucleus accumbens demonstrated changes in rat exploratory behavior and prepulse inhibition similar to those observed following parenteral amphetamine. In contrast with geometric models used for ligand design, such as pharmacophores, the hydrophobic eigenmode approach to lead modulatory peptide design targets hydrophobic eigenmode-bearing subsequences, including those not visible from X-ray and NMR studies such as extracellular segments and loops.
Collapse
Affiliation(s)
- Arnold J Mandell
- Cielo Institute, 486 Sunset Drive, Asheville, NC 28804-3727, USA.
| | | | | | | | | | | | | |
Collapse
|
47
|
|
48
|
Giuliani A, Benigni R, Colafranceschi M, Chandrashekar I, Cowsik SM. Large contact surface interactions between proteins detected by time series analysis methods: case study on C-phycocyanins. Proteins 2003; 51:299-310. [PMID: 12660998 DOI: 10.1002/prot.10366] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A purely sequence-dependent approach to the modeling of protein-protein interaction was applied to the study of C-phycocyanin alphabeta dimers. The interacting pairs (alpha and beta subunits) share an almost complete structural homology, together with a general lack of sequence superposition; thus, they constitute a particularly relevant example for protein-protein interaction prediction. The present analysis is based on a description posited at an intermediate level between sequence and structure, that is, the hydrophobicity patterning along the chains. Based on the description of the sequence hydrophobicity patterns through a battery of nonlinear tools (recurrence quantification analysis and other sequence complexity descriptors), we were able to generate an explicit equation modeling alpha and beta monomers interaction; the model consisted of canonical correlation between the hydrophobicity autocorrelation structures of the interacting pairs. The general implications of this holistic approach to the modeling of protein-protein interactions, which considers the protein primary structures as a whole, are discussed.
Collapse
|