1
|
Bhattacharyya T, Nayak S, Goswami S, Gadiyaram V, Mathew OK, Sowdhamini R. PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe. Database (Oxford) 2022; 2022:6566803. [PMID: 35411388 PMCID: PMC9216583 DOI: 10.1093/database/baac025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 02/23/2022] [Accepted: 03/27/2022] [Indexed: 11/13/2022]
Abstract
Abstract
Sequence alignments are models that capture the structural, functional and evolutionary relationships between proteins. Structure-guided sequence alignments are helpful in the case of distantly related proteins with poor sequence identity, thus rendering routine sequence alignment methods ineffective. Protein Alignment organized as Structural Superfamilies or PASS2 database provides such sequence alignments of protein domains within a superfamily as per the Structural Classification of Proteins extended (SCOPe) database. The current update of PASS2 (i.e. PASS2.7) is following the latest release of SCOPe (2.07) and we provide data for 14 323 protein domains that are <40% identical and are organized into 2024 superfamilies. Several useful features derived from the alignments, such as conserved secondary structural motifs, HMMs and residues conserved across the superfamily, are also reported. Protein domains that are deviant from the rest of the members of a superfamily may compromise the quality of the alignment, and we found this to be the case in ∼7% of the total superfamilies we considered. To improve the alignment by objectively identifying such ‘outliers’, in this update, we have used a k-means-based unsupervised machine learning method for clustering superfamily members, where features provided were length of domains aligned, Cα-RMSD derived from the rigid-body superposition of all members and gaps contributed to the alignment by each domain. In a few cases, we have split the superfamily as per the clusters predicted and provided complete data for each cluster. A new feature included in this update is absolutely conserved interactions (ACIs) between residue backbones and side chains, which are obtained by aligning protein structure networks using structure-guided sequence alignments of superfamilies. ACIs provide valuable information about functionally important residues and the structure–function relationships of proteins. The ACIs and the corresponding conserved networks for backbone and sidechain have been marked on the superimposed structure separately.
Database URL
The updated version of the PASS2 database is available at http://caps.ncbs.res.in/pass2/.
Collapse
Affiliation(s)
- Teerna Bhattacharyya
- National Centre for Biological Sciences (TIFR), Bellary Road, Bangalore, Karnataka 560065, India
| | - Soumya Nayak
- National Centre for Biological Sciences (TIFR), Bellary Road, Bangalore, Karnataka 560065, India
| | - Smit Goswami
- National Centre for Biological Sciences (TIFR), Bellary Road, Bangalore, Karnataka 560065, India
| | - Vasundhara Gadiyaram
- National Centre for Biological Sciences (TIFR), Bellary Road, Bangalore, Karnataka 560065, India
| | - Oommen K Mathew
- National Centre for Biological Sciences (TIFR), Bellary Road, Bangalore, Karnataka 560065, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences (TIFR), Bellary Road, Bangalore, Karnataka 560065, India
| |
Collapse
|
2
|
Ghosh P, Bhattacharyya T, Mathew OK, Sowdhamini R. PASS2 version 6: a database of structure-based sequence alignments of protein domain superfamilies in accordance with SCOPe. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5367127. [PMID: 30820573 PMCID: PMC6395796 DOI: 10.1093/database/baz028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 02/03/2019] [Accepted: 02/06/2019] [Indexed: 11/15/2022]
Abstract
The number of protein structures is increasing due to the individual initiatives and rapid development of structure determination techniques. Structure-based sequence alignments of distantly related proteins enable the investigation of structural, evolutionary and functional relationships between proteins and their domains leading to their common evolutionary origin. Protein Alignments organized as Structural Superfamilies (PASS2) is a database that provides such alignments of members of protein domain superfamilies of known structure and with less than 40% sequence identity. PASS2 has been continuously updated in accordance to Structural Classification of Proteins (SCOP), and now Structural Classification of Proteins - extended (SCOPe). The current update directly corresponds to SCOPe 2.06, dealing with 2006 domain superfamilies of known structure and about 14 000 domains. Alignments have been augmented by features such as hidden Markov models, highly conserved residues, structural motifs and gene ontology terms, which are available for download. In this update, we introduce the concepts of 'extreme structural outliers' and 'split superfamilies' as well.
Collapse
Affiliation(s)
- Pritha Ghosh
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, India
| | - Teerna Bhattacharyya
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, India
| | - Oommen K Mathew
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, India
| |
Collapse
|
3
|
Gandhimathi A, Ghosh P, Hariharaputran S, Mathew OK, Sowdhamini R. PASS2 database for the structure-based sequence alignment of distantly related SCOP domain superfamilies: update to version 5 and added features. Nucleic Acids Res 2016; 44:D410-4. [PMID: 26553811 PMCID: PMC4702857 DOI: 10.1093/nar/gkv1205] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Revised: 10/16/2015] [Accepted: 10/24/2015] [Indexed: 11/12/2022] Open
Abstract
Structure-based sequence alignment is an essential step in assessing and analysing the relationship of distantly related proteins. PASS2 is a database that records such alignments for protein domain superfamilies and has been constantly updated periodically. This update of the PASS2 version, named as PASS2.5, directly corresponds to the SCOPe 2.04 release. All SCOPe structural domains that share less than 40% sequence identity, as defined by the ASTRAL compendium of protein structures, are included. The current version includes 1977 superfamilies and has been assembled utilizing the structure-based sequence alignment protocol. Such an alignment is obtained initially through MATT, followed by a refinement through the COMPARER program. The JOY program has been used for structural annotations of such alignments. In this update, we have automated the protocol and focused on inclusion of new features such as mapping of GO terms, absolutely conserved residues among the domains in a superfamily and inclusion of PDBs, that are absent in SCOPe 2.04, using the HMM profiles from the alignments of the superfamily members and are provided as a separate list. We have also implemented a more user-friendly manner of data presentation and options for downloading more features. PASS2.5 version is available at http://caps.ncbs.res.in/pass2/.
Collapse
Affiliation(s)
- Arumugam Gandhimathi
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bangalore 560065, Karnataka, India
| | - Pritha Ghosh
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bangalore 560065, Karnataka, India
| | - Sridhar Hariharaputran
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bangalore 560065, Karnataka, India Bharathidasan University, Palkalainagar, Tiruchirapalli 620024, Tamilnadu, India
| | - Oommen K Mathew
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bangalore 560065, Karnataka, India SASTRA University, Tirumalaisamudram, Thanjavur 613401, Tamil Nadu, India
| | - R Sowdhamini
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bangalore 560065, Karnataka, India
| |
Collapse
|
4
|
Mutt E, Rani SS, Sowdhamini R. Structural updates of alignment of protein domains and consequences on evolutionary models of domain superfamilies. BioData Min 2013; 6:20. [PMID: 24237883 PMCID: PMC4175504 DOI: 10.1186/1756-0381-6-20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Accepted: 09/24/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Influx of newly determined crystal structures into primary structural databases is increasing at a rapid pace. This leads to updation of primary and their dependent secondary databases which makes large scale analysis of structures even more challenging. Hence, it becomes essential to compare and appreciate replacement of data and inclusion of new data that is critical between two updates. PASS2 is a database that retains structure-based sequence alignments of protein domain superfamilies and relies on SCOP database for its hierarchy and definition of superfamily members. Since, accurate alignments of distantly related proteins are useful evolutionary models for depicting variations within protein superfamilies, this study aims to trace the changes in data in between PASS2 updates. RESULTS In this study, differences in superfamily compositions, family constituents and length variations between different versions of PASS2 have been tracked. Studying length variations in protein domains, which have been introduced by indels (insertions/deletions), are important because theses indels act as evolutionary signatures in introducing variations in substrate specificity, domain interactions and sometimes even regulating protein stability. With this objective of classifying the nature and source of variations in the superfamilies during transitions (between the different versions of PASS2), increasing length-rigidity of the superfamilies in the recent version is observed. In order to study such length-variant superfamilies in detail, an improved classification approach is also presented, which divides the superfamilies into distinct groups based on their extent of length variation. CONCLUSIONS An objective study in terms of transition between the database updates, detailed investigation of the new/old members and examination of their structural alignments is non-trivial and will help researchers in designing experiments on specific superfamilies, in various modelling studies, in linking representative superfamily members to rapidly expanding sequence space and in evaluating the effects of length variations of new members in drug target proteins. The improved objective classification scheme developed here would be useful in future for automatic analysis of length variation in cases of updates of databases or even within different secondary databases.
Collapse
Affiliation(s)
- Eshita Mutt
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, 560 065 Bangalore, India.
| | | | | |
Collapse
|
5
|
Arumugam G, Nair AG, Hariharaputran S, Ramanathan S. Rebelling for a reason: protein structural "outliers". PLoS One 2013; 8:e74416. [PMID: 24073209 PMCID: PMC3779223 DOI: 10.1371/journal.pone.0074416] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 07/31/2013] [Indexed: 11/29/2022] Open
Abstract
Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or 'rebels', are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities.
Collapse
Affiliation(s)
- Gandhimathi Arumugam
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Gandhi Krishi Vigyana Kendra Campus, Bangalore, India
| | - Anu G. Nair
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Gandhi Krishi Vigyana Kendra Campus, Bangalore, India
| | - Sridhar Hariharaputran
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Gandhi Krishi Vigyana Kendra Campus, Bangalore, India
| | - Sowdhamini Ramanathan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Gandhi Krishi Vigyana Kendra Campus, Bangalore, India
| |
Collapse
|
6
|
Gandhimathi A, Nair AG, Sowdhamini R. PASS2 version 4: an update to the database of structure-based sequence alignments of structural domain superfamilies. Nucleic Acids Res 2011; 40:D531-4. [PMID: 22123743 PMCID: PMC3245109 DOI: 10.1093/nar/gkr1096] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Accurate structure-based sequence alignments of distantly related proteins are crucial in gaining insight about protein domains that belong to a superfamily. The PASS2 database provides alignments of proteins related at the superfamily level and are characterized by low sequence identity. We thus report an automated, updated version of the superfamily alignment database known as PASS2.4, consisting of 1961 superfamilies and 10 569 protein domains, which is in direct correspondence with SCOP (1.75) database. Database organization, improved methods for efficient structure-based sequence alignments and the analysis of extreme distantly related proteins within superfamilies formed the focus of this update. Alignment of family-specific functional residues can be realized using such alignments and is shown using one superfamily as an example. The database of alignments and other related features can be accessed at http://caps.ncbs.res.in/pass2/.
Collapse
Affiliation(s)
- A Gandhimathi
- National centre for Biological Sciences, TIFR, GKVK campus, Bangalore 560 065, Karnataka, India
| | | | | |
Collapse
|
7
|
Veeramalai M, Gilbert D. A novel method for comparing topological models of protein structures enhanced with ligand information. Bioinformatics 2008; 24:2698-705. [DOI: 10.1093/bioinformatics/btn518] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
8
|
Rekha N, Machado SM, Narayanan C, Krupa A, Srinivasan N. Interaction interfaces of protein domains are not topologically equivalent across families within superfamilies: Implications for metabolic and signaling pathways. Proteins 2006; 58:339-53. [PMID: 15562516 DOI: 10.1002/prot.20319] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Using a data set of aligned protein domain superfamilies of known three-dimensional structure, we compared the location of interdomain interfaces on the tertiary folds between members of distantly related protein domain superfamilies. The data set analyzed is comprised of interdomain interfaces, with domains occurring within a polypeptide chain and those between two polypeptide chains. We observe that, in general, the interfaces between protein domains are formed entirely in different locations on the tertiary folds in such pairs. This variation in the location of interface happens in protein domains involved in a wide range of functions, such as enzymes, adapters, and domains that bind protein ligands, or cofactors. While basic biochemical functionality is preserved at the domain superfamily level, the effect of biochemical function on protein assemblies is different in these protein domains related by superfamily. The divergence between proteins, in most cases, is coupled with domain recruitment, with different modes of interaction with the recruited domain. This is in complete contrast to the observation that in closely related homologous protein domains, almost always the interaction interfaces are topologically equivalent. In a small subset of interacting domains within proteins related by remote homology, we observe that the relative positioning of domains with respect to one another is preserved. Based on the analysis of multidomain proteins of known or unknown structure, we suggest that variation in protein-protein interactions in members within a superfamily could serve as diverging points in otherwise parallel metabolic or signaling pathways. We discuss a few representative cases of diverging pathways involving domains in a superfamily.
Collapse
Affiliation(s)
- N Rekha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | | | | | | |
Collapse
|
9
|
Affiliation(s)
- N Srinivasan
- Molecular Biophysics Unit; Indian Institute of Science; Bangalore 560 012; India
| |
Collapse
|
10
|
Casbon JA, Saqi MAS. On single and multiple models of protein families for the detection of remote sequence relationships. BMC Bioinformatics 2006; 7:48. [PMID: 16448555 PMCID: PMC1397874 DOI: 10.1186/1471-2105-7-48] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2005] [Accepted: 01/31/2006] [Indexed: 11/23/2022] Open
Abstract
Background The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily. Results Using profile-profile methods for remote homolog detection we benchmark the performance of single structure-based superfamily models and multiple domain models. On average, over all superfamilies, using a truncated receiver operator characteristic (ROC5) we find that multiple domain models outperform single superfamily models, except at low error rates where the two models behave in a similar way. However there is a wide range of performance depending on the superfamily. For 12% of all superfamilies the ROC5 value for superfamily models is greater than 0.2 above the domain models and for 10% of superfamilies the domain models show a similar improvement in performance over the superfamily models. Conclusion Using a sensitive profile-profile method we have investigated the performance of single structure-based models and multiple sequence models (domain models) in detecting remote superfamily members. We find that overall, multiple models perform better in recognition although single structure-based models display better alignment accuracy.
Collapse
Affiliation(s)
- James A Casbon
- Bioinformatics Group, Institute of Cell and Molecular Science, The Genome Centre, Queen Mary's School of Medicine and Dentistry, Charterhouse Square, London, EC1M 6BQ, UK
| | - Mansoor AS Saqi
- Bioinformatics Group, Institute of Cell and Molecular Science, The Genome Centre, Queen Mary's School of Medicine and Dentistry, Charterhouse Square, London, EC1M 6BQ, UK
| |
Collapse
|
11
|
Abstract
S4 is an automatically generated database of multiple structure-based sequence alignments of protein superfamilies in the SCOP database. All structural domains that do not share more than 40% sequence identity as defined by the ASTRAL compendium of protein structures are included. The alignments are constructed using pairwise structural alignments to generate residue equivalences that are then integrated into multiple alignments using sequence alignment tools. We describe the database and give examples showing how the automatically generated S4 alignments compare favourably to hand-crafted alignments. Available at: http://compbio.mds.qmw.ac.uk/S4.html.
Collapse
Affiliation(s)
- James Casbon
- Institute for Cell and Molecular Science, Bart's and The London, Queen Mary's School of Medicine and Dentistry, University of London, 32 Newark Street, London E1 2AA, UK
| | | |
Collapse
|
12
|
Barik S. When proteome meets genome: the alpha helix and the beta strand of proteins are eschewed by mRNA splice junctions and may define the minimal indivisible modules of protein architecture. J Biosci 2005; 29:261-73. [PMID: 15381847 PMCID: PMC2367099 DOI: 10.1007/bf02702608] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The significance of the intron-exon structure of genes is a mystery. As eukaryotic proteins are made up of modular functional domains, each exon was suspected to encode some form of module; however, the definition of a module remained vague. Comparison of pre-mRNA splice junctions with the three-dimensional architecture of its protein product from different eukaryotes revealed that the junctions were far less likely to occur inside the alpha-helices and beta-strands of proteins than within the more flexible linker regions ('turns' and 'loops') connecting them. The splice junctions were equally distributed in the different types of linkers and throughout the linker sequence, although a slight preference for the central region of the linker was observed. The avoidance of the alpha-helix and the beta-strand by splice junctions suggests the existence of a selection pressure against their disruption, perhaps underscoring the investment made by nature in building these intricate secondary structures. A corollary is that the helix and the strand are the smallest integral architectural units of a protein and represent the minimal modules in the evolution of protein structure. These results should find use in comparative genomics, designing of cloning strategies, and in the mutual verification of genome sequences with protein structures.
Collapse
Affiliation(s)
- Sailen Barik
- Department of Biochemistry and Molecular Biology (MSB 2370), University of South Alabama, College of Medicine, 307 University Blvd., Mobile 36688-0002, USA.
| |
Collapse
|
13
|
Improvement of alignment accuracy utilizing sequentially conserved motifs. BMC Bioinformatics 2004; 5:167. [PMID: 15509307 PMCID: PMC533867 DOI: 10.1186/1471-2105-5-167] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2004] [Accepted: 10/28/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Multiple sequence alignment algorithms are very important tools in molecular biology today. Accurate alignment of proteins is central to several areas such as homology modelling, docking studies, understanding evolutionary trends and study of structure-function relationships. In recent times, improvement of existing progressing programs and implementation of new iterative algorithms have made a significant change in this field. RESULTS We report an alignment algorithm that combines progressive dynamic algorithm, local substructure alignment and iterative refinement to achieve an improved, user-interactive tool. Large-scale benchmarking studies show that this FMALIGN server produces alignments that, aside from preservation of functional and structural conservation, have accuracy comparable to other popular multiple alignment programs. CONCLUSIONS The FMALIGN server allows the user to fix conserved regions in equivalent position in the alignment thereby reducing the chance of global misalignment to a great extent. FMALIGN is available at http://caps.ncbs.res.in/FMALIGN/Home.html.
Collapse
|
14
|
Chakrabarti S, Sowdhamini R. Regions of minimal structural variation among members of protein domain superfamilies: application to remote homology detection and modelling using distant relationships. FEBS Lett 2004; 569:31-6. [PMID: 15225604 DOI: 10.1016/j.febslet.2004.05.028] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2004] [Accepted: 05/13/2004] [Indexed: 11/21/2022]
Abstract
Structurally conserved regions or structural templates have been identified and examined for features such as amino acid content, solvent accessibility, secondary structures, non-polar interaction, residue packing and extent of structural deviations in 179 aligned members of superfamilies involving 1208 pairs of protein domains. An analysis of these structural features shows that the retention of secondary structural conservation and similar hydrogen bonding pattern within the templates is 2.5 and 1.8 times higher, respectively, than full-length alignments suggesting that they form the minimum structural requirement of a superfamily. The identification and availability of structural templates find value in different areas of protein structure prediction and modelling such as in sensitive sequence searches, accurate sequence alignment and three-dimensional modelling on the basis of distant relationships.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Centre for Biological Sciences, UAS-GKVK Campus, Bellary Road, Bangalore 560 065, India
| | | |
Collapse
|
15
|
Bhaduri A, Pugalenthi G, Sowdhamini R. PASS2: an automated database of protein alignments organised as structural superfamilies. BMC Bioinformatics 2004; 5:35. [PMID: 15059245 PMCID: PMC407847 DOI: 10.1186/1471-2105-5-35] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2003] [Accepted: 04/02/2004] [Indexed: 12/02/2022] Open
Abstract
Background The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. Description An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. Conclusions The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at
Collapse
Affiliation(s)
- Anirban Bhaduri
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK campus, Bellary Road, Bangalore, Karnataka 560 065, India
| | - Ganesan Pugalenthi
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK campus, Bellary Road, Bangalore, Karnataka 560 065, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK campus, Bellary Road, Bangalore, Karnataka 560 065, India
| |
Collapse
|