1
|
Ensemble Deep Learning Models for Heart Disease Classification: A Case Study from Mexico. INFORMATION 2020. [DOI: 10.3390/info11040207] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Heart diseases are highly ranked among the leading causes of mortality in the world. They have various types including vascular, ischemic, and hypertensive heart disease. A large number of medical features are reported for patients in the Electronic Health Records (EHR) that allow physicians to diagnose and monitor heart disease. We collected a dataset from Medica Norte Hospital in Mexico that includes 800 records and 141 indicators such as age, weight, glucose, blood pressure rate, and clinical symptoms. Distribution of the collected records is very unbalanced on the different types of heart disease, where 17% of records have hypertensive heart disease, 16% of records have ischemic heart disease, 7% of records have mixed heart disease, and 8% of records have valvular heart disease. Herein, we propose an ensemble-learning framework of different neural network models, and a method of aggregating random under-sampling. To improve the performance of the classification algorithms, we implement a data preprocessing step with features selection. Experiments were conducted with unidirectional and bidirectional neural network models and results showed that an ensemble classifier with a BiLSTM or BiGRU model with a CNN model had the best classification performance with accuracy and F1-score between 91% and 96% for the different types of heart disease. These results are competitive and promising for heart disease dataset. We showed that ensemble-learning framework based on deep models could overcome the problem of classifying an unbalanced heart disease dataset. Our proposed framework can lead to highly accurate models that are adapted for clinical real data and diagnosis use.
Collapse
|
2
|
Gerard FCA, Ribeiro EDA, Leyrat C, Ivanov I, Blondel D, Longhi S, Ruigrok RWH, Jamin M. Modular organization of rabies virus phosphoprotein. J Mol Biol 2009; 388:978-96. [PMID: 19341745 DOI: 10.1016/j.jmb.2009.03.061] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2008] [Revised: 03/23/2009] [Accepted: 03/25/2009] [Indexed: 10/20/2022]
Abstract
A phosphoprotein (P) is found in all viruses of the Mononegavirales order. These proteins form homo-oligomers, fulfil similar roles in the replication cycles of the various viruses, but differ in their length and oligomerization state. Sequence alignments reveal no sequence similarity among proteins from viruses belonging to the same family. Sequence analysis and experimental data show that phosphoproteins from viruses of the Paramyxoviridae contain structured domains alternating with intrinsically disordered regions. Here, we used predictions of disorder of secondary structure, and an analysis of sequence conservation to predict the domain organization of the phosphoprotein from Sendai virus, vesicular stomatitis virus (VSV) and rabies virus (RV P). We devised a new procedure for combining the results from multiple prediction methods and locating the boundaries between disordered regions and structured domains. To validate the proposed modular organization predicted for RV P and to confirm that the putative structured domains correspond to autonomous folding units, we used two-hybrid and biochemical approaches to characterize the properties of several fragments of RV P. We found that both central and C-terminal domains can fold in isolation, that the central domain is the oligomerization domain, and that the C-terminal domain binds to nucleocapsids. Our results suggest a conserved organization of P proteins in the Rhabdoviridae family in concatenated functional domains resembling that of the P proteins in the Paramyxoviridae family.
Collapse
Affiliation(s)
- Francine C A Gerard
- UJF-EMBL-CNRS UMI 3265 - Unit of Virus Host Cell Interactions, Grenoble, France
| | | | | | | | | | | | | | | |
Collapse
|
3
|
Zhang GL, Khan AM, Srinivasan KN, Heiny AT, Lee KX, Kwoh CK, August JT, Brusic V. Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes. BMC Bioinformatics 2008; 9 Suppl 1:S19. [PMID: 18315850 PMCID: PMC2259420 DOI: 10.1186/1471-2105-9-s1-s19] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunological hotspots, has been observed in several antigens. These clusters may be exploited to facilitate the development of epitope-based vaccines by selecting a small number of hotspots that can elicit all of the required T-cell activation functions. Given the large size of pathogen proteomes, including of variant strains, computational tools are necessary for automated screening and selection of immunological hotspots. RESULTS Hotspot Hunter is a web-based computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes through analysis of antigenic diversity. It allows screening and selection of hotspots specific to four common HLA supertypes, namely HLA class I A2, A3, B7 and class II DR. The system uses Artificial Neural Network and Support Vector Machine methods as predictive engines. Soft computing principles were employed to integrate the prediction results produced by both methods for robust prediction performance. Experimental validation of the predictions showed that Hotspot Hunter can successfully identify majority of the real hotspots. Users can predict hotspots from a single protein sequence, or from a set of aligned protein sequences representing pathogen proteome. The latter feature provides a global view of the localizations of the hotspots in the proteome set, enabling analysis of antigenic diversity and shift of hotspots across protein variants. The system also allows the integration of prediction results of the four supertypes for identification of hotspots common across multiple supertypes. The target selection feature of the system shortlists candidate peptide hotspots for the formulation of an epitope-based vaccine that could be effective against multiple variants of the pathogen and applicable to a large proportion of the human population. CONCLUSION Hotspot Hunter is publicly accessible at http://antigen.i2r.a-star.edu.sg/hh/. It is a new generation computational tool aiding in epitope-based vaccine design.
Collapse
Affiliation(s)
- Guang Lan Zhang
- Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613
- School of Computer Engineering, Nanyang Technological University, Singapore 639798
| | - Asif M Khan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597
- Department of Microbiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597
| | - Kellathur N Srinivasan
- Department of Pharmacology and Molecular Sciences, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
- Product Evaluation and Registration Division, Centre for Drug Administration, Health Sciences Authority, 11 Biopolis Way, #011-03 Helios, Singapore 138667
| | - AT Heiny
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597
| | - KX Lee
- Department of Microbiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597
| | - Chee Keong Kwoh
- School of Computer Engineering, Nanyang Technological University, Singapore 639798
| | - J Thomas August
- Department of Pharmacology and Molecular Sciences, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Vladimir Brusic
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Boston, MA 02115, USA
- School of Land, Crop, and Food Sciences, University of Queensland, Brisbame 4072, Australia
| |
Collapse
|
4
|
Dor O, Zhou Y. Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training. Proteins 2007; 66:838-45. [PMID: 17177203 DOI: 10.1002/prot.21298] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q(3) (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA >95%). The latter approaches the theoretical upper limit of 88-90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained.
Collapse
Affiliation(s)
- Ofer Dor
- Department of Physiology and Biophysics, Center for Single Molecule Biophysics, Howard Hughes Medical Institute, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | |
Collapse
|
5
|
Wang Y, Xue Z, Xu J. Better prediction of the location of alpha-turns in proteins with support vector machine. Proteins 2006; 65:49-54. [PMID: 16894602 DOI: 10.1002/prot.21062] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We have developed a novel method named AlphaTurn to predict alpha-turns in proteins based on the support vector machine (SVM). The prediction was done on a data set of 469 nonhomologous proteins containing 967 alpha-turns. A great improvement in prediction performance was achieved by using multiple sequence alignment generated by PSI-BLAST as input instead of the single amino acid sequence. The introduction of secondary structure information predicted by PSIPRED also improved the prediction performance. Moreover, we handled the very uneven data set by combining the cost factor j with the "state-shifting" rule. This further promoted the prediction quality of our method. The final SVM model yielded a Matthews correlation coefficient (MCC) of 0.25 by a 10-fold cross-validation. To our knowledge, this MCC value is the highest obtained so far for predicting alpha-turns. An online Web server based on this method has been developed and can be freely accessed at http://bmc.hust.edu.cn/bioinformatics/ or http://210.42.106.80/.
Collapse
Affiliation(s)
- Yan Wang
- Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan City, China
| | | | | |
Collapse
|
6
|
Vlasov PK, Vlasova AV, Esipova NG, Tumanyan VG. Conformational properties of short oligopeptides: Prediction of the protein chain conformation. Biophysics (Nagoya-shi) 2006. [DOI: 10.1134/s0006350906070116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
7
|
Wilson CL, Boardman PE, Doig AJ, Hubbard SJ. Improved prediction for N-termini of alpha-helices using empirical information. Proteins 2005; 57:322-30. [PMID: 15340919 DOI: 10.1002/prot.20218] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.
Collapse
Affiliation(s)
- Claire L Wilson
- Department of Biomolecular Sciences, University of Manchester Institute of Science and Technology, Manchester, United Kingdom
| | | | | | | |
Collapse
|
8
|
Dupuis F, Sadoc JF, Mornon JP. Protein secondary structure assignment through Voronoï tessellation. Proteins 2004; 55:519-28. [PMID: 15103616 DOI: 10.1002/prot.10566] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present a new automatic algorithm, named VoTAP (Voronoï Tessellation Assignment Procedure), which assigns secondary structures of a polypeptide chain using the list of alpha-carbon coordinates. This program uses three-dimensional Voronoï tessellation. This geometrical tool associates with each amino acid a Voronoï polyhedron, the faces of which unambiguously define contacts between residues. Thanks to the face area, for the contacts close together along the primary structure (low-order contacts) a distinction is made between strong and normal ones. This new definition yields new contact matrices, which are analyzed and used to assign secondary structures. This assignment is performed in two stages. The first one uses contacts between residues close together along the primary structure and is based on data collected on a bank of 282 well-refined nonredundant structures. In this bank, associations were made between the prints defined by these low-order contacts and the assignments performed by different automatic methods. The second step focuses on the strand assignment and uses contacts between distant residues. Comparison with several other automatic assignment methods are presented, and the influence of resolution on the assignment is investigated.
Collapse
Affiliation(s)
- Franck Dupuis
- Laboratoire de Minéralogie Cristallographie Paris, CNRS UMR 7590, Universités Paris 6 et 7, Paris, France
| | | | | |
Collapse
|
9
|
Gijsbers R, Ceulemans H, Bollen M. Functional characterization of the non-catalytic ectodomains of the nucleotide pyrophosphatase/phosphodiesterase NPP1. Biochem J 2003; 371:321-30. [PMID: 12533192 PMCID: PMC1223305 DOI: 10.1042/bj20021943] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2002] [Revised: 01/14/2003] [Accepted: 01/17/2003] [Indexed: 02/07/2023]
Abstract
The ubiquitous nucleotide pyrophosphatases/phosphodiesterases NPP1-3 consist of a short intracellular N-terminal domain, a single transmembrane domain and a large extracellular part, comprising two somatomedin-B-like domains, a catalytic domain and a poorly defined C-terminal domain. We show here that the C-terminal domain of NPP1-3 is structurally related to a family of DNA/RNA non-specific endonucleases. However, none of the residues that are essential for catalysis by the endonucleases are conserved in NPP1-NPP3, suggesting that the nuclease-like domain of NPP1-3 does not represent a second catalytic domain. Truncation analysis revealed that the nuclease-like domain of NPP1 is required for protein stability, for the targeting of NPP1 to the plasma membrane and for the expression of catalytic activity. We also demonstrate that 16 conserved cysteines in the somatomedin-B-like domains of NPP1, in concert with two flanking cysteines, mediate the dimerization of NPP1. The K173Q polymorphism of NPP1, which maps to the second somatomedin-B-like domain and has been associated with the aetiology of insulin resistance, did not affect the dimerization or catalytic activity of NPP1, and did not endow NPP1 with an affinity for the insulin receptor. Our data suggest that the non-catalytic ectodomains contribute to the subunit structure, stability and function of NPP1-3.
Collapse
Affiliation(s)
- Rik Gijsbers
- Afdeling Biochemie, Faculteit Geneeskunde, Katholieke Universiteit Leuven, Campus Gasthuisberg, Herestraat 49, B-3000 Leuven, Belgium
| | | | | |
Collapse
|
10
|
Wilson CL, Hubbard SJ, Doig AJ. A critical assessment of the secondary structure alpha-helices and their termini in proteins. Protein Eng Des Sel 2002; 15:545-54. [PMID: 12200536 DOI: 10.1093/protein/15.7.545] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Secondary structure prediction from amino acid sequence is a key component of protein structure prediction, with current accuracy at approximately 75%. We analysed two state-of-the-art secondary structure prediction methods, PHD and JPRED, comparing predictions with secondary structure assigned by the algorithms DSSP and STRIDE. The specific focus of our study was alpha-helix N-termini, as empirical free energy scales are available for residue preferences at N-terminal positions. Although these prediction methods perform well in general at predicting the alpha-helical locations and length distributions in proteins, they perform less well at predicting the correct helical termini. For example, although most predicted alpha-helices overlap a real alpha-helix (with relatively few completely missed or extra predicted helices), only one-third of JPRED and PHD predictions correctly identify the N-terminus. Analysis of neighbouring N-terminal sequences to predicted helical N-termini shows that the correct N-terminus is often within one or two residues. More importantly, the true N-terminal motif is, on average, more favourable as judged by our experimentally measured free energies. This suggests a simple, but powerful, strategy to improve secondary structure prediction using empirically derived energies to adjust the predicted output to a more favourable N-terminal sequence.
Collapse
Affiliation(s)
- Claire L Wilson
- Department of Biomolecular Sciences, UMIST, P.O. Box 88, Manchester M60 1QD, UK
| | | | | |
Collapse
|
11
|
Krath BN, Hove-Jensen B. Implications of secondary structure prediction and amino acid sequence comparison of class I and class II phosphoribosyl diphosphate synthases on catalysis, regulation, and quaternary structure. Protein Sci 2001; 10:2317-24. [PMID: 11604537 PMCID: PMC2374067 DOI: 10.1110/ps.11801] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
Spinach 5-phospho-D-ribosyl alpha-1-diphosphate (PRPP) synthase isozyme 4 was synthesized in Escherichia coli and purified to near homogeneity. The activity of the enzyme is independent of P(i); it is inhibited by ADP in a competitive manner, indicating a lack of an allosteric site; and it accepts ATP, dATP, GTP, CTP, and UTP as diphosphoryl donors. All of these properties are characteristic for class II PRPP synthases. K(m) values for ATP and ribose 5-phosphate are 77 and 48 microM, respectively. Gel filtration reveals a molecular mass of the native enzyme of approximately 110 kD, which is consistent with a homotrimer. Secondary structure prediction shows that spinach PRPP synthase isozyme 4 has a general folding similar to that of Bacillus subtilis class I PRPP synthase, for which the three-dimensional structure has been solved, as the position and extent of helices and beta-sheets of the two enzymes are essentially conserved. Amino acid sequence comparison reveals that residues of class I PRPP synthases interacting with allosteric inhibitors are not conserved in class II PRPP synthases. Similarly, residues important for oligomerization of the B. subtilis enzyme show little conservation in the spinach enzyme. In contrast, residues of the active site of B. subtilis PRPP synthase show extensive conservation in spinach PRPP synthase isozyme 4.
Collapse
Affiliation(s)
- B N Krath
- Department of Biological Chemistry, Institute of Molecular Biology, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
12
|
Simon I, Fiser A, Tusnády GE. Predicting protein conformation by statistical methods. BIOCHIMICA ET BIOPHYSICA ACTA 2001; 1549:123-36. [PMID: 11690649 DOI: 10.1016/s0167-4838(01)00253-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The unique folded structure makes a polypeptide a functional protein. The number of known sequences is about a hundred times larger than the number of known structures and the gap is increasing rapidly. The primary goal of all structure prediction methods is to obtain structure-related information on proteins, whose structures have not been determined experimentally. Besides this goal, the development of accurate prediction methods helps to reveal principles of protein folding. Here we present a brief survey of protein structure predictions based on statistical analyses of known sequence and structure data. We discuss the background of these methods and attempt to elucidate principles, which govern structure formation of soluble and membrane proteins.
Collapse
Affiliation(s)
- I Simon
- Institute of Enzymology, BRC, Hungarian Academy of Sciences, Budapest, Hungary.
| | | | | |
Collapse
|
13
|
Kell DB, Darby RM, Draper J. Genomic computing. Explanatory analysis of plant expression profiling data using machine learning. PLANT PHYSIOLOGY 2001; 126:943-951. [PMID: 11457944 PMCID: PMC1540126 DOI: 10.1104/pp.126.3.943] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Affiliation(s)
- D B Kell
- of Biological Sciences, University of Wales, Aberystwyth SY23 3DD, United Kingdom
| | | | | |
Collapse
|
14
|
Abstract
Methods predicting protein secondary structure improved substantially in the 1990s through the use of evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than four percentage points to its current height of around 76% of all residues predicted correctly in one of the three states, helix, strand, and other. The past year also brought successful new concepts to the field. These new methods may be particularly interesting in light of the improvements achieved through simple combining of existing methods. Divergent evolutionary profiles contain enough information not only to substantially improve prediction accuracy, but also to correctly predict long stretches of identical residues observed in alternative secondary structure states depending on nonlocal conditions. An example is a method automatically identifying structural switches and thus finding a remarkable connection between predicted secondary structure and aspects of function. Secondary structure predictions are increasingly becoming the work horse for numerous methods aimed at predicting protein structure and function. Is the recent increase in accuracy significant enough to make predictions even more useful? Because the recent improvement yields a better prediction of segments, and in particular of beta strands, I believe the answer is affirmative. What is the limit of prediction accuracy? We shall see.
Collapse
Affiliation(s)
- B Rost
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, New York 10032, USA
| |
Collapse
|
15
|
Gijsbers R, Ceulemans H, Stalmans W, Bollen M. Structural and catalytic similarities between nucleotide pyrophosphatases/phosphodiesterases and alkaline phosphatases. J Biol Chem 2001; 276:1361-8. [PMID: 11027689 DOI: 10.1074/jbc.m007552200] [Citation(s) in RCA: 133] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Nucleotide pyrophosphatases/phosphodiesterases (NPPs) generate nucleoside 5'-monophosphates from a variety of nucleotides and their derivatives. Here we show by data base analysis that these enzymes are conserved from eubacteria to higher eukaryotes. We also provide evidence for the existence of two additional members of the mammalian family of ecto-NPPs. Homology searches and alignment-assisted mutagenesis revealed that the catalytic core of NPPs assumes a fold similar to that of a superfamily of phospho-/sulfo-coordinating metalloenzymes comprising alkaline phosphatases, phosphoglycerate mutases, and arysulfatases. Mutation of mouse NPP1 in some of its predicted metal-coordinating residues (D358N or H362Q) or in the catalytic site threonine (T238S) resulted in an enzyme that could still form the nucleotidylated catalytic intermediate but was hampered in the second step of catalysis. We also obtained data indicating that the ability of some mammalian NPPs to auto(de)phosphorylate is due to an intrinsic phosphatase activity, whereby the enzyme phosphorylated on Thr-238 represents the covalent intermediate of the phosphatase reaction. The results of site-directed mutagenesis suggested that the nucleotide pyrophosphatase/phosphodiesterase and the phosphatase activities of NPPs are mediated by a single catalytic site.
Collapse
Affiliation(s)
- R Gijsbers
- Afdeling Biochemie, Faculteit Geneeskunde, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium
| | | | | | | |
Collapse
|
16
|
Martin AC. The ups and downs of protein topology; rapid comparison of protein structure. PROTEIN ENGINEERING 2000; 13:829-37. [PMID: 11239082 DOI: 10.1093/protein/13.12.829] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Protein topology can be described at different levels. At the most fundamental level, it is a sequence of secondary structure elements (a "primary topology string"). Searching predicted primary topology strings against a library of strings from known protein structures is the basis of some protein fold recognition methods. Here a method known as TOPSCAN is presented for rapid comparison of protein structures. Rather than a simple two-letter alphabet (encoding strand and helix), more complex alphabets are used encoding direction, proximity, accessibility and length of secondary elements and loops in addition to secondary structure. Comparisons are made between the structural information content of primary topology strings and encodings which contain additional information ("secondary topology strings"). The algorithm is extremely fast, with a scan of a large domain against a library of more than 2000 secondary structure strings completing in approximately 30 s. Analysis of protein fold similarity using TOPSCAN at primary and secondary topology levels is presented.
Collapse
Affiliation(s)
- A C Martin
- School of Animal and Microbial Sciences, University of Reading, Whiteknights, P.O. Box 228, Reading RG6 6AJ, UK.
| |
Collapse
|