51
|
Andersen JL, Flamm C, Merkle D, Stadler PF. Inferring chemical reaction patterns using rule composition in graph grammars. ACTA ACUST UNITED AC 2013. [DOI: 10.1186/1759-2208-4-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Abstract
Background
Modeling molecules as undirected graphs and chemical reactions as graph rewriting operations is a natural and convenient approach to modeling chemistry. Graph grammar rules are most naturally employed to model elementary reactions like merging, splitting, and isomerisation of molecules. It is often convenient, in particular in the analysis of larger systems, to summarize several subsequent reactions into a single composite chemical reaction.
Results
We introduce a generic approach for composing graph grammar rules to define a chemically useful rule compositions. We iteratively apply these rule compositions to elementary transformations in order to automatically infer complex transformation patterns. As an application we automatically derive the overall reaction pattern of the Formose cycle, namely two carbonyl groups that can react with a bound glycolaldehyde to a second glycolaldehyde. Rule composition also can be used to study polymerization reactions as well as more complicated iterative reaction schemes. Terpenes and the polyketides, for instance, form two naturally occurring classes of compounds of utmost pharmaceutical interest that can be understood as “generalized polymers” consisting of five-carbon (isoprene) and two-carbon units, respectively.
Conclusion
The framework of graph transformations provides a valuable set of tools to generate and investigate large networks of chemical networks. Within this formalism, rule composition is a canonical technique to obtain coarse-grained representations that reflect, in a natural way, “effective” reactions that are obtained by lumping together specific combinations of elementary reactions.
Collapse
|
52
|
de Matos P, Cham JA, Cao H, Alcántara R, Rowland F, Lopez R, Steinbeck C. The Enzyme Portal: a case study in applying user-centred design methods in bioinformatics. BMC Bioinformatics 2013; 14:103. [PMID: 23514033 PMCID: PMC3623738 DOI: 10.1186/1471-2105-14-103] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Accepted: 02/08/2013] [Indexed: 11/29/2022] Open
Abstract
User-centred design (UCD) is a type of user interface design in which the needs and desires of users are taken into account at each stage of the design process for a service or product; often for software applications and websites. Its goal is to facilitate the design of software that is both useful and easy to use. To achieve this, you must characterise users' requirements, design suitable interactions to meet their needs, and test your designs using prototypes and real life scenarios.For bioinformatics, there is little practical information available regarding how to carry out UCD in practice. To address this we describe a complete, multi-stage UCD process used for creating a new bioinformatics resource for integrating enzyme information, called the Enzyme Portal (http://www.ebi.ac.uk/enzymeportal). This freely-available service mines and displays data about proteins with enzymatic activity from public repositories via a single search, and includes biochemical reactions, biological pathways, small molecule chemistry, disease information, 3D protein structures and relevant scientific literature.We employed several UCD techniques, including: persona development, interviews, 'canvas sort' card sorting, user workflows, usability testing and others. Our hope is that this case study will motivate the reader to apply similar UCD approaches to their own software design for bioinformatics. Indeed, we found the benefits included more effective decision-making for design ideas and technologies; enhanced team-working and communication; cost effectiveness; and ultimately a service that more closely meets the needs of our target audience.
Collapse
Affiliation(s)
- Paula de Matos
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | | | | | |
Collapse
|
53
|
Dingerdissen H, Motwani M, Karagiannis K, Simonyan V, Mazumder R. Proteome-wide analysis of nonsynonymous single-nucleotide variations in active sites of human proteins. FEBS J 2013; 280:1542-62. [PMID: 23350563 DOI: 10.1111/febs.12155] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Revised: 12/13/2012] [Accepted: 01/17/2013] [Indexed: 12/30/2022]
Abstract
An enzyme's active site is essential to normal protein activity such that any disruptions at this site may lead to dysfunction and disease. Nonsynonymous single-nucleotide variations (nsSNVs), which alter the amino acid sequence, are one type of disruption that can alter the active site. When this occurs, it is assumed that enzyme activity will vary because of the criticality of the site to normal protein function. We integrate nsSNV data and active site annotations from curated resources to identify all active-site-impacting nsSNVs in the human genome and search for all pathways observed to be associated with this data set to assess the likely consequences. We find that there are 934 unique nsSNVs that occur at the active sites of 559 proteins. Analysis of the nsSNV data shows an over-representation of arginine and an under-representation of cysteine, phenylalanine and tyrosine when comparing the list of nsSNV-impacted active site residues with the list of all possible proteomic active site residues, implying a potential bias for or against variation of these residues at the active site. Clustering analysis shows an abundance of hydrolases and transferases. Pathway and functional analysis shows several pathways over- or under-represented in the data set, with the most significantly affected pathways involved in carbohydrate metabolism. We provide a table of 32 variation-substrate/product pairs that can be used in targeted metabolomics experiments to assay the effects of specific variations. In addition, we report the significant prevalence of aspartic acid to histidine variation in eight proteins associated with nine diseases including glycogen storage diseases, lacrimo-auriculo-dento-digital syndrome, Parkinson's disease and several cancers.
Collapse
Affiliation(s)
- Hayley Dingerdissen
- Department of Biochemistry and Molecular Biology, George Washington University Medical Center, Washington, DC 20037, USA
| | | | | | | | | |
Collapse
|
54
|
|
55
|
Matsuta Y, Ito M, Tohsato Y. ECOH: an enzyme commission number predictor using mutual information and a support vector machine. ACTA ACUST UNITED AC 2012; 29:365-72. [PMID: 23220570 DOI: 10.1093/bioinformatics/bts700] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION The enzyme nomenclature system, commonly known as the enzyme commission (EC) number, plays a key role in classifying and predicting enzymatic reactions. However, numerous reactions have been described in various pathways that do not have an official EC number, and the reactions are not expected to have an EC number assigned because of a lack of articles published on enzyme assays. To predict the EC number of a non-classified enzymatic reaction, we focus on the structural similarity of its substrate and product to the substrate and product of reactions that have been classified. RESULTS We propose a new method to assign EC numbers using a maximum common substructure algorithm, mutual information and a support vector machine, termed the Enzyme COmmission numbers Handler (ECOH). A jack-knife test shows that the sensitivity, precision and accuracy of the method in predicting the first three digits of the official EC number (i.e. the EC sub-subclass) are 86.1%, 87.4% and 99.8%, respectively. We furthermore demonstrate that, by examining the ranking in the candidate lists of EC sub-subclasses generated by the algorithm, the method can successfully predict the classification of 85 enzymatic reactions that fall into multiple EC sub-subclasses. The better performance of the ECOH as compared with existing methods and its flexibility in predicting EC numbers make it useful for predicting enzyme function. AVAILABILITY ECOH is freely available via the Internet at http://www.bioinfo.sk.ritsumei.ac.jp/apps/ecoh/. This program only works on 32-bit Windows.
Collapse
Affiliation(s)
- Yoshihiko Matsuta
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Shiga, Kusatsu 525-8577, Japan
| | | | | |
Collapse
|
56
|
Sillitoe I, Cuff AL, Dessailly BH, Dawson NL, Furnham N, Lee D, Lees JG, Lewis TE, Studer RA, Rentzsch R, Yeats C, Thornton JM, Orengo CA. New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res 2012. [PMID: 23203873 PMCID: PMC3531114 DOI: 10.1093/nar/gks1211] [Citation(s) in RCA: 175] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
57
|
Alcántara R, Onwubiko J, Cao H, Matos PD, Cham JA, Jacobsen J, Holliday GL, Fischer JD, Rahman SA, Jassal B, Goujon M, Rowland F, Velankar S, López R, Overington JP, Kleywegt GJ, Hermjakob H, O'Donovan C, Martín MJ, Thornton JM, Steinbeck C. The EBI enzyme portal. Nucleic Acids Res 2012; 41:D773-80. [PMID: 23175605 PMCID: PMC3531056 DOI: 10.1093/nar/gks1112] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The availability of comprehensive information about enzymes plays an important role in answering questions relevant to interdisciplinary fields such as biochemistry, enzymology, biofuels, bioengineering and drug discovery. At the EMBL European Bioinformatics Institute, we have developed an enzyme portal (http://www.ebi.ac.uk/enzymeportal) to provide this wealth of information on enzymes from multiple in-house resources addressing particular data classes: protein sequence and structure, reactions, pathways and small molecules. The fact that these data reside in separate databases makes information discovery cumbersome. The main goal of the portal is to simplify this process for end users.
Collapse
Affiliation(s)
- Rafael Alcántara
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
58
|
Current challenges in genome annotation through structural biology and bioinformatics. Curr Opin Struct Biol 2012; 22:594-601. [PMID: 22884875 DOI: 10.1016/j.sbi.2012.07.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Revised: 06/29/2012] [Accepted: 07/09/2012] [Indexed: 01/25/2023]
Abstract
With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry.
Collapse
|
59
|
Bar-Even A, Flamholz A, Noor E, Milo R. Rethinking glycolysis: on the biochemical logic of metabolic pathways. Nat Chem Biol 2012; 8:509-17. [PMID: 22596202 DOI: 10.1038/nchembio.971] [Citation(s) in RCA: 158] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Metabolic pathways may seem arbitrary and unnecessarily complex. In many cases, a chemist might devise a simpler route for the biochemical transformation, so why has nature chosen such complex solutions? In this review, we distill lessons from a century of metabolic research and introduce new observations suggesting that the intricate structure of metabolic pathways can be explained by a small set of biochemical principles. Using glycolysis as an example, we demonstrate how three key biochemical constraints--thermodynamic favorability, availability of enzymatic mechanisms and the physicochemical properties of pathway intermediates--eliminate otherwise plausible metabolic strategies. Considering these constraints, glycolysis contains no unnecessary steps and represents one of the very few pathway structures that meet cellular demands. The analysis presented here can be applied to metabolic engineering efforts for the rational design of pathways that produce a desired product while satisfying biochemical constraints.
Collapse
Affiliation(s)
- Arren Bar-Even
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot, Israel
| | | | | | | |
Collapse
|
60
|
Nath N, Mitchell JBO. Is EC class predictable from reaction mechanism? BMC Bioinformatics 2012; 13:60. [PMID: 22530800 PMCID: PMC3368749 DOI: 10.1186/1471-2105-13-60] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2012] [Accepted: 04/24/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. RESULTS The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. CONCLUSIONS Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways.The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.
Collapse
Affiliation(s)
- Neetika Nath
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland KY16 9ST, UK
| | | |
Collapse
|
61
|
Furnham N, Sillitoe I, Holliday GL, Cuff AL, Laskowski RA, Orengo CA, Thornton JM. Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies. PLoS Comput Biol 2012; 8:e1002403. [PMID: 22396634 PMCID: PMC3291543 DOI: 10.1371/journal.pcbi.1002403] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Accepted: 01/09/2012] [Indexed: 11/18/2022] Open
Abstract
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life. Enzymes, as biological catalysts, are crucial to life. Understanding how enzymes have evolved to perform the wide variety of reactions found across all kingdoms of life is fundamental to a broad range of biological studies, especially those leading to new therapeutics. To unravel the evolution of novel enzyme function requires combining information on protein structure, sequence, phylogeny and chemistry (in terms of interacting small molecules and reaction mechanisms). We have developed a protocol for integrating this wide range of data, which we have applied to a relatively large number of families comprising some very diverse relatives. This has permitted us to present an initial overview of the evolution of novel enzyme functions, in which we observe that some changes in function between relatives are more common than others, with most of the functionality observed in nature confined to relatively few families. Moreover, we are able to identify the evolutionary route taken within a superfamily to change the enzyme function from one reaction to another. This information may help in predicting the function of an enzyme that has yet to be experimentally characterised as well as in designing new enzymes for industrial and medical purposes.
Collapse
Affiliation(s)
- Nicholas Furnham
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|