1
|
Malvagia S, Ferri L, Della Bona M, Borsini W, Cirami CL, Dervishi E, Feriozzi S, Gasperini S, Motta S, Mignani R, Trezzi B, Pieruzzi F, Morrone A, Daniotti M, Donati MA, la Marca G. Multicenter evaluation of use of dried blood spot compared to conventional plasma in measurements of globotriaosylsphingosine (LysoGb3) concentration in 104 Fabry patients. Clin Chem Lab Med 2021; 59:1516-1526. [PMID: 33915609 DOI: 10.1515/cclm-2021-0316] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 04/20/2021] [Indexed: 12/28/2022]
Abstract
OBJECTIVES Fabry disease (FD) is an X-linked lysosomal storage disorder, resulting from a deficiency of the enzyme α-galactosidase A, responsible for breaking down glycolipids such as globotriaosylceramide and its deacylated derivative, globotriaosylsphingosine (LysoGb3). Here, we compare the levels of LysoGb3 in dried blood spots (DBS) and plasma in patients with classic and late-onset phenotypes. METHODS LysoGb3 measurements were performed in 104 FD patients, 39 males and 65 females. Venous blood was collected. A portion was spotted onto filter paper and another portion separated to obtain plasma. The LysoGb3 concentrations in DBS and plasma were determined by highly sensitive electrospray ionization liquid chromatography tandem mass spectrometry. Agreement between different matrices was assessed using linear regression and Bland Altman analysis. RESULTS The method on DBS was validated by evaluating its precision, accuracy, matrix effect, recovery, and stability. The analytical performances were verified by comparison of a total of 104 paired DBS and plasma samples from as many FD patients (representing 46 GLA variants). There was a strong correlation between plasma and the corresponding DBS LysoGb3 concentrations, with few exceptions. Discrepancies were observed in anemic patients with typically low hematocrit levels compared to the normal range. CONCLUSIONS The method proved to be efficient for the rapid analysis of LysoGb3. DBS provides a convenient, sensitive, and reproducible method for measuring LysoGb3 levels for diagnosis, initial phenotypic assignment, and therapeutic monitoring in patients with FD.
Collapse
Affiliation(s)
- Sabrina Malvagia
- Newborn Screening, Clinical Chemistry and Pharmacology Lab, Meyer Children's University Hospital, Florence, Italy
| | - Lorenzo Ferri
- Molecular and Cell Biology Laboratory of Neurometabolic Diseases, Neuroscience Department, Meyer Children's Hospital, Florence, Italy
| | - Maria Della Bona
- Newborn Screening, Clinical Chemistry and Pharmacology Lab, Meyer Children's University Hospital, Florence, Italy
| | | | | | - Egrina Dervishi
- Nephrology Dialysis Transplant Unit, Careggi Hospital, Florence, Italy
| | - Sandro Feriozzi
- Nephrology and Dialysis Unit, Belcolle Hospital, Viterbo, Italy
| | - Serena Gasperini
- Pediatric Rare Diseases Unit, Department of Pediatrics, MBBM Foundation, San Gerardo Hospital, Monza, Italy
| | - Serena Motta
- Pediatric Rare Diseases Unit, Department of Pediatrics, MBBM Foundation, San Gerardo Hospital, Monza, Italy
| | - Renzo Mignani
- Department of Nephrology, Infermi Hospital, Rimini, Italy
| | - Barbara Trezzi
- Clinical Nephrology, School of Medicine and Surgery, University of Milano, Milan, Italy
| | - Federico Pieruzzi
- Clinical Nephrology, School of Medicine and Surgery, University of Milano-Bicocca and Nephrology and Dialysis Unit, ASST-Monza San Gerardo Hospital, Monza, Italy
| | - Amelia Morrone
- Molecular and Cell Biology Laboratory of Neurometabolic Diseases, Neuroscience Department, Meyer Children's Hospital, Florence, Italy.,Department of Neurofarba, University of Florence, Florence, Italy
| | - Marta Daniotti
- Metabolic Disease Unit, Meyer Children's University Hospital, Florence, Italy
| | - Maria Alice Donati
- Metabolic Disease Unit, Meyer Children's University Hospital, Florence, Italy
| | - Giancarlo la Marca
- Newborn Screening, Clinical Chemistry and Pharmacology Lab, Meyer Children's University Hospital, Florence, Italy.,Department of Experimental and Clinical Biomedical Sciences, University of Florence, Florence, Italy
| |
Collapse
|
2
|
van den Bergh T, Tamo G, Nobili A, Tao Y, Tan T, Bornscheuer UT, Kuipers RKP, Vroling B, de Jong RM, Subramanian K, Schaap PJ, Desmet T, Nidetzky B, Vriend G, Joosten HJ. CorNet: Assigning function to networks of co-evolving residues by automated literature mining. PLoS One 2017; 12:e0176427. [PMID: 28545124 PMCID: PMC5436653 DOI: 10.1371/journal.pone.0176427] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 12/12/2016] [Indexed: 12/30/2022] Open
Abstract
CorNet is a web-based tool for the analysis of co-evolving residue positions in protein super-family sequence alignments. CorNet projects external information such as mutation data extracted from literature on interactively displayed groups of co-evolving residue positions to shed light on the functions associated with these groups and the residues in them. We used CorNet to analyse six enzyme super-families and found that groups of strongly co-evolving residues tend to consist of residues involved in a same function such as activity, specificity, co-factor binding, or enantioselectivity. This finding allows to assign a function to residues for which no data is available yet in the literature. A mutant library was designed to mutate residues observed in a group of co-evolving residues predicted to be involved in enantioselectivity, but for which no literature data is available yet. The resulting set of mutations indeed showed many instances of increased enantioselectivity.
Collapse
Affiliation(s)
- Tom van den Bergh
- Bio-Prodict, Nijmegen, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | | | - Alberto Nobili
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
| | - Yifeng Tao
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
- Beijing Key Lab of Bioprocess, Beijing University of Chemical Technology, Chaoyang, Beijing, China
| | - Tianwei Tan
- Beijing Key Lab of Bioprocess, Beijing University of Chemical Technology, Chaoyang, Beijing, China
| | - Uwe T. Bornscheuer
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
| | | | | | | | | | - Peter J. Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | - Tom Desmet
- Centre for Industrial Biotechnology and Biocatalysis, Ghent University, Ghent, Belgium
| | - Bernd Nidetzky
- Institute of Biotechnology and Biochemical Engineering, Graz University of Technology, Graz, Austria
| | | | - Henk-Jan Joosten
- Bio-Prodict, Nijmegen, The Netherlands
- CMBI, Radboudumc, Nijmegen, The Netherlands
- * E-mail:
| |
Collapse
|
3
|
Singhal A, Simmons M, Lu Z. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine. PLoS Comput Biol 2016; 12:e1005017. [PMID: 27902695 PMCID: PMC5130168 DOI: 10.1371/journal.pcbi.1005017] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 06/04/2016] [Indexed: 11/23/2022] Open
Abstract
The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient’s genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer’s disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships. To provide personalized health care it is important to understand patients’ genomic variations and the effect these variants have in protecting or predisposing patients to disease. Several projects aim at providing this information by manually curating such genotype-phenotype relationships in organized databases using data from clinical trials and biomedical literature. However, the exponentially increasing size of biomedical literature and the limited ability of manual curators to discover the genotype-phenotype relationships “hidden” in text has led to delays in keeping such databases updated with the current findings. The result is a bottleneck in leveraging valuable information that is currently available to develop personalized health care solutions. In the past, a few computational techniques have attempted to speed up the curation efforts by using text mining techniques to automatically mine genotype-phenotype information from biomedical literature. However, such computational approaches have not been able to achieve accuracy levels sufficient to make them appealing for practical use. In this work, we present a highly accurate machine-learning-based text mining approach for mining complete genotype-phenotype relationships from biomedical literature. We test the performance of this approach on ten well-known diseases and demonstrate the validity of our approach and its potential utility for practical purposes. We are currently working towards generating genotype-phenotype relationships for all PubMed data with the goal of developing an exhaustive database of all the known diseases in life science. We believe that this work will provide very important and needed support for implementation of personalized health care using genomic data.
Collapse
Affiliation(s)
- Ayush Singhal
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Michael Simmons
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
4
|
Knight AM, Nobili A, van den Bergh T, Genz M, Joosten HJ, Albrecht D, Riedel K, Pavlidis IV, Bornscheuer UT. Bioinformatic analysis of fold-type III PLP-dependent enzymes discovers multimeric racemases. Appl Microbiol Biotechnol 2016; 101:1499-1507. [DOI: 10.1007/s00253-016-7940-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 10/09/2016] [Accepted: 10/12/2016] [Indexed: 10/20/2022]
|
5
|
Buchholz PCF, Vogel C, Reusch W, Pohl M, Rother D, Spieß AC, Pleiss J. BioCatNet: A Database System for the Integration of Enzyme Sequences and Biocatalytic Experiments. Chembiochem 2016; 17:2093-2098. [DOI: 10.1002/cbic.201600462] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Indexed: 12/12/2022]
Affiliation(s)
- Patrick C. F. Buchholz
- Institute of Technical Biochemistry; University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| | - Constantin Vogel
- Institute of Technical Biochemistry; University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| | - Waldemar Reusch
- Institute of Technical Biochemistry; University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| | - Martina Pohl
- IBG-1: Biotechnology; Forschungszentrum Jülich GmbH; 52425 Jülich Germany
| | - Dörte Rother
- IBG-1: Biotechnology; Forschungszentrum Jülich GmbH; 52425 Jülich Germany
| | - Antje C. Spieß
- Institute of Biochemical Engineering; Technical University of Braunschweig; Rebenring 56 38106 Braunschweig Germany
- RWTH Aachen University; AVT.EPT; Worringerweg 1 52074 Aachen Germany
| | - Jürgen Pleiss
- Institute of Technical Biochemistry; University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| |
Collapse
|
6
|
Singhal A, Simmons M, Lu Z. Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature. J Am Med Inform Assoc 2016; 23:766-72. [PMID: 27121612 DOI: 10.1093/jamia/ocw041] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 02/19/2016] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Identifying disease-mutation relationships is a significant challenge in the advancement of precision medicine. The aim of this work is to design a tool that automates the extraction of disease-related mutations from biomedical text to advance database curation for the support of precision medicine. MATERIALS AND METHODS We developed a machine-learning (ML) based method to automatically identify the mutations mentioned in the biomedical literature related to a particular disease. In order to predict a relationship between the mutation and the target disease, several features, such as statistical features, distance features, and sentiment features, were constructed. Our ML model was trained with a pre-labeled dataset consisting of manually curated information about mutation-disease associations. The model was subsequently used to extract disease-related mutations from larger biomedical literature corpora. RESULTS The performance of the proposed approach was assessed using a benchmarking dataset. Results show that our proposed approach gains significant improvement over the previous state of the art and obtains F-measures of 0.880 and 0.845 for prostate and breast cancer mutations, respectively. DISCUSSION To demonstrate its utility, we applied our approach to all abstracts in PubMed for 3 diseases (including a non-cancer disease). The mutations extracted were then manually validated against human-curated databases. The validation results show that the proposed approach is useful in a real-world setting to extract uncurated disease mutations from the biomedical literature. CONCLUSIONS The proposed approach improves the state of the art for mutation-disease extraction from text. It is scalable and generalizable to identify mutations for any disease at a PubMed scale.
Collapse
Affiliation(s)
- Ayush Singhal
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
| | - Michael Simmons
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
7
|
Gricman Ł, Vogel C, Pleiss J. Identification of universal selectivity-determining positions in cytochrome P450 monooxygenases by systematic sequence-based literature mining. Proteins 2015; 83:1593-603. [DOI: 10.1002/prot.24840] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Revised: 05/22/2015] [Accepted: 05/26/2015] [Indexed: 12/21/2022]
Affiliation(s)
- Łukasz Gricman
- Institute of Technical Biochemistry, University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| | - Constantin Vogel
- Institute of Technical Biochemistry, University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| | - Jürgen Pleiss
- Institute of Technical Biochemistry, University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| |
Collapse
|
8
|
Steffen-Munsberg F, Vickers C, Kohls H, Land H, Mallin H, Nobili A, Skalden L, van den Bergh T, Joosten HJ, Berglund P, Höhne M, Bornscheuer UT. Bioinformatic analysis of a PLP-dependent enzyme superfamily suitable for biocatalytic applications. Biotechnol Adv 2015; 33:566-604. [PMID: 25575689 DOI: 10.1016/j.biotechadv.2014.12.012] [Citation(s) in RCA: 159] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Revised: 12/16/2014] [Accepted: 12/17/2014] [Indexed: 01/25/2023]
Abstract
In this review we analyse structure/sequence-function relationships for the superfamily of PLP-dependent enzymes with special emphasis on class III transaminases. Amine transaminases are highly important for applications in biocatalysis in the synthesis of chiral amines. In addition, other enzyme activities such as racemases or decarboxylases are also discussed. The substrate scope and the ability to accept chemically different types of substrates are shown to be reflected in conserved patterns of amino acids around the active site. These findings are condensed in a sequence-function matrix, which facilitates annotation and identification of biocatalytically relevant enzymes and protein engineering thereof.
Collapse
Affiliation(s)
- Fabian Steffen-Munsberg
- Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany; KTH Royal Institute of Technology, School of Biotechnology, Division of Industrial Biotechnology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| | - Clare Vickers
- Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
| | - Hannes Kohls
- Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany; Protein Biochemistry, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
| | - Henrik Land
- KTH Royal Institute of Technology, School of Biotechnology, Division of Industrial Biotechnology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| | - Hendrik Mallin
- Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
| | - Alberto Nobili
- Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
| | - Lilly Skalden
- Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
| | - Tom van den Bergh
- Bio-Prodict, Nieuwe Marktstraat 54E, 6511 AA Nijmegen, The Netherlands
| | - Henk-Jan Joosten
- Bio-Prodict, Nieuwe Marktstraat 54E, 6511 AA Nijmegen, The Netherlands
| | - Per Berglund
- KTH Royal Institute of Technology, School of Biotechnology, Division of Industrial Biotechnology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| | - Matthias Höhne
- Protein Biochemistry, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany.
| | - Uwe T Bornscheuer
- Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany.
| |
Collapse
|
9
|
Sebestova E, Bendl J, Brezovsky J, Damborsky J. Computational tools for designing smart libraries. Methods Mol Biol 2014; 1179:291-314. [PMID: 25055786 DOI: 10.1007/978-1-4939-1053-3_20] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Traditional directed evolution experiments are often time-, labor- and cost-intensive because they involve repeated rounds of random mutagenesis and the selection or screening of large mutant libraries. The efficiency of directed evolution experiments can be significantly improved by targeting mutagenesis to a limited number of hot-spot positions and/or selecting a limited set of substitutions. The design of such "smart" libraries can be greatly facilitated by in silico analyses and predictions. Here we provide an overview of computational tools applicable for (a) the identification of hot-spots for engineering enzyme properties, and (b) the evaluation of predicted hot-spots and selection of suitable amino acids for substitutions. The selected tools do not require any specific expertise and can easily be implemented by the wider scientific community.
Collapse
Affiliation(s)
- Eva Sebestova
- Loschmidt Laboratories, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic
| | | | | | | |
Collapse
|
10
|
Peterson TA, Doughty E, Kann MG. Towards precision medicine: advances in computational approaches for the analysis of human variants. J Mol Biol 2013; 425:4047-63. [PMID: 23962656 PMCID: PMC3807015 DOI: 10.1016/j.jmb.2013.08.008] [Citation(s) in RCA: 93] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 08/07/2013] [Accepted: 08/08/2013] [Indexed: 12/26/2022]
Abstract
Variations and similarities in our individual genomes are part of our history, our heritage, and our identity. Some human genomic variants are associated with common traits such as hair and eye color, while others are associated with susceptibility to disease or response to drug treatment. Identifying the human variations producing clinically relevant phenotypic changes is critical for providing accurate and personalized diagnosis, prognosis, and treatment for diseases. Furthermore, a better understanding of the molecular underpinning of disease can lead to development of new drug targets for precision medicine. Several resources have been designed for collecting and storing human genomic variations in highly structured, easily accessible databases. Unfortunately, a vast amount of information about these genetic variants and their functional and phenotypic associations is currently buried in the literature, only accessible by manual curation or sophisticated text text-mining technology to extract the relevant information. In addition, the low cost of sequencing technologies coupled with increasing computational power has enabled the development of numerous computational methodologies to predict the pathogenicity of human variants. This review provides a detailed comparison of current human variant resources, including HGMD, OMIM, ClinVar, and UniProt/Swiss-Prot, followed by an overview of the computational methods and techniques used to leverage the available data to predict novel deleterious variants. We expect these resources and tools to become the foundation for understanding the molecular details of genomic variants leading to disease, which in turn will enable the promise of precision medicine.
Collapse
Affiliation(s)
- Thomas A Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Emily Doughty
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA
| | - Maricel G Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| |
Collapse
|
11
|
Thomas AS, Mehta AB. Difficulties and barriers in diagnosing Fabry disease: what can be learnt from the literature? ACTA ACUST UNITED AC 2013; 7:589-99. [PMID: 24128193 DOI: 10.1517/17530059.2013.846322] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
INTRODUCTION Fabry disease (FD) is an X-linked disorder of glycosphingolipid metabolism caused by deficiency of the lysosomal enzyme alpha galactosidase A. Clinical features include neuropathic pain, rash, proteinuria renal failure, stroke and cardiomyopathy accompanied by a reduced life expectancy. Patients report an average delay of > 10 years between symptom onset and diagnosis. Newborn screening studies suggest a much higher prevalence than that found on population studies supporting the notion that FD is under-diagnosed. AREAS COVERED Four key challenges in the diagnosis of FD and strategies to overcome them are discussed. The clinical features of FD are highly heterogeneous resulting in patients presenting to many different specialists, often with non-specific symptoms with a wide differential diagnosis. The pathophysiological mechanisms underlying this are poorly understood and the prediction of pathogenicity on the basis of gene mutation analysis can be problematic. While the availability of treatment adds an impetus to make the correct diagnosis, our understanding of when and if treatment may be required in a specific individual is incomplete. EXPERT OPINION Improving diagnostic rates of FD requires a greater awareness of the disorder among physicians to whom patients may present, new strategies to determine the pathogenicity of novel mutations and a greater understanding of the natural history of FD across the phenotypic spectrum. Collaborative clinical and laboratory research is vital in furthering knowledge of the underlying mechanisms of this disorder and how they may be impacted by current or future therapies.
Collapse
Affiliation(s)
- Alison S Thomas
- Royal Free Hospital and University College London Medical School, Lysosomal Storage Disorders Unit , London NW3 2QG , UK
| | | |
Collapse
|
12
|
Vohra S, Biggin PC. Mutationmapper: a tool to aid the mapping of protein mutation data. PLoS One 2013; 8:e71711. [PMID: 23951226 PMCID: PMC3739722 DOI: 10.1371/journal.pone.0071711] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2013] [Accepted: 07/01/2013] [Indexed: 12/25/2022] Open
Abstract
There has been a rapid increase in the amount of mutational data due to, amongst other things, an increase in single nucleotide polymorphism (SNP) data and the use of site-directed mutagenesis as a tool to help dissect out functional properties of proteins. Many manually curated databases have been developed to index point mutations but they are not sustainable with the ever-increasing volume of scientific literature. There have been considerable efforts in the automatic extraction of mutation specific information from raw text involving use of various text-mining approaches. However, one of the key problems is to link these mutations with its associated protein and to present this data in such a way that researchers can immediately contextualize it within a structurally related family of proteins. To aid this process, we have developed an application called MutationMapper. Point mutations are extracted from abstracts and are validated against protein sequences in Uniprot as far as possible. Our methodology differs in a fundamental way from the usual text-mining approach. Rather than start with abstracts, we start with protein sequences, which facilitates greatly the process of validating a potential point mutation identified in an abstract. The results are displayed as mutations mapped on to the protein sequence or a multiple sequence alignment. The latter enables one to readily pick up mutations performed at equivalent positions in related proteins. We demonstrate the use of MutationMapper against several examples including a single sequence and multiple sequence alignments. The application is available as a web-service at http://mutationmapper.bioch.ox.ac.uk.
Collapse
Affiliation(s)
- Shabana Vohra
- Structural Bioinformatics and Computational Biochemistry, University of Oxford, Oxford, United Kingdom
- Oxford Centre for Integrative Systems Biology, Department of Biochemistry, Oxford, United Kingdom
| | - Philip C. Biggin
- Structural Bioinformatics and Computational Biochemistry, University of Oxford, Oxford, United Kingdom
- Oxford Centre for Integrative Systems Biology, Department of Biochemistry, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
13
|
Verspoor K, Jimeno Yepes A, Cavedon L, McIntosh T, Herten-Crabb A, Thomas Z, Plazzer JP. Annotating the biomedical literature for the human variome. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat019. [PMID: 23584833 PMCID: PMC3676157 DOI: 10.1093/database/bat019] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This article introduces the Variome Annotation Schema, a schema that
aims to capture the core concepts and relations relevant to cataloguing and interpreting
human genetic variation and its relationship to disease, as described in the published
literature. The schema was inspired by the needs of the database curators of the
International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is
intended to have application to genetic variation information in a range of diseases. The
schema has been applied to a small corpus of full text journal publications on the subject
of inherited colorectal cancer. We show that the inter-annotator agreement on annotation
of this corpus ranges from 0.78 to 0.95 F-score across different entity
types when exact matching is measured, and improves to a minimum F-score
of 0.87 when boundary matching is relaxed. Relations show more variability in agreement,
but several are reliable, with the highest, cohort-has-size, reaching
0.90 F-score. We also explore the relevance of the schema to the InSiGHT
database curation process. The schema and the corpus represent an important new resource
for the development of text mining solutions that address relationships among patient
cohorts, disease and genetic variation, and therefore, we also discuss the role text
mining might play in the curation of information related to the human variome. The corpus
is available at http://opennicta.com/home/health/variome.
Collapse
Affiliation(s)
- Karin Verspoor
- National ICT Australia (NICTA), Victoria Research Laboratory, Level 2, Building 193, The University of Melbourne, Parkville VIC 3010, Australia.
| | | | | | | | | | | | | |
Collapse
|
14
|
Wei CH, Harris BR, Kao HY, Lu Z. tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 2013; 29:1433-9. [PMID: 23564842 DOI: 10.1093/bioinformatics/btt156] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Text-mining mutation information from the literature becomes a critical part of the bioinformatics approach for the analysis and interpretation of sequence variations in complex diseases in the post-genomic era. It has also been used for assisting the creation of disease-related mutation databases. Most of existing approaches are rule-based and focus on limited types of sequence variations, such as protein point mutations. Thus, extending their extraction scope requires significant manual efforts in examining new instances and developing corresponding rules. As such, new automatic approaches are greatly needed for extracting different kinds of mutations with high accuracy. RESULTS Here, we report tmVar, a text-mining approach based on conditional random field (CRF) for extracting a wide range of sequence variants described at protein, DNA and RNA levels according to a standard nomenclature developed by the Human Genome Variation Society. By doing so, we cover several important types of mutations that were not considered in past studies. Using a novel CRF label model and feature set, our method achieves higher performance than a state-of-the-art method on both our corpus (91.4 versus 78.1% in F-measure) and their own gold standard (93.9 versus 89.4% in F-measure). These results suggest that tmVar is a high-performance method for mutation extraction from biomedical literature. AVAILABILITY tmVar software and its corpus of 500 manually curated abstracts are available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/pub/tmVar
Collapse
Affiliation(s)
- Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
15
|
Gyimesi G, Borsodi D, Sarankó H, Tordai H, Sarkadi B, Hegedűs T. ABCMdb: a database for the comparative analysis of protein mutations in ABC transporters, and a potential framework for a general application. Hum Mutat 2012; 33:1547-56. [PMID: 22693078 DOI: 10.1002/humu.22138] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 05/29/2012] [Indexed: 11/08/2022]
Abstract
To overcome the pathological phenomena caused by altered function of ABC (ATP Binding Cassette) proteins, their mechanisms of action are extensively investigated, often involving the design of mutant constructs for experiments. Designing mutagenetic constructs, interpreting the result of mutagenetic experiments, and finding individual genetic variants require an extensive knowledge of previously published mutations. To aid the recapitulation of mutations described in the literature, we set up a database of ABC protein mutations (ABCMdb) extracted from full-text papers using an automatic mining approach. We have also developed a Web application interface to compare mutations in different ABC proteins using sequence alignments and to interactively map the mutations to 3D structural models. Currently our database contains protein mutations published for ABCB1, ABCB11, ABCC1, ABCC6, ABCC7, and the proteins of the ABCG subfamily. The database will be extended to include other members and subfamilies, and to provide information on whether or not a mutation is disease causing, represents a high-incidence polymorphism, or was generated only in vitro. The ABCMdb database should already help to compare the effects of mutations at homologous positions in different ABC proteins, and its interactive tools aim to advance the design of experiments for a wider range of proteins.
Collapse
Affiliation(s)
- Gergely Gyimesi
- Membrane Research Group, Hungarian Academy of Sciences, Budapest, Hungary
| | | | | | | | | | | |
Collapse
|
16
|
Ebrahim HY, Baker RJ, Mehta AB, Hughes DA. Functional analysis of variant lysosomal acid glycosidases of Anderson-Fabry and Pompe disease in a human embryonic kidney epithelial cell line (HEK 293 T). J Inherit Metab Dis 2012; 35:325-34. [PMID: 21972175 DOI: 10.1007/s10545-011-9395-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2011] [Revised: 09/05/2011] [Accepted: 09/08/2011] [Indexed: 11/30/2022]
Abstract
The functional significance of missense mutations in genes encoding acid glycosidases of lysosomal storage disorders (LSDs) is not always clear. Here we describe a method of investigating functional properties of variant enzymes in vitro using a human embryonic kidney epithelial cell line. Site-directed mutagenesis was performed on the parental plasmids containing cDNA encoding for alpha-galactosidase A (α-Gal A) and acid maltase (α-Glu) to prepare plasmids encoding relevant point mutations. Mutant plasmids were transfected into HEK 293 T cells, and transient over-expression of variant enzymes was measured after 3 days. We have illustrated the method by examining enzymatic activities of four unknown α-Gal A and one α-Glu variants identified in our patients with Anderson-Fabry disease and Pompe diseases respectively. Comparison with control variants known to be either pathogenic or non-pathogenic together with over-expression of wild-type enzyme allowed determination of the pathogenicity of the mutation. One leader sequence novel variant of α-Gal A (p.A15T) was shown not to significantly reduce enzyme activity, whereas three other novel α-Gal A variants (p.D93Y, p.L372P and p.T410I) were shown to be pathogenic as they resulted in significant reduction of enzyme activity. A novel α-Glu variant (p.L72R) was shown to be pathogenic as this significantly reduced enzyme activity. Certain acid glycosidase variants that have been described in association with late-onset LSDs and which are known to have variable residual plasma and leukocyte enzyme activity in patients appear to show intermediate to low enzyme activity (p.N215S and p.Q279E α-Gal A respectively) in the over-expression system.
Collapse
Affiliation(s)
- Hatim Y Ebrahim
- Department of Haematology, Royal Free Campus, University College London Medical School, Rowland Hill Street, London, NW3 2PF, UK
| | | | | | | |
Collapse
|
17
|
Seddon G, Lounnas V, McGuire R, van den Bergh T, Bywater RP, Oliveira L, Vriend G. Drug design for ever, from hype to hope. J Comput Aided Mol Des 2012; 26:137-50. [PMID: 22252446 PMCID: PMC3268973 DOI: 10.1007/s10822-011-9519-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 12/05/2011] [Indexed: 01/28/2023]
Abstract
In its first 25 years JCAMD has been disseminating a large number of techniques aimed at finding better medicines faster. These include genetic algorithms, COMFA, QSAR, structure based techniques, homology modelling, high throughput screening, combichem, and dozens more that were a hype in their time and that now are just a useful addition to the drug-designers toolbox. Despite massive efforts throughout academic and industrial drug design research departments, the number of FDA-approved new molecular entities per year stagnates, and the pharmaceutical industry is reorganising accordingly. The recent spate of industrial consolidations and the concomitant move towards outsourcing of research activities requires better integration of all activities along the chain from bench to bedside. The next 25 years will undoubtedly show a series of translational science activities that are aimed at a better communication between all parties involved, from quantum chemistry to bedside and from academia to industry. This will above all include understanding the underlying biological problem and optimal use of all available data.
Collapse
Affiliation(s)
| | - V. Lounnas
- CMBI, Radboud University Nijmegen Medical Centre, Geert Grooteplein 26–28, 6525 GA Nijmegen, The Netherlands
| | - R. McGuire
- BioAxis Research, Bergse Heihoek 56, Berghem, 5351 SL The Netherlands
| | - T. van den Bergh
- Bio-Prodict, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| | | | - L. Oliveira
- Sao Paulo Federal University (UNIFESP), Sao Paulo, Brazil
| | - G. Vriend
- CMBI, Radboud University Nijmegen Medical Centre, Geert Grooteplein 26–28, 6525 GA Nijmegen, The Netherlands
| |
Collapse
|
18
|
Celli J, Dalgleish R, Vihinen M, Taschner PEM, den Dunnen JT. Curating gene variant databases (LSDBs): toward a universal standard. Hum Mutat 2011; 33:291-7. [PMID: 21990126 DOI: 10.1002/humu.21626] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Accepted: 09/21/2011] [Indexed: 01/27/2023]
Abstract
Gene variant databases or Locus-Specific DataBases (LSDBs) are used to collect and display information on sequence variants on a gene-by-gene basis. Their most frequent use is in relation to DNA-based diagnostics, giving clinicians and scientists easy access to an up-to-date overview of all gene variants identified worldwide and whether they influence the function of the gene ("pathogenic or not"). While literature on gene variant databases is extensive, little has been published on the process of database curation itself. Based on our extensive experience as LSDB curators and our contributions to database curation courses, we discuss the subject of database curation. We describe the tasks involved, the steps to take, and the issues that might occur. Our overview is a first step toward establishing overall guidelines for database curation and ultimately covers one aspect of establishing quality-assured gene variant databases.
Collapse
Affiliation(s)
- Jacopo Celli
- Human and Clinical Genetics, Leiden University Medical Center, Leiden, Netherlands
| | | | | | | | | |
Collapse
|
19
|
Stenson PD, Cooper DN. Prospects for the automated extraction of mutation data from the scientific literature. Hum Genomics 2011; 5:1-4. [PMID: 21106485 PMCID: PMC3500153 DOI: 10.1186/1479-7364-5-1-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
20
|
Tong MY, Cassa CA, Kohane IS. Automated validation of genetic variants from large databases: ensuring that variant references refer to the same genomic locations. ACTA ACUST UNITED AC 2011; 27:891-3. [PMID: 21258063 PMCID: PMC3051330 DOI: 10.1093/bioinformatics/btr029] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY Accurate annotations of genomic variants are necessary to achieve full-genome clinical interpretations that are scientifically sound and medically relevant. Many disease associations, especially those reported before the completion of the HGP, are limited in applicability because of potential inconsistencies with our current standards for genomic coordinates, nomenclature and gene structure. In an effort to validate and link variants from the medical genetics literature to an unambiguous reference for each variant, we developed a software pipeline and reviewed 68 641 single amino acid mutations from Online Mendelian Inheritance in Man (OMIM), Human Gene Mutation Database (HGMD) and dbSNP. The frequency of unresolved mutation annotations varied widely among the databases, ranging from 4 to 23%. A taxonomy of primary causes for unresolved mutations was produced. AVAILABILITY This program is freely available from the web site (http://safegene.hms.harvard.edu/aa2nt/).
Collapse
Affiliation(s)
- Mark Y Tong
- Harvard Medical School, Boston, MA 02115, USA.
| | | | | |
Collapse
|
21
|
Doughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann MG. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. ACTA ACUST UNITED AC 2010; 27:408-15. [PMID: 21138947 DOI: 10.1093/bioinformatics/btq667] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to prostate (PCa) and breast cancer (BCa) mutations. RESULTS We developed the extractor of mutations (EMU) tool to identify mutations and their associated genes. We benchmarked EMU against MutationFinder--a tool to extract point mutations from text. Our results show that both methods achieve comparable performance on two manually curated datasets. We also benchmarked EMU's performance for extracting the complete mutational information and phenotype. Remarkably, we show that one of the steps in our approach, a filter based on sequence analysis, increases the precision for that task from 0.34 to 0.59 (PCa) and from 0.39 to 0.61 (BCa). We also show that this high-throughput approach can be extended to other diseases. DISCUSSION Our method improves the current status of disease-mutation databases by significantly increasing the number of annotated mutations. We found 51 and 128 mutations manually verified to be related to PCa and Bca, respectively, that are not currently annotated for these cancer types in the OMIM or Swiss-Prot databases. EMU's retrieval performance represents a 2-fold improvement in the number of annotated mutations for PCa and BCa. We further show that our method can benefit from full-text analysis once there is an increase in Open Access availability of full-text articles. AVAILABILITY Freely available at: http://bioinf.umbc.edu/EMU/ftp.
Collapse
Affiliation(s)
- Emily Doughty
- University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | | | | | | | | | | | | |
Collapse
|