Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kuipers R, van den Bergh T, Joosten HJ, Lekanne dit Deprez RH, Mannens MM, Schaap PJ. Novel tools for extraction and validation of disease-related mutations applied to Fabry disease. Hum Mutat 2010;31:1026-32. [PMID: 20629180 DOI: 10.1002/humu.21317] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

For:	Kuipers R, van den Bergh T, Joosten HJ, Lekanne dit Deprez RH, Mannens MM, Schaap PJ. Novel tools for extraction and validation of disease-related mutations applied to Fabry disease. Hum Mutat 2010;31:1026-32. [PMID: 20629180 DOI: 10.1002/humu.21317] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Number

Cited by Other Article(s)

Malvagia S, Ferri L, Della Bona M, Borsini W, Cirami CL, Dervishi E, Feriozzi S, Gasperini S, Motta S, Mignani R, Trezzi B, Pieruzzi F, Morrone A, Daniotti M, Donati MA, la Marca G. Multicenter evaluation of use of dried blood spot compared to conventional plasma in measurements of globotriaosylsphingosine (LysoGb3) concentration in 104 Fabry patients. Clin Chem Lab Med 2021;59:1516-1526. [PMID: 33915609 DOI: 10.1515/cclm-2021-0316] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 04/20/2021] [Indexed: 12/28/2022]

Affiliation(s)

Sabrina Malvagia Newborn Screening, Clinical Chemistry and Pharmacology Lab, Meyer Children's University Hospital, Florence, Italy
Lorenzo Ferri Molecular and Cell Biology Laboratory of Neurometabolic Diseases, Neuroscience Department, Meyer Children's Hospital, Florence, Italy
Maria Della Bona Newborn Screening, Clinical Chemistry and Pharmacology Lab, Meyer Children's University Hospital, Florence, Italy
Walter Borsini Casa di Cura Villa Ulivella e Glicini, Florence, Italy
Calogero Lino Cirami Nephrology Dialysis Transplant Unit, Careggi Hospital, Florence, Italy
Egrina Dervishi Nephrology Dialysis Transplant Unit, Careggi Hospital, Florence, Italy
Sandro Feriozzi Nephrology and Dialysis Unit, Belcolle Hospital, Viterbo, Italy
Serena Gasperini Pediatric Rare Diseases Unit, Department of Pediatrics, MBBM Foundation, San Gerardo Hospital, Monza, Italy
Serena Motta Pediatric Rare Diseases Unit, Department of Pediatrics, MBBM Foundation, San Gerardo Hospital, Monza, Italy
Renzo Mignani Department of Nephrology, Infermi Hospital, Rimini, Italy
Barbara Trezzi Clinical Nephrology, School of Medicine and Surgery, University of Milano, Milan, Italy
Federico Pieruzzi Clinical Nephrology, School of Medicine and Surgery, University of Milano-Bicocca and Nephrology and Dialysis Unit, ASST-Monza San Gerardo Hospital, Monza, Italy
Amelia Morrone Molecular and Cell Biology Laboratory of Neurometabolic Diseases, Neuroscience Department, Meyer Children's Hospital, Florence, Italy.,Department of Neurofarba, University of Florence, Florence, Italy
Marta Daniotti Metabolic Disease Unit, Meyer Children's University Hospital, Florence, Italy
Maria Alice Donati Metabolic Disease Unit, Meyer Children's University Hospital, Florence, Italy
Giancarlo la Marca Newborn Screening, Clinical Chemistry and Pharmacology Lab, Meyer Children's University Hospital, Florence, Italy.,Department of Experimental and Clinical Biomedical Sciences, University of Florence, Florence, Italy

Collapse

van den Bergh T, Tamo G, Nobili A, Tao Y, Tan T, Bornscheuer UT, Kuipers RKP, Vroling B, de Jong RM, Subramanian K, Schaap PJ, Desmet T, Nidetzky B, Vriend G, Joosten HJ. CorNet: Assigning function to networks of co-evolving residues by automated literature mining. PLoS One 2017;12:e0176427. [PMID: 28545124 PMCID: PMC5436653 DOI: 10.1371/journal.pone.0176427] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 12/12/2016] [Indexed: 12/30/2022] Open

Singhal A, Simmons M, Lu Z. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine. PLoS Comput Biol 2016;12:e1005017. [PMID: 27902695 PMCID: PMC5130168 DOI: 10.1371/journal.pcbi.1005017] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 06/04/2016] [Indexed: 11/23/2022] Open

Abstract

The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient’s genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer’s disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F₁-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships.

To provide personalized health care it is important to understand patients’ genomic variations and the effect these variants have in protecting or predisposing patients to disease. Several projects aim at providing this information by manually curating such genotype-phenotype relationships in organized databases using data from clinical trials and biomedical literature. However, the exponentially increasing size of biomedical literature and the limited ability of manual curators to discover the genotype-phenotype relationships “hidden” in text has led to delays in keeping such databases updated with the current findings. The result is a bottleneck in leveraging valuable information that is currently available to develop personalized health care solutions. In the past, a few computational techniques have attempted to speed up the curation efforts by using text mining techniques to automatically mine genotype-phenotype information from biomedical literature. However, such computational approaches have not been able to achieve accuracy levels sufficient to make them appealing for practical use. In this work, we present a highly accurate machine-learning-based text mining approach for mining complete genotype-phenotype relationships from biomedical literature. We test the performance of this approach on ten well-known diseases and demonstrate the validity of our approach and its potential utility for practical purposes. We are currently working towards generating genotype-phenotype relationships for all PubMed data with the goal of developing an exhaustive database of all the known diseases in life science. We believe that this work will provide very important and needed support for implementation of personalized health care using genomic data.

Collapse

Knight AM, Nobili A, van den Bergh T, Genz M, Joosten HJ, Albrecht D, Riedel K, Pavlidis IV, Bornscheuer UT. Bioinformatic analysis of fold-type III PLP-dependent enzymes discovers multimeric racemases. Appl Microbiol Biotechnol 2016;101:1499-1507. [DOI: 10.1007/s00253-016-7940-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 10/09/2016] [Accepted: 10/12/2016] [Indexed: 10/20/2022]

Buchholz PCF, Vogel C, Reusch W, Pohl M, Rother D, Spieß AC, Pleiss J. BioCatNet: A Database System for the Integration of Enzyme Sequences and Biocatalytic Experiments. Chembiochem 2016;17:2093-2098. [DOI: 10.1002/cbic.201600462] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Indexed: 12/12/2022]

Singhal A, Simmons M, Lu Z. Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature. J Am Med Inform Assoc 2016;23:766-72. [PMID: 27121612 DOI: 10.1093/jamia/ocw041] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 02/19/2016] [Indexed: 11/14/2022] Open

Gricman Ł, Vogel C, Pleiss J. Identification of universal selectivity-determining positions in cytochrome P450 monooxygenases by systematic sequence-based literature mining. Proteins 2015;83:1593-603. [DOI: 10.1002/prot.24840] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Revised: 05/22/2015] [Accepted: 05/26/2015] [Indexed: 12/21/2022]

Steffen-Munsberg F, Vickers C, Kohls H, Land H, Mallin H, Nobili A, Skalden L, van den Bergh T, Joosten HJ, Berglund P, Höhne M, Bornscheuer UT. Bioinformatic analysis of a PLP-dependent enzyme superfamily suitable for biocatalytic applications. Biotechnol Adv 2015;33:566-604. [PMID: 25575689 DOI: 10.1016/j.biotechadv.2014.12.012] [Citation(s) in RCA: 159] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Revised: 12/16/2014] [Accepted: 12/17/2014] [Indexed: 01/25/2023]

Affiliation(s)

Fabian Steffen-Munsberg Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany; KTH Royal Institute of Technology, School of Biotechnology, Division of Industrial Biotechnology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
Clare Vickers Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
Hannes Kohls Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany; Protein Biochemistry, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
Henrik Land KTH Royal Institute of Technology, School of Biotechnology, Division of Industrial Biotechnology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
Hendrik Mallin Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
Alberto Nobili Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
Lilly Skalden Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
Tom van den Bergh Bio-Prodict, Nieuwe Marktstraat 54E, 6511 AA Nijmegen, The Netherlands
Henk-Jan Joosten Bio-Prodict, Nieuwe Marktstraat 54E, 6511 AA Nijmegen, The Netherlands
Per Berglund KTH Royal Institute of Technology, School of Biotechnology, Division of Industrial Biotechnology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
Matthias Höhne Protein Biochemistry, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany.
Uwe T Bornscheuer Dept. of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany.

Collapse

Sebestova E, Bendl J, Brezovsky J, Damborsky J. Computational tools for designing smart libraries. Methods Mol Biol 2014;1179:291-314. [PMID: 25055786 DOI: 10.1007/978-1-4939-1053-3_20] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Peterson TA, Doughty E, Kann MG. Towards precision medicine: advances in computational approaches for the analysis of human variants. J Mol Biol 2013;425:4047-63. [PMID: 23962656 PMCID: PMC3807015 DOI: 10.1016/j.jmb.2013.08.008] [Citation(s) in RCA: 93] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 08/07/2013] [Accepted: 08/08/2013] [Indexed: 12/26/2022]

Thomas AS, Mehta AB. Difficulties and barriers in diagnosing Fabry disease: what can be learnt from the literature? ACTA ACUST UNITED AC 2013;7:589-99. [PMID: 24128193 DOI: 10.1517/17530059.2013.846322] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Vohra S, Biggin PC. Mutationmapper: a tool to aid the mapping of protein mutation data. PLoS One 2013;8:e71711. [PMID: 23951226 PMCID: PMC3739722 DOI: 10.1371/journal.pone.0071711] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2013] [Accepted: 07/01/2013] [Indexed: 12/25/2022] Open

Verspoor K, Jimeno Yepes A, Cavedon L, McIntosh T, Herten-Crabb A, Thomas Z, Plazzer JP. Annotating the biomedical literature for the human variome. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013;2013:bat019. [PMID: 23584833 PMCID: PMC3676157 DOI: 10.1093/database/bat019] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Wei CH, Harris BR, Kao HY, Lu Z. tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 2013;29:1433-9. [PMID: 23564842 DOI: 10.1093/bioinformatics/btt156] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Gyimesi G, Borsodi D, Sarankó H, Tordai H, Sarkadi B, Hegedűs T. ABCMdb: a database for the comparative analysis of protein mutations in ABC transporters, and a potential framework for a general application. Hum Mutat 2012;33:1547-56. [PMID: 22693078 DOI: 10.1002/humu.22138] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 05/29/2012] [Indexed: 11/08/2022]

Ebrahim HY, Baker RJ, Mehta AB, Hughes DA. Functional analysis of variant lysosomal acid glycosidases of Anderson-Fabry and Pompe disease in a human embryonic kidney epithelial cell line (HEK 293 T). J Inherit Metab Dis 2012;35:325-34. [PMID: 21972175 DOI: 10.1007/s10545-011-9395-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2011] [Revised: 09/05/2011] [Accepted: 09/08/2011] [Indexed: 11/30/2022]

Seddon G, Lounnas V, McGuire R, van den Bergh T, Bywater RP, Oliveira L, Vriend G. Drug design for ever, from hype to hope. J Comput Aided Mol Des 2012;26:137-50. [PMID: 22252446 PMCID: PMC3268973 DOI: 10.1007/s10822-011-9519-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 12/05/2011] [Indexed: 01/28/2023]

Celli J, Dalgleish R, Vihinen M, Taschner PEM, den Dunnen JT. Curating gene variant databases (LSDBs): toward a universal standard. Hum Mutat 2011;33:291-7. [PMID: 21990126 DOI: 10.1002/humu.21626] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Accepted: 09/21/2011] [Indexed: 01/27/2023]

Stenson PD, Cooper DN. Prospects for the automated extraction of mutation data from the scientific literature. Hum Genomics 2011;5:1-4. [PMID: 21106485 PMCID: PMC3500153 DOI: 10.1186/1479-7364-5-1-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Tong MY, Cassa CA, Kohane IS. Automated validation of genetic variants from large databases: ensuring that variant references refer to the same genomic locations. ACTA ACUST UNITED AC 2011;27:891-3. [PMID: 21258063 PMCID: PMC3051330 DOI: 10.1093/bioinformatics/btr029] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Doughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann MG. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. ACTA ACUST UNITED AC 2010;27:408-15. [PMID: 21138947 DOI: 10.1093/bioinformatics/btq667] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

MOTIVATION

A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to prostate (PCa) and breast cancer (BCa) mutations.

RESULTS

We developed the extractor of mutations (EMU) tool to identify mutations and their associated genes. We benchmarked EMU against MutationFinder--a tool to extract point mutations from text. Our results show that both methods achieve comparable performance on two manually curated datasets. We also benchmarked EMU's performance for extracting the complete mutational information and phenotype. Remarkably, we show that one of the steps in our approach, a filter based on sequence analysis, increases the precision for that task from 0.34 to 0.59 (PCa) and from 0.39 to 0.61 (BCa). We also show that this high-throughput approach can be extended to other diseases.

DISCUSSION

Our method improves the current status of disease-mutation databases by significantly increasing the number of annotated mutations. We found 51 and 128 mutations manually verified to be related to PCa and Bca, respectively, that are not currently annotated for these cancer types in the OMIM or Swiss-Prot databases. EMU's retrieval performance represents a 2-fold improvement in the number of annotated mutations for PCa and BCa. We further show that our method can benefit from full-text analysis once there is an increase in Open Access availability of full-text articles.

AVAILABILITY

Freely available at: http://bioinf.umbc.edu/EMU/ftp.

Collapse