1
|
Claustres M, Thèze C, des Georges M, Baux D, Girodon E, Bienvenu T, Audrezet MP, Dugueperoux I, Férec C, Lalau G, Pagin A, Kitzis A, Thoreau V, Gaston V, Bieth E, Malinge MC, Reboul MP, Fergelot P, Lemonnier L, Mekki C, Fanen P, Bergougnoux A, Sasorith S, Raynal C, Bareil C. CFTR-France, a national relational patient database for sharing genetic and phenotypic data associated with rare CFTR variants. Hum Mutat 2017; 38:1297-1315. [PMID: 28603918 DOI: 10.1002/humu.23276] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Revised: 05/31/2017] [Accepted: 06/04/2017] [Indexed: 11/09/2022]
Abstract
Most of the 2,000 variants identified in the CFTR (cystic fibrosis transmembrane regulator) gene are rare or private. Their interpretation is hampered by the lack of available data and resources, making patient care and genetic counseling challenging. We developed a patient-based database dedicated to the annotations of rare CFTR variants in the context of their cis- and trans-allelic combinations. Based on almost 30 years of experience of CFTR testing, CFTR-France (https://cftr.iurc.montp.inserm.fr/cftr) currently compiles 16,819 variant records from 4,615 individuals with cystic fibrosis (CF) or CFTR-RD (related disorders), fetuses with ultrasound bowel anomalies, newborns awaiting clinical diagnosis, and asymptomatic compound heterozygotes. For each of the 736 different variants reported in the database, patient characteristics and genetic information (other variations in cis or in trans) have been thoroughly checked by a dedicated curator. Combining updated clinical, epidemiological, in silico, or in vitro functional data helps to the interpretation of unclassified and the reassessment of misclassified variants. This comprehensive CFTR database is now an invaluable tool for diagnostic laboratories gathering information on rare variants, especially in the context of genetic counseling, prenatal and preimplantation genetic diagnosis. CFTR-France is thus highly complementary to the international database CFTR2 focused so far on the most common CF-causing alleles.
Collapse
Affiliation(s)
- Mireille Claustres
- Laboratoire de Génétique Moléculaire, Centre Hospitalier Universitaire et Université de Montpellier, Montpellier, France
| | - Corinne Thèze
- Laboratoire de Génétique Moléculaire, Centre Hospitalier Universitaire et Université de Montpellier, Montpellier, France
| | - Marie des Georges
- Laboratoire de Génétique Moléculaire, Centre Hospitalier Universitaire et Université de Montpellier, Montpellier, France
| | - David Baux
- Laboratoire de Génétique Moléculaire, Centre Hospitalier Universitaire et Université de Montpellier, Montpellier, France
| | - Emmanuelle Girodon
- Service de Génétique et Biologie Moléculaires, Groupe Hospitalier Cochin-Broca-Hotel Dieu, Paris, France
| | - Thierry Bienvenu
- Service de Génétique et Biologie Moléculaires, Groupe Hospitalier Cochin-Broca-Hotel Dieu, Paris, France
| | - Marie-Pierre Audrezet
- Laboratoire de Génétique Moléculaire et d'Histocompatibilité, Centre Hospitalier Régional Universitaire, Brest, France
| | - Ingrid Dugueperoux
- Laboratoire de Génétique Moléculaire et d'Histocompatibilité, Centre Hospitalier Régional Universitaire, Brest, France
| | - Claude Férec
- Laboratoire de Génétique Moléculaire et d'Histocompatibilité, Centre Hospitalier Régional Universitaire, Brest, France
| | - Guy Lalau
- Centre de Biologie Pathologie Génétique, Centre Hospitalier Régional Universitaire, Lille, France
| | - Adrien Pagin
- Centre de Biologie Pathologie Génétique, Centre Hospitalier Régional Universitaire, Lille, France
| | - Alain Kitzis
- Département de Génétique, Centre Hospitalier Universitaire, Poitiers, France
| | - Vincent Thoreau
- Département de Génétique, Centre Hospitalier Universitaire, Poitiers, France
| | - Véronique Gaston
- Service de Génétique Médicale, Centre Hospitalier Universitaire, Toulouse, France
| | - Eric Bieth
- Service de Génétique Médicale, Centre Hospitalier Universitaire, Toulouse, France
| | - Marie-Claire Malinge
- Département de Biochimie Génétique, Institut de Biologie en Santé, Centre Hospitalier Universitaire, Angers, France
| | - Marie-Pierre Reboul
- Laboratoire de Génétique Moléculaire, Centre Hospitalier Régional Universitaire, Bordeaux, France
| | - Patricia Fergelot
- Laboratoire Maladies Rares, Génétique et Métabolisme, Bordeaux, France
| | - Lydie Lemonnier
- Registre français de la mucoviscidose, Vaincre la Mucoviscidose, Paris, France
| | - Chadia Mekki
- Laboratoire de Génétique, Hôpital Henri Mondor, Créteil, France
| | - Pascale Fanen
- Laboratoire de Génétique, Hôpital Henri Mondor, Créteil, France
| | - Anne Bergougnoux
- Laboratoire de Génétique Moléculaire, Centre Hospitalier Universitaire et Université de Montpellier, Montpellier, France
| | - Souphatta Sasorith
- Laboratoire de Génétique Moléculaire, Centre Hospitalier Universitaire et Université de Montpellier, Montpellier, France
| | - Caroline Raynal
- Laboratoire de Génétique Moléculaire, Centre Hospitalier Universitaire et Université de Montpellier, Montpellier, France
| | - Corinne Bareil
- Laboratoire de Génétique Moléculaire, Centre Hospitalier Universitaire et Université de Montpellier, Montpellier, France
| |
Collapse
|
2
|
Belhassan K, Ouldim K, Sefiani AA. Genetics and genomic medicine in Morocco: the present hope can make the future bright. Mol Genet Genomic Med 2016; 4:588-598. [PMID: 27896281 PMCID: PMC5118203 DOI: 10.1002/mgg3.255] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Genetics and genomic medicine in Morocco: the present hope can make the future bright.
![]()
Collapse
Affiliation(s)
- Khadija Belhassan
- Medical Genetics and Onco-Genetics Department Hassan II University Medical Center Fez Morocco
| | - Karim Ouldim
- Medical Genetics and Onco-Genetics Department Hassan II University Medical Center Fez Morocco
| | | |
Collapse
|
3
|
Béroud C, Letovsky SI, Braastad CD, Caputo SM, Beaudoux O, Bignon YJ, Bressac-De Paillerets B, Bronner M, Buell CM, Collod-Béroud G, Coulet F, Derive N, Divincenzo C, Elzinga CD, Garrec C, Houdayer C, Karbassi I, Lizard S, Love A, Muller D, Nagan N, Nery CR, Rai G, Revillion F, Salgado D, Sévenet N, Sinilnikova O, Sobol H, Stoppa-Lyonnet D, Toulas C, Trautman E, Vaur D, Vilquin P, Weymouth KS, Willis A, Eisenberg M, Strom CM. BRCA Share: A Collection of Clinical BRCA Gene Variants. Hum Mutat 2016; 37:1318-1328. [DOI: 10.1002/humu.23113] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Accepted: 09/02/2016] [Indexed: 12/12/2022]
Affiliation(s)
- Christophe Béroud
- Aix Marseille Univ; INSERM, GMGF Marseille France
- APHM; Hôpital TIMONE Enfants; Laboratoire de Génétique Moléculaire; Marseille France
| | | | | | - Sandrine M. Caputo
- Service de Génétique; Department de Biologie des Tumeurs; Institut Curie; Paris France
| | | | | | | | | | | | | | - Florence Coulet
- Groupe hospitalier Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, Laboratoire d'Oncogénétique et Angiogénétique moléculaire; Université Pierre et Marie Curie; Paris France
| | - Nicolas Derive
- Service de Génétique; Department de Biologie des Tumeurs; Institut Curie; Paris France
| | | | | | | | - Claude Houdayer
- Service de Génétique; Department de Biologie des Tumeurs; Institut Curie; Paris France
- Université Paris Descartes; Paris France
| | | | - Sarab Lizard
- CHU de Dijon; Hôpital d'Enfants; Service de Génétique Médicale Dijon France
| | - Angela Love
- Quest Diagnostics; Marlborough Massachusetts
| | | | | | | | - Ghadi Rai
- Aix Marseille Univ; INSERM, GMGF Marseille France
| | | | | | | | | | | | - Dominique Stoppa-Lyonnet
- Service de Génétique; Department de Biologie des Tumeurs; Institut Curie; Paris France
- Université Paris Descartes; Paris France
| | | | - Edwin Trautman
- Laboratory Corporation of America; Westborough Massachusetts
| | - Dominique Vaur
- Laboratoire de biologie et de génétique du cancer; CLCC François Baclesse; INSERM 1079 Centre Normand de Génomique et de Médecine Personnalisée; Caen France
| | - Paul Vilquin
- Laboratoire de Biologie Cellulaire et Hormonale (CHU Arnaud de Villeneuve); Montpellier France
| | | | - Alecia Willis
- Laboratory Corporation of America; Research Triangle Park North Carolina
| | - Marcia Eisenberg
- Laboratory Corporation of America; Research Triangle Park North Carolina
| | | | | | | | | |
Collapse
|
4
|
Verspoor KM, Heo GE, Kang KY, Song M. Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts. BMC Med Inform Decis Mak 2016; 16 Suppl 1:68. [PMID: 27454860 PMCID: PMC4959367 DOI: 10.1186/s12911-016-0294-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Variome corpus, a small collection of published articles about inherited colorectal cancer, includes annotations of 11 entity types and 13 relation types related to the curation of the relationship between genetic variation and disease. Due to the richness of these annotations, the corpus provides a good testbed for evaluation of biomedical literature information extraction systems. METHODS In this paper, we focus on assessing performance on extracting the relations in the corpus, using gold standard entities as a starting point, to establish a baseline for extraction of relations important for extraction of genetic variant information from the literature. We test the application of the Public Knowledge Discovery Engine for Java (PKDE4J) system, a natural language processing system designed for information extraction of entities and relations in text, on the relation extraction task using this corpus. RESULTS For the relations which are attested at least 100 times in the Variome corpus, we realise a performance ranging from 0.78-0.84 Precision-weighted F-score, depending on the relation. We find that the PKDE4J system adapted straightforwardly to the range of relation types represented in the corpus; some extensions to the original methodology were required to adapt to the multi-relational classification context. The results are competitive with state-of-the-art relation extraction performance on more heavily studied corpora, although the analysis shows that the Recall of a co-occurrence baseline outweighs the benefit of improved Precision for many relations, indicating the value of simple semantic constraints on relations. CONCLUSIONS This work represents the first attempt to apply relation extraction methods to the Variome corpus. The results demonstrate that automated methods have good potential to structure the information expressed in the published literature related to genetic variants, connecting mutations to genes, diseases, and patient cohorts. Further development of such approaches will facilitate more efficient biocuration of genetic variant information into structured databases, leveraging the knowledge embedded in the vast publication literature.
Collapse
Affiliation(s)
- Karin M Verspoor
- Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Go Eun Heo
- Department of Library and Information Science, Yonsei University, Seoul, Korea
| | - Keun Young Kang
- Department of Library and Information Science, Yonsei University, Seoul, Korea
| | - Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Korea.
| |
Collapse
|
5
|
Singhal A, Simmons M, Lu Z. Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature. J Am Med Inform Assoc 2016; 23:766-72. [PMID: 27121612 DOI: 10.1093/jamia/ocw041] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 02/19/2016] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Identifying disease-mutation relationships is a significant challenge in the advancement of precision medicine. The aim of this work is to design a tool that automates the extraction of disease-related mutations from biomedical text to advance database curation for the support of precision medicine. MATERIALS AND METHODS We developed a machine-learning (ML) based method to automatically identify the mutations mentioned in the biomedical literature related to a particular disease. In order to predict a relationship between the mutation and the target disease, several features, such as statistical features, distance features, and sentiment features, were constructed. Our ML model was trained with a pre-labeled dataset consisting of manually curated information about mutation-disease associations. The model was subsequently used to extract disease-related mutations from larger biomedical literature corpora. RESULTS The performance of the proposed approach was assessed using a benchmarking dataset. Results show that our proposed approach gains significant improvement over the previous state of the art and obtains F-measures of 0.880 and 0.845 for prostate and breast cancer mutations, respectively. DISCUSSION To demonstrate its utility, we applied our approach to all abstracts in PubMed for 3 diseases (including a non-cancer disease). The mutations extracted were then manually validated against human-curated databases. The validation results show that the proposed approach is useful in a real-world setting to extract uncurated disease mutations from the biomedical literature. CONCLUSIONS The proposed approach improves the state of the art for mutation-disease extraction from text. It is scalable and generalizable to identify mutations for any disease at a PubMed scale.
Collapse
Affiliation(s)
- Ayush Singhal
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
| | - Michael Simmons
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
6
|
Dalgleish R. LSDBs and How They Have Evolved. Hum Mutat 2016; 37:532-9. [DOI: 10.1002/humu.22979] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 02/18/2016] [Indexed: 01/10/2023]
Affiliation(s)
- Raymond Dalgleish
- Department of Genetics; University of Leicester; Leicester United Kingdom
| |
Collapse
|
7
|
Savige J, Dalgleish R, Cotton RG, den Dunnen JT, Macrae F, Povey S. The Human Variome Project: ensuring the quality of DNA variant databases in inherited renal disease. Pediatr Nephrol 2015; 30:1893-901. [PMID: 25384529 DOI: 10.1007/s00467-014-2994-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2014] [Revised: 10/09/2014] [Accepted: 10/15/2014] [Indexed: 02/02/2023]
Abstract
A recent review identified 60 common inherited renal diseases caused by DNA variants in 132 different genes. These diseases can be diagnosed with DNA sequencing, but each gene probably also has a thousand normal variants. Many more normal variants have been characterised by individual laboratories than are reported in the literature or found in publicly accessible collections. At present, testing laboratories must assess each novel change they identify for pathogenicity, even when this has been done elsewhere previously, and the distinction between normal and disease-associated variants is particularly an issue with the recent surge in exomic sequencing and gene discovery projects. The Human Variome Project recommends the establishment of gene-specific DNA variant databases to facilitate the sharing of DNA variants and decisions about likely disease causation. Databases improve diagnostic accuracy and testing efficiency, and reduce costs. They also help with genotype-phenotype correlations and predictive algorithms. The Human Variome Project advocates databases that use standardised descriptions, are up-to-date, include clinical information and are freely available. Currently, the genes affected in the most common inherited renal diseases correspond to 350 different variant databases, many of which are incomplete or have insufficient clinical details for genotype-phenotype correlations. Assistance is needed from nephrologists to maximise the usefulness of these databases for the diagnosis and management of inherited renal disease.
Collapse
Affiliation(s)
- Judy Savige
- The University of Melbourne, Melbourne Health, Melbourne, Australia. .,Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC, 3050, Australia.
| | | | - Richard Gh Cotton
- Human Variome Project, The University of Melbourne, Melbourne, Australia
| | - Johan T den Dunnen
- Human and Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Finlay Macrae
- The University of Melbourne, Melbourne Health, Melbourne, Australia.,Colorectal Medicine and Genetics, The Royal Melbourne Hospital, Parkville, Australia
| | - Sue Povey
- Research Department of Genetics, Evolution and Environment, University College London, London, UK
| |
Collapse
|
8
|
Aziz N, Zhao Q, Bry L, Driscoll DK, Funke B, Gibson JS, Grody WW, Hegde MR, Hoeltge GA, Leonard DGB, Merker JD, Nagarajan R, Palicki LA, Robetorye RS, Schrijver I, Weck KE, Voelkerding KV. College of American Pathologists' Laboratory Standards for Next-Generation Sequencing Clinical Tests. Arch Pathol Lab Med 2015; 139:481-93. [DOI: 10.5858/arpa.2014-0250-cp] [Citation(s) in RCA: 265] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
9
|
New functional and structural insights from updated mutational databases for complement factor H, Factor I, membrane cofactor protein and C3. Biosci Rep 2014; 34:BSR20140117. [PMID: 25188723 PMCID: PMC4206863 DOI: 10.1042/bsr20140117] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
aHUS (atypical haemolytic uraemic syndrome), AMD (age-related macular degeneration) and other diseases are associated with defective AP (alternative pathway) regulation. CFH (complement factor H), CFI (complement factor I), MCP (membrane cofactor protein) and C3 exhibited the most disease-associated genetic alterations in the AP. Our interactive structural database for these was updated with a total of 324 genetic alterations. A consensus structure for the SCR (short complement regulator) domain showed that the majority (37%) of SCR mutations occurred at its hypervariable loop and its four conserved Cys residues. Mapping 113 missense mutations onto the CFH structure showed that over half occurred in the C-terminal domains SCR-15 to -20. In particular, SCR-20 with the highest total of affected residues is associated with binding to C3d and heparin-like oligosaccharides. No clustering of 49 missense mutations in CFI was seen. In MCP, SCR-3 was the most affected by 23 missense mutations. In C3, the neighbouring thioester and MG (macroglobulin) domains exhibited most of 47 missense mutations. The mutations in the regulators CFH, CFI and MCP involve loss-of-function, whereas those for C3 involve gain-of-function. This combined update emphasizes the importance of the complement AP in inflammatory disease, clarifies the functionally important regions in these proteins, and will facilitate diagnosis and therapy. A new compilation of 324 mutations in four major proteins from the complement alternative pathway reveals mutational hotspots in factor H and complement C3, and less so in factor I and membrane cofactor protein. Their associations with function are discussed.
Collapse
|
10
|
Kountouris P, Lederer CW, Fanis P, Feleki X, Old J, Kleanthous M. IthaGenes: an interactive database for haemoglobin variations and epidemiology. PLoS One 2014; 9:e103020. [PMID: 25058394 PMCID: PMC4109966 DOI: 10.1371/journal.pone.0103020] [Citation(s) in RCA: 168] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 06/27/2014] [Indexed: 02/07/2023] Open
Abstract
Inherited haemoglobinopathies are the most common monogenic diseases, with millions of carriers and patients worldwide. At present, we know several hundred disease-causing mutations on the globin gene clusters, in addition to numerous clinically important trans-acting disease modifiers encoded elsewhere and a multitude of polymorphisms with relevance for advanced diagnostic approaches. Moreover, new disease-linked variations are discovered every year that are not included in traditional and often functionally limited locus-specific databases. This paper presents IthaGenes, a new interactive database of haemoglobin variations, which stores information about genes and variations affecting haemoglobin disorders. In addition, IthaGenes organises phenotype, relevant publications and external links, while embedding the NCBI Sequence Viewer for graphical representation of each variation. Finally, IthaGenes is integrated with the companion tool IthaMaps for the display of corresponding epidemiological data on distribution maps. IthaGenes is incorporated in the ITHANET community portal and is free and publicly available at http://www.ithanet.eu/db/ithagenes.
Collapse
Affiliation(s)
- Petros Kountouris
- Molecular Genetics Thalassaemia, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
- * E-mail:
| | - Carsten W. Lederer
- Molecular Genetics Thalassaemia, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Pavlos Fanis
- Molecular Genetics Thalassaemia, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Xenia Feleki
- Molecular Genetics Thalassaemia, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - John Old
- Oxford Radcliffe Hospitals NHS Trust, Oxford, United Kingdom
| | - Marina Kleanthous
- Molecular Genetics Thalassaemia, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| |
Collapse
|
11
|
Savige J, Dagher H, Povey S. Mutation databases for inherited renal disease: are they complete, accurate, clinically relevant, and freely available? Hum Mutat 2014; 35:791-3. [PMID: 24826923 DOI: 10.1002/humu.22588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 04/09/2014] [Indexed: 12/22/2022]
Abstract
This study examined whether gene-specific DNA variant databases for inherited diseases of the kidney fulfilled the Human Variome Project recommendations of being complete, accurate, clinically relevant and freely available. A recent review identified 60 inherited renal diseases caused by mutations in 132 genes. The disease name, MIM number, gene name, together with "mutation" or "database," were used to identify web-based databases. Fifty-nine diseases (98%) due to mutations in 128 genes had a variant database. Altogether there were 349 databases (a median of 3 per gene, range 0-6), but no gene had two databases with the same number of variants, and 165 (50%) databases included fewer than 10 variants. About half the databases (180, 54%) had been updated in the previous year. Few (77, 23%) were curated by "experts" but these included nine of the 11 with the most variants. Even fewer databases (41, 12%) included clinical features apart from the name of the associated disease. Most (223, 67%) could be accessed without charge, including those for 50 genes (40%) with the maximum number of variants. Future efforts should focus on encouraging experts to collaborate on a single database for each gene affected in inherited renal disease, including both unpublished variants, and clinical phenotypes.
Collapse
Affiliation(s)
- Judy Savige
- Department of Medicine, The University of Melbourne (Northern Health Melbourne Health), Melbourne, Australia
| | | | | |
Collapse
|
12
|
Soussi T, Leroy B, Taschner PEM. Recommendations for analyzing and reporting TP53 gene variants in the high-throughput sequencing era. Hum Mutat 2014; 35:766-78. [PMID: 24729566 DOI: 10.1002/humu.22561] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 04/02/2014] [Indexed: 12/27/2022]
Abstract
The architecture of TP53, the most frequently mutated gene in human cancer, is more complex than previously thought. Using TP53 variants as clinical biomarkers to predict response to treatment or patient outcome requires an unequivocal and standardized procedure toward a definitive strategy for the clinical evaluation of variants to provide maximum diagnostic sensitivity and specificity. An intronic promoter and two novel exons have been identified resulting in the expression of multiple transcripts and protein isoforms. These regions are additional targets for mutation events impairing the tumor suppressive activity of TP53. Reassessment of variants located in these regions is needed to refine their prognostic value in many malignancies. We recommend using the stable Locus Reference Genomic reference sequence for detailed and unequivocal reports and annotations of germ line and somatic alterations on all TP53 transcripts and protein isoforms according to the recommendations of the Human Genome Variation Society. This novel and comprehensive description framework will generate standardized data that are easy to understand, analyze, and exchange across various cancer variant databases. Based on the statistical analysis of more than 45,000 variants in the latest version of the UMD TP53 database, we also provide a classification of their functional effects ("pathogenicity").
Collapse
Affiliation(s)
- Thierry Soussi
- Department of Oncology-Pathology, Cancer Center Karolinska (CCK), Karolinska Institute, Stockholm, Sweden; Université Pierre et Marie Curie-Paris 6, Paris, 75005, France
| | | | | |
Collapse
|
13
|
Affiliation(s)
- Ourania Horaitis
- Genomic Disorders Research Centre St. Vincent's Hospital Melbourne Fitzroy Australia
| | | |
Collapse
|
14
|
Soussi T. Locus-Specific Databases in Cancer: What Future in a Post-Genomic Era? The TP53 LSDB paradigm. Hum Mutat 2014; 35:643-53. [DOI: 10.1002/humu.22518] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Accepted: 01/16/2014] [Indexed: 11/08/2022]
Affiliation(s)
- Thierry Soussi
- Department of Oncology-Pathology Cancer Center Karolinska (CCK); Karolinska Institute; Stockholm Sweden
- Université Pierre et Marie Curie Paris 6; Paris France
| |
Collapse
|
15
|
Jimeno Yepes A, Verspoor K. Literature mining of genetic variants for curation: quantifying the importance of supplementary material. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau003. [PMID: 24520105 PMCID: PMC3920087 DOI: 10.1093/database/bau003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains 'all of the information', and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication.
Collapse
Affiliation(s)
- Antonio Jimeno Yepes
- National ICT Australia, Victoria Research Laboratory, Melbourne, Australia and Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | | |
Collapse
|
16
|
Jimeno Yepes A, Verspoor K. Mutation extraction tools can be combined for robust recognition of genetic variants in the literature. F1000Res 2014; 3:18. [PMID: 25285203 PMCID: PMC4176422 DOI: 10.12688/f1000research.3-18.v2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/27/2014] [Indexed: 11/20/2022] Open
Abstract
As the cost of genomic sequencing continues to fall, the amount of data being collected and studied for the purpose of understanding the genetic basis of disease is increasing dramatically. Much of the source information relevant to such efforts is available only from unstructured sources such as the scientific literature, and significant resources are expended in manually curating and structuring the information in the literature. As such, there have been a number of systems developed to target automatic extraction of mutations and other genetic variation from the literature using text mining tools. We have performed a broad survey of the existing publicly available tools for extraction of genetic variants from the scientific literature. We consider not just one tool but a number of different tools, individually and in combination, and apply the tools in two scenarios. First, they are compared in an intrinsic evaluation context, where the tools are tested for their ability to identify specific mentions of genetic variants in a corpus of manually annotated papers, the Variome corpus. Second, they are compared in an extrinsic evaluation context based on our previous study of text mining support for curation of the COSMIC and InSiGHT databases. Our results demonstrate that no single tool covers the full range of genetic variants mentioned in the literature. Rather, several tools have complementary coverage and can be used together effectively. In the intrinsic evaluation on the Variome corpus, the combined performance is above 0.95 in F-measure, while in the extrinsic evaluation the combined recall performance is above 0.71 for COSMIC and above 0.62 for InSiGHT, a substantial improvement over the performance of any individual tool. Based on the analysis of these results, we suggest several directions for the improvement of text mining tools for genetic variant extraction from the literature.
Collapse
Affiliation(s)
- Antonio Jimeno Yepes
- National ICT Australia, Victoria Research Laboratory, Melbourne, Australia ; Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Karin Verspoor
- National ICT Australia, Victoria Research Laboratory, Melbourne, Australia ; Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
17
|
Abstract
In this chapter we aim to provide an overview of DNA variant databases, commonly known as Locus-Specific Databases (LSDBs), or Gene-Disease Specific Databases (GDSDBs), but the term variant database will be used for simplicity. We restrict this overview to germ-line variants, particularly as related to Mendelian diseases, which are diseases caused by a variant in a single gene. Common difficulties associated with variant databases and some proposed solutions are reviewed. Finally, systems where technical solutions have been implemented are discussed. This work will be useful for anyone wishing to establish their own variant database, or to learn about the global picture of variant databases, and the technical challenges to be overcome.
Collapse
Affiliation(s)
- John-Paul Plazzer
- Department of Colorectal Medicine and Genetics, Royal Melbourne Hospital, RMH, Grattan Street, Parkville, VIC, 3050, Australia,
| | | |
Collapse
|
18
|
Peterson TA, Doughty E, Kann MG. Towards precision medicine: advances in computational approaches for the analysis of human variants. J Mol Biol 2013; 425:4047-63. [PMID: 23962656 PMCID: PMC3807015 DOI: 10.1016/j.jmb.2013.08.008] [Citation(s) in RCA: 106] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 08/07/2013] [Accepted: 08/08/2013] [Indexed: 12/26/2022]
Abstract
Variations and similarities in our individual genomes are part of our history, our heritage, and our identity. Some human genomic variants are associated with common traits such as hair and eye color, while others are associated with susceptibility to disease or response to drug treatment. Identifying the human variations producing clinically relevant phenotypic changes is critical for providing accurate and personalized diagnosis, prognosis, and treatment for diseases. Furthermore, a better understanding of the molecular underpinning of disease can lead to development of new drug targets for precision medicine. Several resources have been designed for collecting and storing human genomic variations in highly structured, easily accessible databases. Unfortunately, a vast amount of information about these genetic variants and their functional and phenotypic associations is currently buried in the literature, only accessible by manual curation or sophisticated text text-mining technology to extract the relevant information. In addition, the low cost of sequencing technologies coupled with increasing computational power has enabled the development of numerous computational methodologies to predict the pathogenicity of human variants. This review provides a detailed comparison of current human variant resources, including HGMD, OMIM, ClinVar, and UniProt/Swiss-Prot, followed by an overview of the computational methods and techniques used to leverage the available data to predict novel deleterious variants. We expect these resources and tools to become the foundation for understanding the molecular details of genomic variants leading to disease, which in turn will enable the promise of precision medicine.
Collapse
Affiliation(s)
- Thomas A Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Emily Doughty
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA
| | - Maricel G Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| |
Collapse
|
19
|
The Moroccan Genetic Disease Database (MGDD): a database for DNA variations related to inherited disorders and disease susceptibility. Eur J Hum Genet 2013; 22:322-6. [PMID: 23860041 DOI: 10.1038/ejhg.2013.151] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 05/28/2013] [Accepted: 06/11/2013] [Indexed: 11/09/2022] Open
Abstract
National and ethnic mutation databases provide comprehensive information about genetic variations reported in a population or an ethnic group. In this paper, we present the Moroccan Genetic Disease Database (MGDD), a catalogue of genetic data related to diseases identified in the Moroccan population. We used the PubMed, Web of Science and Google Scholar databases to identify available articles published until April 2013. The Database is designed and implemented on a three-tier model using Mysql relational database and the PHP programming language. To date, the database contains 425 mutations and 208 polymorphisms found in 301 genes and 259 diseases. Most Mendelian diseases in the Moroccan population follow autosomal recessive mode of inheritance (74.17%) and affect endocrine, nutritional and metabolic physiology. The MGDD database provides reference information for researchers, clinicians and health professionals through a user-friendly Web interface. Its content should be useful to improve researches in human molecular genetics, disease diagnoses and design of association studies. MGDD can be publicly accessed at http://mgdd.pasteur.ma.
Collapse
|
20
|
Rallapalli PM, Kemball-Cook G, Tuddenham EG, Gomez K, Perkins SJ. An interactive mutation database for human coagulation factor IX provides novel insights into the phenotypes and genetics of hemophilia B. J Thromb Haemost 2013; 11:1329-40. [PMID: 23617593 DOI: 10.1111/jth.12276] [Citation(s) in RCA: 112] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 04/18/2013] [Indexed: 11/27/2022]
Abstract
BACKGROUND Factor IX (FIX) is important in the coagulation cascade, being activated to FIXa on cleavage. Defects in the human F9 gene frequently lead to hemophilia B. OBJECTIVE To assess 1113 unique F9 mutations corresponding to 3721 patient entries in a new and up-to-date interactive web database alongside the FIXa protein structure. METHODS The mutations database was built using MySQL and structural analyses were based on a homology model for the human FIXa structure based on closely-related crystal structures. RESULTS Mutations have been found in 336 (73%) out of 461 residues in FIX. There were 812 unique point mutations, 182 deletions, 54 polymorphisms, 39 insertions and 26 others that together comprise a total of 1113 unique variants. The 64 unique mild severity mutations in the mature protein with known circulating protein phenotypes include 15 (23%) quantitative type I mutations and 41 (64%) predominantly qualitative type II mutations. Inhibitors were described in 59 reports (1.6%) corresponding to 25 unique mutations. CONCLUSION The interactive database provides insights into mechanisms of hemophilia B. Type II mutations are deduced to disrupt predominantly those structural regions involved with functional interactions. The interactive features of the database will assist in making judgments about patient management.
Collapse
Affiliation(s)
- P M Rallapalli
- Division of Biosciences, Research Department of Structural and Molecular Biology, University College London, London, UK
| | | | | | | | | |
Collapse
|
21
|
Al-Numair NS, Martin ACR. The SAAP pipeline and database: tools to analyze the impact and predict the pathogenicity of mutations. BMC Genomics 2013; 14 Suppl 3:S4. [PMID: 23819919 PMCID: PMC3665582 DOI: 10.1186/1471-2164-14-s3-s4] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Understanding and predicting the effects of mutations on protein structure and phenotype is an increasingly important area. Genes for many genetically linked diseases are now routinely sequenced in the clinic. Previously we focused on understanding the structural effects of mutations, creating the SAAPdb resource. Results We have updated SAAPdb to include 41% more SNPs and 36% more PDs. Introducing a hydrophobic residue on the surface, or a hydrophilic residue in the core, no longer shows significant differences between SNPs and PDs. We have improved some of the analyses significantly enhancing the analysis of clashes and of mutations to-proline and from-glycine. A new web interface has been developed allowing users to analyze their own mutations. Finally we have developed a machine learning method which gives a cross-validated accuracy of 0.846, considerably out-performing well known methods including SIFT and PolyPhen2 which give accuracies between 0.690 and 0.785. Conclusions We have updated SAAPdb and improved its analyses, but with the increasing rate with which mutation data are generated, we have created a new analysis pipeline and web interface. Results of machine learning using the structural analysis results to predict pathogenicity considerably outperform other methods.
Collapse
Affiliation(s)
- Nouf S Al-Numair
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | | |
Collapse
|
22
|
Verspoor K, Jimeno Yepes A, Cavedon L, McIntosh T, Herten-Crabb A, Thomas Z, Plazzer JP. Annotating the biomedical literature for the human variome. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat019. [PMID: 23584833 PMCID: PMC3676157 DOI: 10.1093/database/bat019] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This article introduces the Variome Annotation Schema, a schema that
aims to capture the core concepts and relations relevant to cataloguing and interpreting
human genetic variation and its relationship to disease, as described in the published
literature. The schema was inspired by the needs of the database curators of the
International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is
intended to have application to genetic variation information in a range of diseases. The
schema has been applied to a small corpus of full text journal publications on the subject
of inherited colorectal cancer. We show that the inter-annotator agreement on annotation
of this corpus ranges from 0.78 to 0.95 F-score across different entity
types when exact matching is measured, and improves to a minimum F-score
of 0.87 when boundary matching is relaxed. Relations show more variability in agreement,
but several are reliable, with the highest, cohort-has-size, reaching
0.90 F-score. We also explore the relevance of the schema to the InSiGHT
database curation process. The schema and the corpus represent an important new resource
for the development of text mining solutions that address relationships among patient
cohorts, disease and genetic variation, and therefore, we also discuss the role text
mining might play in the curation of information related to the human variome. The corpus
is available at http://opennicta.com/home/health/variome.
Collapse
Affiliation(s)
- Karin Verspoor
- National ICT Australia (NICTA), Victoria Research Laboratory, Level 2, Building 193, The University of Melbourne, Parkville VIC 3010, Australia.
| | | | | | | | | | | | | |
Collapse
|
23
|
Georgitsi M, Patrinos GP. Genetic databases in pharmacogenomics: the Frequency of Inherited Disorders Database (FINDbase). Methods Mol Biol 2013; 1015:321-336. [PMID: 23824866 DOI: 10.1007/978-1-62703-435-7_21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Pharmacogenomics studies how the variations of the individuals' genetic makeup are correlated with a person's response to certain drugs in relation to the therapeutic efficiency, clinical outcome, or even survival, and how they affect drug metabolism, transport, or clearance. Yet, since the incidence of these polymorphisms, being either single-point variations or small insertions/deletions, varies among different populations, a systematic collection and documentation of these variations is warranted, in order to facilitate implementation of pharmacogenomics in different populations. Here we review the existing electronic databases related to pharmacogenomics and pay particular attention in the description of the pharmacogenomics module Frequency of Inherited Disorders database (FINDbase), which documents curated allelic frequency data pertaining to 144 pharmacogenomics markers across 14 genes, representing approximately 87,000 individuals from 150 populations and ethnic groups worldwide. Long-term sustainability of these resources aims to contribute to the design, development, and implementation of pharmacogenomics testing towards the application of personalized approaches in medical treatment.
Collapse
Affiliation(s)
- Marianthi Georgitsi
- Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece
| | | |
Collapse
|
24
|
Abel O, Powell JF, Andersen PM, Al-Chalabi A. ALSoD: A user-friendly online bioinformatics tool for amyotrophic lateral sclerosis genetics. Hum Mutat 2012; 33:1345-51. [PMID: 22753137 DOI: 10.1002/humu.22157] [Citation(s) in RCA: 216] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Accepted: 06/19/2012] [Indexed: 12/11/2022]
Abstract
Amyotrophic lateral sclerosis (ALS) is the commonest adult onset motor neuron disease, with a peak age of onset in the seventh decade. With advances in genetic technology, there is an enormous increase in the volume of genetic data produced, and a corresponding need for storage, analysis, and interpretation, particularly as our understanding of the relationships between genotype and phenotype mature. Here, we present a system to enable this in the form of the ALS Online Database (ALSoD at http://alsod.iop.kcl.ac.uk), a freely available database that has been transformed from a single gene storage facility recording mutations in the SOD1 gene to a multigene ALS bioinformatics repository and analytical instrument combining genotype, phenotype, and geographical information with associated analysis tools. These include a comparison tool to evaluate genes side by side or jointly with user configurable features, a pathogenicity prediction tool using a combination of computational approaches to distinguish variants with nonfunctional characteristics from disease-associated mutations with more dangerous consequences, and a credibility tool to enable ALS researchers to objectively assess the evidence for gene causation in ALS. Furthermore, integration of external tools, systems for feedback, annotation by users, and two-way links to collaborators hosting complementary databases further enhance the functionality of ALSoD.
Collapse
Affiliation(s)
- Olubunmi Abel
- Department of Clinical Neuroscience, King's College London, Institute of Psychiatry, London, UK
| | | | | | | |
Collapse
|
25
|
Humbertclaude V, Hamroun D, Bezzou K, Bérard C, Boespflug-Tanguy O, Bommelaer C, Campana-Salort E, Cances C, Chabrol B, Commare MC, Cuisset JM, de Lattre C, Desnuelle C, Echenne B, Halbert C, Jonquet O, Labarre-Vila A, N'Guyen-Morel MA, Pages M, Pepin JL, Petitjean T, Pouget J, Ollagnon-Roman E, Richelme C, Rivier F, Sacconi S, Tiffreau V, Vuillerot C, Picot MC, Claustres M, Béroud C, Tuffery-Giraud S. Motor and respiratory heterogeneity in Duchenne patients: implication for clinical trials. Eur J Paediatr Neurol 2012; 16:149-60. [PMID: 21920787 DOI: 10.1016/j.ejpn.2011.07.001] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2011] [Revised: 07/13/2011] [Accepted: 07/17/2011] [Indexed: 01/06/2023]
Abstract
AIMS Our objective was to clarify the clinical heterogeneity in Duchenne muscular dystrophy (DMD). METHODS The French dystrophinopathy database provided clinical, histochemical and molecular data of 278 DMD patients (mean longitudinal follow-up: 14.2 years). Diagnosis was based on mutation identification in the DMD gene. Three groups were defined according to the age at ambulation loss: before 8 years (group A); between 8 and 11 years (group B); between 11 and 16 years (group C). RESULTS Motor and respiratory declines were statistically different between the three groups, as opposed to heart involvement. When acquired, running ability was lost at the mean age of 5.41 (group A), 7.11 (group B), 9.19 (group C) years; climbing stairs ability at 6.24 (group A), 7.99 (group B), 10,42 (group C) years, and ambulation at 7.10 (group A), 9.25 (group B), 12.01 (group C) years. Pulmonary growth stopped at 10.26 (group A), 12.45 (group B), 14.58 (group C) years. Then, forced vital capacity decreased at the rate of 8.83 (group A), 7.52 (group B), 6.03 (group C) percent per year. Phenotypic variability did not rely on specific mutational spectrum. CONCLUSION Beside the most common form of DMD (group B), we provide detailed description on two extreme clinical subgroups: a severe one (group A) characterized by early severe motor and respiratory decline and a milder subgroup (group C). Compared to group B or C, four to six times fewer patients from group A are needed to detect the same decrease in disease progression in a clinical trial.
Collapse
|
26
|
Li Z, Liu X, Wen J, Xu Y, Zhao X, Li X, Liu L, Zhang X. DRUMS: a human disease related unique gene mutation search engine. Hum Mutat 2012; 32:E2259-65. [PMID: 21913285 DOI: 10.1002/humu.21556] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
With the completion of the human genome project and the development of new methods for gene variant detection, the integration of mutation data and its phenotypic consequences has become more important than ever. Among all available resources, locus-specific databases (LSDBs) curate one or more specific genes' mutation data along with high-quality phenotypes. Although some genotype-phenotype data from LSDB have been integrated into central databases little effort has been made to integrate all these data by a search engine approach. In this work, we have developed disease related unique gene mutation search engine (DRUMS), a search engine for human disease related unique gene mutation as a convenient tool for biologists or physicians to retrieve gene variant and related phenotype information. Gene variant and phenotype information were stored in a gene-centred relational database. Moreover, the relationships between mutations and diseases were indexed by the uniform resource identifier from LSDB, or another central database. By querying DRUMS, users can access the most popular mutation databases under one interface. DRUMS could be treated as a domain specific search engine. By using web crawling, indexing, and searching technologies, it provides a competitively efficient interface for searching and retrieving mutation data and their relationships to diseases. The present system is freely accessible at http://www.scbit.org/glif/new/drums/index.html.
Collapse
Affiliation(s)
- Zuofeng Li
- School of Life Sciences and Technology, Tongji University, Shanghai, China
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Vihinen M, den Dunnen JT, Dalgleish R, Cotton RGH. Guidelines for establishing locus specific databases. Hum Mutat 2011; 33:298-305. [PMID: 22052659 DOI: 10.1002/humu.21646] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Accepted: 10/25/2011] [Indexed: 11/06/2022]
Abstract
Information about genetic variation has been collected for some 20 years into registries, known as locus specific databases (LSDBs), which nowadays often contain information in addition to the actual genetic variation. Several issues have to be taken into account when considering establishing and maintaining LSDBs and these have been discussed previously in a number of articles describing guidelines and recommendations. This information is widely scattered and, for a newcomer, it would be difficult to obtain the latest information and guidance. Here, a sequence of steps essential for establishing an LSDB is discussed together with guidelines for each step. Curators need to collect information from various sources, code it in systematic way, and distribute to the research and clinical communities. In doing this, ethical issues have to be taken into account. To facilitate integration of information to, for example, analyze genotype-phenotype correlations, systematic data representation using established nomenclatures, data models, and ontologies is essential. LSDB curation and maintenance comprises a number of tasks that can be managed by following logical steps. These resources are becoming ever more important and new curators are essential to ensure that we will have expertly curated databases for all disease-related genes in the near future.
Collapse
Affiliation(s)
- Mauno Vihinen
- Institute of Biomedical Technology, University of Tampere, Finland.
| | | | | | | |
Collapse
|
28
|
Patnaik SK, Helmberg W, Blumenfeld OO. BGMUT: NCBI dbRBC database of allelic variations of genes encoding antigens of blood group systems. Nucleic Acids Res 2011; 40:D1023-9. [PMID: 22084196 PMCID: PMC3245102 DOI: 10.1093/nar/gkr958] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Analogous to human leukocyte antigens, blood group antigens are surface markers on the erythrocyte cell membrane whose structures differ among individuals and which can be serologically identified. The Blood Group Antigen Gene Mutation Database (BGMUT) is an online repository of allelic variations in genes that determine the antigens of various human blood group systems. The database is manually curated with allelic information collated from scientific literature and from direct submissions from research laboratories. Currently, the database documents sequence variations of a total of 1251 alleles of all 40 gene loci that together are known to affect antigens of 30 human blood group systems. When available, information on the geographic or ethnic prevalence of an allele is also provided. The BGMUT website also has general information on the human blood group systems and the genes responsible for them. BGMUT is a part of the dbRBC resource of the National Center for Biotechnology Information, USA, and is available online at http://www.ncbi.nlm.nih.gov/projects/gv/rbc/xslcgi.fcgi?cmd=bgmut. The database should be of use to members of the transfusion medicine community, those interested in studies of genetic variation and related topics such as human migrations, and students as well as members of the general public.
Collapse
Affiliation(s)
- Santosh Kumar Patnaik
- Department of Thoracic Surgery, Roswell Park Cancer Institute, Buffalo, NY 14203, USA
| | | | | |
Collapse
|
29
|
Celli J, Dalgleish R, Vihinen M, Taschner PEM, den Dunnen JT. Curating gene variant databases (LSDBs): toward a universal standard. Hum Mutat 2011; 33:291-7. [PMID: 21990126 DOI: 10.1002/humu.21626] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Accepted: 09/21/2011] [Indexed: 01/27/2023]
Abstract
Gene variant databases or Locus-Specific DataBases (LSDBs) are used to collect and display information on sequence variants on a gene-by-gene basis. Their most frequent use is in relation to DNA-based diagnostics, giving clinicians and scientists easy access to an up-to-date overview of all gene variants identified worldwide and whether they influence the function of the gene ("pathogenic or not"). While literature on gene variant databases is extensive, little has been published on the process of database curation itself. Based on our extensive experience as LSDB curators and our contributions to database curation courses, we discuss the subject of database curation. We describe the tasks involved, the steps to take, and the issues that might occur. Our overview is a first step toward establishing overall guidelines for database curation and ultimately covers one aspect of establishing quality-assured gene variant databases.
Collapse
Affiliation(s)
- Jacopo Celli
- Human and Clinical Genetics, Leiden University Medical Center, Leiden, Netherlands
| | | | | | | | | |
Collapse
|
30
|
Zatkova A, Sedlackova T, Radvansky J, Polakova H, Nemethova M, Aquaron R, Dursun I, Usher JL, Kadasi L. Identification of 11 Novel Homogentisate 1,2 Dioxygenase Variants in Alkaptonuria Patients and Establishment of a Novel LOVD-Based HGD Mutation Database. JIMD Rep 2011; 4:55-65. [PMID: 23430897 DOI: 10.1007/8904_2011_68] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Revised: 06/01/2011] [Accepted: 06/07/2011] [Indexed: 12/05/2022] Open
Abstract
Enzymatic loss in alkaptonuria (AKU), an autosomal recessive disorder, is caused by mutations in the homogentisate 1,2 dioxygenase (HGD) gene, which decrease or completely inactivate the function of the HGD protein to metabolize homogentisic acid (HGA). AKU shows a very low prevalence (1:100,000-250,000) in most ethnic groups, but there are countries with much higher incidence, such as Slovakia and the Dominican Republic. In this work, we report 11 novel HGD mutations identified during analysis of 36 AKU patients and 41 family members from 27 families originating from 9 different countries, mainly from Slovakia and France. In Slovak patients, we identified two additional mutations, thus a total number of HGD mutations identified in this small country is 12. In order to record AKU-causing mutations and variants of the HGD gene, we have created a HGD mutation database that is open for future submissions and is available online ( http://hgddatabase.cvtisr.sk/ ). It is founded on the Leiden Open (source) Variation Database (LOVD) system and includes data from the original AKU database ( http://www.alkaptonuria.cib.csic.es ) and also all so far reported variants and AKU patients. Where available, HGD-haplotypes associated with the mutations are also presented. Currently, this database contains 148 unique variants, of which 115 are reported pathogenic mutations. It provides a valuable tool for information exchange in AKU research and care fields and certainly presents a useful data source for genotype-phenotype correlations and also for future clinical trials.
Collapse
Affiliation(s)
- Andrea Zatkova
- Laboratory of Genetics, Institute of Molecular Physiology and Genetics, Slovak Academy of Sciences, Vlarska 5, 833 34, Bratislava, Slovakia,
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Lill CM, Abel O, Bertram L, Al-Chalabi A. Keeping up with genetic discoveries in amyotrophic lateral sclerosis: the ALSoD and ALSGene databases. ACTA ACUST UNITED AC 2011; 12:238-49. [PMID: 21702733 DOI: 10.3109/17482968.2011.584629] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Amyotrophic lateral sclerosis (ALS) is a genetically heterogeneous disorder that shows a characteristic dichotomy of familial forms typically displaying Mendelian inheritance patterns, and sporadic ALS showing no or less obvious familial aggregation. While the former is caused by rare, highly penetrant, and pathogenic mutations, risk for sporadic ALS is probably the result of the combined effects of common polymorphisms with minor to moderate effect sizes. Owing to recent advances in high-throughput genotyping and sequencing technologies, genetic research in both fields is evolving at a rapidly increasing pace making it more and more difficult to follow and evaluate the most significant progress in the field. To alleviate this problem, our groups have created dedicated and freely available online databases, ALSoD ( http://alsod.iop.kcl.ac.uk/ ) and ALSGene ( http://www.alsgene.org ), which provide systematic and in-depth qualitative and quantitative overviews of genetic research in both familial and sporadic ALS. This review briefly introduces the background and main features of both databases and provides an overview of the currently most compelling genetic findings in ALS derived from analyses using these resources.
Collapse
Affiliation(s)
- Christina M Lill
- Neuropsychiatric Genetics Group, Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | | | | | | |
Collapse
|
32
|
Abstract
TP53 mutations are the most frequent genetic alterations found in human cancer. For more than 20 years, TP53 mutation databases have collected over 30,000 somatic mutations from various types of cancer. Analyses of these mutations have led to many types of studies and have improved our knowledge about the TP53 protein and its function. The recent advances in sequencing methodologies and the various cancer genome sequencing projects will lead to a profound shift in database curation and data management. In this paper, we will review the current status of the TP53 mutation database, its application to various fields of research, and how data quality and curation can be improved. We will also discuss how the genetic data will be stored and handled in the future and the consequences for database management.
Collapse
|
33
|
Cotton RGH. Rare disease registries and mutation/variation databases. Hum Mutat 2011; 32:1073-4. [DOI: 10.1002/humu.21596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
34
|
Webb EA, Smith TD, Cotton RGH. Difficulties in finding DNA mutations and associated phenotypic data in web resources using simple, uncomplicated search terms, and a suggested solution. Hum Genomics 2011; 5:141-55. [PMID: 21504866 PMCID: PMC3500169 DOI: 10.1186/1479-7364-5-3-141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
DNA mutation data currently reside in many online databases, which differ markedly in the terminology used to describe or define the mutation and also in completeness of content, potentially making it difficult both to locate a mutation of interest and to find sought-after data (eg phenotypic effect). To highlight the current deficiencies in the accessibility of web-based genetic variation information, we examined the ease with which various resources could be interrogated for five model mutations, using a set of simple search terms relating to the change in amino acid or nucleotide. Fifteen databases were investigated for the time and/or number of mouse clicks; clicks required to find the mutations; availability of phenotype data; the procedure for finding information; and site layout. Google and PubMed were also examined. The three locus-specific databases (LSDBs) generally yielded positive outcomes, but the 12 genome-wide databases gave poorer results, with most proving not to be search-able and only three yielding successful outcomes. Google and PubMed searches found some mutations and provided patchy information on phenotype. The results show that many web-based resources are not currently configured for fast and easy access to comprehensive mutation data, with only the isolated LSDBs providing optimal outcomes. Centralising this information within a common repository, coupled with a simple, all-inclusive interrogation process, would improve searching for all gene variation data.
Collapse
Affiliation(s)
- Elizabeth A Webb
- Genomic Disorders Research Centre, Melbourne, Vic 3053, Australia.
| | | | | |
Collapse
|
35
|
van Baal S, Zlotogora J, Lagoumintzis G, Gkantouna V, Tzimas I, Poulas K, Tsakalidis A, Romeo G, Patrinos GP. ETHNOS : A versatile electronic tool for the development and curation of national genetic databases. Hum Genomics 2011; 4:361-8. [PMID: 20650823 PMCID: PMC3500166 DOI: 10.1186/1479-7364-4-5-361] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open
Abstract
National and ethnic mutation databases (NEMDBs) are emerging online repositories, recording extensive information about the described genetic heterogeneity of an ethnic group or population. These resources facilitate the provision of genetic services and provide a comprehensive list of genomic variations among different populations. As such, they enhance awareness of the various genetic disorders. Here, we describe the features of the ETHNOS software, a simple but versatile tool based on a flat-file database that is specifically designed for the development and curation of NEMDBs. ETHNOS is a freely available software which runs more than half of the NEMDBs currently available. Given the emerging need for NEMDB in genetic testing services and the fact that ETHNOS is the only off-the-shelf software available for NEMDB development and curation, its adoption in subsequent NEMDB development would contribute towards data content uniformity, unlike the diverse contents and quality of the available gene (locus)-specific databases. Finally, we allude to the potential applications of NEMDBs, not only as worldwide central allele frequency repositories, but also, and most importantly, as data warehouses of individual-level genomic data, hence allowing for a comprehensive ethnicity-specific documentation of genomic variation.
Collapse
Affiliation(s)
- Sjozef van Baal
- Erasmus MC, MGC-Department of Cell Biology and Genetics, Rotterdam, the Netherlands
| | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Fokkema IFAC, Taschner PEM, Schaafsma GCP, Celli J, Laros JFJ, den Dunnen JT. LOVD v.2.0: the next generation in gene variant databases. Hum Mutat 2011; 32:557-63. [PMID: 21520333 DOI: 10.1002/humu.21438] [Citation(s) in RCA: 733] [Impact Index Per Article: 56.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2010] [Accepted: 12/14/2010] [Indexed: 01/14/2023]
Abstract
Locus-Specific DataBases (LSDBs) store information on gene sequence variation associated with human phenotypes and are frequently used as a reference by researchers and clinicians. We developed the Leiden Open-source Variation Database (LOVD) as a platform-independent Web-based LSDB-in-a-Box package. LOVD was designed to be easy to set up and maintain and follows the Human Genome Variation Society (HGVS) recommendations. Here we describe LOVD v.2.0, which adds enhanced flexibility and functionality and has the capacity to store sequence variants in multiple genes per patient. To reduce redundancy, patient and sequence variant data are stored in separate tables. Tables are linked to generate connections between sequence variant data for each gene and every patient. The dynamic structure allows database managers to add custom columns. The database structure supports fast queries and allows storage of sequence variants from high-throughput sequence analysis, as demonstrated by the X-chromosomal Mental Retardation LOVD installation. LOVD contains measures to ensure database security from unauthorized access. Currently, the LOVD Website (http://www.LOVD.nl/) lists 71 public LOVD installations hosting 3,294 gene variant databases with 199,000 variants in 84,000 patients. To promote LSDB standardization and thereby database interoperability, we offer free server space and help to establish an LSDB on our Leiden server.
Collapse
Affiliation(s)
- Ivo F A C Fokkema
- Center of Human and Clinical Genetics, Department of Human Genetics, Leiden University Medical Center, Leiden, Nederland
| | | | | | | | | | | |
Collapse
|
37
|
Mitropoulou C, Webb AJ, Mitropoulos K, Brookes AJ, Patrinos GP. Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use. Hum Mutat 2011; 31:1109-16. [PMID: 20672379 DOI: 10.1002/humu.21332] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Genetic variation databases have become indispensable in many areas of health care. In addition, more and more experts are depositing published and unpublished disease-causing variants of particular genes into locus-specific databases (LSDBs). Some of these databases contain such extensive information that they have become known as knowledge bases. Here, we analyzed 1,188 LSDBs and their content for the presence or absence of 44 content criteria related to database features (general presentation, locus-specific information, database structure) and data content (data collection, summary table of variants, database querying). Our analyses revealed that several elements have helped to advance the field and reduce data heterogeneity, such as the development of specialized database management systems and the creation of data querying tools. We also identified a number of deficiencies, namely, the lack of detailed disease and phenotypic descriptions for each genetic variant and links to relevant patient organizations, which, if addressed, would allow LSDBs to better serve the clinical genetics community. We propose a structure, based on LSDBs and closely related repositories (namely, clinical genetics databases), which would contribute to a federated genetic variation browser and also allow the maintenance of variation data.
Collapse
Affiliation(s)
- Christina Mitropoulou
- Erasmus MC, Faculty of Medicine and Health Sciences, MGC-Department of Cell Biology and Genetics, Rotterdam, The Netherlands
| | | | | | | | | |
Collapse
|
38
|
Doughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann MG. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. ACTA ACUST UNITED AC 2010; 27:408-15. [PMID: 21138947 DOI: 10.1093/bioinformatics/btq667] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to prostate (PCa) and breast cancer (BCa) mutations. RESULTS We developed the extractor of mutations (EMU) tool to identify mutations and their associated genes. We benchmarked EMU against MutationFinder--a tool to extract point mutations from text. Our results show that both methods achieve comparable performance on two manually curated datasets. We also benchmarked EMU's performance for extracting the complete mutational information and phenotype. Remarkably, we show that one of the steps in our approach, a filter based on sequence analysis, increases the precision for that task from 0.34 to 0.59 (PCa) and from 0.39 to 0.61 (BCa). We also show that this high-throughput approach can be extended to other diseases. DISCUSSION Our method improves the current status of disease-mutation databases by significantly increasing the number of annotated mutations. We found 51 and 128 mutations manually verified to be related to PCa and Bca, respectively, that are not currently annotated for these cancer types in the OMIM or Swiss-Prot databases. EMU's retrieval performance represents a 2-fold improvement in the number of annotated mutations for PCa and BCa. We further show that our method can benefit from full-text analysis once there is an increase in Open Access availability of full-text articles. AVAILABILITY Freely available at: http://bioinf.umbc.edu/EMU/ftp.
Collapse
Affiliation(s)
- Emily Doughty
- University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | | | | | | | | | | | | |
Collapse
|
39
|
Corrales I, Ramírez L, Ayats J, Altisent C, Parra R, Vidal F. Integration of molecular and clinical data of 40 unrelated von Willebrand Disease families in a Spanish locus-specific mutation database: first release including 58 mutations. Haematologica 2010; 95:1982-4. [PMID: 20801902 DOI: 10.3324/haematol.2010.028977] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
40
|
Küntzer J, Eggle D, Klostermann S, Burtscher H. Human variation databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2010; 2010:baq015. [PMID: 20639550 PMCID: PMC2911800 DOI: 10.1093/database/baq015] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
More than 100 000 human genetic variations have been described in various genes that are associated with a wide variety of diseases. Such data provides invaluable information for both clinical medicine and basic science. A number of locus-specific databases have been developed to exploit this huge amount of data. However, the scope, format and content of these databases differ strongly and as no standard for variation databases has yet been adopted, the way data is presented varies enormously. This review aims to give an overview of current resources for human variation data in public and commercial resources.
Collapse
Affiliation(s)
- Jan Küntzer
- Pharma Research and Early Development, pRED Informatics, Roche Diagnostics GmbH, Penzberg, Germany.
| | | | | | | |
Collapse
|
41
|
Bareil C, Thèze C, Béroud C, Hamroun D, Guittard C, René C, Paulet D, Georges MD, Claustres M. UMD-CFTR: A database dedicated to CF and CFTR-related disorders. Hum Mutat 2010; 31:1011-9. [DOI: 10.1002/humu.21316] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
42
|
Affiliation(s)
- Richard G H Cotton
- Genomic Disorders Research Centre, Florey Neuroscience Institutes, and Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC
| | - Finlay A Macrae
- Colorectal Medicine and Genetics, Royal Melbourne Hospital, and University of Melbourne Department of Medicine, Melbourne, VIC
| |
Collapse
|
43
|
Abstract
A standardized, controlled vocabulary allows phenotypic information to be described in an unambiguous fashion in medical publications and databases. The Human Phenotype Ontology (HPO) is being developed in an effort to provide such a vocabulary. The use of an ontology to capture phenotypic information allows the use of computational algorithms that exploit semantic similarity between related phenotypic abnormalities to define phenotypic similarity metrics, which can be used to perform database searches for clinical diagnostics or as a basis for incorporating the human phenome into large-scale computational analysis of gene expression patterns and other cellular phenomena associated with human disease. The HPO is freely available at http://www.human-phenotype-ontology.org.
Collapse
Affiliation(s)
- P N Robinson
- Institute for Medical Genetics, Augustenburger Platz 1, 13353 Berlin, Germany.
| | | |
Collapse
|
44
|
[Genetic mutation databases: stakes and perspectives for orphan genetic diseases]. PATHOLOGIE-BIOLOGIE 2009; 58:387-95. [PMID: 19954899 DOI: 10.1016/j.patbio.2009.09.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 09/14/2009] [Indexed: 12/30/2022]
Abstract
New technologies, which constantly become available for mutation detection and gene analysis, have contributed to an exponential rate of discovery of disease genes and variation in the human genome. The task of collecting and documenting this enormous amount of data in genetic databases represents a major challenge for the future of biological and medical science. The Locus Specific Databases (LSDBs) are so far the most efficient mutation databases. This review presents the main types of databases available for the analysis of mutations responsible for genetic disorders, as well as open perspectives for new therapeutic research or challenges for future medicine. Accurate and exhaustive collection of variations in human genomes will be crucial for research and personalized delivery of healthcare.
Collapse
|
45
|
Wei MH, Blake PW, Shevchenko J, Toro JR. The folliculin mutation database: an online database of mutations associated with Birt-Hogg-Dubé syndrome. Hum Mutat 2009; 30:E880-90. [PMID: 19562744 DOI: 10.1002/humu.21075] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The folliculin gene (FLCN), also known as BHD, is the only known susceptibility gene for Birt-Hogg-Dubé syndrome. BHDS is the autosomal dominant predisposition to the development of follicular hamartomas, lung cysts, spontaneous pneumothorax, and/or kidney neoplasms. To date, 53 unique germline mutations have been reported. FLCN mutation detection rate is 88%. FLCN encodes a predicted 579-amino acid protein, designated folliculin that is highly conserved between humans and homologs in mice, Drosophila, and C. elegans. We developed the first online database detailing all FLCN variants identified in our laboratory and reported in the literature. The FLCN database applies, and assists researchers in applying HGVS nomenclature guidelines. To date, the FCLN database includes 84 variants: 53 unique germline mutations and 31 SNPs. The majority of FLCN germline mutations are predicted to produce a truncated folliculin, resulting in loss of function. The FLCN mutations consist of: 45% (24/53) deletions, 32% (17/53) substitutions (10 putative-splice site, 5 nonsense, and 2 missense), 15% (8/53) duplications, 6% (3/53) insertion/deletions and 2% (1/53) insertions. The database strives to systematically unify current knowledge of FLCN variants and will be useful to geneticists and genetic counselors while also providing a rapid and systematic resource for investigators.
Collapse
Affiliation(s)
- Ming-Hui Wei
- Genetic Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD 20892-4562, USA
| | | | | | | |
Collapse
|
46
|
Izarzugaza JMG, Baresic A, McMillan LEM, Yeats C, Clegg AB, Orengo CA, Martin ACR, Valencia A. An integrated approach to the interpretation of single amino acid polymorphisms within the framework of CATH and Gene3D. BMC Bioinformatics 2009; 10 Suppl 8:S5. [PMID: 19758469 PMCID: PMC2745587 DOI: 10.1186/1471-2105-10-s8-s5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The phenotypic effects of sequence variations in protein-coding regions come about primarily via their effects on the resulting structures, for example by disrupting active sites or affecting structural stability. In order better to understand the mechanisms behind known mutant phenotypes, and predict the effects of novel variations, biologists need tools to gauge the impacts of DNA mutations in terms of their structural manifestation. Although many mutations occur within domains whose structure has been solved, many more occur within genes whose protein products have not been structurally characterized. RESULTS Here we present 3DSim (3D Structural Implication of Mutations), a database and web application facilitating the localization and visualization of single amino acid polymorphisms (SAAPs) mapped to protein structures even where the structure of the protein of interest is unknown. The server displays information on 6514 point mutations, 4865 of them known to be associated with disease. These polymorphisms are drawn from SAAPdb, which aggregates data from various sources including dbSNP and several pathogenic mutation databases. While the SAAPdb interface displays mutations on known structures, 3DSim projects mutations onto known sequence domains in Gene3D. This resource contains sequences annotated with domains predicted to belong to structural families in the CATH database. Mappings between domain sequences in Gene3D and known structures in CATH are obtained using a MUSCLE alignment. 1210 three-dimensional structures corresponding to CATH structural domains are currently included in 3DSim; these domains are distributed across 396 CATH superfamilies, and provide a comprehensive overview of the distribution of mutations in structural space. CONCLUSION The server is publicly available at http://3DSim.bioinfo.cnio.es/. In addition, the database containing the mapping between SAAPdb, Gene3D and CATH is available on request and most of the functionality is available through programmatic web service access.
Collapse
Affiliation(s)
- Jose M G Izarzugaza
- Institute of Structural and Molecular Biology, University College London, UK.
| | | | | | | | | | | | | | | |
Collapse
|
47
|
Zaimidou S, van Baal S, Smith TD, Mitropoulos K, Ljujic M, Radojkovic D, Cotton RG, Patrinos GP. A1ATVar: a relational database of human SERPINA1 gene variants leading to alpha1-antitrypsin deficiency and application of the VariVis software. Hum Mutat 2009; 30:308-13. [PMID: 19021233 DOI: 10.1002/humu.20857] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We have developed a relational database of human SERPINA1 gene mutations, leading to alpha(1)-antitrypsin (AAT) deficiency, called A(1)ATVar, which can be accessed over the World Wide Web at www.goldenhelix.org/A1ATVar. Extensive information has been extracted from the literature and converted into a searchable database, including genotype information, clinical phenotype, allelic frequencies for the commonest AAT variant alleles, methods of detection, and references. Mutation summaries are automatically displayed and user-generated queries can be formulated based on fields in the database. A separate module, linked to the FINDbase database for frequencies of inherited disorders allows the user to access allele frequency information for the three most frequent AAT alleles, namely PiM, PiS, and PiZ. The available experimental protocols to detect AAT variant alleles at the protein and DNA levels have been archived in a searchable format. A visualization tool, called VariVis, has been implemented to combine A(1)ATVar variant information with SERPINA1 sequence and annotation data. A direct data submission tool allows registered users to submit data on novel AAT variant alleles as well as experimental protocols to explore SERPINA1 genetic heterogeneity, via a password-protected interface. Database access is free of charge and there are no registration requirements for querying the data. The A(1)ATVar database is the only integrated database on the Internet offering summarized information on AAT allelic variants and could be useful not only for clinical diagnosis and research on AAT deficiency and the SERPINA1 gene, but could also serve as an example for an all-in-one solution for locus-specific database (LSDB) development and curation.
Collapse
Affiliation(s)
- Sophia Zaimidou
- Medical Genetics Centre-Department of Cell Biology and Genetics, Faculty of Medicine and Health Sciences, Erasmus University Medical Center, Rotterdam, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
48
|
Hurst JM, McMillan LE, Porter CT, Allen J, Fakorede A, Martin AC. The SAAPdb web resource: A large-scale structural analysis of mutant proteins. Hum Mutat 2009; 30:616-24. [DOI: 10.1002/humu.20898] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
49
|
Reeves GA, Talavera D, Thornton JM. Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 2009; 6:129-47. [PMID: 19019817 DOI: 10.1098/rsif.2008.0341] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent years have seen a huge increase in the generation of genomic and proteomic data. This has been due to improvements in current biological methodologies, the development of new experimental techniques and the use of computers as support tools. All these raw data are useless if they cannot be properly analysed, annotated, stored and displayed. Consequently, a vast number of resources have been created to present the data to the wider community. Annotation tools and databases provide the means to disseminate these data and to comprehend their biological importance. This review examines the various aspects of annotation: type, methodology and availability. Moreover, it puts a special interest on novel annotation fields, such as that of phenotypes, and highlights the recent efforts focused on the integrating annotations.
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
50
|
Greenblatt MS, Brody LC, Foulkes WD, Genuardi M, Hofstra RMW, Olivier M, Plon SE, Sijmons RH, Sinilnikova O, Spurdle AB. Locus-specific databases and recommendations to strengthen their contribution to the classification of variants in cancer susceptibility genes. Hum Mutat 2008; 29:1273-81. [PMID: 18951438 PMCID: PMC3446852 DOI: 10.1002/humu.20889] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Locus-specific databases (LSDBs) are curated collections of sequence variants in genes associated with disease. LSDBs of cancer-related genes often serve as a critical resource to researchers, diagnostic laboratories, clinicians, and others in the cancer genetics community. LSDBs are poised to play an important role in disseminating clinical classification of variants. The IARC Working Group on Unclassified Genetic Variants has proposed a new system of five classes of variants in cancer susceptibility genes. However, standards are lacking for reporting and analyzing the multiple data types that assist in classifying variants. By adhering to standards of transparency and consistency in the curation and annotation of data, LSDBs can be critical for organizing our understanding of how genetic variation relates to disease. In this article we discuss how LSDBs can accomplish these goals, using existing databases for BRCA1, BRCA2, MSH2, MLH1, TP53, and CDKN2A to illustrate the progress and remaining challenges in this field. We recommend that: 1) LSDBs should only report a conclusion related to pathogenicity if a consensus has been reached by an expert panel. 2) The system used to classify variants should be standardized. The Working Group encourages use of the five class system described in this issue by Plon and colleagues. 3) Evidence that supports a conclusion should be reported in the database, including sources and criteria used for assignment. 4) Variants should only be classified as pathogenic if more than one type of evidence has been considered. 5) All instances of all variants should be recorded.
Collapse
Affiliation(s)
- Marc S Greenblatt
- Vermont Cancer Center and Department of Medicine, University of Vermont, Burlington, Vermont 05405, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|