1
|
Bermejo-Das-Neves C, Nguyen HN, Poch O, Thompson JD. A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i). BMC Bioinformatics 2014; 15:111. [PMID: 24742296 PMCID: PMC4021375 DOI: 10.1186/1471-2105-15-111] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/09/2014] [Indexed: 11/10/2022] Open
Abstract
Background Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. Results In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. Conclusions We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease.
Collapse
Affiliation(s)
| | | | | | - Julie D Thompson
- ICube Laboratory and Strasbourg Federation of Translational Medicine (FMTS), University of Strasbourg and CNRS, Strasbourg, France.
| |
Collapse
|
2
|
Nguyen H, Luu TD, Poch O, Thompson JD. Knowledge discovery in variant databases using inductive logic programming. Bioinform Biol Insights 2013; 7:119-31. [PMID: 23589683 PMCID: PMC3615990 DOI: 10.4137/bbi.s11184] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.
Collapse
Affiliation(s)
- Hoan Nguyen
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire Illkirch, France
| | | | | | | |
Collapse
|
3
|
Luu TD, Rusu A, Walter V, Linard B, Poidevin L, Ripp R, Moulinier L, Muller J, Raffelsberger W, Wicker N, Lecompte O, Thompson JD, Poch O, Nguyen H. KD4v: Comprehensible Knowledge Discovery System for Missense Variant. Nucleic Acids Res 2012; 40:W71-5. [PMID: 22641855 PMCID: PMC3394327 DOI: 10.1093/nar/gks474] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
A major challenge in the post-genomic era is a better understanding of how human genetic alterations involved in disease affect the gene products. The KD4v (Comprehensible Knowledge Discovery System for Missense Variant) server allows to characterize and predict the phenotypic effects (deleterious/neutral) of missense variants. The server provides a set of rules learned by Induction Logic Programming (ILP) on a set of missense variants described by conservation, physico-chemical, functional and 3D structure predicates. These rules are interpretable by non-expert humans and are used to accurately predict the deleterious/neutral status of an unknown mutation. The web server is available at http://decrypthon.igbmc.fr/kd4v.
Collapse
Affiliation(s)
- Tien-Dao Luu
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67404 Illkirch, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Luu TD, Rusu AM, Walter V, Ripp R, Moulinier L, Muller J, Toursel T, Thompson JD, Poch O, Nguyen H. MSV3d: database of human MisSense Variants mapped to 3D protein structure. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas018. [PMID: 22491796 PMCID: PMC3317913 DOI: 10.1093/database/bas018] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The elucidation of the complex relationships linking genotypic and phenotypic variations to protein structure is a major challenge in the post-genomic era. We present MSV3d (Database of human MisSense Variants mapped to 3D protein structure), a new database that contains detailed annotation of missense variants of all human proteins (20 199 proteins). The multi-level characterization includes details of the physico-chemical changes induced by amino acid modification, as well as information related to the conservation of the mutated residue and its position relative to functional features in the available or predicted 3D model. Major releases of the database are automatically generated and updated regularly in line with the dbSNP (database of Single Nucleotide Polymorphism) and SwissVar releases, by exploiting the extensive Décrypthon computational grid resources. The database (http://decrypthon.igbmc.fr/msv3d) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in XML or flat file formats. Database URL:http://decrypthon.igbmc.fr/msv3d
Collapse
Affiliation(s)
- Tien-Dao Luu
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire (UMR7104), 67404 Illkirch
| | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Audo I, Bujakowska K, Orhan E, Poloschek C, Defoort-Dhellemmes S, Drumare I, Kohl S, Luu T, Lecompte O, Zrenner E, Lancelot ME, Antonio A, Germain A, Michiels C, Audier C, Letexier M, Saraiva JP, Leroy B, Munier F, Mohand-Saïd S, Lorenz B, Friedburg C, Preising M, Kellner U, Renner A, Moskova-Doumanova V, Berger W, Wissinger B, Hamel C, Schorderet D, De Baere E, Sharon D, Banin E, Jacobson S, Bonneau D, Zanlonghi X, Le Meur G, Casteels I, Koenekoop R, Long V, Meire F, Prescott K, de Ravel T, Simmons I, Nguyen H, Dollfus H, Poch O, Léveillard T, Nguyen-Ba-Charvet K, Sahel JA, Bhattacharya S, Zeitz C. Whole-exome sequencing identifies mutations in GPR179 leading to autosomal-recessive complete congenital stationary night blindness. Am J Hum Genet 2012; 90:321-30. [PMID: 22325361 DOI: 10.1016/j.ajhg.2011.12.007] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2011] [Revised: 11/18/2011] [Accepted: 12/08/2011] [Indexed: 10/14/2022] Open
Abstract
Congenital stationary night blindness (CSNB) is a heterogeneous retinal disorder characterized by visual impairment under low light conditions. This disorder is due to a signal transmission defect from rod photoreceptors to adjacent bipolar cells in the retina. Two forms can be distinguished clinically, complete CSNB (cCSNB) or incomplete CSNB; the two forms are distinguished on the basis of the affected signaling pathway. Mutations in NYX, GRM6, and TRPM1, expressed in the outer plexiform layer (OPL) lead to disruption of the ON-bipolar cell response and have been seen in patients with cCSNB. Whole-exome sequencing in cCSNB patients lacking mutations in the known genes led to the identification of a homozygous missense mutation (c.1807C>T [p.His603Tyr]) in one consanguineous autosomal-recessive cCSNB family and a homozygous frameshift mutation in GPR179 (c.278delC [p.Pro93Glnfs(∗)57]) in a simplex male cCSNB patient. Additional screening with Sanger sequencing of 40 patients identified three other cCSNB patients harboring additional allelic mutations in GPR179. Although, immunhistological studies revealed Gpr179 in the OPL in wild-type mouse retina, Gpr179 did not colocalize with specific ON-bipolar markers. Interestingly, Gpr179 was highly concentrated in horizontal cells and Müller cell endfeet. The involvement of these cells in cCSNB and the specific function of GPR179 remain to be elucidated.
Collapse
|