5
|
Folkman L, Yang Y, Li Z, Stantic B, Sattar A, Mort M, Cooper DN, Liu Y, Zhou Y. DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. ACTA ACUST UNITED AC 2015; 31:1599-606. [PMID: 25573915 DOI: 10.1093/bioinformatics/btu862] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 12/23/2014] [Indexed: 12/15/2022]
Abstract
MOTIVATION Frameshifting (FS) indels and nonsense (NS) variants disrupt the protein-coding sequence downstream of the mutation site by changing the reading frame or introducing a premature termination codon, respectively. Despite such drastic changes to the protein sequence, FS indels and NS variants have been discovered in healthy individuals. How to discriminate disease-causing from neutral FS indels and NS variants is an understudied problem. RESULTS We have built a machine learning method called DDIG-in (FS) based on real human genetic variations from the Human Gene Mutation Database (inherited disease-causing) and the 1000 Genomes Project (GP) (putatively neutral). The method incorporates both sequence and predicted structural features and yields a robust performance by 10-fold cross-validation and independent tests on both FS indels and NS variants. We showed that human-derived NS variants and FS indels derived from animal orthologs can be effectively employed for independent testing of our method trained on human-derived FS indels. DDIG-in (FS) achieves a Matthews correlation coefficient (MCC) of 0.59, a sensitivity of 86%, and a specificity of 72% for FS indels. Application of DDIG-in (FS) to NS variants yields essentially the same performance (MCC of 0.43) as a method that was specifically trained for NS variants. DDIG-in (FS) was shown to make a significant improvement over existing techniques.
Collapse
Affiliation(s)
- Lukas Folkman
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardif
| | - Yuedong Yang
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardif
| | - Zhixiu Li
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardif
| | - Bela Stantic
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardif
| | - Matthew Mort
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA
| | - David N Cooper
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA
| | - Yunlong Liu
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK and Department of Medical and Molecular Genetics, Indiana University School of Medicine, 975 West Walnut Street, MRL Bldg IB130, Indianapolis, IN 46202, USA School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute for Integrated and Intelligent Systems, Griffith University, 170 Kessels Road, Brisbane, Queensland 4111, Australia, Queensland Research Laboratory, NICTA - National ICT Australia, 70-72 Bowen Street, Spring Hill, Queensland 4000, Australia, Institute for Glycomics, Griffith University, Parklands Drive, Southport, Queensland 4222, Australia, Institute of Medical Genetics, Cardiff University, Heath Park, Cardif
| |
Collapse
|