1
|
Souza MG, Vallejo EE, Estrada K. Detecting Clustered Independent Rare Variant Associations Using Genetic Algorithms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:932-939. [PMID: 31403438 DOI: 10.1109/tcbb.2019.2930505] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The availability of an increasing collection of sequencing data provides the opportunity to study genetic variation with an unprecedented level of detail. There is much interest in uncovering the role of rare variants and their contribution to disease. However, detecting associations of rare variants with small minor allele frequencies (MAF) and modest effects remains a challenge for rare variant association methods. Due to this low signal-to-noise ratio, most methods are underpowered to detect associations even when conducting rare variant association tests at the gene level. We present a new method for detecting rare variant associations. The algorithm consists of two steps. In the first step, a genetic algorithm searches for a promising genomic region containing a collection of genes with causal rare variants. In the second step, a genetic algorithm aims at removing false positives from the located genomic region. We tested the proposed method with a collection of datasets obtained from real exome data. The proposed method possesses sufficient power for detecting associations of rare variants with complex phenotypes. This method can be used for studying the contribution of rare variants with complex disease, particularly in cases where single-variant or gene-based tests are underpowered.
Collapse
|
3
|
Khan MW, Alam M. A survey of application: genomics and genetic programming, a new frontier. Genomics 2012; 100:65-71. [PMID: 22683715 DOI: 10.1016/j.ygeno.2012.05.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2011] [Revised: 05/22/2012] [Accepted: 05/29/2012] [Indexed: 11/15/2022]
Abstract
The aim of this paper is to provide an introduction to the rapidly developing field of genetic programming (GP). Particular emphasis is placed on the application of GP to genomics. First, the basic methodology of GP is introduced. This is followed by a review of applications in the areas of gene network inference, gene expression data analysis, SNP analysis, epistasis analysis and gene annotation. Finally this paper concluded by suggesting potential avenues of possible future research on genetic programming, opportunities to extend the technique, and areas for possible practical applications.
Collapse
Affiliation(s)
- Mohammad Wahab Khan
- Department of Computer Science, Jamia Millia Islamia, Maulana Mohammad Ali Jauhar Marg, New Delhi 110025, India.
| | | |
Collapse
|
4
|
Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA. A Survey of Evolutionary Algorithms for Decision-Tree Induction. ACTA ACUST UNITED AC 2012. [DOI: 10.1109/tsmcc.2011.2157494] [Citation(s) in RCA: 194] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
6
|
Wongseree W, Assawamakin A, Piroonratana T, Sinsomros S, Limwongse C, Chaiyaratana N. Detecting purely epistatic multi-locus interactions by an omnibus permutation test on ensembles of two-locus analyses. BMC Bioinformatics 2009; 10:294. [PMID: 19761607 PMCID: PMC2759961 DOI: 10.1186/1471-2105-10-294] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2009] [Accepted: 09/17/2009] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Purely epistatic multi-locus interactions cannot generally be detected via single-locus analysis in case-control studies of complex diseases. Recently, many two-locus and multi-locus analysis techniques have been shown to be promising for the epistasis detection. However, exhaustive multi-locus analysis requires prohibitively large computational efforts when problems involve large-scale or genome-wide data. Furthermore, there is no explicit proof that a combination of multiple two-locus analyses can lead to the correct identification of multi-locus interactions. RESULTS The proposed 2LOmb algorithm performs an omnibus permutation test on ensembles of two-locus analyses. The algorithm consists of four main steps: two-locus analysis, a permutation test, global p-value determination and a progressive search for the best ensemble. 2LOmb is benchmarked against an exhaustive two-locus analysis technique, a set association approach, a correlation-based feature selection (CFS) technique and a tuned ReliefF (TuRF) technique. The simulation results indicate that 2LOmb produces a low false-positive error. Moreover, 2LOmb has the best performance in terms of an ability to identify all causative single nucleotide polymorphisms (SNPs) and a low number of output SNPs in purely epistatic two-, three- and four-locus interaction problems. The interaction models constructed from the 2LOmb outputs via a multifactor dimensionality reduction (MDR) method are also included for the confirmation of epistasis detection. 2LOmb is subsequently applied to a type 2 diabetes mellitus (T2D) data set, which is obtained as a part of the UK genome-wide genetic epidemiology study by the Wellcome Trust Case Control Consortium (WTCCC). After primarily screening for SNPs that locate within or near 372 candidate genes and exhibit no marginal single-locus effects, the T2D data set is reduced to 7,065 SNPs from 370 genes. The 2LOmb search in the reduced T2D data reveals that four intronic SNPs in PGM1 (phosphoglucomutase 1), two intronic SNPs in LMX1A (LIM homeobox transcription factor 1, alpha), two intronic SNPs in PARK2 (Parkinson disease (autosomal recessive, juvenile) 2, parkin) and three intronic SNPs in GYS2 (glycogen synthase 2 (liver)) are associated with the disease. The 2LOmb result suggests that there is no interaction between each pair of the identified genes that can be described by purely epistatic two-locus interaction models. Moreover, there are no interactions between these four genes that can be described by purely epistatic multi-locus interaction models with marginal two-locus effects. The findings provide an alternative explanation for the aetiology of T2D in a UK population. CONCLUSION An omnibus permutation test on ensembles of two-locus analyses can detect purely epistatic multi-locus interactions with marginal two-locus effects. The study also reveals that SNPs from large-scale or genome-wide case-control data which are discarded after single-locus analysis detects no association can still be useful for genetic epidemiology studies.
Collapse
Affiliation(s)
- Waranyu Wongseree
- Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand
| | - Anunchai Assawamakin
- Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkoknoi, Bangkok 10700, Thailand
| | - Theera Piroonratana
- Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand
| | - Saravudh Sinsomros
- Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand
| | - Chanin Limwongse
- Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkoknoi, Bangkok 10700, Thailand
| | - Nachol Chaiyaratana
- Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand
- Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkoknoi, Bangkok 10700, Thailand
| |
Collapse
|