1
|
Han L, Shen B, Wu X, Zhang J, Wen YJ. Compressed variance component mixed model reveals epistasis associated with flowering in Arabidopsis. FRONTIERS IN PLANT SCIENCE 2024; 14:1283642. [PMID: 38259933 PMCID: PMC10800901 DOI: 10.3389/fpls.2023.1283642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 12/15/2023] [Indexed: 01/24/2024]
Abstract
Introduction Epistasis is currently a topic of great interest in molecular and quantitative genetics. Arabidopsis thaliana, as a model organism, plays a crucial role in studying the fundamental biology of diverse plant species. However, there have been limited reports about identification of epistasis related to flowering in genome-wide association studies (GWAS). Therefore, it is of utmost importance to conduct epistasis in Arabidopsis. Method In this study, we employed Levene's test and compressed variance component mixed model in GWAS to detect quantitative trait nucleotides (QTNs) and QTN-by-QTN interactions (QQIs) for 11 flowering-related traits of 199 Arabidopsis accessions with 216,130 markers. Results Our analysis detected 89 QTNs and 130 pairs of QQIs. Around these loci, 34 known genes previously reported in Arabidopsis were confirmed to be associated with flowering-related traits, such as SPA4, which is involved in regulating photoperiodic flowering, and interacts with PAP1 and PAP2, affecting growth of Arabidopsis under light conditions. Then, we observed significant and differential expression of 35 genes in response to variations in temperature, photoperiod, and vernalization treatments out of unreported genes. Functional enrichment analysis revealed that 26 of these genes were associated with various biological processes. Finally, the haplotype and phenotypic difference analysis revealed 20 candidate genes exhibiting significant phenotypic variations across gene haplotypes, of which the candidate genes AT1G12990 and AT1G09950 around QQIs might have interaction effect to flowering time regulation in Arabidopsis. Discussion These findings may offer valuable insights for the identification and exploration of genes and gene-by-gene interactions associated with flowering-related traits in Arabidopsis, that may even provide valuable reference and guidance for the research of epistasis in other species.
Collapse
Affiliation(s)
- Le Han
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Bolin Shen
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Xinyi Wu
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Jin Zhang
- College of Science, Nanjing Agricultural University, Nanjing, China
- State Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, China
| | - Yang-Jun Wen
- College of Science, Nanjing Agricultural University, Nanjing, China
- State Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
2
|
Esmaeili F, Narimani Z, Vasighi M. Discovering SNP-disease relationships in genome-wide SNP data using an improved harmony search based on SNP locus and genetic inheritance patterns. PLoS One 2023; 18:e0292266. [PMID: 37831690 PMCID: PMC10575495 DOI: 10.1371/journal.pone.0292266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 09/15/2023] [Indexed: 10/15/2023] Open
Abstract
Advances in high-throughput sequencing technologies have made it possible to access millions of measurements from thousands of people. Single nucleotide polymorphisms (SNPs), the most common type of mutation in the human genome, have been shown to play a significant role in the development of complex and multifactorial diseases. However, studying the synergistic interactions between different SNPs in explaining multifactorial diseases is challenging due to the high dimensionality of the data and methodological complexities. Existing solutions often use a multi-objective approach based on metaheuristic optimization algorithms such as harmony search. However, previous studies have shown that using a multi-objective approach is not sufficient to address complex disease models with no or low marginal effect. In this research, we introduce a locus-driven harmony search (LDHS), an improved harmony search algorithm that focuses on using SNP locus information and genetic inheritance patterns to initialize harmony memories. The proposed method integrates biological knowledge to improve harmony memory initialization by adding SNP combinations that are likely candidates for interaction and disease causation. Using a SNP grouping process, LDHS generates harmonies that include SNPs with a higher potential for interaction, resulting in greater power in detecting disease-causing SNP combinations. The performance of the proposed algorithm was evaluated on 200 synthesized datasets for disease models with and without marginal effect. The results show significant improvement in the power of the algorithm to find disease-related SNP sets while decreasing computational cost compared to state-of-the-art algorithms. The proposed algorithm also demonstrated notable performance on real breast cancer data, showing that integrating prior knowledge can significantly improve the process of detecting disease-related SNPs in both real and synthesized data.
Collapse
Affiliation(s)
- Fariba Esmaeili
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| | - Zahra Narimani
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| | - Mahdi Vasighi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| |
Collapse
|
3
|
Tuo S, Li C, Liu F, Li A, He L, Geem ZW, Shang J, Liu H, Zhu Y, Feng Z, Chen T. MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00813-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractGenome-wide association studies have succeeded in identifying genetic variants associated with complex diseases, but the findings have not been well interpreted biologically. Although it is widely accepted that epistatic interactions of high-order single nucleotide polymorphisms (SNPs) [(1) Single nucleotide polymorphisms (SNP) are mainly deoxyribonucleic acid (DNA) sequence polymorphisms caused by variants at a single nucleotide at the genome level. They are the most common type of heritable variation in humans.] are important causes of complex diseases, the combinatorial explosion of millions of SNPs and multiple tests impose a large computational burden. Moreover, it is extremely challenging to correctly distinguish high-order SNP epistatic interactions from other high-order SNP combinations due to small sample sizes. In this study, a multitasking harmony search algorithm (MTHSA-DHEI) is proposed for detecting high-order epistatic interactions [(2) In classical genetics, if genes X1 and X2 are mutated and each mutation by itself produces a unique disease status (phenotype) but the mutations together cause the same disease status as the gene X1 mutation, gene X1 is epistatic and gene X2 is hypostatic, and gene X1 has an epistatic effect (main effect) on disease status. In this work, a high-order epistatic interaction occurs when two or more SNP loci have a joint influence on disease status.], with the goal of simultaneously detecting multiple types of high-order (k1-order, k2-order, …, kn-order) SNP epistatic interactions. Unified coding is adopted for multiple tasks, and four complementary association evaluation functions are employed to improve the capability of discriminating the high-order SNP epistatic interactions. We compare the proposed MTHSA-DHEI method with four excellent methods for detecting high-order SNP interactions for 8 high-orderepistatic interaction models with no marginal effect (EINMEs) and 12 epistatic interaction models with marginal effects (EIMEs) (*) and implement the MTHSA-DHEI algorithm with a real dataset: age-related macular degeneration (AMD). The experimental results indicate that MTHSA-DHEI has power and an F1-score exceeding 90% for all EIMEs and five EINMEs and reduces the computational time by more than 90%. It can efficiently perform multiple high-order detection tasks for high-order epistatic interactions and improve the discrimination ability for diverse epistasis models.
Collapse
|
4
|
Yang T, Gao F. High-quality pan-genome of Escherichia coli generated by excluding confounding and highly similar strains reveals an association between unique gene clusters and genomic islands. Brief Bioinform 2022; 23:6638794. [PMID: 35809555 DOI: 10.1093/bib/bbac283] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 06/17/2022] [Accepted: 06/20/2022] [Indexed: 01/24/2023] Open
Abstract
The pan-genome analysis of bacteria provides detailed insight into the diversity and evolution of a bacterial population. However, the genomes involved in the pan-genome analysis should be checked carefully, as the inclusion of confounding strains would have unfavorable effects on the identification of core genes, and the highly similar strains could bias the results of the pan-genome state (open versus closed). In this study, we found that the inclusion of highly similar strains also affects the results of unique genes in pan-genome analysis, which leads to a significant underestimation of the number of unique genes in the pan-genome. Therefore, these strains should be excluded from pan-genome analysis at the early stage of data processing. Currently, tens of thousands of genomes have been sequenced for Escherichia coli, which provides an unprecedented opportunity as well as a challenge for pan-genome analysis of this classical model organism. Using the proposed strategies, a high-quality E. coli pan-genome was obtained, and the unique genes was extracted and analyzed, revealing an association between the unique gene clusters and genomic islands from a pan-genome perspective, which may facilitate the identification of genomic islands.
Collapse
Affiliation(s)
- Tong Yang
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, China
| |
Collapse
|
5
|
Brandes N, Weissbrod O, Linial M. Open problems in human trait genetics. Genome Biol 2022; 23:131. [PMID: 35725481 PMCID: PMC9208223 DOI: 10.1186/s13059-022-02697-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 05/30/2022] [Indexed: 12/21/2022] Open
Abstract
Genetic studies of human traits have revolutionized our understanding of the variation between individuals, and yet, the genetics of most traits is still poorly understood. In this review, we highlight the major open problems that need to be solved, and by discussing these challenges provide a primer to the field. We cover general issues such as population structure, epistasis and gene-environment interactions, data-related issues such as ancestry diversity and rare genetic variants, and specific challenges related to heritability estimates, genetic association studies, and polygenic risk scores. We emphasize the interconnectedness of these problems and suggest promising avenues to address them.
Collapse
Affiliation(s)
- Nadav Brandes
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
6
|
Wang X, Cao X, Feng Y, Guo M, Yu G, Wang J. ELSSI: parallel SNP-SNP interactions detection by ensemble multi-type detectors. Brief Bioinform 2022; 23:6607749. [PMID: 35696639 DOI: 10.1093/bib/bbac213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 04/18/2022] [Accepted: 05/07/2022] [Indexed: 12/11/2022] Open
Abstract
With the development of high-throughput genotyping technology, single nucleotide polymorphism (SNP)-SNP interactions (SSIs) detection has become an essential way for understanding disease susceptibility. Various methods have been proposed to detect SSIs. However, given the disease complexity and bias of individual SSI detectors, these single-detector-based methods are generally unscalable for real genome-wide data and with unfavorable results. We propose a novel ensemble learning-based approach (ELSSI) that can significantly reduce the bias of individual detectors and their computational load. ELSSI randomly divides SNPs into different subsets and evaluates them by multi-type detectors in parallel. Particularly, ELSSI introduces a four-stage pipeline (generate, score, switch and filter) to iteratively generate new SNP combination subsets from SNP subsets, score the combination subset by individual detectors, switch high-score combinations to other detectors for re-scoring, then filter out combinations with low scores. This pipeline makes ELSSI able to detect high-order SSIs from large genome-wide datasets. Experimental results on various simulated and real genome-wide datasets show the superior efficacy of ELSSI to state-of-the-art methods in detecting SSIs, especially for high-order ones. ELSSI is applicable with moderate PCs on the Internet and flexible to assemble new detectors. The code of ELSSI is available at https://www.sdu-idea.cn/codes.php?name=ELSSI.
Collapse
Affiliation(s)
- Xin Wang
- School of Software, Shandong University, Jinan 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research(C-FAIR), Shandong University, Jinan 250101, China
| | - Xia Cao
- College of Computer and Information Sciences, Southwest University, Chongqing 400715, China
| | - Yuantao Feng
- College of Computer and Information Sciences, Southwest University, Chongqing 400715, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan 250101, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research(C-FAIR), Shandong University, Jinan 250101, China
| |
Collapse
|
7
|
Ponte-Fernandez C, Gonzalez-Dominguez J, Carvajal-Rodriguez A, Martin MJ. Evaluation of Existing Methods for High-Order Epistasis Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:912-926. [PMID: 33055017 DOI: 10.1109/tcbb.2020.3030312] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.
Collapse
|
8
|
Woodward AA, Taylor DM, Goldmuntz E, Mitchell LE, Agopian A, Moore JH, Urbanowicz RJ. Gene-Interaction-Sensitive enrichment analysis in congenital heart disease. BioData Min 2022; 15:4. [PMID: 35151364 PMCID: PMC8841104 DOI: 10.1186/s13040-022-00287-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 01/17/2022] [Indexed: 11/24/2022] Open
Abstract
Background Gene set enrichment analysis (GSEA) uses gene-level univariate associations to identify gene set-phenotype associations for hypothesis generation and interpretation. We propose that GSEA can be adapted to incorporate SNP and gene-level interactions. To this end, gene scores are derived by Relief-based feature importance algorithms that efficiently detect both univariate and interaction effects (MultiSURF) or exclusively interaction effects (MultiSURF*). We compare these interaction-sensitive GSEA approaches to traditional χ2 rankings in simulated genome-wide array data, and in a target and replication cohort of congenital heart disease patients with conotruncal defects (CTDs). Results In the simulation study and for both CTD datasets, both Relief-based approaches to GSEA captured more relevant and significant gene ontology terms compared to the univariate GSEA. Key terms and themes of interest include cell adhesion, migration, and signaling. A leading edge analysis highlighted semaphorins and their receptors, the Slit-Robo pathway, and other genes with roles in the secondary heart field and outflow tract development. Conclusions Our results indicate that interaction-sensitive approaches to enrichment analysis can improve upon traditional univariate GSEA. This approach replicated univariate findings and identified additional and more robust support for the role of the secondary heart field and cardiac neural crest cell migration in the development of CTDs. Supplementary Information The online version contains supplementary material available at (10.1186/s13040-022-00287-w).
Collapse
|
9
|
Qin X, Ma S, Wu M. Gene-gene interaction analysis incorporating network information via a structured Bayesian approach. Stat Med 2021; 40:6619-6633. [PMID: 34542187 PMCID: PMC8595614 DOI: 10.1002/sim.9202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 08/22/2021] [Accepted: 08/30/2021] [Indexed: 01/14/2023]
Abstract
Increasing evidence has shown that gene-gene interactions have important effects in biological processes of human diseases. Due to the high dimensionality of genetic measurements, interaction analysis usually suffers from a lack of sufficient information and has unsatisfactory results. Biological network information has been massively accumulated, allowing researchers to identify biomarkers while taking a system perspective, conducting network selection (of functionally related biomarkers), and accommodating network structures. In main-effect-only analysis, network information has been incorporated. However, effort has been limited in interaction analysis. Recently, link networks that describe the relationships between genetic interactions have been demonstrated as effective for revealing multiscale hierarchical organizations in networks and providing interesting findings beyond node networks. In this study, we develop a novel structured Bayesian interaction analysis approach to effectively incorporate network information. This study is among the first to identify gene-gene interactions with the assistance of network selection, while simultaneously accommodating the underlying network structures of both main effects and interactions. It innovatively respects multiple hierarchies among main effects, interactions, and networks. The Bayesian technique is adopted, which may be more informative for estimation and prediction over some other techniques. An efficient variational Bayesian expectation-maximization algorithm is developed to explore the posterior distribution. Extensive simulation studies demonstrate the practical superiority of the proposed approach. The analysis of TCGA data on melanoma and lung cancer leads to biologically sensible findings with satisfactory prediction accuracy and selection stability.
Collapse
Affiliation(s)
- Xing Qin
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| |
Collapse
|
10
|
El Hou A, Rocha D, Venot E, Blanquet V, Philippe R. Long-range linkage disequilibrium in French beef cattle breeds. Genet Sel Evol 2021; 53:63. [PMID: 34301193 PMCID: PMC8306006 DOI: 10.1186/s12711-021-00657-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 07/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Linkage disequilibrium (LD) is a key parameter to study the history of populations and to identify and fine map quantitative trait loci (QTL) and it has been studied for many years in animal populations. The advent of new genotyping technologies has allowed whole-genome LD studies in most cattle populations. However, to date, long-range LD (LRLD) between distant variants on the genome has not been investigated in detail in cattle. Here, we present the first comprehensive study of LRLD in French beef cattle by analysing data on 672 Charolais (CHA), 462 Limousine (LIM) and 326 Blonde d'Aquitaine (BLA) individuals that were genotyped on the Illumina BovineHD Beadchip. Furthermore, whole-genome LD and haplotype block structure were analysed in these three breeds. RESULTS We computed linkage disequilibrium (r2) values for 5.9, 5.6 and 6.0 billion pairs of SNPs on the 29 autosomes of CHA, LIM and BLA, respectively. Mean r2 values drop to less than 0.1 for distances between SNPs greater than 120 kb. However, for the first time, we detected the existence of LRLD in the three main French beef breeds. In total, 598, 266, and 795 LRLD events (r2 ≥ 0.6) were detected in CHA, LIM and BLA, respectively. Each breed had predominantly population-specific LRLD interactions, although shared LRLD events occurred in a number of regions (55 LRLD events were shared between two breeds and nine between the three breeds). Examples of possible functional gene interactions and QTL co-location were observed with some of these LRLD events, which suggests epistatic selection. CONCLUSIONS We identified long-range linkage disequilibrium for the first time in French beef cattle populations. Epistatic selection may be the main source of the observed LRLD events, but other forces may also be involved. LRLD information should be accounted for in genome-wide association studies.
Collapse
Affiliation(s)
- Abdelmajid El Hou
- INRAE, PEIRENE EA7500, USC1061 GAMAA, Université de Limoges, 87060, Limoges, France
| | - Dominique Rocha
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Eric Venot
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Véronique Blanquet
- INRAE, PEIRENE EA7500, USC1061 GAMAA, Université de Limoges, 87060, Limoges, France
| | - Romain Philippe
- INRAE, PEIRENE EA7500, USC1061 GAMAA, Université de Limoges, 87060, Limoges, France.
| |
Collapse
|
11
|
Banegas-Luna AJ, Peña-García J, Iftene A, Guadagni F, Ferroni P, Scarpato N, Zanzotto FM, Bueno-Crespo A, Pérez-Sánchez H. Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey. Int J Mol Sci 2021; 22:4394. [PMID: 33922356 PMCID: PMC8122817 DOI: 10.3390/ijms22094394] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/18/2022] Open
Abstract
Artificial Intelligence is providing astonishing results, with medicine being one of its favourite playgrounds. Machine Learning and, in particular, Deep Neural Networks are behind this revolution. Among the most challenging targets of interest in medicine are cancer diagnosis and therapies but, to start this revolution, software tools need to be adapted to cover the new requirements. In this sense, learning tools are becoming a commodity but, to be able to assist doctors on a daily basis, it is essential to fully understand how models can be interpreted. In this survey, we analyse current machine learning models and other in-silico tools as applied to medicine-specifically, to cancer research-and we discuss their interpretability, performance and the input data they are fed with. Artificial neural networks (ANN), logistic regression (LR) and support vector machines (SVM) have been observed to be the preferred models. In addition, convolutional neural networks (CNNs), supported by the rapid development of graphic processing units (GPUs) and high-performance computing (HPC) infrastructures, are gaining importance when image processing is feasible. However, the interpretability of machine learning predictions so that doctors can understand them, trust them and gain useful insights for the clinical practice is still rarely considered, which is a factor that needs to be improved to enhance doctors' predictive capacity and achieve individualised therapies in the near future.
Collapse
Affiliation(s)
- Antonio Jesús Banegas-Luna
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Jorge Peña-García
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Adrian Iftene
- Faculty of Computer Science, Universitatea Alexandru Ioan Cuza (UAIC), 700505 Jashi, Romania;
| | - Fiorella Guadagni
- Interinstitutional Multidisciplinary Biobank (BioBIM), IRCCS San Raffaele Roma, 00166 Rome, Italy; (F.G.); (P.F.)
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Patrizia Ferroni
- Interinstitutional Multidisciplinary Biobank (BioBIM), IRCCS San Raffaele Roma, 00166 Rome, Italy; (F.G.); (P.F.)
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Noemi Scarpato
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Fabio Massimo Zanzotto
- Dipartimento di Ingegneria dell’Impresa “Mario Lucertini”, University of Rome Tor Vergata, 00133 Rome, Italy;
| | - Andrés Bueno-Crespo
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| |
Collapse
|
12
|
Wang D, Tang H, Liu JF, Xu S, Zhang Q, Ning C. Rapid epistatic mixed-model association studies by controlling multiple polygenic effects. Bioinformatics 2021; 36:4833-4837. [PMID: 32614415 DOI: 10.1093/bioinformatics/btaa610] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 05/21/2020] [Accepted: 06/24/2020] [Indexed: 12/19/2022] Open
Abstract
SUMMARY We have developed a rapid mixed model algorithm for exhaustive genome-wide epistatic association analysis by controlling multiple polygenic effects. Our model can simultaneously handle additive by additive epistasis, dominance by dominance epistasis and additive by dominance epistasis, and account for intrasubject fluctuations due to individuals with repeated records. Furthermore, we suggest a simple but efficient approximate algorithm, which allows the examination of all pairwise interactions in a remarkably fast manner of linear with population size. Simulation studies are performed to investigate the properties of REMMAX. Application to publicly available yeast and human data has showed that our mixed model-based method has similar performance with simple linear model on computational efficiency. It took less than 40 h for the pairwise analysis of 5000 individuals genotyped with roughly 350 000 SNPs with five threads on Intel Xeon E5 2.6 GHz CPU. AVAILABILITY AND IMPLEMENTATION Source codes are freely available at https://github.com/chaoning/GMAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dan Wang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, China
| | - Hui Tang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, China
| | - Jian-Feng Liu
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Shizhong Xu
- Department of Botany and Plant Science, University of California, Riverside, CA 92521, USA
| | - Qin Zhang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, China
| | - Chao Ning
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, China
| |
Collapse
|
13
|
Pizarro M, Landi V, Navas F, León J, Martínez A, Fernández J, Delgado J. Nonparametric analysis of casein complex genes' epistasis and their effects on phenotypic expression of milk yield and composition in Murciano-Granadina goats. J Dairy Sci 2020; 103:8274-8291. [DOI: 10.3168/jds.2019-17833] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 04/07/2020] [Indexed: 01/17/2023]
|
14
|
Ni X, Zhou M, Wang H, He KY, Broeckel U, Hanis C, Kardia S, Redline S, Cooper RS, Tang H, Zhu X. Detecting fitness epistasis in recently admixed populations with genome-wide data. BMC Genomics 2020; 21:476. [PMID: 32652930 PMCID: PMC7353720 DOI: 10.1186/s12864-020-06874-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 06/30/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fitness epistasis, the interaction effect of genes at different loci on fitness, makes an important contribution to adaptive evolution. Although fitness interaction evidence has been observed in model organisms, it is more difficult to detect and remains poorly understood in human populations as a result of limited statistical power and experimental constraints. Fitness epistasis is inferred from non-independence between unlinked loci. We previously observed ancestral block correlation between chromosomes 4 and 6 in African Americans. The same approach fails when examining ancestral blocks on the same chromosome due to the strong confounding effect observed in a recently admixed population. RESULTS We developed a novel approach to eliminate the bias caused by admixture linkage disequilibrium when searching for fitness epistasis on the same chromosome. We applied this approach in 16,252 unrelated African Americans and identified significant ancestral correlations in two pairs of genomic regions (P-value< 8.11 × 10- 7) on chromosomes 1 and 10. The ancestral correlations were not explained by population admixture. Historical African-European crossover events are reduced between pairs of epistatic regions. We observed multiple pairs of co-expressed genes shared by the two regions on each chromosome, including ADAR being co-expressed with IFI44 in almost all tissues and DARC being co-expressed with VCAM1, S1PR1 and ELTD1 in multiple tissues in the Genotype-Tissue Expression (GTEx) data. Moreover, the co-expressed gene pairs are associated with the same diseases/traits in the GWAS Catalog, such as white blood cell count, blood pressure, lung function, inflammatory bowel disease and educational attainment. CONCLUSIONS Our analyses revealed two instances of fitness epistasis on chromosomes 1 and 10, and the findings suggest a potential approach to improving our understanding of adaptive evolution.
Collapse
Affiliation(s)
- Xumin Ni
- Department of Mathematics, School of Science, Beijing Jiaotong University, Beijing, 100044, China
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Mengshi Zhou
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Heming Wang
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Karen Y He
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Uli Broeckel
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Craig Hanis
- Department of Epidemiology, Human Genetics and Environmental Sciences, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Sharon Kardia
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard S Cooper
- Department of Public Health Science, Loyola University Medical Center, Maywood, IL, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA.
| |
Collapse
|
15
|
Yanes T, McInerney-Leo AM, Law MH, Cummings S. The emerging field of polygenic risk scores and perspective for use in clinical care. Hum Mol Genet 2020; 29:R165-R176. [DOI: 10.1093/hmg/ddaa136] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 06/30/2020] [Accepted: 07/01/2020] [Indexed: 02/06/2023] Open
Abstract
Abstract
Genetic testing is used widely for diagnostic, carrier and predictive testing in monogenic diseases. Until recently, there were no genetic testing options available for multifactorial complex diseases like heart disease, diabetes and cancer. Genome-wide association studies (GWAS) have been invaluable in identifying single-nucleotide polymorphisms (SNPs) associated with increased or decreased risk for hundreds of complex disorders. For a given disease, SNPs can be combined to generate a cumulative estimation of risk known as a polygenic risk score (PRS). After years of research, PRSs are increasingly used in clinical settings. In this article, we will review the literature on how both genome-wide and restricted PRSs are developed and the relative merit of each. The validation and evaluation of PRSs will also be discussed, including the recognition that PRS validity is intrinsically linked to the methodological and analytical approach of the foundation GWAS together with the ethnic characteristics of that cohort. Specifically, population differences may affect imputation accuracy, risk magnitude and direction. Even as PRSs are being introduced into clinical practice, there is a push to combine them with clinical and demographic risk factors to develop a holistic disease risk. The existing evidence regarding the clinical utility of PRSs is considered across four different domains: informing population screening programs, guiding therapeutic interventions, refining risk for families at high risk, and facilitating diagnosis and predicting prognostic outcomes. The evidence for clinical utility in relation to five well-studied disorders is summarized. The potential ethical, legal and social implications are also highlighted.
Collapse
Affiliation(s)
- Tatiane Yanes
- Dermatology Research Centre, The University of Queensland Diamantina Institute, The University of Queensland, Brisbane, QLD 4102, Australia
| | - Aideen M McInerney-Leo
- Dermatology Research Centre, The University of Queensland Diamantina Institute, The University of Queensland, Brisbane, QLD 4102, Australia
| | - Matthew H Law
- Statistical Genetics Lab, QIMR Berghofer Medical Research Institute, Herston QLD 4006, Australia
- Faculty of Health, School of Biomedical Sciences, and Institute of Health and Biomedical Innovation, Queensland University of Technology, Kelvin Grove QLD 4059, Australia
| | | |
Collapse
|
16
|
Rojano E, Seoane P, Ranea JAG, Perkins JR. Regulatory variants: from detection to predicting impact. Brief Bioinform 2019; 20:1639-1654. [PMID: 29893792 PMCID: PMC6917219 DOI: 10.1093/bib/bby039] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 04/18/2018] [Indexed: 02/01/2023] Open
Abstract
Variants within non-coding genomic regions can greatly affect disease. In recent years, increasing focus has been given to these variants, and how they can alter regulatory elements, such as enhancers, transcription factor binding sites and DNA methylation regions. Such variants can be considered regulatory variants. Concurrently, much effort has been put into establishing international consortia to undertake large projects aimed at discovering regulatory elements in different tissues, cell lines and organisms, and probing the effects of genetic variants on regulation by measuring gene expression. Here, we describe methods and techniques for discovering disease-associated non-coding variants using sequencing technologies. We then explain the computational procedures that can be used for annotating these variants using the information from the aforementioned projects, and prediction of their putative effects, including potential pathogenicity, based on rule-based and machine learning approaches. We provide the details of techniques to validate these predictions, by mapping chromatin-chromatin and chromatin-protein interactions, and introduce Clustered Regularly Interspaced Short Palindromic Repeats-Associated Protein 9 (CRISPR-Cas9) technology, which has already been used in this field and is likely to have a big impact on its future evolution. We also give examples of regulatory variants associated with multiple complex diseases. This review is aimed at bioinformaticians interested in the characterization of regulatory variants, molecular biologists and geneticists interested in understanding more about the nature and potential role of such variants from a functional point of views, and clinicians who may wish to learn about variants in non-coding genomic regions associated with a given disease and find out what to do next to uncover how they impact on the underlying mechanisms.
Collapse
Affiliation(s)
- Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Juan A G Ranea
- CIBER de Enfermedades Raras, ISCIII, Madrid, Spain and Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - James R Perkins
- Research laboratory, IBIMA-Regional University Hospital of Malaga, UMA, Malaga 29009, Spain
| |
Collapse
|
17
|
Discovering genetic interactions bridging pathways in genome-wide association studies. Nat Commun 2019; 10:4274. [PMID: 31537791 PMCID: PMC6753138 DOI: 10.1038/s41467-019-12131-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 08/20/2019] [Indexed: 12/20/2022] Open
Abstract
Genetic interactions have been reported to underlie phenotypes in a variety of systems, but the extent to which they contribute to complex disease in humans remains unclear. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions, but existing methods for identifying them from GWAS data tend to focus on testing individual locus pairs, which undermines statistical power. Importantly, a global genetic network mapped for a model eukaryotic organism revealed that genetic interactions often connect genes between compensatory functional modules in a highly coherent manner. Taking advantage of this expected structure, we developed a computational approach called BridGE that identifies pathways connected by genetic interactions from GWAS data. Applying BridGE broadly, we discover significant interactions in Parkinson's disease, schizophrenia, hypertension, prostate cancer, breast cancer, and type 2 diabetes. Our novel approach provides a general framework for mapping complex genetic networks underlying human disease from genome-wide genotype data.
Collapse
|
18
|
Yan J, Risacher SL, Shen L, Saykin AJ. Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Brief Bioinform 2019; 19:1370-1381. [PMID: 28679163 DOI: 10.1093/bib/bbx066] [Citation(s) in RCA: 106] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Indexed: 11/14/2022] Open
Abstract
In the past decade, significant progress has been made in complex disease research across multiple omics layers from genome, transcriptome and proteome to metabolome. There is an increasing awareness of the importance of biological interconnections, and much success has been achieved using systems biology approaches. However, because of the typical focus on one single omics layer at a time, existing systems biology findings explain only a modest portion of complex disease. Recent advances in multi-omics data collection and sharing present us new opportunities for studying complex diseases in a more comprehensive fashion, and yet simultaneously create new challenges considering the unprecedented data dimensionality and diversity. Here, our goal is to review extant and emerging network approaches that can be applied across multiple biological layers to facilitate a more comprehensive and integrative multilayered omics analysis of complex diseases.
Collapse
Affiliation(s)
- Jingwen Yan
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| |
Collapse
|
19
|
|
20
|
FDHE-IW: A Fast Approach for Detecting High-Order Epistasis in Genome-Wide Case-Control Studies. Genes (Basel) 2018; 9:genes9090435. [PMID: 30158504 PMCID: PMC6162554 DOI: 10.3390/genes9090435] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 08/16/2018] [Accepted: 08/16/2018] [Indexed: 12/13/2022] Open
Abstract
Detecting high-order epistasis in genome-wide association studies (GWASs) is of importance when characterizing complex human diseases. However, the enormous numbers of possible single-nucleotide polymorphism (SNP) combinations and the diversity among diseases presents a significant computational challenge. Herein, a fast method for detecting high-order epistasis based on an interaction weight (FDHE-IW) method is evaluated in the detection of SNP combinations associated with disease. First, the symmetrical uncertainty (SU) value for each SNP is calculated. Then, the top-k SNPs are isolated as guiders to identify 2-way SNP combinations with significant interaction weight values. Next, a forward search is employed to detect high-order SNP combinations with significant interaction weight values as candidates. Finally, the findings were statistically evaluated using a G-test to isolate true positives. The developed algorithm was used to evaluate 12 simulated datasets and an age-related macular degeneration (AMD) dataset and was shown to perform robustly in the detection of some high-order disease-causing models.
Collapse
|
21
|
Gosik K, Sun L, Chinchilli VM, Wu R. An Ultrahigh-Dimensional Mapping Model of High-order Epistatic Networks for Complex Traits. Curr Genomics 2018; 19:384-394. [PMID: 30065614 PMCID: PMC6030858 DOI: 10.2174/1389202919666171218162210] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Revised: 03/28/2017] [Accepted: 05/04/2017] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Genetic interactions involving more than two loci have been thought to affect quantitatively inherited traits and diseases more pervasively than previously appreciated. However, the detection of such high-order interactions to chart a complete portrait of genetic architecture has not been well explored. METHODS We present an ultrahigh-dimensional model to systematically characterize genetic main effects and interaction effects of various orders among all possible markers in a genetic mapping or association study. The model was built on the extension of a variable selection procedure, called iFORM, derived from forward selection. The model shows its unique power to estimate the magnitudes and signs of high-order epistatic effects, in addition to those of main effects and pairwise epistatic effects. RESULTS The statistical properties of the model were tested and validated through simulation studies. By analyzing a real data for shoot growth in a mapping population of woody plant, mei (Prunus mume), we demonstrated the usefulness and utility of the model in practical genetic studies. The model has identified important high-order interactions that contribute to shoot growth for mei. CONCLUSION The model provides a tool to precisely construct genotype-phenotype maps for quantitative traits by identifying any possible high-order epistasis which is often ignored in the current genetic literature.
Collapse
Affiliation(s)
- Kirk Gosik
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA17033, USA
| | - Lidan Sun
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA17033, USA
| | - Vernon M. Chinchilli
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA17033, USA
| | - Rongling Wu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA17033, USA
| |
Collapse
|
22
|
TrioMDR: Detecting SNP interactions in trio families with model-based multifactor dimensionality reduction. Genomics 2018; 111:1176-1182. [PMID: 30055230 DOI: 10.1016/j.ygeno.2018.07.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 07/11/2018] [Accepted: 07/15/2018] [Indexed: 12/18/2022]
Abstract
Single nucleotide polymorphism (SNP) interactions can explain the missing heritability of common complex diseases. Many interaction detection methods have been proposed in genome-wide association studies, and they can be divided into two types: population-based and family-based. Compared with population-based methods, family-based methods are robust vs. population stratification. Several family-based methods have been proposed, among which Multifactor Dimensionality Reduction (MDR)-based methods are popular and powerful. However, current MDR-based methods suffer from heavy computational burden. Furthermore, they do not allow for main effect adjustment. In this work we develop a two-stage model-based MDR approach (TrioMDR) to detect multi-locus interaction in trio families (i.e., two parents and one affected child). TrioMDR combines the MDR framework with logistic regression models to check interactions, so TrioMDR can adjust main effects. In addition, unlike consuming permutation procedures used in traditional MDR-based methods, TrioMDR utilizes a simple semi-parameter P-values correction procedure to control type I error rate, this procedure only uses a few permutations to achieve the significance of a multi-locus model and significantly speeds up TrioMDR. We performed extensive experiments on simulated data to compare the type I error and power of TrioMDR under different scenarios. The results demonstrate that TrioMDR is fast and more powerful in general than some recently proposed methods for interaction detection in trios. The R codes of TrioMDR are available at: https://github.com/TrioMDR/TrioMDR.
Collapse
|
23
|
Ning C, Wang D, Kang H, Mrode R, Zhou L, Xu S, Liu JF. A rapid epistatic mixed-model association analysis by linear retransformations of genomic estimated values. Bioinformatics 2018; 34:1817-1825. [PMID: 29342229 PMCID: PMC5972602 DOI: 10.1093/bioinformatics/bty017] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 01/07/2018] [Accepted: 01/10/2018] [Indexed: 12/16/2022] Open
Abstract
Motivation Epistasis provides a feasible way for probing potential genetic mechanism of complex traits. However, time-consuming computation challenges successful detection of interaction in practice, especially when linear mixed model (LMM) is used to control type I error in the presence of population structure and cryptic relatedness. Results A rapid epistatic mixed-model association analysis (REMMA) method was developed to overcome computational limitation. This method first estimates individuals' epistatic effects by an extended genomic best linear unbiased prediction (EG-BLUP) model with additive and epistatic kinship matrix, then pairwise interaction effects are obtained by linear retransformations of individuals' epistatic effects. Simulation studies showed that REMMA could control type I error and increase statistical power in detecting epistatic QTNs in comparison with existing LMM-based FaST-LMM. We applied REMMA to two real datasets, a mouse dataset and the Wellcome Trust Case Control Consortium (WTCCC) data. Application to the mouse data further confirmed the performance of REMMA in controlling type I error. For the WTCCC data, we found most epistatic QTNs for type 1 diabetes (T1D) located in a major histocompatibility complex (MHC) region, from which a large interacting network with 12 hub genes (interacting with ten or more genes) was established. Availability and implementation Our REMMA method can be freely accessed at https://github.com/chaoning/REMMA. Contact liujf@cau.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chao Ning
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Dan Wang
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Huimin Kang
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Raphael Mrode
- Animal Biosciences, International Livestock Institute, Nairobi, Kenya
| | - Lei Zhou
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Shizhong Xu
- Department of Botany and Plant Science, University of California, Riverside, CA, USA
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
24
|
Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations. Sci Rep 2017; 7:11529. [PMID: 28912584 PMCID: PMC5599559 DOI: 10.1038/s41598-017-11064-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 08/17/2017] [Indexed: 02/01/2023] Open
Abstract
Genome-wide association study is especially challenging in detecting high-order disease-causing models due to model diversity, possible low or even no marginal effect of the model, and extraordinary search and computations. In this paper, we propose a niche harmony search algorithm where joint entropy is utilized as a heuristic factor to guide the search for low or no marginal effect model, and two computationally lightweight scores are selected to evaluate and adapt to diverse of disease models. In order to obtain all possible suspected pathogenic models, niche technique merges with HS, which serves as a taboo region to avoid HS trapping into local search. From the resultant set of candidate SNP-combinations, we use G-test statistic for testing true positives. Experiments were performed on twenty typical simulation datasets in which 12 models are with marginal effect and eight ones are with no marginal effect. Our results indicate that the proposed algorithm has very high detection power for searching suspected disease models in the first stage and it is superior to some typical existing approaches in both detection power and CPU runtime for all these datasets. Application to age-related macular degeneration (AMD) demonstrates our method is promising in detecting high-order disease-causing models.
Collapse
|
25
|
Moyer E, Hagenauer M, Lesko M, Francis F, Rodriguez O, Nagarajan V, Huser V, Busby B. MetaNetVar: Pipeline for applying network analysis tools for genomic variants analysis. F1000Res 2016; 5:674. [PMID: 27158457 PMCID: PMC4857755 DOI: 10.12688/f1000research.8288.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/12/2016] [Indexed: 01/02/2023] Open
Abstract
Network analysis can make variant analysis better. There are existing tools like HotNet2 and dmGWAS that can provide various analytical methods. We developed a prototype of a pipeline called MetaNetVar that allows execution of multiple tools. The code is published at
https://github.com/NCBI-Hackathons/Network_SNPs. A working prototype is published as an Amazon Machine Image - ami-4510312f .
Collapse
Affiliation(s)
- Eric Moyer
- National Center for Biotechnology Information, Bethesda, USA
| | - Megan Hagenauer
- Molecular, Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, USA
| | - Matthew Lesko
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, USA
| | - Felix Francis
- Bioinformatics and Systems Biology program, University of Delaware, Newark, USA
| | - Oscar Rodriguez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Vijayaraj Nagarajan
- Bioinformatics and Computational Biosciences Branch, National Institute of Allergy and Infectious Diseases, National Institute of Mental Health, Bethesda, USA
| | - Vojtech Huser
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institute of Mental Health, Bethesda, USA
| | - Ben Busby
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, USA
| |
Collapse
|