1
|
Žukauskaitė G, Domarkienė I, Rančelis T, Kavaliauskienė I, Baronas K, Kučinskas V, Ambrozaitytė L. Putative protective genomic variation in the Lithuanian population. Genet Mol Biol 2024; 47:e20230030. [PMID: 38626572 PMCID: PMC11021042 DOI: 10.1590/1678-4685-gmb-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/01/2024] [Indexed: 04/18/2024] Open
Abstract
Genomic effect variants associated with survival and protection against complex diseases vary between populations due to microevolutionary processes. The aim of this study was to analyse diversity and distribution of effect variants in a context of potential positive selection. In total, 475 individuals of Lithuanian origin were genotyped using high-throughput scanning and/or sequencing technologies. Allele frequency analysis for the pre-selected effect variants was performed using the catalogue of single nucleotide polymorphisms. Comparison of the pre-selected effect variants with variants in primate species was carried out to ascertain which allele was derived and potentially of protective nature. Recent positive selection analysis was performed to verify this protective effect. Four variants having significantly different frequencies compared to European populations were identified while two other variants reached borderline significance. Effect variant in SLC30A8 gene may potentially protect against type 2 diabetes. The existing paradox of high rates of type 2 diabetes in the Lithuanian population and the relatively high frequencies of potentially protective genome variants against it indicate a lack of knowledge about the interactions between environmental factors, regulatory regions, and other genome variation. Identification of effect variants is a step towards better understanding of the microevolutionary processes, etiopathogenetic mechanisms, and personalised medicine.
Collapse
Affiliation(s)
- Gabrielė Žukauskaitė
- Vilnius University, Faculty of Medicine, Institute of Biomedical Sciences, Department of Human and Medical Genetics, Vilnius, Lithuania
| | - Ingrida Domarkienė
- Vilnius University, Faculty of Medicine, Institute of Biomedical Sciences, Department of Human and Medical Genetics, Vilnius, Lithuania
| | - Tautvydas Rančelis
- Vilnius University, Faculty of Medicine, Institute of Biomedical Sciences, Department of Human and Medical Genetics, Vilnius, Lithuania
| | - Ingrida Kavaliauskienė
- Vilnius University, Faculty of Medicine, Institute of Biomedical Sciences, Department of Human and Medical Genetics, Vilnius, Lithuania
| | - Karolis Baronas
- Vilnius University, Faculty of Medicine, Institute of Biomedical Sciences, Department of Human and Medical Genetics, Vilnius, Lithuania
| | - Vaidutis Kučinskas
- Vilnius University, Faculty of Medicine, Institute of Biomedical Sciences, Department of Human and Medical Genetics, Vilnius, Lithuania
| | - Laima Ambrozaitytė
- Vilnius University, Faculty of Medicine, Institute of Biomedical Sciences, Department of Human and Medical Genetics, Vilnius, Lithuania
| |
Collapse
|
2
|
Lac L, Leung CK, Hu P. Computational frameworks integrating deep learning and statistical models in mining multimodal omics data. J Biomed Inform 2024; 152:104629. [PMID: 38552994 DOI: 10.1016/j.jbi.2024.104629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 02/26/2024] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.
Collapse
Affiliation(s)
- Leann Lac
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Carson K Leung
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Pingzhao Hu
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Biochemistry, Western University, London, Ontario, Canada; Department of Computer Science, Western University, London, Ontario, Canada; Department of Oncology, Western University, London, Ontario, Canada; Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada; The Children's Health Research Institute, Lawson Health Research Institute, London, Ontario, Canada.
| |
Collapse
|
3
|
Lin WY. Searching for gene-gene interactions through variance quantitative trait loci of 29 continuous Taiwan Biobank phenotypes. Front Genet 2024; 15:1357238. [PMID: 38516378 PMCID: PMC10956579 DOI: 10.3389/fgene.2024.1357238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
Introduction: After the era of genome-wide association studies (GWAS), thousands of genetic variants have been identified to exhibit main effects on human phenotypes. The next critical issue would be to explore the interplay between genes, the so-called "gene-gene interactions" (GxG) or epistasis. An exhaustive search for all single-nucleotide polymorphism (SNP) pairs is not recommended because this will induce a harsh penalty of multiple testing. Limiting the search of epistasis on SNPs reported by previous GWAS may miss essential interactions between SNPs without significant marginal effects. Moreover, most methods are computationally intensive and can be challenging to implement genome-wide. Methods: I here searched for GxG through variance quantitative trait loci (vQTLs) of 29 continuous Taiwan Biobank (TWB) phenotypes. A discovery cohort of 86,536 and a replication cohort of 25,460 TWB individuals were analyzed, respectively. Results: A total of 18 nearly independent vQTLs with linkage disequilibrium measure r 2 < 0.01 were identified and replicated from nine phenotypes. 15 significant GxG were found with p-values <1.1E-5 (in the discovery cohort) and false discovery rates <2% (in the replication cohort). Among these 15 GxG, 11 were detected for blood traits including red blood cells, hemoglobin, and hematocrit; 2 for total bilirubin; 1 for fasting glucose; and 1 for total cholesterol (TCHO). All GxG were observed for gene pairs on the same chromosome, except for the APOA5 (chromosome 11)-TOMM40 (chromosome 19) interaction for TCHO. Discussion: This study provided a computationally feasible way to search for GxG genome-wide and applied this approach to 29 phenotypes.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei, Taiwan
- Master of Public Health Degree Program, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
4
|
Tong L, Shi W, Isgut M, Zhong Y, Lais P, Gloster L, Sun J, Swain A, Giuste F, Wang MD. Integrating Multi-Omics Data With EHR for Precision Medicine Using Advanced Artificial Intelligence. IEEE Rev Biomed Eng 2024; 17:80-97. [PMID: 37824325 DOI: 10.1109/rbme.2023.3324264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
With the recent advancement of novel biomedical technologies such as high-throughput sequencing and wearable devices, multi-modal biomedical data ranging from multi-omics molecular data to real-time continuous bio-signals are generated at an unprecedented speed and scale every day. For the first time, these multi-modal biomedical data are able to make precision medicine close to a reality. However, due to data volume and the complexity, making good use of these multi-modal biomedical data requires major effort. Researchers and clinicians are actively developing artificial intelligence (AI) approaches for data-driven knowledge discovery and causal inference using a variety of biomedical data modalities. These AI-based approaches have demonstrated promising results in various biomedical and healthcare applications. In this review paper, we summarize the state-of-the-art AI models for integrating multi-omics data and electronic health records (EHRs) for precision medicine. We discuss the challenges and opportunities in integrating multi-omics data with EHRs and future directions. We hope this review can inspire future research and developing in integrating multi-omics data with EHRs for precision medicine.
Collapse
|
5
|
Williams AH, Zhan CG. Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management. BioDrugs 2023; 37:649-674. [PMID: 37464099 DOI: 10.1007/s40259-023-00611-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2023] [Indexed: 07/20/2023]
Abstract
In recent years, machine learning (ML) techniques have garnered considerable interest for their potential use in accelerating the rate of drug discovery. With the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, the utilization of ML has become even more crucial in the search for effective antiviral medications. The pandemic has presented the scientific community with a unique challenge, and the rapid identification of potential treatments has become an urgent priority. Researchers have been able to accelerate the process of identifying drug candidates, repurposing existing drugs, and designing new compounds with desirable properties using machine learning in drug discovery. To train predictive models, ML techniques in drug discovery rely on the analysis of large datasets, including both experimental and clinical data. These models can be used to predict the biological activities, potential side effects, and interactions with specific target proteins of drug candidates. This strategy has proven to be an effective method for identifying potential coronavirus disease 2019 (COVID-19) and other disease treatments. This paper offers a thorough analysis of the various ML techniques implemented to combat COVID-19, including supervised and unsupervised learning, deep learning, and natural language processing. The paper discusses the impact of these techniques on pandemic drug development, including the identification of potential treatments, the understanding of the disease mechanism, and the creation of effective and safe therapeutics. The lessons learned can be applied to future outbreaks and drug discovery initiatives.
Collapse
Affiliation(s)
- Alexander H Williams
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- GSK Upper Providence, 1250 S. Collegeville Road, Collegeville, PA, 19426, USA
| | - Chang-Guo Zhan
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
| |
Collapse
|
6
|
Jagodnik KM, Shvili Y, Bartal A. HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression. PLoS One 2023; 18:e0280839. [PMID: 36791052 PMCID: PMC9931161 DOI: 10.1371/journal.pone.0280839] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/10/2023] [Indexed: 02/16/2023] Open
Abstract
Graph analytical approaches permit identifying novel genes involved in complex diseases, but are limited by (i) inferring structural network similarity of connected gene nodes, ignoring potentially relevant unconnected nodes; (ii) using homogeneous graphs, missing gene-disease associations' complexity; (iii) relying on disease/gene-phenotype associations' similarities, involving highly incomplete data; (iv) using binary classification, with gene-disease edges as positive training samples, and non-associated gene and disease nodes as negative samples that may include currently unknown disease genes; or (v) reporting predicted novel associations without systematically evaluating their accuracy. Addressing these limitations, we develop the Heterogeneous Integrated Graph for Predicting Disease Genes (HetIG-PreDiG) model that includes gene-gene, gene-disease, and gene-tissue associations. We predict novel disease genes using low-dimensional representation of nodes accounting for network structure, and extending beyond network structure using the developed Gene-Disease Prioritization Score (GDPS) reflecting the degree of gene-disease association via gene co-expression data. For negative training samples, we select non-associated gene and disease nodes with lower GDPS that are less likely to be affiliated. We evaluate the developed model's success in predicting novel disease genes by analyzing the prediction probabilities of gene-disease associations. HetIG-PreDiG successfully predicts (Micro-F1 = 0.95) gene-disease associations, outperforming baseline models, and is validated using published literature, thus advancing our understanding of complex genetic diseases.
Collapse
Affiliation(s)
- Kathleen M. Jagodnik
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- Department of Psychiatry, Harvard Medical School, Boston, MA, United States of America
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States of America
| | - Yael Shvili
- Department of Surgery A, Meir Medical Center, Kfar Sava, Israel
| | - Alon Bartal
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- * E-mail:
| |
Collapse
|
7
|
A batch process for high dimensional imputation. Comput Stat 2023. [DOI: 10.1007/s00180-023-01325-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
8
|
Penetrating Exploration of Prognostic Correlations of the FKBP Gene Family with Lung Adenocarcinoma. J Pers Med 2022; 13:jpm13010049. [PMID: 36675710 PMCID: PMC9862762 DOI: 10.3390/jpm13010049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/17/2022] [Accepted: 12/20/2022] [Indexed: 12/28/2022] Open
Abstract
The complexity of lung adenocarcinoma (LUAD), the development of which involves many interacting biological processes, makes it difficult to find therapeutic biomarkers for treatment. FK506-binding proteins (FKBPs) are composed of 12 members classified as conservative intracellular immunophilin family proteins, which are often connected to cyclophilin structures by tetratricopeptide repeat domains and have peptidyl prolyl isomerase activity that catalyzes proline from residues and turns the trans form into the cis form. Since FKBPs belong to chaperone molecules and promote protein folding, previous studies demonstrated that FKBP family members significantly contribute to the degradation of damaged, misfolded, abnormal, and foreign proteins. However, transcript expressions of this gene family in LUAD still need to be more fully investigated. In this research, we adopted high-throughput bioinformatics technology to analyze FKBP family genes in LUAD to provide credible information to clinicians and promote the development of novel cancer target drugs in the future. The current data revealed that the messenger (m)RNA levels of FKBP2, FKBP3, FKBP4, FKBP10, FKBP11, and FKBP14 were overexpressed in LUAD, and FKBP10 had connections to poor prognoses among LUAD patients in an overall survival (OS) analysis. Based on the above results, we selected FKBP10 to further conduct a comprehensive analysis of the downstream pathway and network. Through a DAVID analysis, we found that FKBP10 was involved in mitochondrial electron transport, NADH to ubiquinone transport, mitochondrial respiratory chain complex I assembly, etc. The MetaCore pathway analysis also indicated that FKBP10 was involved in "Ubiquinone metabolism", "Translation_(L)-selenoaminoacid incorporation in proteins during translation", and "Transcription_Negative regulation of HIF1A function". Collectively, this study revealed that FKBP family members are both significant prognostic biomarkers for lung cancer progression and promising clinical therapeutic targets, thus providing new targets for treating LUAD patients.
Collapse
|
9
|
Zhang C, Qin Q, Li Y, Zheng X, Chen W, Zhen Q, Li B, Wang W, Sun L. Multifactor dimensionality reduction reveals the effect of interaction between ERAP1 and IFIH1 polymorphisms in psoriasis susceptibility genes. Front Genet 2022; 13:1009589. [PMID: 36425068 PMCID: PMC9679141 DOI: 10.3389/fgene.2022.1009589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/18/2022] [Indexed: 09/18/2023] Open
Abstract
Background: Psoriasis is a common immune-mediated hyperproliferative skin dysfunction with known genetic predisposition. Gene-gene interaction (e.g., between HLA-C and ERAP1) in the psoriasis context has been reported in various populations. As ERAP1 has been recognized as a psoriasis susceptibility gene and plays a critical role in antigen presentation, we performed this study to identify interactions between ERAP1 and other psoriasis susceptibility gene variants. Methods: We validated psoriasis susceptibility gene variants in an independent cohort of 5,414 patients with psoriasis and 5,556 controls. Multifactor dimensionality reduction (MDR) analysis was performed to identify the interaction between variants significantly associated with psoriasis in the validation cohort and ERAP1 variants. We then conducted a meta-analysis of those variants with datasets from exome sequencing, target sequencing, and validation analyses and used MDR to identify the best gene-gene interaction model, including variants that were significant in the meta-analysis and ERAP1 variants. Results: We found that 19 of the replicated variants were identified with p < 0.05 and detected six single-nucleotide polymorphisms of psoriasis susceptibility genes in the meta-analysis. MDR analysis revealed that the best predictive model was that between the rs27044 polymorphism of ERAP1 and the rs7590692 polymorphism of IFIH1 (cross-validation consistency = 9/10, test accuracy = 0.53, odds ratio = 1.32 (95% CI, 1.09-1.59), p < 0.01). Conclusion: Our findings suggest that the interaction between ERAP1 and IFIH1 affects the development of psoriasis. This hypothesis needs to be tested in basic biological studies.
Collapse
Affiliation(s)
- Chang Zhang
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Institute of Dermatology, Anhui Medical University, Hefei, China
- Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
- Anhui Provincial Institute of Translational Medicine, Hefei, China
| | - Qin Qin
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Institute of Dermatology, Anhui Medical University, Hefei, China
- Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
- Anhui Provincial Institute of Translational Medicine, Hefei, China
| | - Yuanyuan Li
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Institute of Dermatology, Anhui Medical University, Hefei, China
- Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
- Anhui Provincial Institute of Translational Medicine, Hefei, China
| | - Xiaodong Zheng
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Institute of Dermatology, Anhui Medical University, Hefei, China
- Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
- Anhui Provincial Institute of Translational Medicine, Hefei, China
| | - Weiwei Chen
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Institute of Dermatology, Anhui Medical University, Hefei, China
- Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
- Anhui Provincial Institute of Translational Medicine, Hefei, China
| | - Qi Zhen
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Institute of Dermatology, Anhui Medical University, Hefei, China
- Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
- Anhui Provincial Institute of Translational Medicine, Hefei, China
| | - Bao Li
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Institute of Dermatology, Anhui Medical University, Hefei, China
- Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
- Anhui Provincial Institute of Translational Medicine, Hefei, China
| | - Wenjun Wang
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Institute of Dermatology, Anhui Medical University, Hefei, China
- Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
- Anhui Provincial Institute of Translational Medicine, Hefei, China
| | - Liangdan Sun
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Institute of Dermatology, Anhui Medical University, Hefei, China
- Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
- Anhui Provincial Institute of Translational Medicine, Hefei, China
| |
Collapse
|
10
|
Saha S, Perrin L, Röder L, Brun C, Spinelli L. Epi-MEIF: detecting higher order epistatic interactions for complex traits using mixed effect conditional inference forests. Nucleic Acids Res 2022; 50:e114. [PMID: 36107776 PMCID: PMC9639209 DOI: 10.1093/nar/gkac715] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 07/29/2022] [Accepted: 09/12/2022] [Indexed: 12/04/2022] Open
Abstract
Understanding the relationship between genetic variations and variations in complex and quantitative phenotypes remains an ongoing challenge. While Genome-wide association studies (GWAS) have become a vital tool for identifying single-locus associations, we lack methods for identifying epistatic interactions. In this article, we propose a novel method for higher-order epistasis detection using mixed effect conditional inference forest (epiMEIF). The proposed method is fitted on a group of single nucleotide polymorphisms (SNPs) potentially associated with the phenotype and the tree structure in the forest facilitates the identification of n-way interactions between the SNPs. Additional testing strategies further improve the robustness of the method. We demonstrate its ability to detect true n-way interactions via extensive simulations in both cross-sectional and longitudinal synthetic datasets. This is further illustrated in an application to reveal epistatic interactions from natural variations of cardiac traits in flies (Drosophila). Overall, the method provides a generalized way to identify higher-order interactions from any GWAS data, thereby greatly improving the detection of the genetic architecture underlying complex phenotypes.
Collapse
Affiliation(s)
- Saswati Saha
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems , Marseille , France
| | - Laurent Perrin
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems , Marseille , France
- CNRS , Marseille , France
| | - Laurence Röder
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems , Marseille , France
| | - Christine Brun
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems , Marseille , France
- CNRS , Marseille , France
| | - Lionel Spinelli
- Aix Marseille Univ, INSERM, TAGC (UMR1090), Turing Centre for Living systems , Marseille , France
| |
Collapse
|
11
|
Yu L, Liu W, Wang X, Ye Z, Tan Q, Qiu W, Nie X, Li M, Wang B, Chen W. A review of practical statistical methods used in epidemiological studies to estimate the health effects of multi-pollutant mixture. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 306:119356. [PMID: 35487468 DOI: 10.1016/j.envpol.2022.119356] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 04/11/2022] [Accepted: 04/21/2022] [Indexed: 05/27/2023]
Abstract
Environmental risk factors have been implicated in adverse health effects. Previous epidemiological studies on environmental risk factors mainly analyzed the impact of single pollutant exposure on health, while in fact, humans are constantly exposed to a complex mixture consisted of multiple pollutants/chemicals. In recent years, environmental epidemiologists have sought to assess adverse health effects of exposure to multi-pollutant mixtures based on the diversity of real-world environmental pollutants. However, the statistical challenges are considerable, for instance, multicollinearity and interaction among components of the mixture complicate the statistical analysis. There is currently no consensus on appropriate statistical methods. Here we summarized the practical statistical methods used in environmental epidemiology to estimate health effects of exposure to multi-pollutant mixture, such as Bayesian kernel machine regression (BKMR), weighted quantile sum (WQS) regressions, shrinkage methods (least absolute shrinkage and selection operator, elastic network model, adaptive elastic-net model, and principal component analysis), environment-wide association study (EWAS), etc. We sought to review these statistical methods and determine the application conditions, strengths, weaknesses, and result interpretability of each method, providing crucial insight and assistance for addressing epidemiological statistical issues regarding health effects from multi-pollutant mixture.
Collapse
Affiliation(s)
- Linling Yu
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Wei Liu
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Xing Wang
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Zi Ye
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Qiyou Tan
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Weihong Qiu
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Xiuquan Nie
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Minjing Li
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Bin Wang
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Weihong Chen
- Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China; Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, and State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.
| |
Collapse
|
12
|
Chu X, Jiang M, Liu ZJ. Biomarker interaction selection and disease detection based on multivariate gain ratio. BMC Bioinformatics 2022; 23:176. [PMID: 35550010 PMCID: PMC9103137 DOI: 10.1186/s12859-022-04699-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 04/14/2022] [Indexed: 11/30/2022] Open
Abstract
Background Disease detection is an important aspect of biotherapy. With the development of biotechnology and computer technology, there are many methods to detect disease based on single biomarker. However, biomarker does not influence disease alone in some cases. It’s the interaction between biomarkers that determines disease status. The existing influence measure I-score is used to evaluate the importance of interaction in determining disease status, but there is a deviation about the number of variables in interaction when applying I-score. To solve the problem, we propose a new influence measure Multivariate Gain Ratio (MGR) based on Gain Ratio (GR) of single-variate, which provides us with multivariate combination called interaction. Results We propose a preprocessing verification algorithm based on partial predictor variables to select an appropriate preprocessing method. In this paper, an algorithm for selecting key interactions of biomarkers and applying key interactions to construct a disease detection model is provided. MGR is more credible than I-score in the case of interaction containing small number of variables. Our method behaves better with average accuracy \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$93.13\%$$\end{document}93.13% than I-score of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$91.73\%$$\end{document}91.73% in Breast Cancer Wisconsin (Diagnostic) Dataset. Compared to the classification results \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$89.80\%$$\end{document}89.80% based on all predictor variables, MGR identifies the true main biomarkers and realizes the dimension reduction. In Leukemia Dataset, the experiment results show the effectiveness of MGR with the accuracy of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$97.32\%$$\end{document}97.32% compared to I-score with accuracy \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$89.11\%$$\end{document}89.11%. The results can be explained by the nature of MGR and I-score mentioned above because every key interaction contains a small number of variables in Leukemia Dataset. Conclusions MGR is effective for selecting important biomarkers and biomarker interactions even in high-dimension feature space in which the interaction could contain more than two biomarkers. The prediction ability of interactions selected by MGR is better than I-score in the case of interaction containing small number of variables. MGR is generally applicable to various types of biomarker datasets including cell nuclei, gene, SNPs and protein datasets.
Collapse
Affiliation(s)
- Xiao Chu
- Academy of Mathematics and Systems Science Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China.
| | - Mao Jiang
- Academy of Mathematics and Systems Science Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Zhuo-Jun Liu
- Academy of Mathematics and Systems Science Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
13
|
Lu TP, Kamatani Y, Belbin G, Park T, Hsiao CK. Editorial: Current Status and Future Challenges of Biobank Data Analysis. Front Genet 2022; 13:882611. [PMID: 35495141 PMCID: PMC9047950 DOI: 10.3389/fgene.2022.882611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 03/24/2022] [Indexed: 11/23/2022] Open
Affiliation(s)
- Tzu-Pin Lu
- Department of Public Health, College of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Yoichiro Kamatani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Gillian Belbin
- Institute of Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Chuhsing Kate Hsiao
- Department of Public Health, College of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- *Correspondence: Chuhsing Kate Hsiao,
| |
Collapse
|
14
|
Lim AJW, Lim LJ, Ooi BNS, Koh ET, Tan JWL, Chong SS, Khor CC, Tucker-Kellogg L, Leong KP, Lee CG. Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients. EBioMedicine 2022; 75:103800. [PMID: 35022146 PMCID: PMC8808170 DOI: 10.1016/j.ebiom.2021.103800] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/19/2021] [Accepted: 12/20/2021] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Major challenges in large scale genetic association studies include not only the identification of causative single nucleotide polymorphisms (SNPs), but also accounting for SNP-SNP interactions. This study thus proposes a novel feature engineering approach integrating potentially functional coding haplotypes (pfcHap) with machine-learning (ML) feature selection to identify biologically meaningful, possibly causative genetic factors, that take into consideration potential SNP-SNP interactions within the pfcHap, to best predict for methotrexate (MTX) response in rheumatoid arthritis (RA) patients. METHODS Exome sequencing from 349 RA patients were analysed, of which they were split into training and unseen test set. Inferred pfcHaps were combined with 30 non-genetic features to undergo ML recursive feature elimination with cross-validation using the training set. Predictive capacity and robustness of the selected features were assessed using six popular machine learning models through a train set cross-validation and evaluated in an unseen test set. FINDINGS Significantly, 100 features (95 pfcHaps, 5 non-genetic factors) were identified to have good predictive performance (AUC: 0.776-0.828; Sensitivity: 0.656-0.813; Specificity: 0.684-0.868) across all six ML models in an unseen test dataset for the prediction of MTX response in RA patients. INTERPRETATION Majority of the predictive pfcHap SNPs were predicted to be potentially functional and some of the genes in which the pfcHap resides in were identified to be associated with previously reported MTX/RA pathways. FUNDING Singapore Ministry of Health's National Medical Research Council (NMRC) [NMRC/CBRG/0095/2015; CG12Aug17; CGAug16M012; NMRC/CG/017/2013]; National Cancer Center Research Fund and block funding Duke-NUS Medical School.; Singapore Ministry of Education Academic Research Fund Tier 2 grant MOE2019-T2-1-138.
Collapse
Affiliation(s)
- Ashley J W Lim
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Lee Jin Lim
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Brandon N S Ooi
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Ee Tzun Koh
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore
| | - Justina Wei Lynn Tan
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore
| | - Samuel S Chong
- Dept of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Chiea Chuen Khor
- Division of Human Genetics, Genome Institute of Singapore, Singapore
| | - Lisa Tucker-Kellogg
- Centre for Computational Biology, and Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore
| | - Khai Pang Leong
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore; Clinical Research & Innovation Office, Tan Tock Seng Hospital, Singapore.
| | - Caroline G Lee
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Div of Cellular & Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, Singapore; Duke-NUS Medical School, Singapore; NUS Graduate School, National University of Singapore, Singapore.
| |
Collapse
|
15
|
OUP accepted manuscript. Rheumatology (Oxford) 2022; 61:4175-4186. [DOI: 10.1093/rheumatology/keac032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 01/11/2022] [Indexed: 11/12/2022] Open
|
16
|
Zhu H, Wang J, Gao T, Tian M, Xia L, Cai Q, Zhang C, Xu Y, Zheng X. Contribution of revision amputation vs replantation for certain digits to functional outcomes after traumatic digit amputations: A comparative study based on multicenter prospective cohort. Int J Surg 2021; 96:106164. [PMID: 34774728 DOI: 10.1016/j.ijsu.2021.106164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 10/14/2021] [Accepted: 11/03/2021] [Indexed: 12/23/2022]
Abstract
BACKGROUND Traumatic digit amputations can result in significant impairment. Optimal surgical treatment is unclear for certain digits in various amputation patterns. Our aim was to compare the contribution of revision amputation vs replantation for each particular digit to functional outcomes. MATERIALS AND METHODS Prospective cohort study at three tertiary hospitals was conducted in China. Eligible participants were 3192 patients with traumatic digit amputations enrolled from January 1, 2014, to January 1, 2018. The primary outcome was Michigan Hand Outcomes Questionnaire (MHQ) scores 2 years after initial surgery. Secondary outcome was score on the Disabilities of the Arm, Shoulder, and Hand (DASH). RESULTS Of 3192 enrolled patients, 2890 completed the study. Main-effect linear regression showed that participants with replantation of thumb, index, long, and ring (proximal to the proximal interphalangeal [PIP] joint) fingers had significantly better MHQ scores compared to participants with the corresponding finger revision amputation. DASH results were comparable. Finger-finger interaction analyses conducted with multifactor dimensionality reduction (MDR) revealed that the small finger and ring finger had the smallest and greatest interactions with other fingers, respectively. After stratification by amputation level of thumb, index finger, or long finger, linear regression showed that replantation of the ring finger distal to the PIP joint resulted in better MHQ and DASH when the thumb or long finger was also traumatically amputated proximal to the IP/PIP joint. CONCLUSIONS Replantation of the thumb, index, long, and ring (proximal to PIP joint) fingers is preferable to revision amputation, regardless of amputation pattern. Replantation of the ring finger amputated distal to PIP was beneficial only when the thumb or long finger was amputated proximal to IP/PIP joint. Replantation or revision amputation of the small finger was indistinguishable in terms of functional outcome. Future investigations and clinical decisions should take into account the role of finger-finger interactions.
Collapse
Affiliation(s)
- Hongyi Zhu
- Department of Orthopaedic Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, No. 600 Yishan Road, Xuhui District, Shanghai, China Department of Orthopaedic Surgery, 80 PLA Hospital, No. 256, Beigong West Street, Weifang City, Shandong, China Department of Hand Surgery, Xi'an Honghui Hospital, No. 76, Nanguo Road, Nanshaomen, Xi'an, Shaanxi, China
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values. Genes (Basel) 2021; 12:genes12111754. [PMID: 34828360 PMCID: PMC8626003 DOI: 10.3390/genes12111754] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 10/23/2021] [Accepted: 10/23/2021] [Indexed: 11/17/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a prototypical neurodegenerative disease characterized by progressive degeneration of motor neurons to severely effect the functionality to control voluntary muscle movement. Most of the non-additive genetic aberrations responsible for ALS make its molecular classification very challenging along with limited sample size, curse of dimensionality, class imbalance and noise in the data. Deep learning methods have been successful in many other related areas but have low minority class accuracy and suffer from the lack of explainability when used directly with RNA expression features for ALS molecular classification. In this paper, we propose a deep-learning-based molecular ALS classification and interpretation framework. Our framework is based on training a convolution neural network (CNN) on images obtained from converting RNA expression values into pixels based on DeepInsight similarity technique. Then, we employed Shapley additive explanations (SHAP) to extract pixels with higher relevance to ALS classifications. These pixels were mapped back to the genes which made them up. This enabled us to classify ALS samples with high accuracy for a minority class along with identifying genes that might be playing an important role in ALS molecular classifications. Taken together with RNA expression images classified with CNN, our preliminary analysis of the genes identified by SHAP interpretation demonstrate the value of utilizing Machine Learning to perform molecular classification of ALS and uncover disease-associated genes.
Collapse
|
18
|
Blumenthal DB, Baumbach J, Hoffmann M, Kacprowski T, List M. A framework for modeling epistatic interaction. Bioinformatics 2021; 37:1708-1716. [PMID: 33252645 DOI: 10.1093/bioinformatics/btaa990] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 10/21/2020] [Accepted: 11/16/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recently, various tools for detecting single nucleotide polymorphisms (SNPs) involved in epistasis have been developed. However, no studies evaluate the employed statistical epistasis models such as the χ2-test or quadratic regression independently of the tools that use them. Such an independent evaluation is crucial for developing improved epistasis detection tools, for it allows to decide if a tool's performance should be attributed to the epistasis model or to the optimization strategy run on top of it. RESULTS We present a protocol for evaluating epistasis models independently of the tools they are used in and generalize existing models designed for dichotomous phenotypes to the categorical and quantitative case. In addition, we propose a new model which scores candidate SNP sets by computing maximum likelihood distributions for the observed phenotypes in the cells of their penetrance tables. Extensive experiments show that the proposed maximum likelihood model outperforms three widely used epistasis models in most cases. The experiments also provide valuable insights into the properties of existing models, for instance, that quadratic regression perform particularly well on instances with quantitative phenotypes. AVAILABILITY AND IMPLEMENTATION The evaluation protocol and all compared models are implemented in C++ and are supported under Linux and macOS. They are available at https://github.com/baumbachlab/genepiseeker/, along with test datasets and scripts to reproduce the experiments. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David B Blumenthal
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Markus Hoffmann
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Tim Kacprowski
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
19
|
What Can Machine Learning Approaches in Genomics Tell Us about the Molecular Basis of Amyotrophic Lateral Sclerosis? J Pers Med 2020; 10:jpm10040247. [PMID: 33256133 PMCID: PMC7712791 DOI: 10.3390/jpm10040247] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 11/21/2020] [Accepted: 11/23/2020] [Indexed: 02/07/2023] Open
Abstract
Amyotrophic Lateral Sclerosis (ALS) is the most common late-onset motor neuron disorder, but our current knowledge of the molecular mechanisms and pathways underlying this disease remain elusive. This review (1) systematically identifies machine learning studies aimed at the understanding of the genetic architecture of ALS, (2) outlines the main challenges faced and compares the different approaches that have been used to confront them, and (3) compares the experimental designs and results produced by those approaches and describes their reproducibility in terms of biological results and the performances of the machine learning models. The majority of the collected studies incorporated prior knowledge of ALS into their feature selection approaches, and trained their machine learning models using genomic data combined with other types of mined knowledge including functional associations, protein-protein interactions, disease/tissue-specific information, epigenetic data, and known ALS phenotype-genotype associations. The importance of incorporating gene-gene interactions and cis-regulatory elements into the experimental design of future ALS machine learning studies is highlighted. Lastly, it is suggested that future advances in the genomic and machine learning fields will bring about a better understanding of ALS genetic architecture, and enable improved personalized approaches to this and other devastating and complex diseases.
Collapse
|
20
|
Blumenthal DB, Viola L, List M, Baumbach J, Tieri P, Kacprowski T. EpiGEN: an epistasis simulation pipeline. Bioinformatics 2020; 36:4957-4959. [DOI: 10.1093/bioinformatics/btaa245] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 04/03/2020] [Accepted: 04/08/2020] [Indexed: 02/06/2023] Open
Abstract
Abstract
Summary
Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes.
Availability and implementation
EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David B Blumenthal
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| | - Lorenzo Viola
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| | - Markus List
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| | - Jan Baumbach
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| | - Paolo Tieri
- CNR National Research Council, IAC Institute for Applied Computing, 00185 Rome, Italy
| | - Tim Kacprowski
- Technical University of Munich, School of Life Sciences Weihenstephan, Chair of Experimental Bioinformatics, 85354 Freising, Germany
| |
Collapse
|
21
|
Chirita-Emandi A, Serban CL, Paul C, Andreescu N, Velea I, Mihailescu A, Serafim V, Tiugan DA, Tutac P, Zimbru C, Puiu M, Niculescu MD. CHDH-PNPLA3 Gene-Gene Interactions Predict Insulin Resistance in Children with Obesity. Diabetes Metab Syndr Obes 2020; 13:4483-4494. [PMID: 33239899 PMCID: PMC7682614 DOI: 10.2147/dmso.s277268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Accepted: 09/26/2020] [Indexed: 11/26/2022] Open
Abstract
INTRODUCTION Insulin resistance plays a major role in metabolic syndrome and is recognized as the most common risk factor for non-alcoholic fatty liver disease (NAFLD). Identifying predictors for insulin resistance could optimize screening and prevention. PURPOSE To evaluate the contribution of multiple single nucleotide polymorphisms across genes related to NAFLD and choline metabolism, in predicting insulin resistance in children with obesity. METHODS One hundred fifty-three children with obesity (73 girls), aged 7-18 years, were evaluated within the NutriGen Study (ClinicalTrials.gov-NCT02837367). Insulin resistance was defined by Homeostatic Model Assessment for insulin-resistance cut-offs that accommodated pubertal and gender differences. Anthropometric, metabolic, intake-related variables, and 55 single nucleotide polymorphisms related to NAFLD and choline metabolism were evaluated. Gene-gene interaction effects were assessed using Multiple Data Reduction Software. RESULTS Sixty percent (93/153) of participants showed insulin resistance (58.7% of boys, 63% of girls). Children with insulin resistance presented significantly higher values for standardized body mass index, triglycerides, transaminases and plasma choline when compared to those without insulin resistance. Out of 52 single nucleotide polymorphisms analysed, the interaction between genotypes CHDH(rs12676) and PNPLA3(rs738409) predicted insulin resistance. The model presented a 6/10 cross-validation consistency and 0.58 testing accuracy. Plasma choline levels and alanine aminotransferase modulated the gene interaction effect, significantly improving the model. CONCLUSION The interaction between genotypes in CHDH and PNPLA3 genes, modulated by choline and alanine aminotransferase levels, predicted insulin-resistance status in children with obesity. If replicated in larger cohorts, these findings could help identify metabolic risk in children with obesity.
Collapse
Affiliation(s)
- Adela Chirita-Emandi
- Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
- Regional Center of Medical Genetics Timis, Clinical Emergency Hospital for Children “Louis Turcanu”, Timisoara, Romania
| | - Costela Lacrimioara Serban
- Regional Center of Medical Genetics Timis, Clinical Emergency Hospital for Children “Louis Turcanu”, Timisoara, Romania
- Department of Functional Sciences, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
| | - Corina Paul
- Pediatrics Department – Pediatrics Discipline II, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
- Pediatrics, Endocrinology and Diabetes Department, Clinic II Pediatrics, “Pius Branzeu” Clinical Emergency County Hospital, Timisoara, Romania
| | - Nicoleta Andreescu
- Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
- Regional Center of Medical Genetics Timis, Clinical Emergency Hospital for Children “Louis Turcanu”, Timisoara, Romania
- Correspondence: Nicoleta Andreescu Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania Email
| | - Iulian Velea
- Pediatrics Department – Pediatrics Discipline II, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
- Pediatrics, Endocrinology and Diabetes Department, Clinic II Pediatrics, “Pius Branzeu” Clinical Emergency County Hospital, Timisoara, Romania
| | - Alexandra Mihailescu
- Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
| | - Vlad Serafim
- Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
- The National Institute of Research and Development for Biological Sciences, Bucharest, Romania
| | - Diana-Andreea Tiugan
- Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
| | - Paul Tutac
- Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
| | - Cristian Zimbru
- Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
- Department of Automation and Applied Informatics, Politehnica University of Timisoara, Timisoara, Romania
| | - Maria Puiu
- Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
- Regional Center of Medical Genetics Timis, Clinical Emergency Hospital for Children “Louis Turcanu”, Timisoara, Romania
| | - Mihai Dinu Niculescu
- Department of Microscopic Morphology - Genetics, Center of Genomic Medicine, University of Medicine and Pharmacy “Victor Babes”, Timisoara, Romania
- Advanced Nutrigenomics, Cary, NC27511, USA
| |
Collapse
|