1
|
Butt AH, Alkhalifah T, Alturise F, Khan YD. A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns. Sci Rep 2022; 12:15183. [PMID: 36071071 PMCID: PMC9452539 DOI: 10.1038/s41598-022-19099-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 08/24/2022] [Indexed: 11/26/2022] Open
Abstract
Enhancers regulate gene expression, by playing a crucial role in the synthesis of RNAs and proteins. They do not directly encode proteins or RNA molecules. In order to control gene expression, it is important to predict enhancers and their potency. Given their distance from the target gene, lack of common motifs, and tissue/cell specificity, enhancer regions are thought to be difficult to predict in DNA sequences. Recently, a number of bioinformatics tools were created to distinguish enhancers from other regulatory components and to pinpoint their advantages. However, because the quality of its prediction method needs to be improved, its practical application value must also be improved. Based on nucleotide composition and statistical moment-based features, the current study suggests a novel method for identifying enhancers and non-enhancers and evaluating their strength. The proposed study outperformed state-of-the-art techniques using fivefold and tenfold cross-validation in terms of accuracy. The accuracy from the current study results in 86.5% and 72.3% in enhancer site and its strength prediction respectively. The results of the suggested methodology point to the potential for more efficient and successful outcomes when statistical moment-based features are used. The current study's source code is available to the research community at https://github.com/csbioinfopk/enpred.
Collapse
Affiliation(s)
- Ahmad Hassan Butt
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Saudi Arabia.
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
2
|
DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features. Appl Bionics Biomech 2022; 2022:5483115. [PMID: 35465187 PMCID: PMC9020926 DOI: 10.1155/2022/5483115] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/25/2021] [Accepted: 02/05/2022] [Indexed: 12/29/2022] Open
Abstract
In the domain of genome annotation, the identification of DNA-binding protein is one of the crucial challenges. DNA is considered a blueprint for the cell. It contained all necessary information for building and maintaining the trait of an organism. It is DNA, which makes a living thing, a living thing. Protein interaction with DNA performs an essential role in regulating DNA functions such as DNA repair, transcription, and regulation. Identification of these proteins is a crucial task for understanding the regulation of genes. Several methods have been developed to identify the binding sites of DNA and protein depending upon the structures and sequences, but they were costly and time-consuming. Therefore, we propose a methodology named “DNAPred_Prot”, which uses various position and frequency-dependent features from protein sequences for efficient and effective prediction of DNA-binding proteins. Using testing techniques like 10-fold cross-validation and jackknife testing an accuracy of 94.95% and 95.11% was yielded, respectively. The results of SVM and ANN were also compared with those of a random forest classifier. The robustness of the proposed model was evaluated by using the independent dataset PDB186, and an accuracy of 91.47% was achieved by it. From these results, it can be predicted that the suggested methodology performs better than other extant methods for the identification of DNA-binding proteins.
Collapse
|
3
|
Nakayama T, Wang Q, Okadera T. Evaluation of spatio-temporal variations in water availability using a process-based eco-hydrology model in arid and semi-arid regions of Mongolia. Ecol Modell 2021. [DOI: 10.1016/j.ecolmodel.2020.109404] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
4
|
Guimerà R, Reichardt I, Aguilar-Mogas A, Massucci FA, Miranda M, Pallarès J, Sales-Pardo M. A Bayesian machine scientist to aid in the solution of challenging scientific problems. SCIENCE ADVANCES 2020; 6:eaav6971. [PMID: 32064326 PMCID: PMC6994216 DOI: 10.1126/sciadv.aav6971] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 11/20/2019] [Indexed: 05/06/2023]
Abstract
Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, we introduce a Bayesian machine scientist, which establishes the plausibility of models using explicit approximations to the exact marginal posterior over models and establishes its prior expectations about models by learning from a large empirical corpus of mathematical expressions. It explores the space of models using Markov chain Monte Carlo. We show that this approach uncovers accurate models for synthetic and real data and provides out-of-sample predictions that are more accurate than those of existing approaches and of other nonparametric methods.
Collapse
Affiliation(s)
- Roger Guimerà
- ICREA, Barcelona 08010, Catalonia, Spain
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
- Corresponding author.
| | - Ignasi Reichardt
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | - Antoni Aguilar-Mogas
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
- Division of Research, Economic Development and Engagement, East Carolina University, Greenville, NC 27858, USA
| | - Francesco A. Massucci
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
- SIRIS Lab, Research Division of SIRIS Academic, Barcelona 08003, Catalonia, Spain
| | - Manuel Miranda
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | - Jordi Pallarès
- Department of Mechanical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | - Marta Sales-Pardo
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| |
Collapse
|
5
|
Tokuda IT, Levnajic Z, Ishimura K. A practical method for estimating coupling functions in complex dynamical systems. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2019; 377:20190015. [PMID: 31656141 PMCID: PMC6833996 DOI: 10.1098/rsta.2019.0015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/02/2019] [Indexed: 06/10/2023]
Abstract
A foremost challenge in modern network science is the inverse problem of reconstruction (inference) of coupling equations and network topology from the measurements of the network dynamics. Of particular interest are the methods that can operate on real (empirical) data without interfering with the system. One such earlier attempt (Tokuda et al. 2007 Phys. Rev. Lett. 99, 064101. (doi:10.1103/PhysRevLett.99.064101)) was a method suited for general limit-cycle oscillators, yielding both oscillators' natural frequencies and coupling functions between them (phase equations) from empirically measured time series. The present paper reviews the above method in a way comprehensive to domain-scientists other than physics. It also presents applications of the method to (i) detection of the network connectivity, (ii) inference of the phase sensitivity function, (iii) approximation of the interaction among phase-coherent chaotic oscillators, and (iv) experimental data from a forced Van der Pol electric circuit. This reaffirms the range of applicability of the method for reconstructing coupling functions and makes it accessible to a much wider scientific community. This article is part of the theme issue 'Coupling functions: dynamical interaction mechanisms in the physical, biological and social sciences'.
Collapse
Affiliation(s)
- Isao T. Tokuda
- Department of Mechanical Engineering, Ritsumeikan University, Kusatsu, Japan
| | - Zoran Levnajic
- Complex Systems and Data Science Lab, Faculty of Information Studies in Novo Mesto, Novo Mesto, Slovenia
| | - Kazuyoshi Ishimura
- Department of Mechanical Engineering, Ritsumeikan University, Kusatsu, Japan
| |
Collapse
|
6
|
Liang Y, Kelemen A. Dynamic modeling and network approaches for omics time course data: overview of computational approaches and applications. Brief Bioinform 2019; 19:1051-1068. [PMID: 28430854 DOI: 10.1093/bib/bbx036] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Indexed: 12/23/2022] Open
Abstract
Inferring networks and dynamics of genes, proteins, cells and other biological entities from high-throughput biological omics data is a central and challenging issue in computational and systems biology. This is essential for understanding the complexity of human health, disease susceptibility and pathogenesis for Predictive, Preventive, Personalized and Participatory (P4) system and precision medicine. The delineation of the possible interactions of all genes/proteins in a genome/proteome is a task for which conventional experimental techniques are ill suited. Urgently needed are rapid and inexpensive computational and statistical methods that can identify interacting candidate disease genes or drug targets out of thousands that can be further investigated or validated by experimentations. Moreover, identifying biological dynamic systems, and simultaneously estimating the important kinetic structural and functional parameters, which may not be experimentally accessible could be important directions for drug-disease-gene network studies. In this article, we present an overview and comparison of recent developments of dynamic modeling and network approaches for time-course omics data, and their applications to various biological systems, health conditions and disease statuses. Moreover, various data reduction and analytical schemes ranging from mathematical to computational to statistical methods are compared including their merits, drawbacks and limitations. The most recent software, associated web resources and other potentials for the compared methods are also presented and discussed in detail.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD, USA
| | - Arpad Kelemen
- Department of Family and Community Health, University of Maryland, Baltimore, MD, USA
| |
Collapse
|
7
|
Shiao SPK, Grayson J, Yu CH. Gene-Metabolite Interaction in the One Carbon Metabolism Pathway: Predictors of Colorectal Cancer in Multi-Ethnic Families. J Pers Med 2018; 8:E26. [PMID: 30082654 PMCID: PMC6164460 DOI: 10.3390/jpm8030026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 07/14/2018] [Accepted: 08/01/2018] [Indexed: 02/07/2023] Open
Abstract
For personalized healthcare, the purpose of this study was to examine the key genes and metabolites in the one-carbon metabolism (OCM) pathway and their interactions as predictors of colorectal cancer (CRC) in multi-ethnic families. In this proof-of-concept study, we included a total of 30 participants, 15 CRC cases and 15 matched family/friends representing major ethnic groups in southern California. Analytics based on supervised machine learning were applied, with the target variable being specified as cancer, including the ensemble method and generalized regression (GR) prediction. Elastic Net with Akaike's Information Criterion with correction (AICc) and Leave-One-Out cross validation GR methods were used to validate the results for enhanced optimality, prediction, and reproducibility. The results revealed that despite some family members sharing genetic heritage, the CRC group had greater combined gene polymorphism-mutations than the family controls (p < 0.1) for five genes including MTHFR C677T, MTHFR A1298C, MTR A2756G, MTRR A66G, and DHFR 19bp. Blood metabolites including homocysteine (7 µmol/L), methyl-folate (40 nmol/L) with total gene mutations (≥4); age (51 years) and vegetable intake (2 cups), and interactions of gene mutations and methylmalonic acid (MMA) (400 nmol/L) were significant predictors (all p < 0.0001) using the AICc. The results were validated by a 3% misclassification rate, AICc of 26, and >99% area under the receiver operating characteristic curve. These results point to the important roles of blood metabolites as potential markers in the prevention of CRC. Future intervention studies can be designed to target the ways to mitigate the enzyme-metabolite deficiencies in the OCM pathway to prevent cancer.
Collapse
Affiliation(s)
- S Pamela K Shiao
- Medical College of Georgia, Augusta University, Augusta, GA 30912, USA.
| | - James Grayson
- Hull College of Business, Augusta University, Augusta, GA 30912, USA.
| | - Chong Ho Yu
- Department of Psychology, Azusa Pacific University, Azusa, CA 91702, USA.
| |
Collapse
|
8
|
Gonzales MC, Grayson J, Lie A, Yu CH, Shiao SYPK. Gene-environment interactions and predictors of breast cancer in family-based multi-ethnic groups. Oncotarget 2018; 9:29019-29035. [PMID: 30018733 PMCID: PMC6044380 DOI: 10.18632/oncotarget.25520] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 05/08/2018] [Indexed: 12/30/2022] Open
Abstract
Breast cancer (BC) is the most common cancer in women worldwide and second leading cause of cancer-related death. Understanding gene-environment interactions could play a critical role for next stage of BC prevention efforts. Hence, the purpose of this study was to examine the key gene-environmental factors affecting the risks of BC in a diverse sample. Five genes in one-carbon metabolism pathway including MTHFR 677, MTHFR 1298, MTR 2756, MTRR 66, and DHFR 19bp together with demographics, lifestyle, and dietary intake factors were examined in association with BC risks. A total of 80 participants (40 BC cases and 40 family/friend controls) in southern California were interviewed and provided salivary samples for genotyping. We presented the first study utilizing both conventional and new analytics including ensemble method and predictive modeling based on smallest errors to predict BC risks. Predictive modeling of Generalized Regression Elastic Net Leave-One-Out demonstrated alcohol use (p = 0.0126) and age (p < 0.0001) as significant predictors; and significant interactions were noted between body mass index (BMI) and alcohol use (p = 0.0027), and between BMI and MTR 2756 polymorphisms (p = 0.0090). Our findings identified the modifiable lifestyle factors in gene-environment interactions that are valuable for BC prevention.
Collapse
Affiliation(s)
- Mildred C Gonzales
- Los Angeles County College of Nursing and Allied Health, Los Angeles, CA, USA
| | - James Grayson
- Hull College of Business, Augusta University, Augusta, GA, USA
| | - Amanda Lie
- Citrus Valley Health Partners, Foothill Presbyterian Hospital, Glendora, CA, USA
| | | | | |
Collapse
|
9
|
Shiao SPK, Grayson J, Lie A, Yu CH. Personalized Nutrition-Genes, Diet, and Related Interactive Parameters as Predictors of Cancer in Multiethnic Colorectal Cancer Families. Nutrients 2018; 10:nu10060795. [PMID: 29925788 PMCID: PMC6024706 DOI: 10.3390/nu10060795] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 06/13/2018] [Accepted: 06/19/2018] [Indexed: 01/04/2023] Open
Abstract
To personalize nutrition, the purpose of this study was to examine five key genes in the folate metabolism pathway, and dietary parameters and related interactive parameters as predictors of colorectal cancer (CRC) by measuring the healthy eating index (HEI) in multiethnic families. The five genes included methylenetetrahydrofolate reductase (MTHFR) 677 and 1298, methionine synthase (MTR) 2756, methionine synthase reductase (MTRR 66), and dihydrofolate reductase (DHFR) 19bp, and they were used to compute a total gene mutation score. We included 53 families, 53 CRC patients and 53 paired family friend members of diverse population groups in Southern California. We measured multidimensional data using the ensemble bootstrap forest method to identify variables of importance within domains of genetic, demographic, and dietary parameters to achieve dimension reduction. We then constructed predictive generalized regression (GR) modeling with a supervised machine learning validation procedure with the target variable (cancer status) being specified to validate the results to allow enhanced prediction and reproducibility. The results showed that the CRC group had increased total gene mutation scores compared to the family members (p < 0.05). Using the Akaike’s information criterion and Leave-One-Out cross validation GR methods, the HEI was interactive with thiamine (vitamin B1), which is a new finding for the literature. The natural food sources for thiamine include whole grains, legumes, and some meats and fish which HEI scoring included as part of healthy portions (versus limiting portions on salt, saturated fat and empty calories). Additional predictors included age, as well as gender and the interaction of MTHFR 677 with overweight status (measured by body mass index) in predicting CRC, with the cancer group having more men and overweight cases. The HEI score was significant when split at the median score of 77 into greater or less scores, confirmed through the machine-learning recursive tree method and predictive modeling, although an HEI score of greater than 80 is the US national standard set value for a good diet. The HEI and healthy eating are modifiable factors for healthy living in relation to dietary parameters and cancer prevention, and they can be used for personalized nutrition in the precision-based healthcare era.
Collapse
Affiliation(s)
- S Pamela K Shiao
- College of Nursing and Medical College of Georgia, Augusta University, Augusta, GA 30912, USA.
| | - James Grayson
- Hull College of Business, Augusta University, Augusta, GA 30912, USA.
| | - Amanda Lie
- Citrus Valley Health Partners, Foothill Presbyterian Hospital, Glendora, CA 91741, USA.
| | - Chong Ho Yu
- School of Business, University of Phoenix, Pasadena, CA 91101, USA.
| |
Collapse
|
10
|
Predictors of the Healthy Eating Index and Glycemic Index in Multi-Ethnic Colorectal Cancer Families. Nutrients 2018; 10:nu10060674. [PMID: 29861441 PMCID: PMC6024360 DOI: 10.3390/nu10060674] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 05/22/2018] [Accepted: 05/24/2018] [Indexed: 12/13/2022] Open
Abstract
For personalized nutrition in preparation for precision healthcare, we examined the predictors of healthy eating, using the healthy eating index (HEI) and glycemic index (GI), in family-based multi-ethnic colorectal cancer (CRC) families. A total of 106 participants, 53 CRC cases and 53 family members from multi-ethnic families participated in the study. Machine learning validation procedures, including the ensemble method and generalized regression prediction, Elastic Net with Akaike’s Information Criterion with correction and Leave-One-Out cross validation methods, were applied to validate the results for enhanced prediction and reproducibility. Models were compared based on HEI scales for the scores of 77 versus 80 as the status of healthy eating, predicted from individual dietary parameters and health outcomes. Gender and CRC status were interactive as additional predictors of HEI based on the HEI score of 77. Predictors of HEI 80 as the criterion score of a good diet included five significant dietary parameters (with intake amount): whole fruit (1 cup), milk or milk alternative such as soy drinks (6 oz), whole grain (1 oz), saturated fat (15 g), and oil and nuts (1 oz). Compared to the GI models, HEI models presented more accurate and fitted models. Milk or a milk alternative such as soy drink (6 oz) is the common significant parameter across HEI and GI predictive models. These results point to the importance of healthy eating, with the appropriate amount of healthy foods, as modifiable factors for cancer prevention.
Collapse
|
11
|
Kamble PS, Collins J, Harvey RA, Prewitt T, Kimball E, Deluzio T, Allen E, Bouchard JR. Understanding Prediabetes in a Medicare Advantage Population Using Data Adaptive Techniques. Popul Health Manag 2018; 21:477-485. [PMID: 29648934 DOI: 10.1089/pop.2017.0165] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The objective was to identify individuals with undiagnosed prediabetes from administrative data using adaptive techniques. The data source was a national Medicare Advantage Prescription Drug (MAPD) plan administrative data set. A retrospective, cross-sectional study developed and evaluated data adaptive logistic regression, decision tree, neural network, and ensemble predictive models for metabolic syndrome and prediabetes using 3 mutually exclusive cohorts (N = 279,903). The misclassification rate (MCR), average squared error (ASE), c-statistics, sensitivity (SN), and false positive (FP) rates were compared to select the final predictive models. MAPD individuals with continuous enrollment from 2013 to 2014 were included. Metabolic syndrome and prediabetes were defined using clinical guidelines, diagnosis, and laboratory data. A total of 512 variables identified through subject matter expertise in addition to utilizing all data available were evaluated for the modeling. The ensemble model demonstrated better discrimination (c-statistics, MCR, and ASE of 0.83, 0.24, and 0.16, respectively), high SN, and low FP rate in predicting metabolic syndrome than the individual data adaptive modeling techniques. Logistic regression demonstrated better discrimination (c-statistics, MCR, and ASE of 0.67, 0.13, and 0.11 respectively), high SN, and low FP rate in predicting prediabetes than the other adaptive modeling techniques or ensemble methods. The scored data predicted prediabetes in 44% of the MAPD population, which is comparable to 2005-2006 National Health and Nutrition Examination Survey prediabetes rates of 41%. The logistic regression model demonstrated good performance in predicting undiagnosed prediabetes in MAPD individuals.
Collapse
Affiliation(s)
- Pravin S Kamble
- 1 Comprehensive Health Insights, Inc. , Louisville, Kentucky
| | - Jenna Collins
- 1 Comprehensive Health Insights, Inc. , Louisville, Kentucky
| | | | | | - Ed Kimball
- 3 Novo Nordisk, Inc. , Plainsboro, New Jersey
| | | | - Elsie Allen
- 3 Novo Nordisk, Inc. , Plainsboro, New Jersey
| | | |
Collapse
|
12
|
Shiao SPK, Grayson J, Yu CH, Wasek B, Bottiglieri T. Gene Environment Interactions and Predictors of Colorectal Cancer in Family-Based, Multi-Ethnic Groups. J Pers Med 2018; 8:E10. [PMID: 29462916 PMCID: PMC5872084 DOI: 10.3390/jpm8010010] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Revised: 02/14/2018] [Accepted: 02/14/2018] [Indexed: 12/11/2022] Open
Abstract
For the personalization of polygenic/omics-based health care, the purpose of this study was to examine the gene-environment interactions and predictors of colorectal cancer (CRC) by including five key genes in the one-carbon metabolism pathways. In this proof-of-concept study, we included a total of 54 families and 108 participants, 54 CRC cases and 54 matched family friends representing four major racial ethnic groups in southern California (White, Asian, Hispanics, and Black). We used three phases of data analytics, including exploratory, family-based analyses adjusting for the dependence within the family for sharing genetic heritage, the ensemble method, and generalized regression models for predictive modeling with a machine learning validation procedure to validate the results for enhanced prediction and reproducibility. The results revealed that despite the family members sharing genetic heritage, the CRC group had greater combined gene polymorphism rates than the family controls (p < 0.05), on MTHFR C677T, MTR A2756G, MTRR A66G, and DHFR 19 bp except MTHFR A1298C. Four racial groups presented different polymorphism rates for four genes (all p < 0.05) except MTHFR A1298C. Following the ensemble method, the most influential factors were identified, and the best predictive models were generated by using the generalized regression models, with Akaike's information criterion and leave-one-out cross validation methods. Body mass index (BMI) and gender were consistent predictors of CRC for both models when individual genes versus total polymorphism counts were used, and alcohol use was interactive with BMI status. Body mass index status was also interactive with both gender and MTHFR C677T gene polymorphism, and the exposure to environmental pollutants was an additional predictor. These results point to the important roles of environmental and modifiable factors in relation to gene-environment interactions in the prevention of CRC.
Collapse
Affiliation(s)
- S Pamela K Shiao
- College of Nursing and Medical College of Georgia, Augusta University, Augusta, GA 30912, USA.
| | - James Grayson
- College of Business, Augusta University, Augusta, GA 30912, USA.
| | - Chong Ho Yu
- University of Phoenix, Pasadena, CA 91101, USA.
| | - Brandi Wasek
- Center of Metabolomics, Institute of Metabolic Disease, Baylor Scott & White Research Institute, Dallas, TX 75226, USA.
| | - Teodoro Bottiglieri
- Center of Metabolomics, Institute of Metabolic Disease, Baylor Scott & White Research Institute, Dallas, TX 75226, USA.
| |
Collapse
|
13
|
Michael E, Singh BK, Mayala BK, Smith ME, Hampton S, Nabrzyski J. Continental-scale, data-driven predictive assessment of eliminating the vector-borne disease, lymphatic filariasis, in sub-Saharan Africa by 2020. BMC Med 2017; 15:176. [PMID: 28950862 PMCID: PMC5615442 DOI: 10.1186/s12916-017-0933-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Accepted: 08/16/2017] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND There are growing demands for predicting the prospects of achieving the global elimination of neglected tropical diseases as a result of the institution of large-scale nation-wide intervention programs by the WHO-set target year of 2020. Such predictions will be uncertain due to the impacts that spatial heterogeneity and scaling effects will have on parasite transmission processes, which will introduce significant aggregation errors into any attempt aiming to predict the outcomes of interventions at the broader spatial levels relevant to policy making. We describe a modeling platform that addresses this problem of upscaling from local settings to facilitate predictions at regional levels by the discovery and use of locality-specific transmission models, and we illustrate the utility of using this approach to evaluate the prospects for eliminating the vector-borne disease, lymphatic filariasis (LF), in sub-Saharan Africa by the WHO target year of 2020 using currently applied or newly proposed intervention strategies. METHODS AND RESULTS: We show how a computational platform that couples site-specific data discovery with model fitting and calibration can allow both learning of local LF transmission models and simulations of the impact of interventions that take a fuller account of the fine-scale heterogeneous transmission of this parasitic disease within endemic countries. We highlight how such a spatially hierarchical modeling tool that incorporates actual data regarding the roll-out of national drug treatment programs and spatial variability in infection patterns into the modeling process can produce more realistic predictions of timelines to LF elimination at coarse spatial scales, ranging from district to country to continental levels. Our results show that when locally applicable extinction thresholds are used, only three countries are likely to meet the goal of LF elimination by 2020 using currently applied mass drug treatments, and that switching to more intensive drug regimens, increasing the frequency of treatments, or switching to new triple drug regimens will be required if LF elimination is to be accelerated in Africa. The proportion of countries that would meet the goal of eliminating LF by 2020 may, however, reach up to 24/36 if the WHO 1% microfilaremia prevalence threshold is used and sequential mass drug deliveries are applied in countries. CONCLUSIONS We have developed and applied a data-driven spatially hierarchical computational platform that uses the discovery of locally applicable transmission models in order to predict the prospects for eliminating the macroparasitic disease, LF, at the coarser country level in sub-Saharan Africa. We show that fine-scale spatial heterogeneity in local parasite transmission and extinction dynamics, as well as the exact nature of intervention roll-outs in countries, will impact the timelines to achieving national LF elimination on this continent.
Collapse
Affiliation(s)
- Edwin Michael
- Department of Biological Sciences, University of Notre Dame, Galvin Life Science Center, Notre Dame, IN, 46556, USA.
| | - Brajendra K Singh
- Department of Biological Sciences, University of Notre Dame, Galvin Life Science Center, Notre Dame, IN, 46556, USA
| | - Benjamin K Mayala
- Department of Biological Sciences, University of Notre Dame, Galvin Life Science Center, Notre Dame, IN, 46556, USA
| | - Morgan E Smith
- Department of Biological Sciences, University of Notre Dame, Galvin Life Science Center, Notre Dame, IN, 46556, USA
| | - Scott Hampton
- Center for Research Computing, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Jaroslaw Nabrzyski
- Center for Research Computing, University of Notre Dame, Notre Dame, IN, 46556, USA
| |
Collapse
|
14
|
Liang Y, Kelemen A. Computational dynamic approaches for temporal omics data with applications to systems medicine. BioData Min 2017. [PMID: 28638442 PMCID: PMC5473988 DOI: 10.1186/s13040-017-0140-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Modeling and predicting biological dynamic systems and simultaneously estimating the kinetic structural and functional parameters are extremely important in systems and computational biology. This is key for understanding the complexity of the human health, drug response, disease susceptibility and pathogenesis for systems medicine. Temporal omics data used to measure the dynamic biological systems are essentials to discover complex biological interactions and clinical mechanism and causations. However, the delineation of the possible associations and causalities of genes, proteins, metabolites, cells and other biological entities from high throughput time course omics data is challenging for which conventional experimental techniques are not suited in the big omics era. In this paper, we present various recently developed dynamic trajectory and causal network approaches for temporal omics data, which are extremely useful for those researchers who want to start working in this challenging research area. Moreover, applications to various biological systems, health conditions and disease status, and examples that summarize the state-of-the art performances depending on different specific mining tasks are presented. We critically discuss the merits, drawbacks and limitations of the approaches, and the associated main challenges for the years ahead. The most recent computing tools and software to analyze specific problem type, associated platform resources, and other potentials for the dynamic trajectory and interaction methods are also presented and discussed in detail.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD 21201 USA
| | - Arpad Kelemen
- Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201 USA
| |
Collapse
|