1
|
Marcos-Zambrano LJ, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V, Aasmets O, Berland M, Gruca A, Hasic J, Hron K, Klammsteiner T, Kolev M, Lahti L, Lopes MB, Moreno V, Naskinova I, Org E, Paciência I, Papoutsoglou G, Shigdel R, Stres B, Vilne B, Yousef M, Zdravevski E, Tsamardinos I, Carrillo de Santa Pau E, Claesson MJ, Moreno-Indias I, Truu J. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment. Front Microbiol 2021; 12:634511. [PMID: 33737920 PMCID: PMC7962872 DOI: 10.3389/fmicb.2021.634511] [Citation(s) in RCA: 160] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 02/01/2021] [Indexed: 12/19/2022] Open
Abstract
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
Collapse
|
Review |
4 |
160 |
2
|
Choquet H, Meyre D. Genetics of Obesity: What have we Learned? Curr Genomics 2011; 12:169-79. [PMID: 22043165 PMCID: PMC3137002 DOI: 10.2174/138920211795677895] [Citation(s) in RCA: 142] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2011] [Revised: 03/31/2011] [Accepted: 03/31/2011] [Indexed: 12/14/2022] Open
Abstract
Candidate gene and genome-wide association studies have led to the discovery of nine loci involved in Mendelian forms of obesity and 58 loci contributing to polygenic obesity. These loci explain a small fraction of the heritability for obesity and many genes remain to be discovered. However, efforts in obesity gene identification greatly modified our understanding of this disorder. In this review, we propose an overlook of major lessons learned from 15 years of research in the field of genetics and obesity. We comment on the existence of the genetic continuum between monogenic and polygenic forms of obesity that pinpoints the role of genes involved in the central regulation of food intake and genetic predisposition to obesity. We explain how the identification of novel obesity predisposing genes has clarified unsuspected biological pathways involved in the control of energy balance that have helped to understand past human history and to explore causality in epidemiology. We provide evidence that obesity predisposing genes interact with the environment and influence the response to treatment relevant to disease prediction.
Collapse
|
Journal Article |
14 |
142 |
3
|
Razavian N, Blecker S, Schmidt AM, Smith-McLallen A, Nigam S, Sontag D. Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors. BIG DATA 2015; 3:277-287. [PMID: 27441408 DOI: 10.1089/big.2015.0020] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We present a new approach to population health, in which data-driven predictive models are learned for outcomes such as type 2 diabetes. Our approach enables risk assessment from readily available electronic claims data on large populations, without additional screening cost. Proposed model uncovers early and late-stage risk factors. Using administrative claims, pharmacy records, healthcare utilization, and laboratory results of 4.1 million individuals between 2005 and 2009, an initial set of 42,000 variables were derived that together describe the full health status and history of every individual. Machine learning was then used to methodically enhance predictive variable set and fit models predicting onset of type 2 diabetes in 2009-2011, 2010-2012, and 2011-2013. We compared the enhanced model with a parsimonious model consisting of known diabetes risk factors in a real-world environment, where missing values are common and prevalent. Furthermore, we analyzed novel and known risk factors emerging from the model at different age groups at different stages before the onset. Parsimonious model using 21 classic diabetes risk factors resulted in area under ROC curve (AUC) of 0.75 for diabetes prediction within a 2-year window following the baseline. The enhanced model increased the AUC to 0.80, with about 900 variables selected as predictive (p < 0.0001 for differences between AUCs). Similar improvements were observed for models predicting diabetes onset 1-3 years and 2-4 years after baseline. The enhanced model improved positive predictive value by at least 50% and identified novel surrogate risk factors for type 2 diabetes, such as chronic liver disease (odds ratio [OR] 3.71), high alanine aminotransferase (OR 2.26), esophageal reflux (OR 1.85), and history of acute bronchitis (OR 1.45). Liver risk factors emerge later in the process of diabetes development compared with obesity-related factors such as hypertension and high hemoglobin A1c. In conclusion, population-level risk prediction for type 2 diabetes using readily available administrative data is feasible and has better prediction performance than classical diabetes risk prediction algorithms on very large populations with missing data. The new model enables intervention allocation at national scale quickly and accurately and recovers potentially novel risk factors at different stages before the disease onset.
Collapse
|
|
10 |
88 |
4
|
Abstract
Genome-wide variation data with millions of genetic markers have become commonplace. However, the potential for interpretation and application of these data for clinical assessment of outcomes of interest, and prediction of disease risk, is currently not fully realized. Many common complex diseases now have numerous, well-established risk loci and likely harbor many genetic determinants with effects too small to be detected at genome-wide levels of statistical significance. A simple and intuitive approach for converting genetic data to a predictive measure of disease susceptibility is to aggregate the effects of these loci into a single measure, the genetic risk score. Here, we describe some common methods and software packages for calculating genetic risk scores and polygenic risk scores, with focus on studies of common complex diseases. We review the basic information needed, as well as important considerations for constructing genetic risk scores, including specific requirements for phenotypic and genetic data, and limitations in their application. © 2019 by John Wiley & Sons, Inc.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
62 |
5
|
Klareskog L, Rönnelid J, Saevarsdottir S, Padyukov L, Alfredsson L. The importance of differences; On environment and its interactions with genes and immunity in the causation of rheumatoid arthritis. J Intern Med 2020; 287:514-533. [PMID: 32176395 DOI: 10.1111/joim.13058] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 02/03/2020] [Accepted: 02/25/2020] [Indexed: 12/19/2022]
Abstract
The current review uses rheumatoid arthritis (RA) as a prominent example for how studies on the interplay between environmental and genetic factors in defined subsets of a disease can be used to formulate aetiological hypotheses that subsequently can be tested for causality using molecular and functional studies. Major discussed findings are that exposures to airways from many different noxious agents including cigarette smoke, silica dust and more interact with major susceptibility genes, mainly HLA-DR genetic variants in triggering antigen-specific immune reactions specific for RA. We also discuss how several other environmental and lifestyle factors, including microbial, neural and metabolic factors, can influence risk for RA in ways that are different in different subsets of RA.The description of these processes in RA provides the best example so far in any immune-mediated disease of how triggering of immunity at one anatomical site in the context of known environmental and genetic factors subsequently can lead to symptoms that precede the classical inflammatory disease symptoms and later contribute also to the classical RA joint inflammation. The findings referred to in the review have led to a change of paradigms for very early therapy and prevention of RA and to efforts towards what we have named 'personalized prevention'. We believe that the progress described here for RA will be of relevance for research and practice also in other immune-mediated diseases.
Collapse
|
Review |
5 |
49 |
6
|
Mahajan P, Uddin S, Hajati F, Moni MA. Ensemble Learning for Disease Prediction: A Review. Healthcare (Basel) 2023; 11:1808. [PMID: 37372925 DOI: 10.3390/healthcare11121808] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 06/19/2023] [Accepted: 06/19/2023] [Indexed: 06/29/2023] Open
Abstract
Machine learning models are used to create and enhance various disease prediction frameworks. Ensemble learning is a machine learning technique that combines multiple classifiers to improve performance by making more accurate predictions than a single classifier. Although numerous studies have employed ensemble approaches for disease prediction, there is a lack of thorough assessment of commonly used ensemble approaches against highly researched diseases. Consequently, this study aims to identify significant trends in the performance accuracies of ensemble techniques (i.e., bagging, boosting, stacking, and voting) against five hugely researched diseases (i.e., diabetes, skin disease, kidney disease, liver disease, and heart conditions). Using a well-defined search strategy, we first identified 45 articles from the current literature that applied two or more of the four ensemble approaches to any of these five diseases and were published in 2016-2023. Although stacking has been used the fewest number of times (23) compared with bagging (41) and boosting (37), it showed the most accurate performance the most times (19 out of 23). The voting approach is the second-best ensemble approach, as revealed in this review. Stacking always revealed the most accurate performance in the reviewed articles for skin disease and diabetes. Bagging demonstrated the best performance for kidney disease (five out of six times) and boosting for liver and diabetes (four out of six times). The results show that stacking has demonstrated greater accuracy in disease prediction than the other three candidate algorithms. Our study also demonstrates variability in the perceived performance of different ensemble approaches against frequently used disease datasets. The findings of this work will assist researchers in better understanding current trends and hotspots in disease prediction models that employ ensemble learning, as well as in determining a more suitable ensemble model for predictive disease analytics. This article also discusses variability in the perceived performance of different ensemble approaches against frequently used disease datasets.
Collapse
|
Review |
2 |
46 |
7
|
Abstract
Next-generation sequencing (NGS) is commonly used for researching the causes of genetic disorders. However, its usefulness in clinical practice for medical diagnosis is in early development. In this report, we demonstrate the value of NGS for genetic risk assessment and evaluate the limitations and barriers for the adoption of this technology into medical practice. We performed whole exome sequencing (WES) on 81 volunteers, and for each volunteer, we requested personal medical histories, constructed a three-generation pedigree, and required their participation in a comprehensive educational program. We limited our clinical reporting to disease risks based on only rare damaging mutations and known pathogenic variations in genes previously reported to be associated with human disorders. We identified 271 recessive risk alleles (214 genes), 126 dominant risk alleles (101 genes), and 3 X-recessive risk alleles (3 genes). We linked personal disease histories with causative disease genes in 18 volunteers. Furthermore, by incorporating family histories into our genetic analyses, we identified an additional five heritable diseases. Traditional genetic counseling and disease education were provided in verbal and written reports to all volunteers. Our report demonstrates that when genome results are carefully interpreted and integrated with an individual's medical records and pedigree data, NGS is a valuable diagnostic tool for genetic disease risk.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
39 |
8
|
Tsoukalas D, Fragoulakis V, Papakonstantinou E, Antonaki M, Vozikis A, Tsatsakis A, Buga AM, Mitroi M, Calina D. Prediction of Autoimmune Diseases by Targeted Metabolomic Assay of Urinary Organic Acids. Metabolites 2020; 10:E502. [PMID: 33302528 PMCID: PMC7764183 DOI: 10.3390/metabo10120502] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 11/30/2020] [Accepted: 12/04/2020] [Indexed: 12/12/2022] Open
Abstract
Autoimmune diseases (ADs) are chronic disorders characterized by the loss of self-tolerance, and although being heterogeneous, they share common pathogenic mechanisms. Self-antigens and inflammation markers are established diagnostic tools; however, the metabolic imbalances that underlie ADs are poorly described. The study aimed to employ metabolomics for the detection of disease-related changes in autoimmune diseases that could have predictive value. Quantitative analysis of 28 urine organic acids was performed using Gas Chromatography-Mass Spectrometry in a group of 392 participants. Autoimmune thyroiditis, inflammatory bowel disease, psoriasis and rheumatoid arthritis were the most prevalent autoimmune diseases of the study. Statistically significant differences were observed in the tricarboxylate cycle metabolites, succinate, methylcitrate and malate, the pyroglutamate and 2-hydroxybutyrate from the glutathione cycle and the metabolites methylmalonate, 4-hydroxyphenylpyruvate, 2-hydroxyglutarate and 2-hydroxyisobutyrate between the AD group and the control. Artificial neural networks and Binary logistic regression resulted in the highest predictive accuracy scores (66.7% and 74.9%, respectively), while Methylmalonate, 2-Hydroxyglutarate and 2-hydroxybutyrate were proposed as potential biomarkers for autoimmune diseases. Urine organic acid levels related to the mechanisms of energy production and detoxification were associated with the presence of autoimmune diseases and could be an adjunct tool for early diagnosis and prediction.
Collapse
|
research-article |
5 |
32 |
9
|
Li Y, Xu J, Wang Y, Zhang Y, Jiang W, Shen B, Ding X. A novel machine learning algorithm, Bayesian networks model, to predict the high-risk patients with cardiac surgery-associated acute kidney injury. Clin Cardiol 2020; 43:752-761. [PMID: 32400109 PMCID: PMC7368305 DOI: 10.1002/clc.23377] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 04/08/2020] [Accepted: 04/13/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Cardiac surgery-associated acute kidney injury (CSA-AKI) is a well-recognized complication with an ominous outcome. HYPOTHESIS Bayesian networks (BNs) not only can reveal the complex interrelationships between predictors and CSA-AKI, but predict the individual risk of CSA-AKI occurrence. METHODS During 2013 and 2015, we recruited 5533 eligible participants who underwent cardiac surgery from a tertiary hospital in eastern China. Data on demographics, clinical and laboratory information were prospectively recorded in the electronic medical system and analyzed by gLASSO-logistic regression and BNs. RESULTS The incidences of CSA-AKI and severe CSA-AKI were 37.5% and 11.1%. BNs model revealed that gender, left ventricular ejection fractions (LVEF), serum creatinine (SCr), serum uric acid (SUA), platelet, and aortic cross-clamp time (ACCT) were found as the parent nodes of CSA-AKI, while ultrafiltration volume and postoperative central venous pressure (CVP) were connected with CSA-AKI as children nodes. In the severe CSA-AKI model, age, proteinuria, and SUA were directly linked to severe AKI; the new nodes of NYHA grade and direct bilirubin created relationships with severe AKI through was related to LVEF, surgery types, and SCr level. The internal AUCs for predicting CSA-AKI and severe AKI were 0.755 and 0.845, which remained 0.736 and 0.816 in the external validation. Given the known variables, the risk for CSA-AKI can be inferred at individual levels based on the established BNs model and prior information. CONCLUSION BNs model has a high accuracy, good interpretability, and strong generalizability in predicting CSA-AKI. It facilitates physicians to identify high-risk patients and implement protective strategies to improve the prognosis.
Collapse
|
Journal Article |
5 |
30 |
10
|
Lowe FJ, Luettich K, Gregg EO. Lung cancer biomarkers for the assessment of modified risk tobacco products: an oxidative stress perspective. Biomarkers 2013; 18:183-95. [PMID: 23530763 PMCID: PMC3667677 DOI: 10.3109/1354750x.2013.777116] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Accepted: 02/13/2013] [Indexed: 11/24/2022]
Abstract
Manufacturers have developed prototype cigarettes yielding reduced levels of some tobacco smoke toxicants, when tested using laboratory machine smoking under standardised conditions. For the scientific assessment of modified risk tobacco products, tests that offer objective, reproducible data, which can be obtained in a much shorter time than the requirements of conventional epidemiology are needed. In this review, we consider whether biomarkers of biological effect related to oxidative stress can be used in this role. Based on published data, urinary 8-oxo-7,8-dihydro-2-deoxyguanosine, thymidine glycol, F2-isoprostanes, serum dehydroascorbic acid to ascorbic acid ratio and carotenoid concentrations show promise, while 4-hydroxynonenal requires further qualification.
Collapse
|
Review |
12 |
29 |
11
|
Sun P, Yang XB. Light, Temperature, and Moisture Effects on Apothecium Production of Sclerotinia sclerotiorum. PLANT DISEASE 2000; 84:1287-1293. [PMID: 30831869 DOI: 10.1094/pdis.2000.84.12.1287] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The purpose of this study was to quantify the effects of light, moisture, and temperature on apothecium production of Sclerotinia sclerotiorum. Sclerotia were placed in sand beds in crispers and exposed to two light intensities. For each light intensity, sclerotia were subjected to five temperature levels and three moisture levels. The results showed that the optimal temperature and temperature range for germination of sclerotia were affected by both light intensity and the moisture level of the sand. At light intensity of 80 to 90 mol m-2 s-1 (low light intensity treatment), the optimal temperatures were in the range of 12 to 18°C regardless of moisture level. At light intensity of 120 to 130 mol m-2 s-1 (high light intensity treatment), the optimal temperature was shifted to 20°C when the soil moisture level was high. Under high light intensity, only a few days were needed for initials to develop into apothecia. Under low light intensity, several weeks were needed for initials to develop into apothecia. The frequency with which initials developed into apothecia was high under high light intensity (80%) but low under low light intensity. The initials produced at low light intensity and high temperature (25 to 30°C) were thinner and longer. The apothecia also were smaller at low light intensity than those produced at high light intensity at any temperature. The periods for apothecium production were longer under lower temperature treatments. The relationship between apothecium production and degree days was analyzed. Apothecium production began at about 160 degree days and ceased at about 900 degree days at high light intensity. However, production began at about 760 degree days and ceased at 1,720 degree days at low light intensity. Nonlinear regression equations which describe the relationship between cumulative formation of apothecia and degree days were highly significant. The deviation between the observed value and the predicted value increased as degree days increased.
Collapse
|
|
25 |
25 |
12
|
Rockne RC, Branciamore S, Qi J, Frankhouser DE, O'Meally D, Hua WK, Cook G, Carnahan E, Zhang L, Marom A, Wu H, Maestrini D, Wu X, Yuan YC, Liu Z, Wang LD, Forman S, Carlesso N, Kuo YH, Marcucci G. State-Transition Analysis of Time-Sequential Gene Expression Identifies Critical Points That Predict Development of Acute Myeloid Leukemia. Cancer Res 2020; 80:3157-3169. [PMID: 32414754 PMCID: PMC7416495 DOI: 10.1158/0008-5472.can-20-0354] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 04/06/2020] [Accepted: 05/12/2020] [Indexed: 12/13/2022]
Abstract
Temporal dynamics of gene expression inform cellular and molecular perturbations associated with disease development and evolution. Given the complexity of high-dimensional temporal genomic data, an analytic framework guided by a robust theory is needed to interpret time-sequential changes and to predict system dynamics. Here we model temporal dynamics of the transcriptome of peripheral blood mononuclear cells in a two-dimensional state-space representing states of health and leukemia using time-sequential bulk RNA-seq data from a murine model of acute myeloid leukemia (AML). The state-transition model identified critical points that accurately predict AML development and identifies stepwise transcriptomic perturbations that drive leukemia progression. The geometry of the transcriptome state-space provided a biological interpretation of gene dynamics, aligned gene signals that are not synchronized in time across mice, and allowed quantification of gene and pathway contributions to leukemia development. Our state-transition model synthesizes information from multiple cell types in the peripheral blood and identifies critical points in the transition from health to leukemia to guide interpretation of changes in the transcriptome as a whole to predict disease progression. SIGNIFICANCE: These findings apply the theory of state transitions to model the initiation and development of acute myeloid leukemia, identifying transcriptomic perturbations that accurately predict time to disease development.See related commentary by Kuijjer, p. 3072 GRAPHICAL ABSTRACT: http://cancerres.aacrjournals.org/content/canres/80/15/3157/F1.large.jpg.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
20 |
13
|
Arehart CH, Daya M, Campbell M, Boorgula MP, Rafaels N, Chavan S, David G, Hanifin J, Slifka MK, Gallo RL, Hata T, Schneider LC, Paller AS, Ong PY, Spergel JM, Guttman-Yassky E, Leung DYM, Beck LA, Gignoux CR, Mathias RA, Barnes KC. Polygenic prediction of atopic dermatitis improves with atopic training and filaggrin factors. J Allergy Clin Immunol 2022; 149:145-155. [PMID: 34111454 PMCID: PMC8973457 DOI: 10.1016/j.jaci.2021.05.034] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 04/26/2021] [Accepted: 05/20/2021] [Indexed: 11/24/2022]
Abstract
BACKGROUND While numerous genetic loci associated with atopic dermatitis (AD) have been discovered, to date, work leveraging the combined burden of AD risk variants across the genome to predict disease risk has been limited. OBJECTIVES This study aims to determine whether polygenic risk scores (PRSs) relying on genetic determinants for AD provide useful predictions for disease occurrence and severity. It also explicitly tests the value of including genome-wide association studies of related allergic phenotypes and known FLG loss-of-function (LOF) variants. METHODS AD PRSs were constructed for 1619 European American individuals from the Atopic Dermatitis Research Network using an AD training dataset and an atopic training dataset including AD, childhood onset asthma, and general allergy. Additionally, whole genome sequencing data were used to explore genetic scoring specific to FLG LOF mutations. RESULTS Genetic scores derived from the AD-only genome-wide association studies were predictive of AD cases (PRSAD: odds ratio [OR], 1.70; 95% CI, 1.49-1.93). Accuracy was first improved when PRSs were built off the larger atopy genome-wide association studies (PRSAD+: OR, 2.16; 95% CI, 1.89-2.47) and further improved when including FLG LOF mutations (PRSAD++: OR, 3.23; 95% CI, 2.57-4.07). Importantly, while all 3 PRSs correlated with AD severity, the best prediction was from PRSAD++, which distinguished individuals with severe AD from control subjects with OR of 3.86 (95% CI, 2.77-5.36). CONCLUSIONS This study demonstrates how PRSs for AD that include genetic determinants across atopic phenotypes and FLG LOF variants may be a promising tool for identifying individuals at high risk for developing disease and specifically severe disease.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
19 |
14
|
Hu X, Han Z, Zhou R, Su W, Gong L, Yang Z, Song X, Zhang S, Shu H, Wu D. Altered gut microbiota in the early stage of acute pancreatitis were related to the occurrence of acute respiratory distress syndrome. Front Cell Infect Microbiol 2023; 13:1127369. [PMID: 36949815 PMCID: PMC10025409 DOI: 10.3389/fcimb.2023.1127369] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 02/20/2023] [Indexed: 03/08/2023] Open
Abstract
Background Acute respiratory distress syndrome (ARDS) is the most common cause of organ failure in acute pancreatitis (AP) patients, which associated with high mortality. Specific changes in the gut microbiota have been shown to influence progression of acute pancreatitis. We aimed to determine whether early alterations in the gut microbiota is related to and could predict ARDS occurrence in AP patients. Methods In this study, we performed 16S rRNA sequencing analysis in 65 AP patients and 20 healthy volunteers. The AP patients were further divided into two groups: 26 AP-ARDS patients and 39 AP-nonARDS patients based on ARDS occurrence during hospitalization. Results Our results showed that the AP-ARDS patients exhibited specific changes in gut microbiota composition and function as compared to subjects of AP-nonARDS group. Higher abundances of Proteobacteria phylum, Enterobacteriaceae family, Escherichia-Shigella genus, and Klebsiella pneumoniae, but lower abundances of Bifidobacterium genus were found in AP-ARDS group compared with AP-nonARDS groups. Random forest modelling analysis revealed that the Escherichia-shigella genus was effective to distinguish AP-ARDS from AP-nonARDS, which could predict ARDS occurrence in AP patients. Conclusions Our study revealed that alterations of gut microbiota in AP patients on admission were associated with ARDS occurrence after hospitalization, indicating a potential predictive and pathogenic role of gut microbiota in the development of ARDS in AP patients.
Collapse
|
research-article |
2 |
18 |
15
|
Kokot H, Kokot B, Sebastijanović A, Voss C, Podlipec R, Zawilska P, Berthing T, Ballester-López C, Danielsen PH, Contini C, Ivanov M, Krišelj A, Čotar P, Zhou Q, Ponti J, Zhernovkov V, Schneemilch M, Doumandji Z, Pušnik M, Umek P, Pajk S, Joubert O, Schmid O, Urbančič I, Irmler M, Beckers J, Lobaskin V, Halappanavar S, Quirke N, Lyubartsev AP, Vogel U, Koklič T, Stoeger T, Štrancar J. Prediction of Chronic Inflammation for Inhaled Particles: the Impact of Material Cycling and Quarantining in the Lung Epithelium. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2020; 32:e2003913. [PMID: 33073368 DOI: 10.1002/adma.202003913] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 08/22/2020] [Indexed: 06/11/2023]
Abstract
On a daily basis, people are exposed to a multitude of health-hazardous airborne particulate matter with notable deposition in the fragile alveolar region of the lungs. Hence, there is a great need for identification and prediction of material-associated diseases, currently hindered due to the lack of in-depth understanding of causal relationships, in particular between acute exposures and chronic symptoms. By applying advanced microscopies and omics to in vitro and in vivo systems, together with in silico molecular modeling, it is determined herein that the long-lasting response to a single exposure can originate from the interplay between the newly discovered nanomaterial quarantining and nanomaterial cycling between different lung cell types. This new insight finally allows prediction of the spectrum of lung inflammation associated with materials of interest using only in vitro measurements and in silico modeling, potentially relating outcomes to material properties for a large number of materials, and thus boosting safe-by-design-based material development. Because of its profound implications for animal-free predictive toxicology, this work paves the way to a more efficient and hazard-free introduction of numerous new advanced materials into our lives.
Collapse
|
|
5 |
16 |
16
|
Abstract
The generation of genome-wide variation data has become commonplace. However, the potential for interpretation and application of these data for clinical assessment of outcomes of interest, and prediction of disease risk, is currently not fully realized. Many common, complex diseases now have numerous, well-established "risk" loci, and likely harbor many genetic determinants with effects too small to be detected at genome-wide levels of statistical significance. A simple and intuitive approach for converting genetic data to a predictive measure of disease susceptibility is to aggregate the risk effects of these loci into a single genetic risk score. Here, some common methods and software packages for calculating genetic risk scores, with focus on studies of common, complex diseases, are described. The basic information needed as well as important considerations for constructing genetic risk scores, including specific requirements for phenotypic and genetic data, and limitations in their application is reviewed. © 2016 by John Wiley & Sons, Inc.
Collapse
|
Journal Article |
9 |
16 |
17
|
Bowen JM, Haskell MJ, Miller GA, Mason CS, Bell DJ, Duthie CA. Early prediction of respiratory disease in preweaning dairy calves using feeding and activity behaviors. J Dairy Sci 2021; 104:12009-12018. [PMID: 34454762 DOI: 10.3168/jds.2021-20373] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 07/16/2021] [Indexed: 11/19/2022]
Abstract
Bovine respiratory disease (BRD) represents one of the major disease challenges affecting preweaning dairy-bred calves. Previous studies have shown that differences in feeding and activity behaviors exist between healthy and diseased calves affected by BRD. The aim of this study was to develop and assess the accuracy of models designed to predict BRD from feeding and activity behaviors. Feeding and activity behaviors were recorded for 100 male preweaning calves between ~8 to 42 d of age. Calves were group housed with ad libitum access to milk via automatic milk feeders, water, starter diet, and straw. Activity was monitored via a leg-mounted accelerometer. Health status of individual calves was monitored daily using an adapted version of the Wisconsin Scoring System to identify BRD. Three models were created to predict disease: (1) deviation from normal lying time based on moving averages (MA); (2) random forest (RF), a machine learning technique based on feeding and activity variables; and (3) a combination of RF and MA output. For the MA model, lying time was predicted based on behavior over previous days (3- and 7-d MA) and the expected value for the current day (based on calf age; measured using accelerometers). Data were not split into training and test data sets. Occasions when the actual lying time increased >9% of predicted lying time were classified as a deviation from normal and a disease alert was provided. Both feeding and activity behaviors were included within the RF model. Data were split into training (70%) and test (30%) data sets based on disease events. Events were classified as 2 d before, the day(s) of the disease event, and 2 d after the event. Accuracy of models was assessed using sensitivity, specificity, balanced accuracy, and Matthews correlation coefficient (MCC). If a positive disease prediction agreed with an actual disease event within a 3-d rolling window, it was classified as a true positive. Stand-alone models (RF; MA) showed high specificity (0.95; 0.97), moderate sensitivity (0.35; 0.43), balanced accuracy (0.65; 0.64), and MCC (0.25; 0.29). Combining outputs increased accuracy (specificity = 0.95, sensitivity = 0.54, balanced accuracy = 0.75, MCC = 0.36). The work presented is the first to demonstrate the use of modeling data derived from precision livestock farming techniques that monitor feeding and activity behaviors for early detection of BRD in preweaning calves, offering a significant advance in health management of youngstock.
Collapse
|
|
4 |
15 |
18
|
Grydeland H, Westlye LT, Walhovd KB, Fjell AM. Improved prediction of Alzheimer's disease with longitudinal white matter/gray matter contrast changes. Hum Brain Mapp 2012; 34:2775-85. [PMID: 22674625 DOI: 10.1002/hbm.22103] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2011] [Revised: 02/04/2012] [Accepted: 03/19/2012] [Indexed: 11/05/2022] Open
Abstract
Brain morphometry measures derived from magnetic resonance imaging (MRI) are important biomarkers for Alzheimer's disease (AD). The objective of the present study was to test whether we could improve morphometry-based detection and prediction of disease state by use of white matter/gray matter (WM/GM) signal intensity contrast obtained from conventional MRI scans. We hypothesized that including WM/GM contrast change along with measures of atrophy in the entorhinal cortex and the hippocampi would yield better classification of AD patients, and more accurate prediction of early disease progression. T1 -weighted MRI scans from two sessions approximately 2 years apart from 78 participants with AD (Clinical Dementia Rating (CDR) = 0.5-2) and 71 age-matched controls were used to calculate annual change rates. Results showed that WM/GM contrast decay was larger in AD compared with controls in the medial temporal lobes. For the discrimination between AD and controls, entorhinal WM/GM contrast decay contributed significantly when included together with decrease in entorhinal cortical thickness and hippocampal volume, and increased the area under the curve to 0.79 compared with 0.75 when using the two morphometric variables only. Independent effects of WM/GM contrast decay and improved classification were also observed for the CDR-based subgroups, including participants converting from either a non-AD status to very mild AD, or from very mild to mild AD. Thus, WM/GM contrast decay increased diagnostic accuracy beyond what was obtained by well-validated morphometric measures alone. The findings suggest that signal intensity properties constitute a sensitive biomarker for cerebral degeneration in AD.
Collapse
|
Research Support, Non-U.S. Gov't |
13 |
15 |
19
|
Yang L, Bi ZW, Kou ZQ, Li XJ, Zhang M, Wang M, Zhang LY, Zhao ZT. Time-series analysis on human brucellosis during 2004-2013 in Shandong Province, China. Zoonoses Public Health 2014; 62:228-35. [PMID: 25043064 DOI: 10.1111/zph.12145] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Indexed: 11/29/2022]
Abstract
Human brucellosis is a re-emerging bacterial anthropozoonotic disease, which remains a public health concern in China with the growing number of cases and more widespread natural foci. The purpose of this study was to short-term forecast the incidence of human brucellosis with a prediction model. We collected the annual and monthly laboratory data of confirmed cases from January 2004 to December 2013 in Shandong Diseases Reporting Information System (SDRIS). Autoregressive integrated moving average (ARIMA) model was fitted based on the monthly human brucellosis incidence from 2004 to 2013. Finally, monthly brucellosis incidences in 2014 were short-term forecasted by the obtained model. The incidence of brucellosis was increasing from 2004 to 2013. For the ARIMA (0, 2, 1) model, the white noise diagnostic check (x(2) = 5.58 P = 0.35) for residuals obtained was revealed by the optimum goodness-of-fit test. The monthly incidences that fitted by ARIMA (0, 2, 1) model were closely consistent with the real incidence from 2004 to 2013. And forecasting incidences from January 2014 to December 2014 were, respectively, 0.101, 0.118, 0.143, 0.166, 0.160, 0.172, 0.169, 0.133, 0.122, 0.105, 0.103 and 0.079 per100 000 population, with standard error 0.011-0.019 and mean absolute percentage error (MAPE) of 58.79%.
Collapse
|
Journal Article |
11 |
14 |
20
|
Time Series Analysis and Forecasting with Automated Machine Learning on a National ICD-10 Database. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17144979. [PMID: 32664331 PMCID: PMC7400312 DOI: 10.3390/ijerph17144979] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/29/2020] [Accepted: 07/07/2020] [Indexed: 12/22/2022]
Abstract
The application of machine learning (ML) for use in generating insights and making predictions on new records continues to expand within the medical community. Despite this progress to date, the application of time series analysis has remained underexplored due to complexity of the underlying techniques. In this study, we have deployed a novel ML, called automated time series (AutoTS) machine learning, to automate data processing and the application of a multitude of models to assess which best forecasts future values. This rapid experimentation allows for and enables the selection of the most accurate model in order to perform time series predictions. By using the nation-wide ICD-10 (International Classification of Diseases, Tenth Revision) dataset of hospitalized patients of Romania, we have generated time series datasets over the period of 2008–2018 and performed highly accurate AutoTS predictions for the ten deadliest diseases. Forecast results for the years 2019 and 2020 were generated on a NUTS 2 (Nomenclature of Territorial Units for Statistics) regional level. This is the first study to our knowledge to perform time series forecasting of multiple diseases at a regional level using automated time series machine learning on a national ICD-10 dataset. The deployment of AutoTS technology can help decision makers in implementing targeted national health policies more efficiently.
Collapse
|
Journal Article |
5 |
14 |
21
|
Liu K, Huang S, Miao ZP, Chen B, Jiang T, Cai G, Jiang Z, Chen Y, Wang Z, Gu H, Chai C, Jiang J. Identifying Potential Norovirus Epidemics in China via Internet Surveillance. J Med Internet Res 2017; 19:e282. [PMID: 28790023 PMCID: PMC5566627 DOI: 10.2196/jmir.7855] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 07/05/2017] [Accepted: 07/10/2017] [Indexed: 12/23/2022] Open
Abstract
Background Norovirus is a common virus that causes acute gastroenteritis worldwide, but a monitoring system for norovirus is unavailable in China. Objective We aimed to identify norovirus epidemics through Internet surveillance and construct an appropriate model to predict potential norovirus infections. Methods The norovirus-related data of a selected outbreak in Jiaxing Municipality, Zhejiang Province of China, in 2014 were collected from immediate epidemiological investigation, and the Internet search volume, as indicated by the Baidu Index, was acquired from the Baidu search engine. All correlated search keywords in relation to norovirus were captured, screened, and composited to establish the composite Baidu Index at different time lags by Spearman rank correlation. The optimal model was chosen and possibly predicted maps in Zhejiang Province were presented by ArcGIS software. Results The combination of two vital keywords at a time lag of 1 day was ultimately identified as optimal (ρ=.924, P<.001). The exponential curve model was constructed to fit the trend of this epidemic, suggesting that a one-unit increase in the mean composite Baidu Index contributed to an increase of norovirus infections by 2.15 times during the outbreak. In addition to Jiaxing Municipality, Hangzhou Municipality might have had some potential epidemics in the study time from the predicted model. Conclusions Although there are limitations with early warning and unavoidable biases, Internet surveillance may be still useful for the monitoring of norovirus epidemics when a monitoring system is unavailable.
Collapse
|
Journal Article |
8 |
14 |
22
|
Nezu N, Usui Y, Saito A, Shimizu H, Asakage M, Yamakawa N, Tsubota K, Wakabayashi Y, Narimatsu A, Umazume K, Maruyama K, Sugimoto M, Kuroda M, Goto H. Machine Learning Approach for Intraocular Disease Prediction Based on Aqueous Humor Immune Mediator Profiles. Ophthalmology 2021; 128:1197-1208. [PMID: 33484732 DOI: 10.1016/j.ophtha.2021.01.019] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 12/20/2020] [Accepted: 01/14/2021] [Indexed: 01/02/2023] Open
Abstract
PURPOSE Various immune mediators have crucial roles in the pathogenesis of intraocular diseases. Machine learning can be used to automatically select and weigh various predictors to develop models maximizing predictive power. However, these techniques have not yet been applied extensively in studies focused on intraocular diseases. We evaluated whether 5 machine learning algorithms applied to the data of immune-mediator levels in aqueous humor can predict the actual diagnoses of 17 selected intraocular diseases and identified which immune mediators drive the predictive power of a machine learning model. DESIGN Cross-sectional study. PARTICIPANTS Five hundred twelve eyes with diagnoses from among 17 intraocular diseases. METHODS Aqueous humor samples were collected, and the concentrations of 28 immune mediators were determined using a cytometric bead array. Each immune mediator was ranked according to its importance using 5 machine learning algorithms. Stratified k-fold cross-validation was used in evaluation of algorithms with the dataset divided into training and test datasets. MAIN OUTCOME MEASURES The algorithms were evaluated in terms of precision, recall, accuracy, F-score, area under the receiver operating characteristic curve, area under the precision-recall curve, and mean decrease in Gini index. RESULTS Among the 5 machine learning models, random forest (RF) yielded the highest classification accuracy in multiclass differentiation of 17 intraocular diseases. The RF prediction models for vitreoretinal lymphoma, acute retinal necrosis, endophthalmitis, rhegmatogenous retinal detachment, and primary open-angle glaucoma achieved the highest classification accuracy, precision, and recall. Random forest recognized vitreoretinal lymphoma, acute retinal necrosis, endophthalmitis, rhegmatogenous retinal detachment, and primary open-angle glaucoma with the top 5 F-scores. The 3 highest-ranking relevant immune mediators were interleukin (IL)-10, interferon-γ-inducible protein (IP)-10, and angiogenin for prediction of vitreoretinal lymphoma; monokine induced by interferon γ, interferon γ, and IP-10 for acute retinal necrosis; and IL-6, granulocyte colony-stimulating factor, and IL-8 for endophthalmitis. CONCLUSIONS Random forest algorithms based on 28 immune mediators in aqueous humor successfully predicted the diagnosis of vitreoretinal lymphoma, acute retinal necrosis, and endophthalmitis. Overall, the findings of the present study contribute to increased knowledge on new biomarkers that potentially can facilitate diagnosis of intraocular diseases in the future.
Collapse
|
Journal Article |
4 |
14 |
23
|
Oudemans PV. Phytophthora Species Associated with Cranberry Root Rot and Surface Irrigation Water in New Jersey. PLANT DISEASE 1999; 83:251-258. [PMID: 30845503 DOI: 10.1094/pdis.1999.83.3.251] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
A lupine baiting technique was used to detect the presence of Phytophthora spp. in several streams, irrigation reservoirs, and drainage canals used in cranberry cultivation. P. cinnamomi was found to be widely distributed throughout the study area in the southern New Jersey Pinelands, and was present both upstream and downstream of agricultural activities. A second species, identified as P. megasperma, was more restricted in its distribution and was never isolated from a water system that did not also contain P. cinnamomi. In a survey of commercial cranberry production, 80% of the acreage represented (approximately 37% of total New Jersey production area) was exposed to one or both Phytophthora spp. through application of infested water from irrigation reservoirs. Based on the widespread distribution of P. cinnamomi, it is likely that this pathogen was introduced many years prior to its discovery on cranberry in the 1980s, which corresponded to the adoption of overhead irrigation in the crop. There were slight differences between the two species in seasonal occurrence. The highest levels of P. cinnamomi were found during the summer months (July to August) whereas P. megasperma was highest during the spring (April to May) and fall (September to October) months.
Collapse
|
|
26 |
13 |
24
|
Abstract
As genome-wide association studies have continued to identify loci associated with complex traits, the implications of and necessity for proper use of these findings, including prediction of disease risk, have become apparent. Many complex diseases have numerous associated loci with detectable effects implicating risk for or protection from disease. A common contemporary approach to using this information for disease prediction is through the application of genetic risk scores. These scores estimate an individual's liability for a specific outcome by aggregating the effects of associated loci into a single measure as described in the previous version of this article. Although genetic risk scores have traditionally included variants that meet criteria for genome-wide significance, an extension known as the polygenic risk score has been developed to include the effects of more variants across the entire genome. Here, we describe common methods and software packages for calculating and interpreting polygenic risk scores. In this revised version of the article, we detail information that is needed to perform a polygenic risk score analysis, considerations for planning the analysis and interpreting results, as well as discussion of the limitations based on the choices made. We also provide simulated sample data and a walkthrough for four different polygenic risk score software. © 2021 Wiley Periodicals LLC.
Collapse
|
Journal Article |
4 |
12 |
25
|
Soto M, Iranzo A, Lahoz S, Fernández M, Serradell M, Gaig C, Melón P, Martí M, Santamaría J, Camps J, Fernández‐Santiago R, Ezquerra M. Serum MicroRNAs Predict Isolated Rapid Eye Movement Sleep Behavior Disorder and Lewy Body Diseases. Mov Disord 2022; 37:2086-2098. [PMID: 35962561 PMCID: PMC9804841 DOI: 10.1002/mds.29171] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/09/2022] [Accepted: 07/10/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Isolated rapid eye movement sleep behavior disorder (IRBD) is a well-established clinical risk factor for Lewy body diseases (LBDs), such as Parkinson's disease (PD) and dementia with Lewy bodies (DLB). OBJECTIVE To elucidate whether serum microRNA (miRNA) deregulation in IRBD can antedate the diagnosis of LBD by performing a longitudinal study in different progression stages of IRBD before and after LBD diagnosis and assessing the predictive performance of differentially expressed miRNAs by machine learning-based modeling. METHODS Using genome-wide miRNA analysis and real-time quantitative polymerase chain reaction validation, we assessed serum miRNA profiles from patients with IRBD stratified by dopamine transporter (DaT) single-photon emission computed tomography into DaT-negative IRBD (n = 17) and DaT-positive IRBD (n = 21), IRBD phenoconverted into LBD (n = 13), and controls (n = 20). Longitudinally, we followed up the IRBD cohort by studying three time point serum samples over 26 months. RESULTS We found sustained cross-sectional and longitudinal deregulation of 12 miRNAs across the RBD continuum, including DaT-negative IRBD, DaT-positive IRBD, and LBD phenoconverted IRBD (let-7c-5p, miR-19b-3p, miR-140, miR-22-3p, miR-221-3p, miR-24-3p, miR-25-3p, miR-29c-3p, miR-361-5p, miR-425-5p, miR-4505, and miR-451a) (false discovery rate P < 0.05). Age- and sex-adjusted predictive modeling based on the 12 differentially expressed miRNA biosignatures discriminated IRBD and PD or DLB from controls with an area under the curve of 98% (95% confidence interval: 89-99%). CONCLUSIONS Besides clinical diagnosis of IRBD or imaging markers such as DaT single-photon emission computed tomography, specific miRNA biosignatures alone hold promise as progression biomarkers for patients with IRBD for predicting PD and DLB clinical outcomes. Further miRNA studies in other PD at-risk populations, such as LRRK2 mutation asymptomatic carriers or hyposmic subjects, are warranted. © 2022 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Collapse
|
research-article |
3 |
11 |