1
|
Cai YQ, Gong DX, Tang LY, Cai Y, Li HJ, Jing TC, Gong M, Hu W, Zhang ZW, Zhang X, Zhang GW. Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions. J Med Internet Res 2024; 26:e47645. [PMID: 38869157 PMCID: PMC11316160 DOI: 10.2196/47645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 10/30/2023] [Accepted: 06/12/2024] [Indexed: 06/14/2024] Open
Abstract
In recent years, there has been explosive development in artificial intelligence (AI), which has been widely applied in the health care field. As a typical AI technology, machine learning models have emerged with great potential in predicting cardiovascular diseases by leveraging large amounts of medical data for training and optimization, which are expected to play a crucial role in reducing the incidence and mortality rates of cardiovascular diseases. Although the field has become a research hot spot, there are still many pitfalls that researchers need to pay close attention to. These pitfalls may affect the predictive performance, credibility, reliability, and reproducibility of the studied models, ultimately reducing the value of the research and affecting the prospects for clinical application. Therefore, identifying and avoiding these pitfalls is a crucial task before implementing the research. However, there is currently a lack of a comprehensive summary on this topic. This viewpoint aims to analyze the existing problems in terms of data quality, data set characteristics, model design, and statistical methods, as well as clinical implications, and provide possible solutions to these problems, such as gathering objective data, improving training, repeating measurements, increasing sample size, preventing overfitting using statistical methods, using specific AI algorithms to address targeted issues, standardizing outcomes and evaluation criteria, and enhancing fairness and replicability, with the goal of offering reference and assistance to researchers, algorithm developers, policy makers, and clinical practitioners.
Collapse
Affiliation(s)
- Yu-Qing Cai
- The First Hospital of China Medical University, Shenyang, China
| | - Da-Xin Gong
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| | - Li-Ying Tang
- The First Hospital of China Medical University, Shenyang, China
| | - Yue Cai
- The First Hospital of China Medical University, Shenyang, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co, Ltd, Shenyang, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| | | | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, China
| | - Zhen-Wei Zhang
- China Rongtong Medical & Healthcare Co, Ltd, Chengdu, China
| | - Xingang Zhang
- Department of Cardiology, The First Hospital of China Medical University, Shenyang, China
| | - Guang-Wei Zhang
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| |
Collapse
|
2
|
Alfayyadh MM, Maksemous N, Sutherland HG, Lea RA, Griffiths LR. Unravelling the Genetic Landscape of Hemiplegic Migraine: Exploring Innovative Strategies and Emerging Approaches. Genes (Basel) 2024; 15:443. [PMID: 38674378 PMCID: PMC11049430 DOI: 10.3390/genes15040443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Migraine is a severe, debilitating neurovascular disorder. Hemiplegic migraine (HM) is a rare and debilitating neurological condition with a strong genetic basis. Sequencing technologies have improved the diagnosis and our understanding of the molecular pathophysiology of HM. Linkage analysis and sequencing studies in HM families have identified pathogenic variants in ion channels and related genes, including CACNA1A, ATP1A2, and SCN1A, that cause HM. However, approximately 75% of HM patients are negative for these mutations, indicating there are other genes involved in disease causation. In this review, we explored our current understanding of the genetics of HM. The evidence presented herein summarises the current knowledge of the genetics of HM, which can be expanded further to explain the remaining heritability of this debilitating condition. Innovative bioinformatics and computational strategies to cover the entire genetic spectrum of HM are also discussed in this review.
Collapse
Affiliation(s)
| | | | | | | | - Lyn R. Griffiths
- Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD 4059, Australia; (M.M.A.); (N.M.); (H.G.S.); (R.A.L.)
| |
Collapse
|
3
|
Cai Y, Cai YQ, Tang LY, Wang YH, Gong M, Jing TC, Li HJ, Li-Ling J, Hu W, Yin Z, Gong DX, Zhang GW. Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review. BMC Med 2024; 22:56. [PMID: 38317226 PMCID: PMC10845808 DOI: 10.1186/s12916-024-03273-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 01/23/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND A comprehensive overview of artificial intelligence (AI) for cardiovascular disease (CVD) prediction and a screening tool of AI models (AI-Ms) for independent external validation are lacking. This systematic review aims to identify, describe, and appraise AI-Ms of CVD prediction in the general and special populations and develop a new independent validation score (IVS) for AI-Ms replicability evaluation. METHODS PubMed, Web of Science, Embase, and IEEE library were searched up to July 2021. Data extraction and analysis were performed for the populations, distribution, predictors, algorithms, etc. The risk of bias was evaluated with the prediction risk of bias assessment tool (PROBAST). Subsequently, we designed IVS for model replicability evaluation with five steps in five items, including transparency of algorithms, performance of models, feasibility of reproduction, risk of reproduction, and clinical implication, respectively. The review is registered in PROSPERO (No. CRD42021271789). RESULTS In 20,887 screened references, 79 articles (82.5% in 2017-2021) were included, which contained 114 datasets (67 in Europe and North America, but 0 in Africa). We identified 486 AI-Ms, of which the majority were in development (n = 380), but none of them had undergone independent external validation. A total of 66 idiographic algorithms were found; however, 36.4% were used only once and only 39.4% over three times. A large number of different predictors (range 5-52,000, median 21) and large-span sample size (range 80-3,660,000, median 4466) were observed. All models were at high risk of bias according to PROBAST, primarily due to the incorrect use of statistical methods. IVS analysis confirmed only 10 models as "recommended"; however, 281 and 187 were "not recommended" and "warning," respectively. CONCLUSION AI has led the digital revolution in the field of CVD prediction, but is still in the early stage of development as the defects of research design, report, and evaluation systems. The IVS we developed may contribute to independent external validation and the development of this field.
Collapse
Affiliation(s)
- Yue Cai
- China Medical University, Shenyang, 110122, China
| | - Yu-Qing Cai
- China Medical University, Shenyang, 110122, China
| | - Li-Ying Tang
- China Medical University, Shenyang, 110122, China
| | - Yi-Han Wang
- China Medical University, Shenyang, 110122, China
| | - Mengchun Gong
- Digital Health China Co. Ltd, Beijing, 100089, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co. Ltd., Shenyang, 110001, China
- Enduring Medicine Smart Innovation Research Institute, Shenyang, 110001, China
| | - Jesse Li-Ling
- Institute of Genetic Medicine, School of Life Science, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, 610065, China
| | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, 610017, China
| | - Zhihua Yin
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, 110122, China.
| | - Da-Xin Gong
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| | - Guang-Wei Zhang
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| |
Collapse
|
4
|
Li B, Feridooni T, Cuen-Ojeda C, Kishibe T, de Mestral C, Mamdani M, Al-Omran M. Machine learning in vascular surgery: a systematic review and critical appraisal. NPJ Digit Med 2022; 5:7. [PMID: 35046493 PMCID: PMC8770468 DOI: 10.1038/s41746-021-00552-y] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 12/13/2021] [Indexed: 12/18/2022] Open
Abstract
Machine learning (ML) is a rapidly advancing field with increasing utility in health care. We conducted a systematic review and critical appraisal of ML applications in vascular surgery. MEDLINE, Embase, and Cochrane CENTRAL were searched from inception to March 1, 2021. Study screening, data extraction, and quality assessment were performed by two independent reviewers, with a third author resolving discrepancies. All original studies reporting ML applications in vascular surgery were included. Publication trends, disease conditions, methodologies, and outcomes were summarized. Critical appraisal was conducted using the PROBAST risk-of-bias and TRIPOD reporting adherence tools. We included 212 studies from a pool of 2235 unique articles. ML techniques were used for diagnosis, prognosis, and image segmentation in carotid stenosis, aortic aneurysm/dissection, peripheral artery disease, diabetic foot ulcer, venous disease, and renal artery stenosis. The number of publications on ML in vascular surgery increased from 1 (1991-1996) to 118 (2016-2021). Most studies were retrospective and single center, with no randomized controlled trials. The median area under the receiver operating characteristic curve (AUROC) was 0.88 (range 0.61-1.00), with 79.5% [62/78] studies reporting AUROC ≥ 0.80. Out of 22 studies comparing ML techniques to existing prediction tools, clinicians, or traditional regression models, 20 performed better and 2 performed similarly. Overall, 94.8% (201/212) studies had high risk-of-bias and adherence to reporting standards was poor with a rate of 41.4%. Despite improvements over time, study quality and reporting remain inadequate. Future studies should consider standardized tools such as PROBAST and TRIPOD to improve study quality and clinical applicability.
Collapse
Affiliation(s)
- Ben Li
- Department of Surgery, University of Toronto, 149 College St, Toronto, ON, M5T 1P5, Canada
- Division of Vascular Surgery, St. Michael's Hospital, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
- Temerty Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM), University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
| | - Tiam Feridooni
- Department of Surgery, University of Toronto, 149 College St, Toronto, ON, M5T 1P5, Canada
- Division of Vascular Surgery, St. Michael's Hospital, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
| | - Cesar Cuen-Ojeda
- Department of Surgery, University of Toronto, 149 College St, Toronto, ON, M5T 1P5, Canada
- Division of Vascular Surgery, St. Michael's Hospital, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
| | - Teruko Kishibe
- Health Sciences Library, St. Michael's Hospital, Unity Health Toronto, 209 Victoria St, Toronto, ON, M5B 1T8, Canada
- Li Ka Shing Knowledge Institute, St. Michael's Hospital, Unity Health Toronto, 209 Victoria St, Toronto, ON, M5B 1T8, Canada
| | - Charles de Mestral
- Department of Surgery, University of Toronto, 149 College St, Toronto, ON, M5T 1P5, Canada
- Division of Vascular Surgery, St. Michael's Hospital, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada
- Li Ka Shing Knowledge Institute, St. Michael's Hospital, Unity Health Toronto, 209 Victoria St, Toronto, ON, M5B 1T8, Canada
- Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, 155 College St, Toronto, ON, M5T 3M7, Canada
| | - Muhammad Mamdani
- Temerty Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM), University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
- Li Ka Shing Knowledge Institute, St. Michael's Hospital, Unity Health Toronto, 209 Victoria St, Toronto, ON, M5B 1T8, Canada
- Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, 155 College St, Toronto, ON, M5T 3M7, Canada
- Leslie Dan Faculty of Pharmacy, University of Toronto, 144 College St, Toronto, ON, M5S 3M2, Canada
| | - Mohammed Al-Omran
- Department of Surgery, University of Toronto, 149 College St, Toronto, ON, M5T 1P5, Canada.
- Division of Vascular Surgery, St. Michael's Hospital, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada.
- Temerty Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM), University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada.
- Li Ka Shing Knowledge Institute, St. Michael's Hospital, Unity Health Toronto, 209 Victoria St, Toronto, ON, M5B 1T8, Canada.
- Institute of Medical Science, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada.
- Department of Surgery, King Saud University, ZIP 4545, Riyadh, 11451, Kingdom of Saudi Arabia.
| |
Collapse
|
5
|
Fan J, Chen M, Luo J, Yang S, Shi J, Yao Q, Zhang X, Du S, Qu H, Cheng Y, Ma S, Zhang M, Xu X, Wang Q, Zhan S. The prediction of asymptomatic carotid atherosclerosis with electronic health records: a comparative study of six machine learning models. BMC Med Inform Decis Mak 2021; 21:115. [PMID: 33820531 PMCID: PMC8020544 DOI: 10.1186/s12911-021-01480-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 03/26/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Screening carotid B-mode ultrasonography is a frequently used method to detect subjects with carotid atherosclerosis (CAS). Due to the asymptomatic progression of most CAS patients, early identification is challenging for clinicians, and it may trigger ischemic stroke. Recently, machine learning has shown a strong ability to classify data and a potential for prediction in the medical field. The combined use of machine learning and the electronic health records of patients could provide clinicians with a more convenient and precise method to identify asymptomatic CAS. METHODS Retrospective cohort study using routine clinical data of medical check-up subjects from April 19, 2010 to November 15, 2019. Six machine learning models (logistic regression [LR], random forest [RF], decision tree [DT], eXtreme Gradient Boosting [XGB], Gaussian Naïve Bayes [GNB], and K-Nearest Neighbour [KNN]) were used to predict asymptomatic CAS and compared their predictability in terms of the area under the receiver operating characteristic curve (AUCROC), accuracy (ACC), and F1 score (F1). RESULTS Of the 18,441 subjects, 6553 were diagnosed with asymptomatic CAS. Compared to DT (AUCROC 0.628, ACC 65.4%, and F1 52.5%), the other five models improved prediction: KNN + 7.6% (0.704, 68.8%, and 50.9%, respectively), GNB + 12.5% (0.753, 67.0%, and 46.8%, respectively), XGB + 16.0% (0.788, 73.4%, and 55.7%, respectively), RF + 16.6% (0.794, 74.5%, and 56.8%, respectively) and LR + 18.1% (0.809, 74.7%, and 59.9%, respectively). The highest achieving model, LR predicted 1045/1966 cases (sensitivity 53.2%) and 3088/3566 non-cases (specificity 86.6%). A tenfold cross-validation scheme further verified the predictive ability of the LR. CONCLUSIONS Among machine learning models, LR showed optimal performance in predicting asymptomatic CAS. Our findings set the stage for an early automatic alarming system, allowing a more precise allocation of CAS prevention measures to individuals probably to benefit most.
Collapse
Affiliation(s)
- Jiaxin Fan
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Mengying Chen
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Jian Luo
- Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Shusen Yang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Shi
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Qingling Yao
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Xiaodong Zhang
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Shuang Du
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Huiyang Qu
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Yuxuan Cheng
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Shuyin Ma
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Meijuan Zhang
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Xi Xu
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Qian Wang
- Department of Health Management, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Shuqin Zhan
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China.
| |
Collapse
|
6
|
Akadam-Teker AB, Teker E, Daglar-Aday A, Pekkoc-Uyanik KC, Aslan EI, Kucukhuseyin Ö, Ozkara G, Yılmaz-Aydoğan H. Interactive effects of interferon-gamma functional single nucleotid polymorphism (+874 T/A) with cardiovascular risk factors in coronary heart disease and early myocardial infarction risk. Mol Biol Rep 2020; 47:8397-8405. [PMID: 33104992 DOI: 10.1007/s11033-020-05877-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Accepted: 09/29/2020] [Indexed: 12/12/2022]
Abstract
Atherosclerosis is an inflammatory disease characterized by extensive lipid accumulation in the artery wall. Throughout the atherosclerotic process, interferon-gamma (IFN-γ), which is an important pro-inflammatory cytokine, plays a central role in atherosclerotic plaque instability and the occurrence of myocardial infarction (MI). In this study, we aimed to investigate the relationship between IFN-γ +874 T/A (rs2430561) polymorphism and coronary heart disease (CHD) as well as its effects on MI and CHD. Three hundred and ninety patients with CHD (229 with MI, 161 without MI) and 233 healthy controls were screened by the amplification refractory mutation system (ARMS) PCR method for IFN-γ +874 T/A polymorphism. For MI risk, early adult age was important risk factors and the risk was increased with IFN-γ +874 T/A polymorphism. IFN-γ T allele was significantly increased in the CHD patients with age≤45 (p = 0.048) and patients with history of MI (p = 0.007). As IFN-γ is an inflammatory cytokine with an emerging role in the atherosclerotic process, it was suggested that inhibition of IFN-γ activity could be a therapeutic strategy to stabilize human atherosclerotic plaque. Our findings support the association between MI risk and IFN-γ +874 T/A polymorphism in the Turkish population, particularly by increasing the level of IFN-γ in young patients, thereby causing rupture of vulnerable plaques in atherosclerotic lesions. Identification of the IFN-γ +874 T/A gene variants as risk factors for early CHD and MI development may be a practical biomarker to guide the MI risk process and determine the ideal therapeutic approach.
Collapse
Affiliation(s)
- A Basak Akadam-Teker
- Department of Medical Genetic, Giresun University Medical Faculty, Giresun, Turkey.
| | - Erhan Teker
- Department of Cardiology, Giresun A. İlhan Özdemir Education Research Hospital, Giresun, Turkey
| | - Aynur Daglar-Aday
- Department of Molecular Medicine, Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | - Kubra Cigdem Pekkoc-Uyanik
- Department of Molecular Medicine, Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey.,Department of Medical Biology, Faculty of Medicine, Haliç University, Istanbul, Turkey
| | - Ezgi Irmak Aslan
- Department of Molecular Medicine, Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | - Özlem Kucukhuseyin
- Department of Molecular Medicine, Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | - Gulcin Ozkara
- Department of Molecular Medicine, Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| | - Hulya Yılmaz-Aydoğan
- Department of Molecular Medicine, Aziz Sancar Institute of Experimental Medicine, Istanbul University, Istanbul, Turkey
| |
Collapse
|
7
|
Sevakula RK, Au-Yeung WTM, Singh JP, Heist EK, Isselbacher EM, Armoundas AA. State-of-the-Art Machine Learning Techniques Aiming to Improve Patient Outcomes Pertaining to the Cardiovascular System. J Am Heart Assoc 2020; 9:e013924. [PMID: 32067584 PMCID: PMC7070211 DOI: 10.1161/jaha.119.013924] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
| | | | - Jagmeet P Singh
- The Cardiac Arrhythmia Service Massachusetts General Hospital Boston MA
| | - E Kevin Heist
- The Cardiac Arrhythmia Service Massachusetts General Hospital Boston MA
| | | | - Antonis A Armoundas
- Cardiovascular Research Center Massachusetts General Hospital Boston MA.,Institute for Medical Engineering and Science Massachusetts Institute of Technology Cambridge MA
| |
Collapse
|
8
|
de Marvao A, Dawes TJW, O'Regan DP. Artificial Intelligence for Cardiac Imaging-Genetics Research. Front Cardiovasc Med 2020; 6:195. [PMID: 32039240 PMCID: PMC6985036 DOI: 10.3389/fcvm.2019.00195] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 12/27/2019] [Indexed: 12/18/2022] Open
Abstract
Cardiovascular conditions remain the leading cause of mortality and morbidity worldwide, with genotype being a significant influence on disease risk. Cardiac imaging-genetics aims to identify and characterize the genetic variants that influence functional, physiological, and anatomical phenotypes derived from cardiovascular imaging. High-throughput DNA sequencing and genotyping have greatly accelerated genetic discovery, making variant interpretation one of the key challenges in contemporary clinical genetics. Heterogeneous, low-fidelity phenotyping and difficulties integrating and then analyzing large-scale genetic, imaging and clinical datasets using traditional statistical approaches have impeded process. Artificial intelligence (AI) methods, such as deep learning, are particularly suited to tackle the challenges of scalability and high dimensionality of data and show promise in the field of cardiac imaging-genetics. Here we review the current state of AI as applied to imaging-genetics research and discuss outstanding methodological challenges, as the field moves from pilot studies to mainstream applications, from one dimensional global descriptors to high-resolution models of whole-organ shape and function, from univariate to multivariate analysis and from candidate gene to genome-wide approaches. Finally, we consider the future directions and prospects of AI imaging-genetics for ultimately helping understand the genetic and environmental underpinnings of cardiovascular health and disease.
Collapse
Affiliation(s)
| | | | - Declan P. O'Regan
- MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom
| |
Collapse
|
9
|
Ho DSW, Schierding W, Wake M, Saffery R, O’Sullivan J. Machine Learning SNP Based Prediction for Precision Medicine. Front Genet 2019; 10:267. [PMID: 30972108 PMCID: PMC6445847 DOI: 10.3389/fgene.2019.00267] [Citation(s) in RCA: 104] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 03/11/2019] [Indexed: 12/17/2022] Open
Abstract
In the past decade, precision genomics based medicine has emerged to provide tailored and effective healthcare for patients depending upon their genetic features. Genome Wide Association Studies have also identified population based risk genetic variants for common and complex diseases. In order to meet the full promise of precision medicine, research is attempting to leverage our increasing genomic understanding and further develop personalized medical healthcare through ever more accurate disease risk prediction models. Polygenic risk scoring and machine learning are two primary approaches for disease risk prediction. Despite recent improvements, the results of polygenic risk scoring remain limited due to the approaches that are currently used. By contrast, machine learning algorithms have increased predictive abilities for complex disease risk. This increase in predictive abilities results from the ability of machine learning algorithms to handle multi-dimensional data. Here, we provide an overview of polygenic risk scoring and machine learning in complex disease risk prediction. We highlight recent machine learning application developments and describe how machine learning approaches can lead to improved complex disease prediction, which will help to incorporate genetic features into future personalized healthcare. Finally, we discuss how the future application of machine learning prediction models might help manage complex disease by providing tissue-specific targets for customized, preventive interventions.
Collapse
Affiliation(s)
| | | | - Melissa Wake
- Murdoch Children Research Institute, Melbourne, VIC, Australia
| | - Richard Saffery
- Murdoch Children Research Institute, Melbourne, VIC, Australia
| | | |
Collapse
|
10
|
Nejati P, Naeimipour S, Salehi A, Shahbazi M. Association of tumor necrosis factor-alpha gene promoter polymorphism and its mRNA expression level in coronary artery disease. Meta Gene 2018. [DOI: 10.1016/j.mgene.2018.08.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
|
11
|
Genomic prediction of relapse in recipients of allogeneic haematopoietic stem cell transplantation. Leukemia 2018; 33:240-248. [PMID: 30089915 PMCID: PMC6326954 DOI: 10.1038/s41375-018-0229-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 06/21/2018] [Accepted: 07/17/2018] [Indexed: 02/06/2023]
Abstract
Allogeneic haematopoietic stem cell transplantation currently represents the primary potentially curative treatment for cancers of the blood and bone marrow. While relapse occurs in approximately 30% of patients, few risk-modifying genetic variants have been identified. The present study evaluates the predictive potential of patient genetics on relapse risk in a genome-wide manner. We studied 151 graft recipients with HLA-matched sibling donors by sequencing the whole-exome, active immunoregulatory regions, and the full MHC region. To assess the predictive capability and contributions of SNPs and INDELs, we employed machine learning and a feature selection approach in a cross-validation framework to discover the most informative variants while controlling against overfitting. Our results show that germline genetic polymorphisms in patients entail a significant contribution to relapse risk, as judged by the predictive performance of the model (AUC = 0.72 [95% CI: 0.63-0.81]). Furthermore, the top contributing variants were predictive in two independent replication cohorts (n = 258 and n = 125) from the same population. The results can help elucidate relapse mechanisms and suggest novel therapeutic targets. A computational genomic model could provide a step toward individualized prognostic risk assessment, particularly when accompanied by other data modalities.
Collapse
|
12
|
Classical rather than genetic risk factors account for high cardiovascular disease prevalence in Lithuania: A cross-sectional population study. Adv Med Sci 2017; 62:121-128. [PMID: 28242483 DOI: 10.1016/j.advms.2016.08.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Revised: 08/24/2016] [Accepted: 08/25/2016] [Indexed: 10/20/2022]
Abstract
PURPOSE Cardiovascular disease (CVD) mortality accounts for 54% of all deaths in Lithuania, making it the highest among all of the European Union countries. We evaluated the prevalence of several CVD risk factors, including lifestyle, blood biochemistry and genetic predisposition to determine the reasons behind significantly increased CVD prevalence in Lithuania. MATERIALS AND METHODS In total 435 volunteers of Lithuanian ethnicity and stable geographic settlement for 3 generations, had their anthropometric, biochemical and behavioural risk factors measured. A randomly selected sample of 166 volunteers had their 60 CVD risk alleles genotyped. The prevalence of risk alleles and cumulative CVD genetic risk score were compared with population of North-West European origin (CEU) using data from the phase 3 HapMap project. RESULTS CVD was present in 33.8% of study volunteers, 84% of participants consumed alcohol, 21% were current smokers and only 30% of participants engaged in higher levels of physical activity. Also, the average BMI (males 28.3±4.3kg/m2, females 27.3±5.0kg/m2), total cholesterol (males 6.1±1.2mmol/L, females 6.2±1.0mmol/L) and LDL-cholesterol (males 4.1±1.1mmol/L, females 4.1±1.0mmol/L) were above the normal values. The cumulative genetic susceptibility to develop CVD in Lithuanians was only 1.4% higher than in CEU population. CONCLUSIONS High BMI and poor population plasma lipid profile are the major contributing factors to high CVD mortality and morbidity in Lithuania. Smoking, alcohol consumption and preliminary genetic predisposition results do not explain the difference in CVD mortality between the Lithuanian and wider European populations. CVD prevention programmes in Lithuania should primarily focus on weight loss and improving blood lipid control.
Collapse
|
13
|
Esperança JCP, Miranda WRR, Netto JB, Lima FS, Baumworcel L, Chimelli L, Silva R, Ürményi TP, Cabello PH, Rondinelli E, Faffe DS. Polymorphisms in IL-10 and INF-γ genes are associated with early atherosclerosis in coronary but not in carotid arteries: A study of 122 autopsy cases of young adults. BBA CLINICAL 2015; 3:214-20. [PMID: 26674973 PMCID: PMC4661558 DOI: 10.1016/j.bbacli.2015.02.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 02/10/2015] [Accepted: 02/24/2015] [Indexed: 12/19/2022]
Abstract
Atherosclerosis is a complex disease, involving both genetic and environmental factors. However, the influence of genetic variations on its early development remains unclear. This study examined the association of 12 different polymorphisms with atherosclerosis severity in anterior descending coronary (DA, n = 103) and carotid arteries (CA, n = 66) of autopsied young adults (< 30 years old). Histological sections (H-E) were classified according to the American Heart Association. Polymorphisms in ACE, TNF-α (− 308G/A and − 238 G/A), IFN-γ (+ 874 A/T), MMP-9 (− 1562 C/T), IL-10 (− 1082 A/G and − 819 C/T), NOS3 (894 G/T), ApoA1 (rs964184), ApoE (E2E3E4 isoforms), and TGF-β (codons 25 and 10) genes were genotyped by gel electrophoresis or automatic DNA sequencing. Firearm projectile or car accident was the main cause of death, and no information about classical risk factors was available. Histological analysis showed high prevalence of type III atherosclerotic lesions in both DA (69%) and CA (39%) arteries, while severe type IV and V lesions were observed in 14% (DA) and 33% (CA). Allele frequencies and genotype distributions were determined. Among the polymorphisms studied, IFN-γ and IL-10 (− 1082 A/G) were related to atherosclerosis severity in DA artery. No association between genotypes and lesion severity was found in CA. In conclusion, we observed that the high prevalence of early atherosclerosis in young adults is associated with IFN-γ (p < 0.001) and IL-10 (p = 0.013) genotypes. This association is blood vessel dependent. Our findings suggest that the vascular system presents site specialization, and specific genetic variations may provide future biomarkers for early disease identification. Twelve SNPs were associated with atherosclerosis severity in autopsied young adults. We found high prevalence of type III lesions in coronary and carotid arteries. Even severe lesions (types IV and V) were found in DA (14%) and CA (33%) arteries. Lesion severity was associated with IL-10 and IFN-γ genotype. The association was observed only in coronary, but not in carotid artery.
Collapse
Affiliation(s)
- José Carlos P Esperança
- Departamento de Patologia, Hospital Universitário Clementino Fraga Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - William R R Miranda
- Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - José B Netto
- Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fabiane S Lima
- Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Leonardo Baumworcel
- Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Leila Chimelli
- Departamento de Patologia, Hospital Universitário Clementino Fraga Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Rosane Silva
- Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Turán P Ürményi
- Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Pedro H Cabello
- Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil ; Laboratório de Genética, Escola de Ciências da Saúde, Universidade do Grande Rio, Rio de Janeiro, Brazil
| | - Edson Rondinelli
- Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil ; Departamento de Clínica Médica, Faculdade de Medicina, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Débora S Faffe
- Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
14
|
Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet 2014; 10:e1004754. [PMID: 25393026 PMCID: PMC4230844 DOI: 10.1371/journal.pgen.1004754] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Affiliation(s)
- Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Samuli Ripatti
- Hjelt Institute, University of Helsinki, Helsinki, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Tero Aittokallio
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- * E-mail:
| |
Collapse
|
15
|
Scientific reporting is suboptimal for aspects that characterize genetic risk prediction studies: a review of published articles based on the Genetic RIsk Prediction Studies statement. J Clin Epidemiol 2014; 67:487-99. [DOI: 10.1016/j.jclinepi.2013.10.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 10/03/2013] [Accepted: 10/09/2013] [Indexed: 12/29/2022]
|
16
|
Bruzzese V, Marrese C, Zullo A, Hassan C, Ridola L, Izzo A, Riccioni C. Carotid artery intima-media thickness in patients with autoimmune connective tissue diseases: a case-control study. Intern Emerg Med 2013; 8:713-6. [PMID: 22033794 DOI: 10.1007/s11739-011-0713-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 10/13/2011] [Indexed: 11/30/2022]
Abstract
Patients with autoimmune rheumatic disorders have an increased incidence of cardiovascular (CV) events and mortality. Despite this being related to a high prevalence of the traditional CV risk factors, systemic inflammation has been postulated to be an independent CV risk factor, particularly in patients with rheumatoid arthritis (RA). However, data are still controversial. We designed a case-control study, in which patients with autoimmune rheumatic disorders were matched with age-, sex-matched controls. Prevalence of early atherosclerosis was assessed by carotid artery intima-media thickness (IMT) measurement. IMT values were considered normal (IMT ≤ 0.9 mm) or abnormal (IMT > 0.9). Multivariate analysis was performed to identify predictors of pathological IMT. Overall, 152 patients and 140 matched controls were enrolled. Prevalence of >0.9 mm IMT values did not significantly differ between patients with autoimmune rheumatic disorders and controls (61 vs. 69%, p = 0.1). In detail, a similar IMT distribution between the 69 RA patients and controls was observed. Cases with a CV risk factor showed a higher prevalence of pathological IMT as compared to those without any risk factor, both in patients (77.1 vs. 38.6%; p < 0.0001) and controls (84.6 vs. 25%; p < 0.0001). At multivariate analysis, age and presence of CV risk factors were found to be independent predictors of >0.9 mm IMT, while RA as well as any other considered rheumatic disease were not. Our data found a similar prevalence of preclinical arterial wall atherosclerotic damage in patients with autoimmune rheumatic diseases and matched controls. Presence of traditional CV risk factors and patient age remain the main factors involved in preclinical atherosclerosis in patients with autoimmune rheumatic disorders, including RA.
Collapse
Affiliation(s)
- Vincenzo Bruzzese
- Internal Medicine and Reumatology, Ospedale Nuovo Regina Margherita, Vie E. Morosini, 30, 00153, Rome, Italy,
| | | | | | | | | | | | | |
Collapse
|
17
|
Kheradmand M, Niimura H, Kuwabara K, Nakahata N, Nakamura A, Ogawa S, Mantjoro EM, Shimatani K, Nerome Y, Owaki T, Kusano K, Takezaki T. Association of inflammatory gene polymorphisms and conventional risk factors with arterial stiffness by age. J Epidemiol 2013; 23:457-65. [PMID: 24077340 PMCID: PMC3834284 DOI: 10.2188/jea.je20130054] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Background Inflammatory gene polymorphisms are potentially associated with atherosclerosis risk, but their age-related effects are unclear. To investigate the age-related effects of inflammatory gene polymorphisms on arterial stiffness, we conducted cross-sectional and 5-year follow-up studies using the cardio-ankle vascular index (CAVI) as a surrogate marker of arterial stiffness. Methods We recruited 1850 adults aged 34 to 69 years from the Japanese general population. Inflammatory gene polymorphisms were selected from NF-kB1, CD14, IL-6, IL-10, MCP-1, ICAM-1, and TNF-α. Associations of CAVI with genetic and conventional risk factors were estimated by sex and age group (34–49, 50–59, and 60–69 years) using a general linear model. The association with 5-year change in CAVI was examined longitudinally. Results Glucose intolerance was associated with high CAVI among women in all age groups, while hypertension was associated with high CAVI among participants in all age groups, except younger women. Mean CAVI for the CD14 CC genotype was lower than those for the TT and CT genotypes (P for trend = 0.005), while the CD14 polymorphism was associated with CAVI only among men aged 34 to 49 years (P = 0.006). No association of the other 6 polymorphisms with CAVI was observed. No association with 5-year change in CAVI was apparent. Conclusions Inflammatory gene polymorphisms were not associated with arterial stiffness. To confirm these results, further large-scale prospective studies are warranted.
Collapse
Affiliation(s)
- Motahare Kheradmand
- Department of International Islands and Community Medicine, Kagoshima University Graduate School of Medical and Dental Sciences
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Abstract
Background: Prediction of the optimal habitat conditions for a given bacterium, based on genome sequence alone would be of value for scientific as well as industrial purposes. One example of such a habitat adaptation is the requirement for oxygen. In spite of good genome data availability, there have been only a few prediction attempts of bacterial oxygen requirements, using genome sequences. Here, we describe a method for distinguishing aerobic, anaerobic and facultative anaerobic bacteria, based on genome sequence-derived input, using naive Bayesian inference. In contrast, other studies found in literature only demonstrate the ability to distinguish two classes at a time. Results: The results shown in the present study are as good as or better than comparable methods previously described in the scientific literature, with an arguably simpler method, when results are directly compared. This method further compares the performance of a single-step naive Bayesian prediction of the three included classifications, compared to a simple Bayesian network with two steps. A two-step network, distinguishing first respiring from non-respiring organisms, followed by the distinction of aerobe and facultative anaerobe organisms within the respiring group, is found to perform best. Conclusions: A simple naive Bayesian network based on the presence or absence of specific protein domains within a genome is an effective and easy way to predict bacterial habitat preferences, such as oxygen requirement.
Collapse
Affiliation(s)
- Dan B Jensen
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - David W Ussery
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark; Comparative Genomics Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
19
|
Multidimensional prognostic risk assessment identifies association between IL12B variation and surgery in Crohn's disease. Inflamm Bowel Dis 2013; 19:1662-70. [PMID: 23665963 PMCID: PMC3874388 DOI: 10.1097/mib.0b013e318281f275] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND The ability to identify patients with Crohn's disease (CD) at highest risk of surgery would be invaluable in guiding therapy. Genome-wide association studies have identified multiple IBD loci with unknown phenotypic consequences. The aims of this study were to: (1) identify associations between known and novel CD loci with early resective CD surgery and (2) develop the best predictive model for time to surgery using a combination of phenotypic, serologic, and genetic variables. METHODS Genotyping was performed on 1,115 subjects using Illumina-based genome-wide technology. Univariate and multivariate analyses tested genetic associations with need for surgery within 5 years. Analyses were performed by testing known CD loci (n = 71) and by performing a genome-wide association study. Time to surgery was analyzed using Cox regression modeling. Clinical and serologic variables were included along with genotype to build predictive models for time to surgery. RESULTS Surgery occurred within 5 years in 239 subjects at a median time of 12 months. Three CD susceptibility loci were independently associated with surgery within 5 years (IL12B, IL23R, and C11orf30). Genome-wide association identified novel putative loci associated with early surgery: 7q21 (CACNA2D1) and 9q34 (RXRA, COL5A1). The most predictive models of time to surgery included genetic and clinical risk factors. More than a 20% difference in frequency of progression to surgery was seen between the lowest and highest risk groups. CONCLUSIONS Progression to surgery is faster in patients with CD with both genetic and clinical risk factors. IL12B is independently associated with need and time to early surgery in CD patients and justifies the investigation of novel and existing therapies that affect this pathway.
Collapse
|
20
|
Wineinger NE, Harper A, Libiger O, Srinivasan SR, Chen W, Berenson GS, Schork NJ. Genomic risk models improve prediction of longitudinal lipid levels in children and young adults. Front Genet 2013; 4:86. [PMID: 23734161 PMCID: PMC3659298 DOI: 10.3389/fgene.2013.00086] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2013] [Accepted: 04/25/2013] [Indexed: 12/31/2022] Open
Abstract
In clinical medicine, lipids are commonly measured biomarkers used to assess an individual's risk for cardiovascular disease, heart attack, and stroke. Accurately predicting longitudinal lipid levels based on genomic information can inform therapeutic practices and decrease cardiovascular risk by identifying high-risk patients prior to onset. Using genotyped and imputed genetic data from 523 unrelated Caucasian Americans from the Bogalusa Heart Study, surveyed on 4,026 occasions from 4 to 48 years of age, we generated various lipid genomic risk models based on previously reported markers. We observed a significant improvement in prediction over non-genetic risk models in high density lipoprotein cholesterol (increase in the squared correlation between observed and predicted values, ΔR (2) = 0.032), low density lipoprotein cholesterol (ΔR (2) = 0.053), total cholesterol (ΔR (2) = 0.043), and triglycerides (ΔR (2) = 0.031). Many of our approaches are based on an n-fold cross-validation procedure that are, by design, adaptable to a clinical environment.
Collapse
Affiliation(s)
| | - Andrew Harper
- Newcastle UniversityNewcastle upon Tyne, Tyne and Wear, UK
| | - Ondrej Libiger
- Scripps Translational Science InstituteLa Jolla, CA, USA
- Scripps Research InstituteLa Jolla, CA, USA
| | | | - Wei Chen
- Center for Cardiovascular Health, Tulane UniversityNew Orleans, LA, USA
| | | | - Nicholas J. Schork
- Scripps Translational Science InstituteLa Jolla, CA, USA
- Scripps Research InstituteLa Jolla, CA, USA
| |
Collapse
|
21
|
Okser S, Pahikkala T, Aittokallio T. Genetic variants and their interactions in disease risk prediction - machine learning and network perspectives. BioData Min 2013; 6:5. [PMID: 23448398 PMCID: PMC3606427 DOI: 10.1186/1756-0381-6-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 02/11/2013] [Indexed: 12/31/2022] Open
Abstract
A central challenge in systems biology and medical genetics is to understand how interactions among genetic loci contribute to complex phenotypic traits and human diseases. While most studies have so far relied on statistical modeling and association testing procedures, machine learning and predictive modeling approaches are increasingly being applied to mining genotype-phenotype relationships, also among those associations that do not necessarily meet statistical significance at the level of individual variants, yet still contributing to the combined predictive power at the level of variant panels. Network-based analysis of genetic variants and their interaction partners is another emerging trend by which to explore how sub-network level features contribute to complex disease processes and related phenotypes. In this review, we describe the basic concepts and algorithms behind machine learning-based genetic feature selection approaches, their potential benefits and limitations in genome-wide setting, and how physical or genetic interaction networks could be used as a priori information for providing improved predictive power and mechanistic insights into the disease networks. These developments are geared toward explaining a part of the missing heritability, and when combined with individual genomic profiling, such systems medicine approaches may also provide a principled means for tailoring personalized treatment strategies in the future.
Collapse
|
22
|
Abstract
MOTIVATION Although several studies have used Bayesian classifiers for risk prediction using genome-wide single nucleotide polymorphism (SNP) datasets, no software can efficiently perform these analyses on massive genetic datasets and can accommodate multiple traits. RESULTS We describe the program PleioGRiP that performs a genome-wide Bayesian model search to identify SNPs associated with a discrete phenotype and uses SNPs ranked by Bayes factor to produce nested Bayesian classifiers. These classifiers can be used for genetic risk prediction, either selecting the classifier with optimal number of features or using an ensemble of classifiers. In addition, PleioGRiP implements an extension to the Bayesian search and classification and can search for pleiotropic relationships in which SNPs are simultaneously associated with two or more distinct phenotypes. These relationships can be used to generate connected Bayesian classifiers to predict the phenotype of interest either using genetic data alone or in combination with the secondary phenotype(s). AVAILABILITY PleioGRiP is implemented in Java, and it is available from http://hdl.handle.net/2144/4367. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stephen W Hartley
- National Institutes of Health/National Human Genome Research Institute, 5625 Fishers Lane, Rockville, MD 20850, USA.
| | | |
Collapse
|
23
|
Juonala M, Viikari JSA, Raitakari OT. Main findings from the prospective Cardiovascular Risk in Young Finns Study. Curr Opin Lipidol 2013; 24:57-64. [PMID: 23069987 DOI: 10.1097/mol.0b013e32835a7ed4] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
PURPOSE OF REVIEW To provide a comprehensive overview on the main findings from the Cardiovascular Risk in Young Finns Study. This prospective multicenter study initiated in 1980 (N = 3596, baseline age 3-18 years) has followed up study participants over 30 years to investigate childhood risk factors for cardiometabolic outcomes in adulthood. RECENT FINDINGS Childhood BMI, socioeconomic status, parental risk factor status, as well as genetic polymorphisms are independent predictors of adult obesity, hypertension, and dyslipidemia. Results from the Young Finns Study and other follow-up studies have shown that conventional childhood risk factors, such as dyslipidemia, obesity, elevated blood pressure and smoking, are predictive of subclinical atherosclerosis in young adults. Recent findings suggest that childhood lifestyle (diet, physical activity) is associated with subclinical atherosclerosis and its progression in adulthood. Concerning the timing of risk factor measurements, they seem to be predictive of adult atherosclerosis from the age of 9 onwards. From a clinical point of view, a recent observation suggesting that the adverse cardiometabolic effects of childhood overweight/obesity are reversed among those who become nonobese adults, provides optimism during the days of obesity epidemic. SUMMARY Current data suggest that childhood risk factors are associated with higher risk of subclinical atherosclerosis in adulthood. Future studies among aging cohorts followed since childhood will provide data on their influence on clinical cardiovascular outcomes.
Collapse
Affiliation(s)
- Markus Juonala
- University of Turku and Turku University Hospital, Cardiovascular Research Center, Turku, Finland.
| | | | | |
Collapse
|
24
|
Jensen DB, Vesth TC, Hallin PF, Pedersen AG, Ussery DW. Bayesian prediction of bacterial growth temperature range based on genome sequences. BMC Genomics 2012; 13 Suppl 7:S3. [PMID: 23282160 PMCID: PMC3521210 DOI: 10.1186/1471-2164-13-s7-s3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles). The predictive performance of these protein families were compared to those of 87 basic sequence features (relative use of amino acids and codons, genomic and 16S rDNA AT content and genome size). When using naïve Bayesian inference, it was possible to correctly predict the optimal temperature range with a Matthews correlation coefficient of up to 0.68. The best predictive performance was always achieved by including protein families as well as structural features, compared to either of these alone. A dedicated computer program was created to perform these predictions. Conclusions This study shows that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the naïve Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic and psychrophilic adapted bacterial genomes.
Collapse
Affiliation(s)
- Dan B Jensen
- Technical University of Denmark, Center for Systems Biology, Denmark.
| | | | | | | | | |
Collapse
|
25
|
Kruppa J, Ziegler A, König IR. Risk estimation and risk prediction using machine-learning methods. Hum Genet 2012; 131:1639-54. [PMID: 22752090 PMCID: PMC3432206 DOI: 10.1007/s00439-012-1194-y] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2012] [Accepted: 06/14/2012] [Indexed: 01/02/2023]
Abstract
After an association between genetic variants and a phenotype has been established, further study goals comprise the classification of patients according to disease risk or the estimation of disease probability. To accomplish this, different statistical methods are required, and specifically machine-learning approaches may offer advantages over classical techniques. In this paper, we describe methods for the construction and evaluation of classification and probability estimation rules. We review the use of machine-learning approaches in this context and explain some of the machine-learning algorithms in detail. Finally, we illustrate the methodology through application to a genome-wide association analysis on rheumatoid arthritis.
Collapse
Affiliation(s)
- Jochen Kruppa
- Institut für Medizininsche Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Maria-Goeppert-Str. 1, 23562 Lübeck, Germany
| | | | | |
Collapse
|
26
|
Hartley SW, Monti S, Liu CT, Steinberg MH, Sebastiani P. Bayesian methods for multivariate modeling of pleiotropic SNP associations and genetic risk prediction. Front Genet 2012; 3:176. [PMID: 22973300 PMCID: PMC3438684 DOI: 10.3389/fgene.2012.00176] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Accepted: 08/20/2012] [Indexed: 12/17/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified numerous associations between genetic loci and individual phenotypes; however, relatively few GWAS have attempted to detect pleiotropic associations, in which loci are simultaneously associated with multiple distinct phenotypes. We show that pleiotropic associations can be directly modeled via the construction of simple Bayesian networks, and that these models can be applied to produce single or ensembles of Bayesian classifiers that leverage pleiotropy to improve genetic risk prediction. The proposed method includes two phases: (1) Bayesian model comparison, to identify Single-Nucleotide Polymorphisms (SNPs) associated with one or more traits; and (2) cross-validation feature selection, in which a final set of SNPs is selected to optimize prediction. To demonstrate the capabilities and limitations of the method, a total of 1600 case-control GWAS datasets with two dichotomous phenotypes were simulated under 16 scenarios, varying the association strengths of causal SNPs, the size of the discovery sets, the balance between cases and controls, and the number of pleiotropic causal SNPs. Across the 16 scenarios, prediction accuracy varied from 90 to 50%. In the 14 scenarios that included pleiotropically associated SNPs, the pleiotropic model search and prediction methods consistently outperformed the naive model search and prediction. In the two scenarios in which there were no true pleiotropic SNPs, the differences between the pleiotropic and naive model searches were minimal. To further evaluate the method on real data, a discovery set of 1071 sickle cell disease (SCD) patients was used to search for pleiotropic associations between cerebral vascular accidents and fetal hemoglobin level. Classification was performed on a smaller validation set of 352 SCD patients, and showed that the inclusion of pleiotropic SNPs may slightly improve prediction, although the difference was not statistically significant. The proposed method is robust, computationally efficient, and provides a powerful new approach for detecting and modeling pleiotropic disease loci.
Collapse
Affiliation(s)
- Stephen W Hartley
- Department of Biostatistics, Boston University School of Public Health Boston, MA, USA
| | | | | | | | | |
Collapse
|
27
|
Inouye M, Ripatti S, Kettunen J, Lyytikäinen LP, Oksala N, Laurila PP, Kangas AJ, Soininen P, Savolainen MJ, Viikari J, Kähönen M, Perola M, Salomaa V, Raitakari O, Lehtimäki T, Taskinen MR, Järvelin MR, Ala-Korpela M, Palotie A, de Bakker PIW. Novel Loci for metabolic networks and multi-tissue expression studies reveal genes for atherosclerosis. PLoS Genet 2012; 8:e1002907. [PMID: 22916037 PMCID: PMC3420921 DOI: 10.1371/journal.pgen.1002907] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Accepted: 07/01/2012] [Indexed: 12/16/2022] Open
Abstract
Association testing of multiple correlated phenotypes offers better power than univariate analysis of single traits. We analyzed 6,600 individuals from two population-based cohorts with both genome-wide SNP data and serum metabolomic profiles. From the observed correlation structure of 130 metabolites measured by nuclear magnetic resonance, we identified 11 metabolic networks and performed a multivariate genome-wide association analysis. We identified 34 genomic loci at genome-wide significance, of which 7 are novel. In comparison to univariate tests, multivariate association analysis identified nearly twice as many significant associations in total. Multi-tissue gene expression studies identified variants in our top loci, SERPINA1 and AQP9, as eQTLs and showed that SERPINA1 and AQP9 expression in human blood was associated with metabolites from their corresponding metabolic networks. Finally, liver expression of AQP9 was associated with atherosclerotic lesion area in mice, and in human arterial tissue both SERPINA1 and AQP9 were shown to be upregulated (6.3-fold and 4.6-fold, respectively) in atherosclerotic plaques. Our study illustrates the power of multi-phenotype GWAS and highlights candidate genes for atherosclerosis.
Collapse
Affiliation(s)
- Michael Inouye
- Medical Systems Biology, Departments of Pathology and of Microbiology and Immunology, The University of Melbourne, Parkville, Victoria, Australia.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Hypothesis-based analysis of gene-gene interactions and risk of myocardial infarction. PLoS One 2012; 7:e41730. [PMID: 22876292 PMCID: PMC3410908 DOI: 10.1371/journal.pone.0041730] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Accepted: 06/25/2012] [Indexed: 11/19/2022] Open
Abstract
The genetic loci that have been found by genome-wide association studies to modulate risk of coronary heart disease explain only a fraction of its total variance, and gene-gene interactions have been proposed as a potential source of the remaining heritability. Given the potentially large testing burden, we sought to enrich our search space with real interactions by analyzing variants that may be more likely to interact on the basis of two distinct hypotheses: a biological hypothesis, under which MI risk is modulated by interactions between variants that are known to be relevant for its risk factors; and a statistical hypothesis, under which interacting variants individually show weak marginal association with MI. In a discovery sample of 2,967 cases of early-onset myocardial infarction (MI) and 3,075 controls from the MIGen study, we performed pair-wise SNP interaction testing using a logistic regression framework. Despite having reasonable power to detect interaction effects of plausible magnitudes, we observed no statistically significant evidence of interaction under these hypotheses, and no clear consistency between the top results in our discovery sample and those in a large validation sample of 1,766 cases of coronary heart disease and 2,938 controls from the Wellcome Trust Case-Control Consortium. Our results do not support the existence of strong interaction effects as a common risk factor for MI. Within the scope of the hypotheses we have explored, this study places a modest upper limit on the magnitude that epistatic risk effects are likely to have at the population level (odds ratio for MI risk 1.3-2.0, depending on allele frequency and interaction model).
Collapse
|
29
|
Pahikkala T, Okser S, Airola A, Salakoski T, Aittokallio T. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms Mol Biol 2012; 7:11. [PMID: 22551170 PMCID: PMC3606421 DOI: 10.1186/1748-7188-7-11] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 04/23/2012] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. RESULTS We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension - UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS. CONCLUSIONS Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/.
Collapse
Affiliation(s)
- Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Tero Aittokallio
- Turku Centre for Computer Science, Turku, Finland
- Department of Mathematics, University of Turku, Turku, Finland
- Data Mining and Modeling group, Turku Centre for Biotechnology, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| |
Collapse
|
30
|
Sebastiani P, Solovieff N, Sun JX. Naïve Bayesian Classifier and Genetic Risk Score for Genetic Risk Prediction of a Categorical Trait: Not so Different after all! Front Genet 2012; 3:26. [PMID: 22393331 PMCID: PMC3289795 DOI: 10.3389/fgene.2012.00026] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2011] [Accepted: 02/12/2012] [Indexed: 12/21/2022] Open
Abstract
One of the most popular modeling approaches to genetic risk prediction is to use a summary of risk alleles in the form of an unweighted or a weighted genetic risk score, with weights that relate to the odds for the phenotype in carriers of the individual alleles. Recent contributions have proposed the use of Bayesian classification rules using Naïve Bayes classifiers. We examine the relation between the two approaches for genetic risk prediction and show that the methods are mathematically related. In addition, we study the properties of the two approaches and describe how they can be generalized to include various models of inheritance.
Collapse
Affiliation(s)
- Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health Boston, MA, USA
| | | | | |
Collapse
|
31
|
Genetic profiling using genome-wide significant coronary artery disease risk variants does not improve the prediction of subclinical atherosclerosis: the Cardiovascular Risk in Young Finns Study, the Bogalusa Heart Study and the Health 2000 Survey--a meta-analysis of three independent studies. PLoS One 2012; 7:e28931. [PMID: 22295058 PMCID: PMC3266236 DOI: 10.1371/journal.pone.0028931] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2011] [Accepted: 11/17/2011] [Indexed: 11/19/2022] Open
Abstract
Background Genome-wide association studies (GWASs) have identified a large number of variants (SNPs) associating with an increased risk of coronary artery disease (CAD). Recently, the CARDIoGRAM consortium published a GWAS based on the largest study population so far. They successfully replicated twelve already known associations and discovered thirteen new SNPs associating with CAD. We examined whether the genetic profiling of these variants improves prediction of subclinical atherosclerosis – i.e., carotid intima-media thickness (CIMT) and carotid artery elasticity (CAE) – beyond classical risk factors. Subjects and Methods We genotyped 24 variants found in a population of European ancestry and measured CIMT and CAE in 2001 and 2007 from 2,081, and 2,015 subjects (aged 30–45 years in 2007) respectively, participating in the Cardiovascular Risk in Young Finns Study (YFS). The Bogalusa Heart Study (BHS; n = 1179) was used as a replication cohort (mean age of 37.5). For additional replication, a sub-sample of 5 SNPs was genotyped for 1,291 individuals aged 46–76 years participating in the Health 2000 population survey. We tested the impact of genetic risk score (GRS24SNP/CAD) calculated as a weighted (by allelic odds ratios for CAD) sum of CAD risk alleles from the studied 24 variants on CIMT, CAE, the incidence of carotid atherosclerosis and the progression of CIMT and CAE during a 6-year follow-up. Results CIMT or CAE did not significantly associate with GRS24SNP/CAD before or after adjusting for classical CAD risk factors (p>0.05 for all) in YFS or in the BHS. CIMT and CAE associated with only one SNP each in the YFS. The findings were not replicated in the replication cohorts. In the meta-analysis CIMT or CAE did not associate with any of the SNPs. Conclusion Genetic profiling, by using known CAD risk variants, should not improve risk stratification for subclinical atherosclerosis beyond conventional risk factors among healthy young adults.
Collapse
|
32
|
Abstract
Like most complex phenotypes, exceptional longevity is thought to reflect a combined influence of environmental (e.g., lifestyle choices, where we live) and genetic factors. To explore the genetic contribution, we undertook a genome-wide association study of exceptional longevity in 801 centenarians (median age at death 104 years) and 914 genetically matched healthy controls. Using these data, we built a genetic model that includes 281 single nucleotide polymorphisms (SNPs) and discriminated between cases and controls of the discovery set with 89% sensitivity and specificity, and with 58% specificity and 60% sensitivity in an independent cohort of 341 controls and 253 genetically matched nonagenarians and centenarians (median age 100 years). Consistent with the hypothesis that the genetic contribution is largest with the oldest ages, the sensitivity of the model increased in the independent cohort with older and older ages (71% to classify subjects with an age at death>102 and 85% to classify subjects with an age at death>105). For further validation, we applied the model to an additional, unmatched 60 centenarians (median age 107 years) resulting in 78% sensitivity, and 2863 unmatched controls with 61% specificity. The 281 SNPs include the SNP rs2075650 in TOMM40/APOE that reached irrefutable genome wide significance (posterior probability of association = 1) and replicated in the independent cohort. Removal of this SNP from the model reduced the accuracy by only 1%. Further in-silico analysis suggests that 90% of centenarians can be grouped into clusters characterized by different “genetic signatures” of varying predictive values for exceptional longevity. The correlation between 3 signatures and 3 different life spans was replicated in the combined replication sets. The different signatures may help dissect this complex phenotype into sub-phenotypes of exceptional longevity.
Collapse
|
33
|
Cosgun E, Limdi NA, Duarte CW. High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans. ACTA ACUST UNITED AC 2011; 27:1384-9. [PMID: 21450715 DOI: 10.1093/bioinformatics/btr159] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION With complex traits and diseases having potential genetic contributions of thousands of genetic factors, and with current genotyping arrays consisting of millions of single nucleotide polymorphisms (SNPs), powerful high-dimensional statistical techniques are needed to comprehensively model the genetic variance. Machine learning techniques have many advantages including lack of parametric assumptions, and high power and flexibility. RESULTS We have applied three machine learning approaches: Random Forest Regression (RFR), Boosted Regression Tree (BRT) and Support Vector Regression (SVR) to the prediction of warfarin maintenance dose in a cohort of African Americans. We have developed a multi-step approach that selects SNPs, builds prediction models with different subsets of selected SNPs along with known associated genetic and environmental variables and tests the discovered models in a cross-validation framework. Preliminary results indicate that our modeling approach gives much higher accuracy than previous models for warfarin dose prediction. A model size of 200 SNPs (in addition to the known genetic and environmental variables) gives the best accuracy. The R(2) between the predicted and actual square root of warfarin dose in this model was on average 66.4% for RFR, 57.8% for SVR and 56.9% for BRT. Thus RFR had the best accuracy, but all three techniques achieved better performance than the current published R(2) of 43% in a sample of mixed ethnicity, and 27% in an African American sample. In summary, machine learning approaches for high-dimensional pharmacogenetic prediction, and for prediction of clinical continuous traits of interest, hold great promise and warrant further research.
Collapse
Affiliation(s)
- Erdal Cosgun
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | | | | |
Collapse
|
34
|
Mohás M, Kisfali P, Járomi L, Maász A, Fehér E, Csöngei V, Polgár N, Sáfrány E, Cseh J, Sümegi K, Hetyésy K, Wittmann I, Melegh B. GCKR gene functional variants in type 2 diabetes and metabolic syndrome: do the rare variants associate with increased carotid intima-media thickness? Cardiovasc Diabetol 2010; 9:79. [PMID: 21114848 PMCID: PMC3009616 DOI: 10.1186/1475-2840-9-79] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Accepted: 11/29/2010] [Indexed: 02/06/2023] Open
Abstract
Background Recent studies revealed that glucokinase regulatory protein (GCKR) variants (rs780094 and rs1260326) are associated with serum triglycerides and plasma glucose levels. Here we analyzed primarily the association of these two variants with the lipid profile and plasma glucose levels in Hungarian subjects with type 2 diabetes mellitus and metabolic syndrome; and also correlated the genotypes with the carotid intima-media thickness records. Methods A total of 321 type 2 diabetic patients, 455 metabolic syndrome patients, and 172 healthy controls were genotyped by PCR-RFLP. Results Both GCKR variants were found to associate with serum triglycerides and with fasting plasma glucose. However, significant association with the development of type 2 diabetes mellitus and metabolic syndrome could not be observed. Analyzing the records of the patients, a positive association of prevalence the GCKR homozygous functional variants and carotid intima-media thickness was found in the metabolic syndrome patients. Conclusions Our results support that rs780094 and rs1260326 functional variants of the GCKR gene are inversely associated with serum triglycerides and fasting plasma glucose levels, as it was already reported for diabetic and metabolic syndrome patients in some other populations. Besides this positive replication, as a novel feature, our preliminary findings also suggest a cardiovascular risk role of the GCKR minor allele carriage based on the carotid intima-media thickness association.
Collapse
Affiliation(s)
- Márton Mohás
- 2nd Department of Medicine and Nephrological Center, University of Pécs, Pécs, Hungary.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|