1
|
Rehman A, Mujahid M, Saba T, Jeon G. Optimised stacked machine learning algorithms for genomics and genetics disorder detection in the healthcare industry. Funct Integr Genomics 2024; 24:23. [PMID: 38305949 DOI: 10.1007/s10142-024-01289-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/22/2023] [Accepted: 01/02/2024] [Indexed: 02/03/2024]
Abstract
With recent advances in precision medicine and healthcare computing, there is an enormous demand for developing machine learning algorithms in genomics to enhance the rapid analysis of disease disorders. Technological advancement in genomics and imaging provides clinicians with enormous amounts of data, but prediction is still mostly subjective, resulting in problematic medical treatment. Machine learning is being employed in several domains of the healthcare sector, encompassing clinical research, early disease identification, and medicinal innovation with a historical perspective. The main objective of this study is to detect patients who, based on several medical standards, are more susceptible to having a genetic disorder. A genetic disease prediction algorithm was employed, leveraging the patient's health history to evaluate the probability of diagnosing a genetic disorder. We developed a computationally efficient machine learning approach to predict the overall lifespan of patients with a genomics disorder and to classify and predict patients with a genetic disease. The SVM, RF, and ETC are stacked using two-layer meta-estimators to develop the proposed model. The first layer comprises all the baseline models employed to predict the outcomes based on the dataset. The second layer comprises a component known as a meta-classifier. Results from the experiment indicate that the model achieved an accuracy of 90.45% and a recall score of 90.19%. The area under the curve (AUC) for mitochondrial diseases is 98.1%; for multifactorial diseases, it is 97.5%; and for single-gene inheritance, it is 98.8%. The proposed approach presents a novel method for predicting patient prognosis in a manner that is unbiased, accurate, and comprehensive. The proposed approach outperforms human professionals using the current clinical standard for genetic disease classification in terms of identification accuracy. The implementation of stacked will significantly improve the field of biomedical research by improving the anticipation of genetic diseases.
Collapse
Affiliation(s)
- Amjad Rehman
- Artificial Intelligence & Data Analytics Lab, CCIS, Prince Sultan University, Riyadh, 11586, Saudi Arabia
| | - Muhammad Mujahid
- Artificial Intelligence & Data Analytics Lab, CCIS, Prince Sultan University, Riyadh, 11586, Saudi Arabia
| | - Tanzila Saba
- Artificial Intelligence & Data Analytics Lab, CCIS, Prince Sultan University, Riyadh, 11586, Saudi Arabia
| | - Gwanggil Jeon
- Artificial Intelligence & Data Analytics Lab, CCIS, Prince Sultan University, Riyadh, 11586, Saudi Arabia.
- Department of Embedded Systems Engineering, Incheon National University, Incheon, 610101, Korea.
| |
Collapse
|
2
|
Barnett EJ, Onete DG, Salekin A, Faraone SV. Genomic Machine Learning Meta-regression: Insights on Associations of Study Features With Reported Model Performance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:169-177. [PMID: 38109236 DOI: 10.1109/tcbb.2023.3343808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Many studies have been conducted with the goal of correctly predicting diagnostic status of a disorder using the combination of genomic data and machine learning. It is often hard to judge which components of a study led to better results and whether better reported results represent a true improvement or an uncorrected bias inflating performance. We extracted information about the methods used and other differentiating features in genomic machine learning models. We used these features in linear regressions predicting model performance. We tested for univariate and multivariate associations as well as interactions between features. Of the models reviewed, 46% used feature selection methods that can lead to data leakage. Across our models, the number of hyperparameter optimizations reported, data leakage due to feature selection, model type, and modeling an autoimmune disorder were significantly associated with an increase in reported model performance. We found a significant, negative interaction between data leakage and training size. Our results suggest that methods susceptible to data leakage are prevalent among genomic machine learning research, resulting in inflated reported performance. Best practice guidelines that promote the avoidance and recognition of data leakage may help the field avoid biased results.
Collapse
|
3
|
Sigala RE, Lagou V, Shmeliov A, Atito S, Kouchaki S, Awais M, Prokopenko I, Mahdi A, Demirkan A. Machine Learning to Advance Human Genome-Wide Association Studies. Genes (Basel) 2023; 15:34. [PMID: 38254924 PMCID: PMC10815885 DOI: 10.3390/genes15010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
Machine learning, including deep learning, reinforcement learning, and generative artificial intelligence are revolutionising every area of our lives when data are made available. With the help of these methods, we can decipher information from larger datasets while addressing the complex nature of biological systems in a more efficient way. Although machine learning methods have been introduced to human genetic epidemiological research as early as 2004, those were never used to their full capacity. In this review, we outline some of the main applications of machine learning to assigning human genetic loci to health outcomes. We summarise widely used methods and discuss their advantages and challenges. We also identify several tools, such as Combi, GenNet, and GMSTool, specifically designed to integrate these methods for hypothesis-free analysis of genetic variation data. We elaborate on the additional value and limitations of these tools from a geneticist's perspective. Finally, we discuss the fast-moving field of foundation models and large multi-modal omics biobank initiatives.
Collapse
Affiliation(s)
- Rafaella E. Sigala
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Vasiliki Lagou
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Aleksey Shmeliov
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Sara Atito
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Samaneh Kouchaki
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Muhammad Awais
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Inga Prokopenko
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| | - Adam Mahdi
- Oxford Internet Institute, University of Oxford, Oxford OX1 3JS, Oxfordshire, UK;
| | - Ayse Demirkan
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| |
Collapse
|
4
|
Susmitha P, Kumar P, Yadav P, Sahoo S, Kaur G, Pandey MK, Singh V, Tseng TM, Gangurde SS. Genome-wide association study as a powerful tool for dissecting competitive traits in legumes. FRONTIERS IN PLANT SCIENCE 2023; 14:1123631. [PMID: 37645459 PMCID: PMC10461012 DOI: 10.3389/fpls.2023.1123631] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 06/08/2023] [Indexed: 08/31/2023]
Abstract
Legumes are extremely valuable because of their high protein content and several other nutritional components. The major challenge lies in maintaining the quantity and quality of protein and other nutritional compounds in view of climate change conditions. The global need for plant-based proteins has increased the demand for seeds with a high protein content that includes essential amino acids. Genome-wide association studies (GWAS) have evolved as a standard approach in agricultural genetics for examining such intricate characters. Recent development in machine learning methods shows promising applications for dimensionality reduction, which is a major challenge in GWAS. With the advancement in biotechnology, sequencing, and bioinformatics tools, estimation of linkage disequilibrium (LD) based associations between a genome-wide collection of single-nucleotide polymorphisms (SNPs) and desired phenotypic traits has become accessible. The markers from GWAS could be utilized for genomic selection (GS) to predict superior lines by calculating genomic estimated breeding values (GEBVs). For prediction accuracy, an assortment of statistical models could be utilized, such as ridge regression best linear unbiased prediction (rrBLUP), genomic best linear unbiased predictor (gBLUP), Bayesian, and random forest (RF). Both naturally diverse germplasm panels and family-based breeding populations can be used for association mapping based on the nature of the breeding system (inbred or outbred) in the plant species. MAGIC, MCILs, RIAILs, NAM, and ROAM are being used for association mapping in several crops. Several modifications of NAM, such as doubled haploid NAM (DH-NAM), backcross NAM (BC-NAM), and advanced backcross NAM (AB-NAM), have also been used in crops like rice, wheat, maize, barley mustard, etc. for reliable marker-trait associations (MTAs), phenotyping accuracy is equally important as genotyping. Highthroughput genotyping, phenomics, and computational techniques have advanced during the past few years, making it possible to explore such enormous datasets. Each population has unique virtues and flaws at the genomics and phenomics levels, which will be covered in more detail in this review study. The current investigation includes utilizing elite breeding lines as association mapping population, optimizing the choice of GWAS selection, population size, and hurdles in phenotyping, and statistical methods which will analyze competitive traits in legume breeding.
Collapse
Affiliation(s)
- Pusarla Susmitha
- Regional Agricultural Research Station, Acharya N.G. Ranga Agricultural University, Andhra Pradesh, India
| | - Pawan Kumar
- Department of Genetics and Plant Breeding, College of Agriculture, Chaudhary Charan Singh (CCS) Haryana Agricultural University, Hisar, India
| | - Pankaj Yadav
- Department of Bioscience and Bioengineering, Indian Institute of Technology, Rajasthan, India
| | - Smrutishree Sahoo
- Department of Genetics and Plant Breeding, School of Agriculture, Gandhi Institute of Engineering and Technology (GIET) University, Odisha, India
| | - Gurleen Kaur
- Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Manish K. Pandey
- Department of Genomics, Prebreeding and Bioinformatics, International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India
| | - Varsha Singh
- Department of Plant and Soil Sciences, Mississippi State University, Starkville, MS, United States
| | - Te Ming Tseng
- Department of Plant and Soil Sciences, Mississippi State University, Starkville, MS, United States
| | - Sunil S. Gangurde
- Department of Plant Pathology, University of Georgia, Tifton, GA, United States
| |
Collapse
|
5
|
Kim R, Lin T, Pang G, Liu Y, Tungate AS, Hendry PL, Kurz MC, Peak DA, Jones J, Rathlev NK, Swor RA, Domeier R, Velilla MA, Lewandowski C, Datner E, Pearson C, Lee D, Mitchell PM, McLean SA, Linnstaedt SD. Derivation and validation of risk prediction for posttraumatic stress symptoms following trauma exposure. Psychol Med 2023; 53:4952-4961. [PMID: 35775366 DOI: 10.1017/s003329172200191x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND Posttraumatic stress symptoms (PTSS) are common following traumatic stress exposure (TSE). Identification of individuals with PTSS risk in the early aftermath of TSE is important to enable targeted administration of preventive interventions. In this study, we used baseline survey data from two prospective cohort studies to identify the most influential predictors of substantial PTSS. METHODS Self-identifying black and white American women and men (n = 1546) presenting to one of 16 emergency departments (EDs) within 24 h of motor vehicle collision (MVC) TSE were enrolled. Individuals with substantial PTSS (⩾33, Impact of Events Scale - Revised) 6 months after MVC were identified via follow-up questionnaire. Sociodemographic, pain, general health, event, and psychological/cognitive characteristics were collected in the ED and used in prediction modeling. Ensemble learning methods and Monte Carlo cross-validation were used for feature selection and to determine prediction accuracy. External validation was performed on a hold-out sample (30% of total sample). RESULTS Twenty-five percent (n = 394) of individuals reported PTSS 6 months following MVC. Regularized linear regression was the top performing learning method. The top 30 factors together showed good reliability in predicting PTSS in the external sample (Area under the curve = 0.79 ± 0.002). Top predictors included acute pain severity, recovery expectations, socioeconomic status, self-reported race, and psychological symptoms. CONCLUSIONS These analyses add to a growing literature indicating that influential predictors of PTSS can be identified and risk for future PTSS estimated from characteristics easily available/assessable at the time of ED presentation following TSE.
Collapse
Affiliation(s)
- Raphael Kim
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA
| | - Tina Lin
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
| | - Gehao Pang
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
| | - Yufeng Liu
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, Carolina Center for Genome Sciences, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA
| | - Andrew S Tungate
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
| | - Phyllis L Hendry
- Department of Emergency Medicine, University of Florida College of Medicine, Jacksonville, FL, USA
| | - Michael C Kurz
- Department of Emergency Medicine, University of Alabama, Birmingham, AL, USA
| | - David A Peak
- Department of Emergency Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Jeffrey Jones
- Department of Emergency Medicine, Spectrum Health Butterworth Campus, Grand Rapids, MI, USA
| | - Niels K Rathlev
- Department of Emergency Medicine, Baystate State Health System, Springfield, MA, USA
| | - Robert A Swor
- Department of Emergency Medicine, Beaumont Hospital, Royal Oak, MI, USA
| | - Robert Domeier
- Department of Emergency Medicine, St Joseph Mercy Health System, Ann Arbor, MI, USA
| | | | | | - Elizabeth Datner
- Department of Emergency Medicine, Albert Einstein Medical Center, Philadelphia, PA, USA
| | - Claire Pearson
- Department of Emergency Medicine, Detroit Receiving, Detroit, MI, USA
| | - David Lee
- Department of Emergency Medicine, North Shore University Hospital, Manhasset, NY, USA
| | - Patricia M Mitchell
- Department of Emergency Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Samuel A McLean
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
- Department of Emergency Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Sarah D Linnstaedt
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
6
|
Alzoubi H, Alzubi R, Ramzan N. Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094439. [PMID: 37177642 PMCID: PMC10181706 DOI: 10.3390/s23094439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/05/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Genome-wide association studies have proven their ability to improve human health outcomes by identifying genotypes associated with phenotypes. Various works have attempted to predict the risk of diseases for individuals based on genotype data. This prediction can either be considered as an analysis model that can lead to a better understanding of gene functions that underlie human disease or as a black box in order to be used in decision support systems and in early disease detection. Deep learning techniques have gained more popularity recently. In this work, we propose a deep-learning framework for disease risk prediction. The proposed framework employs a multilayer perceptron (MLP) in order to predict individuals' disease status. The proposed framework was applied to the Wellcome Trust Case-Control Consortium (WTCCC), the UK National Blood Service (NBS) Control Group, and the 1958 British Birth Cohort (58C) datasets. The performance comparison of the proposed framework showed that the proposed approach outperformed the other methods in predicting disease risk, achieving an area under the curve (AUC) up to 0.94.
Collapse
Affiliation(s)
- Hadeel Alzoubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Raid Alzubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High Street, Paisley PA1 2BE, UK
| |
Collapse
|
7
|
Liu Y, Lin D, Li L, Chen Y, Wen J, Lin Y, He X. Using machine-learning algorithms to identify patients at high risk of upper gastrointestinal lesions for endoscopy. J Gastroenterol Hepatol 2021; 36:2735-2744. [PMID: 33929063 DOI: 10.1111/jgh.15530] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 03/13/2021] [Accepted: 04/25/2021] [Indexed: 01/13/2023]
Abstract
BACKGROUND AND AIM Endoscopic screening for early detection of upper gastrointestinal (UGI) lesions is important. However, population-based endoscopic screening is difficult to implement in populous countries. By identifying high-risk individuals from the general population, the screening targets can be narrowed to individuals who are in most need of an endoscopy. This study was designed to develop an artificial intelligence (AI)-based model to predict patient risk of UGI lesions to identify high-risk individuals for endoscopy. METHODS A total of 620 patients (from 5300 participants) were equally allocated into 10 parts for 10-fold cross validation experiments. The machine-learning predictive models for UGI lesion risk were constructed using random forest, logistic regression, decision tree, and support vector machine (SVM) algorithms. A total of 48 variables covering lifestyles, social-economic status, clinical symptoms, serological results, and pathological data were used in the model construction. RESULTS The accuracies of the four models were between 79.3% and 93.4% in the training set and between 77.2% and 91.2% in the testing dataset (logistics regression: 77.2%; decision tree: 87.3%; random forest: 88.2%; SVM: 91.2%;). The AUCs of four models showed impressive predictive ability. Comparing the four models with the different algorithms, the SVM model featured the best sensitivity and specificity in all datasets tested. CONCLUSIONS Machine-learning algorithms can accurately and reliably predict the risk of UGI lesions based on readily available parameters. The predictive models have the potential to be used clinically for identifying patients with high risk of UGI lesions and stratifying patients for necessary endoscopic screening.
Collapse
Affiliation(s)
- Yongjia Liu
- Department of Gastroenterology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
| | - Da Lin
- Department of Gastroenterology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
| | - Lan Li
- Department of Gastroenterology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
| | - Yu Chen
- Department of Gastroenterology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
| | - Jiayao Wen
- Department of Gastroenterology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
| | - Yiguang Lin
- School of Life Sciences, University of Technology Sydney, Broadway, New South Wales, Australia
| | - Xingxiang He
- Department of Gastroenterology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
| |
Collapse
|
8
|
Fadason T, Farrow S, Gokuladhas S, Golovina E, Nyaga D, O'Sullivan JM, Schierding W. Assigning function to SNPs: Considerations when interpreting genetic variation. Semin Cell Dev Biol 2021; 121:135-142. [PMID: 34446357 DOI: 10.1016/j.semcdb.2021.08.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 08/12/2021] [Indexed: 12/26/2022]
Abstract
Assigning function to single nucleotide polymorphisms (SNPs) to understand the mechanisms that link genetic and phenotypic variation and disease is an area of intensive research that is necessary to contribute to the continuing development of precision medicine. However, despite the apparent simplicity that is captured in the name SNP - 'single nucleotide' changes are not easy to functionally characterize. This complexity arises from multiple features of the genome including the fact that function is development and environment specific. As such, we are often fooled by our terminology and underlying assumptions that there is a single function for a SNP. Here we discuss some of what is known about SNPs, their functions and how we can go about characterizing them.
Collapse
Affiliation(s)
- Tayaza Fadason
- Liggins Institute, The University of Auckland, Auckland, New Zealand; The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | - Sophie Farrow
- Liggins Institute, The University of Auckland, Auckland, New Zealand; The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | | | - Evgeniia Golovina
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Denis Nyaga
- Liggins Institute, The University of Auckland, Auckland, New Zealand
| | - Justin M O'Sullivan
- Liggins Institute, The University of Auckland, Auckland, New Zealand; The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand; Garvan Institute of Medical Research, Sydney, New South Wales, Australia; MRC Lifecourse Epidemiology Unit, University of Southampton, United Kingdom.
| | - William Schierding
- Liggins Institute, The University of Auckland, Auckland, New Zealand; The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| |
Collapse
|
9
|
Bisht A, Kamble MP, Choudhary P, Chaturvedi K, Kohli G, Juneja VK, Sehgal S, Taneja NK. A surveillance of food borne disease outbreaks in India: 2009–2018. Food Control 2021. [DOI: 10.1016/j.foodcont.2020.107630] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
10
|
Comparison of regression imputation methods of baseline covariates that predict survival outcomes. J Clin Transl Sci 2020; 5:e40. [PMID: 33948262 PMCID: PMC8057424 DOI: 10.1017/cts.2020.533] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Introduction: Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typically biased output of complete case analysis. This article examines several types of regression imputation of missing covariates in the prediction of time-to-event outcomes subject to right censoring. Methods: We evaluated the performance of five regression methods in the imputation of missing covariates for the proportional hazards model via summary statistics, including proportional bias and proportional mean squared error. The primary objective was to determine which among the parametric generalized linear models (GLMs) and least absolute shrinkage and selection operator (LASSO), and nonparametric multivariate adaptive regression splines (MARS), support vector machine (SVM), and random forest (RF), provides the “best” imputation model for baseline missing covariates in predicting a survival outcome. Results: LASSO on an average observed the smallest bias, mean square error, mean square prediction error, and median absolute deviation (MAD) of the final analysis model’s parameters among all five methods considered. SVM performed the second best while GLM and MARS exhibited the lowest relative performances. Conclusion: LASSO and SVM outperform GLM, MARS, and RF in the context of regression imputation for prediction of a time-to-event outcome.
Collapse
|
11
|
The basics of data, big data, and machine learning in clinical practice. Clin Rheumatol 2020; 40:11-23. [PMID: 32504192 DOI: 10.1007/s10067-020-05196-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 05/05/2020] [Accepted: 05/20/2020] [Indexed: 12/29/2022]
Abstract
Health informatics and biomedical computing have introduced the use of computer methods to analyze clinical information and provide tools to assist clinicians during the diagnosis and treatment of diverse clinical conditions. With the amount of information that can be obtained in the healthcare setting, new methods to acquire, organize, and analyze the data are being developed each day, including new applications in the world of big data and machine learning. In this review, first we present the most basic concepts in data science, including the structural hierarchy of information and how it is managed. A section is dedicated to discussing topics relevant to the acquisition of data, importantly the availability and use of online resources such as survey software and cloud computing services. Along with digital datasets, these tools make it possible to create more diverse models and facilitate collaboration. After, we describe concepts and techniques in machine learning used to process and analyze health data, especially those most widely applied in rheumatology. Overall, the objective of this review is to aid in the comprehension of how data science is used in health, with a special emphasis on the relevance to the field of rheumatology. It provides clinicians with basic tools on how to approach and understand new trends in health informatics analysis currently being used in rheumatology practice. If clinicians understand the potential use and limitations of health informatics, this will facilitate interdisciplinary conversations and continued projects relating to data, big data, and machine learning.
Collapse
|
12
|
Xu Y, Cao L, Zhao X, Yao Y, Liu Q, Zhang B, Wang Y, Mao Y, Ma Y, Ma JZ, Payne TJ, Li MD, Li L. Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches. Front Psychiatry 2020; 11:416. [PMID: 32477189 PMCID: PMC7241440 DOI: 10.3389/fpsyt.2020.00416] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Accepted: 04/23/2020] [Indexed: 12/22/2022] Open
Abstract
Smoking is a complex behavior with a heritability as high as 50%. Given such a large genetic contribution, it provides an opportunity to prevent those individuals who are susceptible to smoking dependence from ever starting to smoke by predicting their inherited predisposition with their genomic profiles. Although previous studies have identified many susceptibility variants for smoking, they have limited power to predict smoking behavior. We applied the support vector machine (SVM) and random forest (RF) methods to build prediction models for smoking behavior. We first used 1,431 smokers and 1,503 non-smokers of African origin for model building with a 10-fold cross-validation and then tested the prediction models on an independent dataset consisting of 213 smokers and 224 non-smokers. The SVM model with 500 top single nucleotide polymorphisms (SNPs) selected using logistic regression (p<0.01) as the feature selection method achieved an area under the curve (AUC) of 0.691, 0.721, and 0.720 for the training, test, and independent test samples, respectively. The RF model with 500 top SNPs selected using logistic regression (p<0.01) achieved AUCs of 0.671, 0.665, and 0.667 for the training, test, and independent test samples, respectively. Finally, we used the combined logistic (p<0.01) and LASSO (λ=10-3) regression to select features and the SVM algorithm for model building. The SVM model with 500 top SNPs achieved AUCs of 0.756, 0.776, and 0.897 for the training, test, and independent test samples, respectively. We conclude that machine learning methods are promising means to build predictive models for smoking.
Collapse
Affiliation(s)
- Yi Xu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Liyu Cao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xinyi Zhao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yinghao Yao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qiang Liu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Bin Zhang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yan Wang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Ying Mao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yunlong Ma
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jennie Z Ma
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States
| | - Thomas J Payne
- Department of Otolaryngology and Communicative Sciences, University of Mississippi Medical Center, Jackson, MS, United States
| | - Ming D Li
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China
| | - Lanjuan Li
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
13
|
Grinberg NF, Orhobor OI, King RD. An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat. Mach Learn 2019; 109:251-277. [PMID: 32174648 PMCID: PMC7048706 DOI: 10.1007/s10994-019-05848-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Revised: 09/17/2019] [Accepted: 09/19/2019] [Indexed: 11/01/2022]
Abstract
In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.
Collapse
Affiliation(s)
- Nastasiya F. Grinberg
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL UK
- Present Address: Department of Medicine, Cambridge Institute of Therapeutic Immunology & Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, CB2 0AW UK
| | | | - Ross D. King
- Department of Biology and Biological Engineering, Division of Systems and Synthetic Biology, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden
| |
Collapse
|
14
|
Tacconelli E, Górska A, De Angelis G, Lammens C, Restuccia G, Schrenzel J, Huson DH, Carević B, Preoţescu L, Carmeli Y, Kazma M, Spanu T, Carrara E, Malhotra-Kumar S, Gladstone BP. Estimating the association between antibiotic exposure and colonization with extended-spectrum β-lactamase-producing Gram-negative bacteria using machine learning methods: a multicentre, prospective cohort study. Clin Microbiol Infect 2019; 26:87-94. [PMID: 31128285 DOI: 10.1016/j.cmi.2019.05.013] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 04/20/2019] [Accepted: 05/13/2019] [Indexed: 10/26/2022]
Abstract
OBJECTIVES The aim of the study was to measure the impact of antibiotic exposure on the acquisition of colonization with extended-spectrum β-lactamase-producing Gram-negative bacteria (ESBL-GNB) accounting for individual- and group-level confounding using machine-learning methods. METHODS Patients hospitalized between September 2010 and June 2013 at six medical and six surgical wards in Italy, Serbia and Romania were screened for ESBL-GNB at hospital admission, discharge, antibiotic start, and after 3, 7, 15 and 30 days. Primary outcomes were the incidence rate and predictive factors of new ESBL-GNB colonization. Random forest algorithm was used to rank antibiotics according to the risk of selection of ESBL-GNB colonization in patients not colonized before starting antibiotics. RESULTS We screened 10 034 patients collecting 28 322 rectal swab samples. New ESBL-GNB colonization incidence with and without antibiotic treatment was 22/1000 and 9/1000 exposure-days, respectively. In the adjusted regression analyses, antibiotic exposure (hazard ratio (HR) 2.38; 95% CI 1.29-4.40), age 60-69 years (HR 1.19; 95% CI 1.05-1.34), and spring season (HR 1.25; 95% CI 1.14-1.38) were independently associated with new colonization. Monotherapy ranked higher als combination therapy in promoting ESBL-GNB colonization. Among monotherapy, cephalosporins ranked first followed by tetracycline (second), macrolide (fourth) and cotrimoxazole (seventh). Overall the ranking of cephalosporins was lower when used in combination. Among combinations not including cephalosporins, quinolones plus carbapenems ranked highest (eighth). Among sequential therapies, quinolones ranked highest (tenth) when prescribed within 30 days of therapy with cephalosporins. CONCLUSIONS Impact of antibiotics on selecting ESBL-GNB at intestinal level varies if used in monotherapy or combination and according to previous antibiotic exposure. These finding should be explored in future clinical trials on antibiotic stewardship interventions. CLINICAL TRIAL REGISTRATION NCT01208519.
Collapse
Affiliation(s)
- E Tacconelli
- Division of Infectious Disease, Department of Internal Medicine I, Tübingen University Hospital, Tübingen, Germany; Division of Infectious Diseases, Department of Diagnostic and Public Health, University of Verona, Italy.
| | - A Górska
- Algorithms in Bioinformatics, University of Tübingen and International Max Planck Research School, Tübingen, Germany
| | - G De Angelis
- Institute of Microbiology, Fondazione Policlinico Universitario A. Gemelli IRCCS - Università Cattolica del Sacro Cuore, Rome, Italy
| | - C Lammens
- Laboratory of Medical Microbiology, Vaccine & Infectious Disease Institute, University of Antwerp, Antwerp, Belgium
| | - G Restuccia
- Department of Anaesthesiology and Intensive Care Medicine, University of Catania, Catania, Italy
| | - J Schrenzel
- Bacteriology Laboratory, Service of Infectious Diseases, University of Geneva Hospitals and Medical Faculty, Geneva, Switzerland
| | - D H Huson
- Algorithms in Bioinformatics, University of Tübingen and International Max Planck Research School, Tübingen, Germany
| | - B Carević
- Department for Hospital Epidemiology, Clinical Centre of Serbia, Belgrade, Serbia
| | - L Preoţescu
- National Institute for Infectious Diseases, University of Medicine 'Carol Davila', Bucharest, Romania
| | - Y Carmeli
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; National Centre for Infection Control, Israel Ministry of Health, Tel Aviv, Israel
| | - M Kazma
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; National Centre for Infection Control, Israel Ministry of Health, Tel Aviv, Israel
| | - T Spanu
- Institute of Microbiology, Fondazione Policlinico Universitario A. Gemelli IRCCS - Università Cattolica del Sacro Cuore, Rome, Italy
| | - E Carrara
- Division of Infectious Diseases, Department of Diagnostic and Public Health, University of Verona, Italy
| | - S Malhotra-Kumar
- Laboratory of Medical Microbiology, Vaccine & Infectious Disease Institute, University of Antwerp, Antwerp, Belgium
| | - B P Gladstone
- Division of Infectious Disease, Department of Internal Medicine I, Tübingen University Hospital, Tübingen, Germany
| |
Collapse
|
15
|
Kim M, Tagkopoulos I. Data integration and predictive modeling methods for multi-omics datasets. Mol Omics 2018; 14:8-25. [DOI: 10.1039/c7mo00051k] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
We provide an overview of opportunities and challenges in multi-omics predictive analytics with particular emphasis on data integration and machine learning methods.
Collapse
Affiliation(s)
- Minseung Kim
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| | - Ilias Tagkopoulos
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| |
Collapse
|
16
|
Gallego V, Luz Calle M, Oller R. Kernel-Based Measure of Variable Importance for Genetic Association Studies. Int J Biostat 2017. [PMID: 28628480 DOI: 10.1515/ijb-2016-0087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The identification of genetic variants that are associated with disease risk is an important goal of genetic association studies. Standard approaches perform univariate analysis where each genetic variant, usually Single Nucleotide Polymorphisms (SNPs), is tested for association with disease status. Though many genetic variants have been identified and validated so far using this univariate approach, for most complex diseases a large part of their genetic component is still unknown, the so called missing heritability. We propose a Kernel-based measure of variable importance (KVI) that provides the contribution of a SNP, or a group of SNPs, to the joint genetic effect of a set of genetic variants. KVI can be used for ranking genetic markers individually, sets of markers that form blocks of linkage disequilibrium or sets of genetic variants that lie in a gene or a genetic pathway. We prove that, unlike the univariate analysis, KVI captures the relationship with other genetic variants in the analysis, even when measured at the individual level for each genetic variable separately. This is specially relevant and powerful for detecting genetic interactions. We illustrate the results with data from an Alzheimer's disease study and show through simulations that the rankings based on KVI improve those rankings based on two measures of importance provided by the Random Forest. We also prove with a simulation study that KVI is very powerful for detecting genetic interactions.
Collapse
|
17
|
Eriksson D, Bianchi M, Landegren N, Nordin J, Dalin F, Mathioudaki A, Eriksson GN, Hultin-Rosenberg L, Dahlqvist J, Zetterqvist H, Karlsson Å, Hallgren Å, Farias FHG, Murén E, Ahlgren KM, Lobell A, Andersson G, Tandre K, Dahlqvist SR, Söderkvist P, Rönnblom L, Hulting AL, Wahlberg J, Ekwall O, Dahlqvist P, Meadows JRS, Bensing S, Lindblad-Toh K, Kämpe O, Pielberg GR. Extended exome sequencing identifies BACH2 as a novel major risk locus for Addison's disease. J Intern Med 2016; 280:595-608. [PMID: 27807919 DOI: 10.1111/joim.12569] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
BACKGROUND Autoimmune disease is one of the leading causes of morbidity and mortality worldwide. In Addison's disease, the adrenal glands are targeted by destructive autoimmunity. Despite being the most common cause of primary adrenal failure, little is known about its aetiology. METHODS To understand the genetic background of Addison's disease, we utilized the extensively characterized patients of the Swedish Addison Registry. We developed an extended exome capture array comprising a selected set of 1853 genes and their potential regulatory elements, for the purpose of sequencing 479 patients with Addison's disease and 1394 controls. RESULTS We identified BACH2 (rs62408233-A, OR = 2.01 (1.71-2.37), P = 1.66 × 10-15 , MAF 0.46/0.29 in cases/controls) as a novel gene associated with Addison's disease development. We also confirmed the previously known associations with the HLA complex. CONCLUSION Whilst BACH2 has been previously reported to associate with organ-specific autoimmune diseases co-inherited with Addison's disease, we have identified BACH2 as a major risk locus in Addison's disease, independent of concomitant autoimmune diseases. Our results may enable future research towards preventive disease treatment.
Collapse
Affiliation(s)
- D Eriksson
- Department of Medicine (Solna), Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Department of Endocrinology, Metabolism and Diabetes Karolinska University Hospital, Stockholm, Sweden
| | - M Bianchi
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - N Landegren
- Department of Medicine (Solna), Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - J Nordin
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - F Dalin
- Department of Medicine (Solna), Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - A Mathioudaki
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - G N Eriksson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - L Hultin-Rosenberg
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - J Dahlqvist
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - H Zetterqvist
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Å Karlsson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Å Hallgren
- Department of Medicine (Solna), Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - F H G Farias
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - E Murén
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - K M Ahlgren
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - A Lobell
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - G Andersson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - K Tandre
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - S R Dahlqvist
- Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden
| | - P Söderkvist
- Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - L Rönnblom
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - A-L Hulting
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - J Wahlberg
- Department of Endocrinology, Department of Medical and Health Sciences, Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - O Ekwall
- Department of Pediatrics, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.,Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - P Dahlqvist
- Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden
| | - J R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - S Bensing
- Department of Endocrinology, Metabolism and Diabetes Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - K Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - O Kämpe
- Department of Medicine (Solna), Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Department of Endocrinology, Metabolism and Diabetes Karolinska University Hospital, Stockholm, Sweden.,Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - G R Pielberg
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
18
|
Mieth B, Kloft M, Rodríguez JA, Sonnenburg S, Vobruba R, Morcillo-Suárez C, Farré X, Marigorta UM, Fehr E, Dickhaus T, Blanchard G, Schunk D, Navarro A, Müller KR. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies. Sci Rep 2016; 6:36671. [PMID: 27892471 PMCID: PMC5125008 DOI: 10.1038/srep36671] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 10/06/2016] [Indexed: 12/21/2022] Open
Abstract
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
Collapse
Affiliation(s)
- Bettina Mieth
- Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany
| | - Marius Kloft
- Department of Computer Science, Humboldt University of Berlin, Berlin, 10099, Germany
| | - Juan Antonio Rodríguez
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | | | - Robin Vobruba
- Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany
| | - Carlos Morcillo-Suárez
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | - Xavier Farré
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | - Urko M. Marigorta
- School of Biology, Georgia Institute of Technology, Atlanta, 30332, GA, USA
| | - Ernst Fehr
- Department of Economics, Laboratory for Social and Neural Systems Research, University of Zurich, Zurich, 8006, Switzerland
| | - Thorsten Dickhaus
- Institute for Statistics (FB 3), University of Bremen, Bremen, 28359, Germany
| | - Gilles Blanchard
- Department of Mathematics, University of Potsdam, Potsdam, 14476, Germany
| | - Daniel Schunk
- Department of Economics, University of Mainz, Mainz, 55099, Germany
| | - Arcadi Navarro
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, 08010, Spain
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, 08003, Spain
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany
- Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea
| |
Collapse
|
19
|
Thottakkara P, Ozrazgat-Baslanti T, Hupf BB, Rashidi P, Pardalos P, Momcilovic P, Bihorac A. Application of Machine Learning Techniques to High-Dimensional Clinical Data to Forecast Postoperative Complications. PLoS One 2016; 11:e0155705. [PMID: 27232332 PMCID: PMC4883761 DOI: 10.1371/journal.pone.0155705] [Citation(s) in RCA: 111] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 04/05/2016] [Indexed: 11/18/2022] Open
Abstract
Objective To compare performance of risk prediction models for forecasting postoperative sepsis and acute kidney injury. Design Retrospective single center cohort study of adult surgical patients admitted between 2000 and 2010. Patients 50,318 adult patients undergoing major surgery. Measurements We evaluated the performance of logistic regression, generalized additive models, naïve Bayes and support vector machines for forecasting postoperative sepsis and acute kidney injury. We assessed the impact of feature reduction techniques on predictive performance. Model performance was determined using the area under the receiver operating characteristic curve, accuracy, and positive predicted value. The results were reported based on a 70/30 cross validation procedure where the data were randomly split into 70% used for training the model and the 30% for validation. Main Results The areas under the receiver operating characteristic curve for different models ranged between 0.797 and 0.858 for acute kidney injury and between 0.757 and 0.909 for severe sepsis. Logistic regression, generalized additive model, and support vector machines had better performance compared to Naïve Bayes model. Generalized additive models additionally accounted for non-linearity of continuous clinical variables as depicted in their risk patterns plots. Reducing the input feature space with LASSO had minimal effect on prediction performance, while feature extraction using principal component analysis improved performance of the models. Conclusions Generalized additive models and support vector machines had good performance as risk prediction model for postoperative sepsis and AKI. Feature extraction using principal component analysis improved the predictive performance of all models.
Collapse
Affiliation(s)
- Paul Thottakkara
- Department of Anesthesiology, College of Medicine, University of Florida, Gainesville, Florida, United States of America
- Industrial and Systems Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Tezcan Ozrazgat-Baslanti
- Department of Anesthesiology, College of Medicine, University of Florida, Gainesville, Florida, United States of America
| | - Bradley B. Hupf
- Department of Anesthesiology, College of Medicine, University of Florida, Gainesville, Florida, United States of America
| | - Parisa Rashidi
- Biomedical Engineering Department, University of Florida, Gainesville, Florida, United States of America
| | - Panos Pardalos
- Industrial and Systems Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Petar Momcilovic
- Industrial and Systems Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Azra Bihorac
- Department of Anesthesiology, College of Medicine, University of Florida, Gainesville, Florida, United States of America
- * E-mail:
| |
Collapse
|
20
|
Genetic sharing and heritability of paediatric age of onset autoimmune diseases. Nat Commun 2015; 6:8442. [PMID: 26450413 PMCID: PMC4633631 DOI: 10.1038/ncomms9442] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 08/21/2015] [Indexed: 12/21/2022] Open
Abstract
Autoimmune diseases (AIDs) are polygenic diseases affecting 7-10% of the population in the Western Hemisphere with few effective therapies. Here, we quantify the heritability of paediatric AIDs (pAIDs), including JIA, SLE, CEL, T1D, UC, CD, PS, SPA and CVID, attributable to common genomic variations (SNP-h(2)). SNP-h(2) estimates are most significant for T1D (0.863±s.e. 0.07) and JIA (0.727±s.e. 0.037), more modest for UC (0.386±s.e. 0.04) and CD (0.454±0.025), largely consistent with population estimates and are generally greater than that previously reported by adult GWAS. On pairwise analysis, we observed that the diseases UC-CD (0.69±s.e. 0.07) and JIA-CVID (0.343±s.e. 0.13) are the most strongly correlated. Variations across the MHC strongly contribute to SNP-h(2) in T1D and JIA, but does not significantly contribute to the pairwise rG. Together, our results partition contributions of shared versus disease-specific genomic variations to pAID heritability, identifying pAIDs with unexpected risk sharing, while recapitulating known associations between autoimmune diseases previously reported in adult cohorts.
Collapse
|
21
|
Mittag F, Römer M, Zell A. Influence of Feature Encoding and Choice of Classifier on Disease Risk Prediction in Genome-Wide Association Studies. PLoS One 2015; 10:e0135832. [PMID: 26285210 PMCID: PMC4540285 DOI: 10.1371/journal.pone.0135832] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 07/27/2015] [Indexed: 12/31/2022] Open
Abstract
Various attempts have been made to predict the individual disease risk based on genotype data from genome-wide association studies (GWAS). However, most studies only investigated one or two classification algorithms and feature encoding schemes. In this study, we applied seven different classification algorithms on GWAS case-control data sets for seven different diseases to create models for disease risk prediction. Further, we used three different encoding schemes for the genotypes of single nucleotide polymorphisms (SNPs) and investigated their influence on the predictive performance of these models. Our study suggests that an additive encoding of the SNP data should be the preferred encoding scheme, as it proved to yield the best predictive performances for all algorithms and data sets. Furthermore, our results showed that the differences between most state-of-the-art classification algorithms are not statistically significant. Consequently, we recommend to prefer algorithms with simple models like the linear support vector machine (SVM) as they allow for better subsequent interpretation without significant loss of accuracy.
Collapse
Affiliation(s)
- Florian Mittag
- Cognitive Systems Group, University of Tübingen, Tübingen, Germany
- * E-mail:
| | - Michael Römer
- Cognitive Systems Group, University of Tübingen, Tübingen, Germany
| | - Andreas Zell
- Cognitive Systems Group, University of Tübingen, Tübingen, Germany
| |
Collapse
|
22
|
Jeon SH, Jeon EH, Lee JY, Kim YS, Yoon HJ, Hong SP, Lee JH. The potential of interleukin 12 receptor beta 2 (IL12RB2) and tumor necrosis factor receptor superfamily member 8 (TNFRSF8) gene as diagnostic biomarkers of oral lichen planus (OLP). Acta Odontol Scand 2015; 73:588-94. [PMID: 25915578 DOI: 10.3109/00016357.2014.967719] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
OBJECTIVE This study evaluated the potential of interleukin 12 receptor beta 2 and tumor necrosis factor receptor superfamily member 8 as diagnostic biomarkers of oral lichen planus (OLP). MATERIALS AND METHODS The mRNA expression of IL12RB2 and TNFRSF8 in FFPE OLP samples (OLP group, n = 38) were investigated with quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis and compared to those of chronic non-specific mucositis (Non-OLP group, n = 25) and normal mucosa (Normal group, n = 18). Predictive modeling of the expression of IL12RB2 and TNFRSF8 was constructed using support vector machine (SVM), random forest (RF), linear discriminant analysis (LDA), neural network (NN) and naive Bayes (NB) methods. RESULTS Normalized expression of IL12RB2 in the OLP group (3.78 ± 1.67) was significantly higher than the Normal group (1.97 ± 1.12), but lower than the Non-OLP group (6.86 ± 1.67). TNFRSF8 gene expression in the OLP group (7.46 ± 1.51) was significantly higher than the Normal group (2.90 ± 1.61), but no significant difference was found between the OLP and Non-OLP groups. The ratio of IL12RB2/TNFRSF8 in the OLP group (0.52 ± 0.23) was significantly lower than the Normal group (0.74 ± 0.39) and the Non-OLP group (1.07 ± 0.38). In the predictive modeling, the area under receiver operating characteristic (ROC) curves (AUC) ranged from 0.83-0.92 and their accuracy was higher than 0.75 in all methods. CONCLUSIONS The IL12RB2/TNFRSF8 ratio can be a useful diagnostic tool for OLP.
Collapse
Affiliation(s)
- Seung-Ho Jeon
- Department of Oral and Maxillofacial Surgery, School of Dentistry
| | | | | | | | | | | | | |
Collapse
|
23
|
de Oliveira FC, Borges CCH, Almeida FN, e Silva FF, da Silva Verneque R, da Silva MVGB, Arbex W. SNPs selection using support vector regression and genetic algorithms in GWAS. BMC Genomics 2014; 15 Suppl 7:S4. [PMID: 25573332 PMCID: PMC4243330 DOI: 10.1186/1471-2164-15-s7-s4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels.
Collapse
|
24
|
Schlaudraff F, Gründemann J, Fauler M, Dragicevic E, Hardy J, Liss B. Orchestrated increase of dopamine and PARK mRNAs but not miR-133b in dopamine neurons in Parkinson's disease. Neurobiol Aging 2014; 35:2302-15. [PMID: 24742361 PMCID: PMC4099518 DOI: 10.1016/j.neurobiolaging.2014.03.016] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 02/27/2014] [Accepted: 03/14/2014] [Indexed: 01/25/2023]
Abstract
Progressive loss of substantia nigra dopamine neurons (SN DA) is a hallmark of aging and of Parkinson's disease (PD). Mutations in PARK genes cause familial PD forms. Increased expression of alpha-synuclein (PARK4) is a disease-triggering event in familial PD and also observed in SN DA neurons in sporadic PD but related transcriptional changes are unknown. With optimized single-cell quantitative real-time polymerase chain reaction analysis, we compared messenger RNA and microRNA levels in SN DA neurons from sporadic PD patients and controls. Non-optimally matched donor ages and RNA integrities are common problems when analyzing human samples. We dissected the influence of distinct ages and RNA integrities of our samples by applying a specifically-optimized, linear-mixed-effects model to quantitative real-time polymerase chain reaction-data. We identified that elevated alpha-synuclein messenger RNA levels in SN DA neurons of human PD brains were positively correlated with corresponding elevated levels of mRNAs for functional compensation of progressive SN DA loss and for enhanced proteasomal (PARK5/UCHL1) and lysosomal (PARK9/ATPase13A2) function, possibly counteracting alpha-synuclein toxicity. In contrast, microRNA miR-133b levels, previously implicated in transcriptional dysregulation in PD, were not altered in SN DA neurons in PD.
Collapse
Affiliation(s)
- Falk Schlaudraff
- Department of Applied Physiology, Institute of Applied Physiology, University of Ulm, Ulm, Germany
| | - Jan Gründemann
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - Michael Fauler
- Department of Applied Physiology, Institute of Applied Physiology, University of Ulm, Ulm, Germany
| | - Elena Dragicevic
- Department of Applied Physiology, Institute of Applied Physiology, University of Ulm, Ulm, Germany
| | - John Hardy
- Department of Molecular Neuroscience and Reta Lila Weston Laboratories, Institute of Neurology, London, UK
| | - Birgit Liss
- Department of Applied Physiology, Institute of Applied Physiology, University of Ulm, Ulm, Germany.
| |
Collapse
|
25
|
Negi S, Juyal G, Senapati S, Prasad P, Gupta A, Singh S, Kashyap S, Kumar A, Kumar U, Gupta R, Kaur S, Agrawal S, Aggarwal A, Ott J, Jain S, Juyal RC, Thelma BK. A genome-wide association study reveals ARL15, a novel non-HLA susceptibility gene for rheumatoid arthritis in North Indians. ACTA ACUST UNITED AC 2014; 65:3026-35. [PMID: 23918589 DOI: 10.1002/art.38110] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2013] [Accepted: 07/25/2013] [Indexed: 12/25/2022]
Abstract
OBJECTIVE Genome-wide association studies (GWAS) and their subsequent meta-analyses have changed the landscape of genetics in rheumatoid arthritis (RA) by uncovering several novel genes. Such studies are heavily weighted by samples from Caucasian populations, but they explain only a small proportion of total heritability. Our previous studies in genetically distinct North Indian RA cohorts have demonstrated apparent allelic/genetic heterogeneity between North Indian and Western populations, warranting GWAS in non-European populations. We undertook this study to detect additional disease-associated loci that may be collectively important in the presence or absence of genes with a major effect. METHODS High-quality genotypes for >600,000 single-nucleotide polymorphisms (SNPs) in 706 RA patients and 761 controls from North India were generated in the discovery stage. Twelve SNPs showing suggestive association (P < 5 × 10(-5)) were then tested in an independent cohort of 927 RA patients and 1,148 controls. Additional disease-associated loci were determined using support vector machine (SVM) analyses. Fine-mapping of novel loci was performed by using imputation. RESULTS In addition to the expected association of the HLA locus with RA, we identified association with a novel intronic SNP of ARL15 (rs255758) on chromosome 5 (Pcombined = 6.57 × 10(-6); odds ratio 1.42). Genotype-phenotype correlation by assaying adiponectin levels demonstrated the functional significance of this novel gene in disease pathogenesis. SVM analysis confirmed this association along with that of a few more replication stage genes. CONCLUSION In this first GWAS of RA among North Indians, ARL15 emerged as a novel genetic risk factor in addition to the classic HLA locus, which suggests that population-specific genetic loci as well as those shared between Asian and European populations contribute to RA etiology. Furthermore, our study reveals the potential of machine learning methods in unraveling gene-gene interactions using GWAS data.
Collapse
Affiliation(s)
- Sapna Negi
- National Institute of Immunology, New Delhi, India
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Büchel F, Saliger S, Dräger A, Hoffmann S, Wrzodek C, Zell A, Kahle PJ. Parkinson's disease: dopaminergic nerve cell model is consistent with experimental finding of increased extracellular transport of α-synuclein. BMC Neurosci 2013; 14:136. [PMID: 24195591 PMCID: PMC3871002 DOI: 10.1186/1471-2202-14-136] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Accepted: 10/28/2013] [Indexed: 12/13/2022] Open
Abstract
Background Parkinson’s disease is an age-related disease whose pathogenesis is not completely known. Animal models exist for investigating the disease but not all results can be easily transferred to humans. Therefore, mathematical or probabilistic models for the human disease are to be constructed in silico in order to predict specific processes within a cell, such as the dopamine metabolism and transport processes in a neuron. Results We present a Systems Biology Markup Language (SBML) model of a whole dopaminergic nerve cell consisting of 139 reactions and 111 metabolites which includes, among others, the dopamine metabolism and transport, oxidative stress, aggregation of α-synuclein (αSYN), lysosomal and proteasomal degradation, and mitophagy. The predictive power of the model was investigated using flux balance analysis for the identification of steady model states. To this end, we performed six experiments: (i) investigation of the normal cell behavior, (ii) increase of O2, (iii) increase of ATP, (iv) influence of neurotoxins, (v) increase of αSYN in the cell, and (vi) increase of dopamine synthesis. The SBML model is available in the BioModels database with identifier MODEL1302200000. Conclusion It is possible to simulate the normal behavior of an in vivo nerve cell with the developed model. We show that the model is sensitive for neurotoxins and oxidative stress. Further, an increased level of αSYN induces apoptosis and an increased flux of αSYN to the extracellular space was observed.
Collapse
Affiliation(s)
- Finja Büchel
- Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, 72076 Tübingen, Germany.
| | | | | | | | | | | | | |
Collapse
|
27
|
Niu B, Zhang Y, Ding J, Lu Y, Wang M, Lu W, Yuan X, Yin J. Predicting network of drug-enzyme interaction based on machine learning method. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:214-23. [PMID: 23907006 DOI: 10.1016/j.bbapap.2013.07.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Revised: 07/16/2013] [Accepted: 07/18/2013] [Indexed: 12/11/2022]
Abstract
It is important to correctly and efficiently map drugs and enzymes to their possible interaction network in modern drug research. In this work, a novel approach was introduced to encode drug and enzyme molecules with physicochemical molecular descriptors and pseudo amino acid composition, respectively. Based on this encoding method, Random Forest was adopted to build the drug-enzyme interaction network. After selecting the optimal features that are able to represent the main factors of drug-enzyme interaction in our prediction, a total of 129 features were attained which can be clustered into nine categories: Elemental Analysis, Geometry, Chemistry, Amino Acid Composition, Secondary Structure, Polarity, Molecular Volume, Codon Diversity and Electrostatic Charge. It is further found that Geometry features were the most important of all the features. As a result, our predicting model achieved an MCC of 0.915 and a sensitivity of 87.9% at the specificity level of 99.8% for 10-fold cross-validation test, and achieved an MCC of 0.895 and a sensitivity of 95.7% at the specificity level of 95.4% for independent set test. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.
Collapse
Affiliation(s)
- Bing Niu
- College of Life Science, Shanghai University, 99 Shang-Da Road, Shanghai 200072, China
| | | | | | | | | | | | | | | |
Collapse
|