1
|
Hu H, Yu L, Cheng Y, Xiong Y, Qi D, Li B, Zhang X, Zheng F. Identification and validation of oxidative stress-related diagnostic markers for recurrent pregnancy loss: insights from machine learning and molecular analysis. Mol Divers 2024:10.1007/s11030-024-10947-0. [PMID: 39225907 DOI: 10.1007/s11030-024-10947-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Accepted: 07/24/2024] [Indexed: 09/04/2024]
Abstract
It has been recognized that oxidative stress (OS) is implicated in the etiology of recurrent pregnancy loss (RPL), yet the biomarkers reflecting oxidative stress in association with RPL remain scarce. The dataset GSE165004 was retrieved from the Gene Expression Omnibus (GEO) database. From the GeneCards database, a compendium of 789 genes related to oxidative stress-related genes (OSRGs) was compiled. By intersecting differentially expressed genes (DEGs) in normal and RPL samples with OSRGs, differentially expressed OSRGs (DE-OSRGs) were identified. In addition, four machine learning algorithms were employed for the selection of diagnostic markers for RPL. The Receiver Operating Characteristic (ROC) curves for these genes were generated and a predictive nomogram for the diagnostic markers was established. The functions and pathways associated with the diagnostic markers were elucidated, and the correlations between immune cells and diagnostic markers were examined. Potential therapeutics targeting the diagnostic markers were proposed based on data from the Comparative Toxicogenomics Database and ClinicalTrials.gov. The candidate biomarker genes from the four models were further validated in RPL tissue samples using RT-PCR and immunohistochemistry. A set of 20 DE-OSRGs was identified, with 4 genes (KRAS, C2orf69, CYP17A1, and UCP3) being recognized by machine learning algorithms as diagnostic markers exhibiting robust diagnostic capabilities. The nomogram constructed demonstrated favorable predictive accuracy. Pathways including ribosome, peroxisome, Parkinson's disease, oxidative phosphorylation, Huntington's disease, and Alzheimer's disease were co-enriched by KRAS, C2orf69, and CYP17A1. Cell chemotaxis terms were commonly enriched by all four diagnostic markers. Significant differences in the abundance of five cell types, namely eosinophils, monocytes, natural killer cells, regulatory T cells, and T follicular helper cells, were observed between normal and RPL samples. A total of 180 drugs were predicted to target the diagnostic markers, including C544151, D014635, and CYP17A1. In the validation cohort of RPL patients, the LASSO model demonstrated superiority over other models. The expression levels of KRAS, C2orf69, and CYP17A1 were significantly reduced in RPL, while UCP3 levels were elevated, indicating their suitability as molecular markers for RPL. Four oxidative stress-related diagnostic markers (KRAS, C2orf69, CYP17A1, and UCP3) have been proposed to diagnose and potentially treat RPL.
Collapse
Affiliation(s)
- Hui Hu
- Department of Laboratory Medicine, Shanghai East Hospital, Tongji University School of Medicine, 800 Yuntai Road, Pudong New District, Shanghai, 200123, China
- Center for Gene Diagnosis and Department of Clinical Laboratory Medicine, Zhongnan Hospital of Wuhan University, Donghu Road 169, Wuhan, 430071, China
| | - Li Yu
- Center for Gene Diagnosis and Department of Clinical Laboratory Medicine, Zhongnan Hospital of Wuhan University, Donghu Road 169, Wuhan, 430071, China
- Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430015, China
| | - Yating Cheng
- Center for Gene Diagnosis and Department of Clinical Laboratory Medicine, Zhongnan Hospital of Wuhan University, Donghu Road 169, Wuhan, 430071, China
- Department of Clinical Laboratory, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430014, China
| | - Yao Xiong
- Reproductive Center, Zhongshan Hospital of Wuhan University, Wuhan, 430060, China
| | - Daoxi Qi
- Center for Gene Diagnosis and Department of Clinical Laboratory Medicine, Zhongnan Hospital of Wuhan University, Donghu Road 169, Wuhan, 430071, China
| | - Boyu Li
- Center for Gene Diagnosis and Department of Clinical Laboratory Medicine, Zhongnan Hospital of Wuhan University, Donghu Road 169, Wuhan, 430071, China
| | - Xiaokang Zhang
- Center for Gene Diagnosis and Department of Clinical Laboratory Medicine, Zhongnan Hospital of Wuhan University, Donghu Road 169, Wuhan, 430071, China
| | - Fang Zheng
- Center for Gene Diagnosis and Department of Clinical Laboratory Medicine, Zhongnan Hospital of Wuhan University, Donghu Road 169, Wuhan, 430071, China.
| |
Collapse
|
2
|
Song JJ, Chobrutskiy A, Chobrutskiy BI, Cios KJ, Huda TI, Eakins RA, Diaz MJ, Blanck G. Chemical Complementarity of Tumor Resident, Adaptive Immune Receptor CDR3s and Previously Defined Hepatitis C Virus Epitopes Correlates with Improved Outcomes in Hepatocellular Carcinoma. Viral Immunol 2023; 36:669-677. [PMID: 38052065 DOI: 10.1089/vim.2023.0078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023] Open
Abstract
To better understand how adaptive immune receptors (IRs) in hepatocellular carcinoma (HCC) microenvironments are related to disease outcomes, we employed a chemical complementarity scoring algorithm to quantify electrostatic complementarity between HCC tumor TRB or IGH complementarity-determining region 3 (CDR3) amino acid (AA) sequences and previously characterized hepatitis C virus (HCV) epitopes. High electrostatic complementarity between HCC-resident CDR3s and 12 HCV epitopes was associated with greater survival probabilities, as indicated by two distinct HCC IR CDR3 datasets. Two of the HCV epitopes, HCV*71871 (TRB) and HCV*13458 (IGH), were also determined to represent significantly larger electrostatic CDR3-HCV epitope complementarity in HCV-positive HCC cases, compared with HCV-negative HCC cases, with the CDR3s representing yet a third, independent HCC dataset. Overall, the results indicated the utility of CDR3 AA sequences as biomarkers for HCC patient stratification and as potential guides for the development of therapeutic reagents.
Collapse
Affiliation(s)
- Joanna J Song
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Andrea Chobrutskiy
- Department of Pediatrics, Oregon Health and Science University Hospital, Portland, Oregon, USA
| | - Boris I Chobrutskiy
- Department of Internal Medicine, Oregon Health and Science University Hospital, Portland, Oregon, USA
| | - Konrad J Cios
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Taha I Huda
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Rachel A Eakins
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Michael J Diaz
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - George Blanck
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
- Department of Immunology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida, USA
| |
Collapse
|
3
|
Alzoubi H, Alzubi R, Ramzan N. Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094439. [PMID: 37177642 PMCID: PMC10181706 DOI: 10.3390/s23094439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/05/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Genome-wide association studies have proven their ability to improve human health outcomes by identifying genotypes associated with phenotypes. Various works have attempted to predict the risk of diseases for individuals based on genotype data. This prediction can either be considered as an analysis model that can lead to a better understanding of gene functions that underlie human disease or as a black box in order to be used in decision support systems and in early disease detection. Deep learning techniques have gained more popularity recently. In this work, we propose a deep-learning framework for disease risk prediction. The proposed framework employs a multilayer perceptron (MLP) in order to predict individuals' disease status. The proposed framework was applied to the Wellcome Trust Case-Control Consortium (WTCCC), the UK National Blood Service (NBS) Control Group, and the 1958 British Birth Cohort (58C) datasets. The performance comparison of the proposed framework showed that the proposed approach outperformed the other methods in predicting disease risk, achieving an area under the curve (AUC) up to 0.94.
Collapse
Affiliation(s)
- Hadeel Alzoubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Raid Alzubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High Street, Paisley PA1 2BE, UK
| |
Collapse
|
4
|
Biscontin A, Zarantonello L, Russo A, Costa R, Montagnese S. Toward a Molecular Approach to Chronotype Assessment. J Biol Rhythms 2022; 37:272-282. [PMID: 35583112 DOI: 10.1177/07487304221099365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The aim of the present study was to develop a Polygenic Score-based model for molecular chronotype assessment. Questionnaire-based phenotypical chronotype assessment was used as a reference. In total, 54 extremely morning/morning (MM/M; 35 females, 39.7 ± 3.8 years) and 44 extremely evening/evening (EE/E; 20 females, 27.3 ± 7.7 years) individuals donated a buccal DNA sample for genotyping by sequencing of the entire genetic variability of 19 target genes known to be involved in circadian rhythmicity and/or sleep duration. Targeted genotyping was performed using the single primer enrichment technology and a specifically designed panel of 5526 primers. Among 2868 high-quality polymorphisms, a cross-validation approach lead to the identification of 83 chronotype predictive variants, including previously known and also novel chronotype-associated polymorphisms. A large (35 single-nucleotide polymorphisms [SNPs]) and also a small (13 SNPs) panel were obtained, both with an estimated predictive validity of approximately 80%. Potential mechanistic hypotheses for the role of some of the newly identified variants in modulating chronotype are formulated. Once validated in independent populations encompassing the whole range of chronotypes, the identified panels might become useful within the setting of both circadian public health initiatives and precision medicine.
Collapse
Affiliation(s)
| | | | - Antonella Russo
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Rodolfo Costa
- Department of Biology, University of Padova, Padova, Italy.,Institute of Neuroscience, National Research Council, Padova, Italy.,Chronobiology Section, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
| | | |
Collapse
|
5
|
Isik YE, Gormez Y, Aydin Z, Bakir-Gungor B. The Determination of Distinctive Single Nucleotide Polymorphism Sets for the Diagnosis of Behçet's Disease. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1909-1918. [PMID: 33476272 DOI: 10.1109/tcbb.2021.3053429] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Behçet's Disease (BD) is a multi-system inflammatory disorder in which the etiology remains unclear. The most probable hypothesis is that genetic tendency and environmental factors play roles in the development of BD. In order to find the essential reasons, genetic changes on thousands of genes should be analyzed. Besides, there is a need for extra analysis to find out which genetic factor affects the disease. Machine learning approaches have high potential for extracting the knowledge from genomics and selecting the representative Single Nucleotide Polymorphisms (SNPs) as the most effective features for the clinical diagnosis process. In this study, we have attempted to identify representative SNPs using feature selection methods, incorporating biological information and aimed to develop a machine-learning model for diagnosing Behçet's disease. By combining biological information and machine learning classifiers, up to 99.64 percent accuracy of disease prediction is achieved using only 13,611 out of 311,459 SNPs. In addition, we revealed the SNPs that are most distinctive by performing repeated feature selection in cross-validation experiments.
Collapse
|
6
|
Cheng CF, Lin YJ, Lin MC, Liang WM, Chen CC, Chen CH, Wu JY, Lin TH, Liao CC, Huang SM, Hsieh AR, Tsai FJ. Genetic risk score constructed from common genetic variants is associated with cardiovascular disease risk in type 2 diabetes mellitus. J Gene Med 2020; 23:e3305. [PMID: 33350037 DOI: 10.1002/jgm.3305] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 10/21/2020] [Accepted: 12/08/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Patients with type 2 diabetes mellitus (T2DM) experience a two-fold increased risk of cardiovascular diseases. Genome-wide association studies (GWAS) have identified T2DM susceptibility genetic variants. Interestingly, the genetic variants associated with cardiovascular disease risk in T2DM Han Chinese remain to be elucidated. The present study aimed to investigate the genetic variants associated with cardiovascular disease risk in T2DM. METHODS We performed bootstrapping, GWAS and an investigation of genetic variants associated with cardiovascular disease risk in a discovery T2DM cohort and in a replication cohort. The discovery cohort included 326 cardiovascular disease patients and 1209 noncardiovascular disease patients. The replication cohort included 68 cardiovascular disease patients and 317 noncardiovascular disease patients. The main outcome measures were genetic variants for genetic risk score (GRS) in cardiovascular disease risk in T2DM. RESULTS In total, 35 genetic variants were associated with cardiovascular disease risk. A GRS was generated by combining risk alleles from these variants weighted by their estimated effect sizes (log odds ratio [OR]). T2DM patients with weighted GRS ≥ 12.63 had an approximately 15-fold increase in cardiovascular disease risk (odds ratio = 15.67, 95% confidence interval [CI] = 10.33-24.00) compared to patients with weighted GRS < 10.39. With the addition of weighted GRS, receiver-operating characteristic curves showed that area under the curve with conventional risk factors was improved from 0.719 (95% CI = 0.689-0.750) to 0.888 (95% CI = 0.866-0.910). CONCLUSIONS These 35 genetic variants are associated with cardiovascular disease risk in T2DM, alone and cumulatively. T2DM patients with higher levels of weighted genetic risk score have higher cardiovascular disease risks.
Collapse
Affiliation(s)
- Chi-Fung Cheng
- Graduate Institute of Biostatistics, School of Public Health, China Medical University, Taichung, Taiwan.,Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Ying-Ju Lin
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan.,School of Chinese Medicine, China Medical University, Taichung, Taiwan
| | - Mei-Chen Lin
- Graduate Institute of Biostatistics, School of Public Health, China Medical University, Taichung, Taiwan
| | - Wen-Miin Liang
- Graduate Institute of Biostatistics, School of Public Health, China Medical University, Taichung, Taiwan
| | - Ching-Chu Chen
- Division of Endocrinology and Metabolism, Department of Medicine, China Medical University Hospital, Taichung, Taiwan
| | - Chien-Hsiun Chen
- School of Chinese Medicine, China Medical University, Taichung, Taiwan.,Institute of Biomedical Sciences, Taipei, Taiwan
| | - Jer-Yuarn Wu
- School of Chinese Medicine, China Medical University, Taichung, Taiwan.,Institute of Biomedical Sciences, Taipei, Taiwan
| | - Ting-Hsu Lin
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Chiu-Chu Liao
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Shao-Mei Huang
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Ai-Ru Hsieh
- Department of Statistics, Tamkang University, New Taipei, Taiwan
| | - Fuu-Jen Tsai
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan.,School of Chinese Medicine, China Medical University, Taichung, Taiwan.,Department of Biotechnology and Bioinformatics, Asia University, Taichung, Taiwan
| |
Collapse
|
7
|
Lin YJ, Cheng CF, Wang CH, Liang WM, Tang CH, Tsai LP, Chen CH, Wu JY, Hsieh AR, Lee MTM, Lin TH, Liao CC, Huang SM, Zhang Y, Tsai CH, Tsai FJ. Genetic Architecture Associated With Familial Short Stature. J Clin Endocrinol Metab 2020; 105:5805154. [PMID: 32170311 DOI: 10.1210/clinem/dgaa131] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 03/10/2020] [Indexed: 12/21/2022]
Abstract
CONTEXT Human height is an inheritable, polygenic trait under complex and multilocus genetic regulation. Familial short stature (FSS; also called genetic short stature) is the most common type of short stature and is insufficiently known. OBJECTIVE To investigate the FSS genetic profile and develop a polygenic risk predisposition score for FSS risk prediction. DESIGN AND SETTING The FSS participant group of Han Chinese ancestry was diagnosed by pediatric endocrinologists in Taiwan. PATIENTS AND INTERVENTIONS The genetic profiles of 1163 participants with FSS were identified by using a bootstrapping subsampling and genome-wide association studies (GWAS) method. MAIN OUTCOME MEASURES Genetic profile, polygenic risk predisposition score for risk prediction. RESULTS Ten novel genetic single nucleotide polymorphisms (SNPs) and 9 reported GWAS human height-related SNPs were identified for FSS risk. These 10 novel SNPs served as a polygenic risk predisposition score for FSS risk prediction (area under the curve: 0.940 in the testing group). This FSS polygenic risk predisposition score was also associated with the height reduction regression tendency in the general population. CONCLUSION A polygenic risk predisposition score composed of 10 genetic SNPs is useful for FSS risk prediction and the height reduction tendency. Thus, it might contribute to FSS risk in the Han Chinese population from Taiwan.
Collapse
Affiliation(s)
- Ying-Ju Lin
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
- School of Chinese Medicine, China Medical University, Taichung, Taiwan
| | - Chi-Fung Cheng
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
- Department of Health Services Administration, China Medical University, Taichung, Taiwan
| | - Chung-Hsing Wang
- Children's Hospital of China Medical University, Taichung, Taiwan
| | - Wen-Miin Liang
- Department of Health Services Administration, China Medical University, Taichung, Taiwan
| | - Chih-Hsin Tang
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung, Taiwan
| | - Li-Ping Tsai
- Department of Pediatrics, Taipei Tzu Chi Hospital, New Taipei City, Taiwan
| | - Chien-Hsiun Chen
- School of Chinese Medicine, China Medical University, Taichung, Taiwan
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Jer-Yuarn Wu
- School of Chinese Medicine, China Medical University, Taichung, Taiwan
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ai-Ru Hsieh
- Department of Statistics, Tamkang University, New Taipei City, Taiwan
| | | | - Ting-Hsu Lin
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Chiu-Chu Liao
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Shao-Mei Huang
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Yanfei Zhang
- Genomic Medicine Institute, Geisinger, Danville, Pennsylvania, USA
| | - Chang-Hai Tsai
- Department of Biotechnology and Bioinformatics, Asia University, Taichung, Taiwan
| | - Fuu-Jen Tsai
- Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
- School of Chinese Medicine, China Medical University, Taichung, Taiwan
- Children's Hospital of China Medical University, Taichung, Taiwan
- Department of Biotechnology and Bioinformatics, Asia University, Taichung, Taiwan
| |
Collapse
|
8
|
Hsieh AR, Huang YC, Yang YF, Lin HJ, Lin JM, Chang YW, Wu CM, Liao WL, Tsai FJ. Lack of association of genetic variants for diabetic retinopathy in Taiwanese patients with diabetic nephropathy. BMJ Open Diabetes Res Care 2020; 8:8/1/e000727. [PMID: 31958309 PMCID: PMC7039583 DOI: 10.1136/bmjdrc-2019-000727] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 12/11/2019] [Accepted: 01/04/2020] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE Diabetic nephropathy (DN) and diabetic retinopathy (DR) comprise major microvascular complications of diabetes that occur with a high concordance rate in patients and are considered to potentially share pathogeneses. In this case-control study, we sought to investigate whether DR-related single nucleotide polymorphisms (SNPs) exert pleiotropic effects on renal function outcomes among patients with diabetes. RESEARCH DESIGN AND METHODS A total of 33 DR-related SNPs were identified by replicating published SNPs and via a genome-wide association study. Furthermore, we assessed the cumulative effects by creating a weighted genetic risk score and evaluated the discriminatory and prediction ability of these genetic variants using DN cases according to estimated glomerular filtration rate (eGFR) status along with a cohort with early renal functional decline (ERFD). RESULTS Multivariate logistic regression models revealed that the DR-related SNPs afforded no individual or cumulative genetic effect on the nephropathy risk, eGFR status or ERFD outcome among patients with type two diabetes in Taiwan. CONCLUSION Our findings indicate that larger studies would be necessary to clearly ascertain the effects of individual genetic variants and further investigation is also required to identify other genetic pathways underlying DN.
Collapse
Affiliation(s)
- Ai-Ru Hsieh
- Department of Statistics, Tamkang University, Taipei, Taiwan
| | - Yu-Chuen Huang
- School of Chinese Medicine, China Medical University, Taichung, Taiwan
- Human Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Ya-Fei Yang
- Kidney Institute and Division of Nephrology, Department of Internal Medicine, China Medical University Hospital, Taichung, Taiwan
| | - Hui-Ju Lin
- School of Chinese Medicine, China Medical University, Taichung, Taiwan
- Department of Ophthalmology, China Medical University Hospital, Taichung, Taiwan
| | - Jane-Ming Lin
- Department of Ophthalmology, China Medical University Hospital, Taichung, Taiwan
| | - Ya-Wen Chang
- Human Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Chia-Ming Wu
- Human Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
| | - Wen-Ling Liao
- Graduate Institute of Integrated Medicine, China Medical University, Taichung, Taiwan
- Center for Personalized Medicine, China Medical University Hospital, Taichung, Taiwan
| | - Fuu-Jen Tsai
- School of Chinese Medicine, China Medical University, Taichung, Taiwan
- Human Genetic Center, Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
- Department of Health and Nutrition Biotechnology, Asia University, Taichung, Taiwan
| |
Collapse
|
9
|
Karim MN, Reid CM, Tran L, Cochrane A, Billah B. Variable selection methods for multiple regressions influence the parsimony of risk prediction models for cardiac surgery. J Thorac Cardiovasc Surg 2017; 153:1128-1135.e3. [DOI: 10.1016/j.jtcvs.2016.11.028] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 10/24/2016] [Accepted: 11/14/2016] [Indexed: 11/24/2022]
|
10
|
Oh JH, Kerns S, Ostrer H, Powell SN, Rosenstein B, Deasy JO. Computational methods using genome-wide association studies to predict radiotherapy complications and to identify correlative molecular processes. Sci Rep 2017; 7:43381. [PMID: 28233873 PMCID: PMC5324069 DOI: 10.1038/srep43381] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 01/23/2017] [Indexed: 12/25/2022] Open
Abstract
The biological cause of clinically observed variability of normal tissue damage following radiotherapy is poorly understood. We hypothesized that machine/statistical learning methods using single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS) would identify groups of patients of differing complication risk, and furthermore could be used to identify key biological sources of variability. We developed a novel learning algorithm, called pre-conditioned random forest regression (PRFR), to construct polygenic risk models using hundreds of SNPs, thereby capturing genomic features that confer small differential risk. Predictive models were trained and validated on a cohort of 368 prostate cancer patients for two post-radiotherapy clinical endpoints: late rectal bleeding and erectile dysfunction. The proposed method results in better predictive performance compared with existing computational methods. Gene ontology enrichment analysis and protein-protein interaction network analysis are used to identify key biological processes and proteins that were plausible based on other published studies. In conclusion, we confirm that novel machine learning methods can produce large predictive models (hundreds of SNPs), yielding clinically useful risk stratification models, as well as identifying important underlying biological processes in the radiation damage and tissue repair process. The methods are generally applicable to GWAS data and are not specific to radiotherapy endpoints.
Collapse
Affiliation(s)
- Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Sarah Kerns
- Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY 14620, USA
| | - Harry Ostrer
- Department of Pathology, Albert Einstein College of Medicine, New York, NY 10461, USA
| | - Simon N Powell
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Barry Rosenstein
- Department of Radiation Oncology, Mount Sinai School of Medicine, New York, NY 10029, USA
| | - Joseph O Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| |
Collapse
|
11
|
Computational Biosensors: Molecules, Algorithms, and Detection Platforms. MODELING, METHODOLOGIES AND TOOLS FOR MOLECULAR AND NANO-SCALE COMMUNICATIONS 2017. [PMCID: PMC7123247 DOI: 10.1007/978-3-319-50688-3_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Advanced nucleic acid-based sensor-applications require computationally intelligent biosensors that are able to concurrently perform complex detection and classification of samples within an in vitro platform. Realization of these cutting-edge computational biosensor systems necessitates innovation and integration of three key technologies: molecular probes with computational capabilities, algorithmic methods to enable in vitro computational post processing and classification, and immobilization and detection approaches that enable the realization of deployable computational biosensor platforms. We provide an overview of current technologies, including our contributions towards the development of computational biosensor systems.
Collapse
|
12
|
Mieth B, Kloft M, Rodríguez JA, Sonnenburg S, Vobruba R, Morcillo-Suárez C, Farré X, Marigorta UM, Fehr E, Dickhaus T, Blanchard G, Schunk D, Navarro A, Müller KR. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies. Sci Rep 2016; 6:36671. [PMID: 27892471 PMCID: PMC5125008 DOI: 10.1038/srep36671] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 10/06/2016] [Indexed: 12/21/2022] Open
Abstract
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
Collapse
Affiliation(s)
- Bettina Mieth
- Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany
| | - Marius Kloft
- Department of Computer Science, Humboldt University of Berlin, Berlin, 10099, Germany
| | - Juan Antonio Rodríguez
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | | | - Robin Vobruba
- Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany
| | - Carlos Morcillo-Suárez
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | - Xavier Farré
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | - Urko M. Marigorta
- School of Biology, Georgia Institute of Technology, Atlanta, 30332, GA, USA
| | - Ernst Fehr
- Department of Economics, Laboratory for Social and Neural Systems Research, University of Zurich, Zurich, 8006, Switzerland
| | - Thorsten Dickhaus
- Institute for Statistics (FB 3), University of Bremen, Bremen, 28359, Germany
| | - Gilles Blanchard
- Department of Mathematics, University of Potsdam, Potsdam, 14476, Germany
| | - Daniel Schunk
- Department of Economics, University of Mainz, Mainz, 55099, Germany
| | - Arcadi Navarro
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, 08010, Spain
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, 08003, Spain
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany
- Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea
| |
Collapse
|
13
|
Roqueiro D, Witteveen MJ, Anttila V, Terwindt GM, van den Maagdenberg AMJM, Borgwardt K. In silico phenotyping via co-training for improved phenotype prediction from genotype. Bioinformatics 2015; 31:i303-10. [PMID: 26072497 PMCID: PMC4765855 DOI: 10.1093/bioinformatics/btv254] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction. Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium. Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction. Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/co-training.html Contact:karsten.borgwardt@bsse.ethz.ch or menno.witteveen@bsse.ethz.ch Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Damian Roqueiro
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Neurology and Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Menno J Witteveen
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Neurology and Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Verneri Anttila
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Neurology and Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Neurology and Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Neurology and Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Gisela M Terwindt
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Neurology and Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Arn M J M van den Maagdenberg
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Neurology and Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Neurology and Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Karsten Borgwardt
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Neurology and Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
14
|
Petralia F, Wang P, Yang J, Tu Z. Integrative random forest for gene regulatory network inference. Bioinformatics 2015; 31:i197-205. [PMID: 26072483 PMCID: PMC4542785 DOI: 10.1093/bioinformatics/btv268] [Citation(s) in RCA: 100] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Motivation: Gene regulatory network (GRN) inference based on genomic data is one of the most actively pursued computational biological problems. Because different types of biological data usually provide complementary information regarding the underlying GRN, a model that integrates big data of diverse types is expected to increase both the power and accuracy of GRN inference. Towards this goal, we propose a novel algorithm named iRafNet: integrative random forest for gene regulatory network inference. Results: iRafNet is a flexible, unified integrative framework that allows information from heterogeneous data, such as protein–protein interactions, transcription factor (TF)-DNA-binding, gene knock-down, to be jointly considered for GRN inference. Using test data from the DREAM4 and DREAM5 challenges, we demonstrate that iRafNet outperforms the original random forest based network inference algorithm (GENIE3), and is highly comparable to the community learning approach. We apply iRafNet to construct GRN in Saccharomyces cerevisiae and demonstrate that it improves the performance in predicting TF-target gene regulations and provides additional functional insights to the predicted gene regulations. Availability and implementation: The R code of iRafNet implementation and a tutorial are available at: http://research.mssm.edu/tulab/software/irafnet.html Contact:zhidong.tu@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francesca Petralia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jialiang Yang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zhidong Tu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
15
|
The Prediction of Radiotherapy Toxicity Using Single Nucleotide Polymorphism-Based Models: A Step Toward Prevention. Semin Radiat Oncol 2015; 25:281-91. [PMID: 26384276 DOI: 10.1016/j.semradonc.2015.05.006] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Radiotherapy is a mainstay of cancer treatment, used in either a curative or palliative manner to treat approximately 50% of patients with cancer. Normal tissue toxicity limits the doses used in standard radiation therapy protocols and impedes improvements in radiotherapy efficacy. Damage to surrounding normal tissues can produce reactions ranging from bothersome symptoms that negatively affect quality of life to severe life-threatening complications. Improved ways of predicting, before treatment, the risk for development of normal tissue toxicity may allow for more personalized treatment and reduce the incidence and severity of late effects. There is increasing recognition that the cause of normal tissue toxicity is multifactorial and includes genetic factors in addition to radiation dose and volume of exposure, underlying comorbidities, age, concomitant chemotherapy or hormonal therapy, and use of other medications. An understanding of the specific genetic risk factors for normal tissue response to radiation has the potential to enhance our ability to predict adverse outcomes at the treatment-planning stage. Therefore, the field of radiogenomics has focused upon the identification of genetic variants associated with normal tissue toxicity resulting from radiotherapy. Innovative analytic methods are being applied to the discovery of risk variants and development of integrative predictive models that build on traditional normal tissue complication probability models by incorporating genetic information. Results from initial studies provide promising evidence that genetic-based risk models could play an important role in the implementation of precision medicine for radiation oncology through enhancing the ability to predict normal tissue reactions and thereby improve cancer treatment.
Collapse
|
16
|
Wang W, Zhou X, Liu Z, Sun F. Network tuned multiple rank aggregation and applications to gene ranking. BMC Bioinformatics 2015; 16 Suppl 1:S6. [PMID: 25708095 PMCID: PMC4331705 DOI: 10.1186/1471-2105-16-s1-s6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
With the development of various high throughput technologies and analysis methods, researchers can study different aspects of a biological phenomenon simultaneously or one aspect repeatedly with different experimental techniques and analysis methods. The output from each study is a rank list of components of interest. Aggregation of the rank lists of components, such as proteins, genes and single nucleotide variants (SNV), produced by these experiments has been proven to be helpful in both filtering the noise and bringing forth a more complete understanding of the biological problems. Current available rank aggregation methods do not consider the network information that has been observed to provide vital contributions in many data integration studies. We developed network tuned rank aggregation methods incorporating network information and demonstrated its superior performance over aggregation methods without network information. The methods are tested on predicting the Gene Ontology function of yeast proteins. We validate the methods using combinations of three gene expression data sets and three protein interaction networks as well as an integrated network by combining the three networks. Results show that the aggregated rank lists are more meaningful if protein interaction network is incorporated. Among the methods compared, CGI_RRA and CGI_Endeavour, which integrate rank lists with networks using CGI [1] followed by rank aggregation using either robust rank aggregation (RRA) [2] or Endeavour [3] perform the best. Finally, we use the methods to locate target genes of transcription factors.
Collapse
|
17
|
Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet 2014; 10:e1004754. [PMID: 25393026 PMCID: PMC4230844 DOI: 10.1371/journal.pgen.1004754] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Affiliation(s)
- Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Samuli Ripatti
- Hjelt Institute, University of Helsinki, Helsinki, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Tero Aittokallio
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- * E-mail:
| |
Collapse
|