Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ban HJ, Heo JY, Oh KS, Park KJ. Identification of type 2 diabetes-associated combination of SNPs using support vector machine. BMC Genet 2010;11:26. [PMID: 20416077 PMCID: PMC2875201 DOI: 10.1186/1471-2156-11-26] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 04/23/2010] [Indexed: 12/25/2022] Open

For:	Ban HJ, Heo JY, Oh KS, Park KJ. Identification of type 2 diabetes-associated combination of SNPs using support vector machine. BMC Genet 2010;11:26. [PMID: 20416077 PMCID: PMC2875201 DOI: 10.1186/1471-2156-11-26] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 04/23/2010] [Indexed: 12/25/2022] Open

Number

Cited by Other Article(s)

Agho CA, Śliwka J, Nassar H, Niinemets Ü, Runno-Paurson E. Machine Learning-Based Identification of Mating Type and Metalaxyl Response in Phytophthora infestans Using SSR Markers. Microorganisms 2024;12:982. [PMID: 38792811 PMCID: PMC11124124 DOI: 10.3390/microorganisms12050982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/06/2024] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open

Abstract

Phytophthora infestans is the causal agent of late blight in potato. The occurrence of P. infestans with both A1 and A2 mating types in the field may result in sexual reproduction and the generation of recombinant strains. Such strains with new combinations of traits can be highly aggressive, resistant to fungicides, and can make the disease difficult to control in the field. Metalaxyl-resistant isolates are now more prevalent in potato fields. Understanding the genetic structure and rapid identification of mating types and metalaxyl response of P. infestans in the field is a prerequisite for effective late blight disease monitoring and management. Molecular and phenotypic assays involving molecular and phenotypic markers such as mating types and metalaxyl response are typically conducted separately in the studies of the genotypic and phenotypic diversity of P. infestans. As a result, there is a pressing need to reduce the experimental workload and more efficiently assess the aggressiveness of different strains. We think that employing genetic markers to not only estimate genotypic diversity but also to identify the mating type and fungicide response using machine learning techniques can guide and speed up the decision-making process in late blight disease management, especially when the mating type and metalaxyl resistance data are not available. This technique can also be applied to determine these phenotypic traits for dead isolates. In this study, over 600 P. infestans isolates from different populations-Estonia, Pskov region, and Poland-were classified for mating types and metalaxyl response using machine learning techniques based on simple sequence repeat (SSR) markers. For both traits, random forest and the support vector machine demonstrated good accuracy of over 70%, compared to the decision tree and artificial neural network models whose accuracy was lower. There were also associations (p < 0.05) between the traits and some of the alleles detected, but machine learning prediction techniques based on multilocus SSR genotypes offered better prediction accuracy.

Collapse

Kulshreshtha A, Bhatnagar S. Structural effect of the H992D/H418D mutation of angiotensin-converting enzyme in the Indian population: implications for health and disease. J Biomol Struct Dyn 2024:1-18. [PMID: 38411559 DOI: 10.1080/07391102.2024.2321246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 02/14/2024] [Indexed: 02/28/2024]

Koshko L, Scofield S, Debarba L, Stilgenbauer L, Fakhoury P, Jayarathne H, Perez-Mojica JE, Griggs E, Lempradl A, Sadagurski M. Prenatal benzene exposure in mice alters offspring hypothalamic development predisposing to metabolic disease in later life. CHEMOSPHERE 2023;330:138738. [PMID: 37084897 PMCID: PMC10199724 DOI: 10.1016/j.chemosphere.2023.138738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 04/10/2023] [Accepted: 04/18/2023] [Indexed: 05/03/2023]

Abstract

Maternal exposure to environmental contaminants during pregnancy poses a significant threat to a developing fetus, as these substances can easily cross the placenta and disrupt the neurodevelopment of offspring. Specifically, the hypothalamus is essential in the regulation of metabolism, notably during critical windows of development. An abnormal hormonal and inflammatory milieu during development can trigger persistent changes in the function of hypothalamic circuits, leading to long-lasting effects on the body's energy homeostasis and metabolism. We recently demonstrated that gestational exposure to clinically relevant levels of benzene induces severe metabolic dysregulation in the offspring. Given the central role of the hypothalamus in metabolic control, we hypothesized that prenatal exposure to benzene impacts hypothalamic development, contributing to the adverse metabolic effects in the offspring. C57BL/6JB dams were exposed to benzene at 50 ppm in the inhalation chambers exclusively during pregnancy (from E0.5 to E19). Transcriptomic analysis of the exposed offspring at postnatal day 21 (P21) revealed hypothalamic changes in genes related to metabolic regulation, inflammation, and neurodevelopment exclusively in males. Moreover, the hypothalamus of prenatally benzene-exposed male offspring displayed alterations in orexigenic and anorexigenic projections, impairments in leptin signaling, and increased microgliosis. Additional exposure to benzene during lactation did not promote further microgliosis or astrogliosis in the offspring, while the high-fat diet (HFD) challenge in adulthood exacerbated glucose metabolism and hypothalamic inflammation in benzene-exposed offspring of both sexes. These findings reveal the persistent adverse effects of prenatal benzene exposure on hypothalamic circuits and neuroinflammation, predisposing the offspring to long-lasting metabolic health conditions.

Collapse

Zhang Y, Zhang X, Li F, Lin C, Zhang D, Duan B, Zhao Y, Li X, Xu D, Cheng J, Zhao L, Wang J, Wang W. Expression profiles of the CD274 and PLEKHH2 gene and association of its polymorphism with hematologic parameters in sheep. Vet Immunol Immunopathol 2023;259:110597. [PMID: 37094535 DOI: 10.1016/j.vetimm.2023.110597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 04/10/2023] [Accepted: 04/20/2023] [Indexed: 04/26/2023]

Affiliation(s)

Yukun Zhang College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
Xiaoxue Zhang College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
Fadi Li College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
Changchun Lin College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
Deyin Zhang College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
Benzhen Duan Department of Medical Microbiology and Parasitology, School of Basic Medical Sciences, Fudan University, Shanghai 200433, China; Key Laboratory of Medical Molecular Virology, MOE & NHC, School of Basic Medical Sciences, Fudan University, Shanghai 200433, China
Yuan Zhao College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
Xiaolong Li College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
Dan Xu College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
Jiangbo Cheng College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
Liming Zhao College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
Jianghui Wang College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
Weimin Wang College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China.

Collapse

Koshko L, Scofield S, Debarba L, Stilgenbauer L, Sacla M, Fakhoury P, Jayarathne H, Perez-Mojica JE, Griggs E, Lempradl A, Sadagurski M. Prenatal benzene exposure alters offspring hypothalamic development predisposing to metabolic disease in later life. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522910. [PMID: 36711607 PMCID: PMC9881982 DOI: 10.1101/2023.01.05.522910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH, Luban J, Montgomery SB, Finucane HK, Novina CD, Tewhey R, Sabeti PC. Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution. Cell 2021;184:5247-5260.e19. [PMID: 34534445 PMCID: PMC8487971 DOI: 10.1016/j.cell.2021.08.025] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 05/25/2021] [Accepted: 08/19/2021] [Indexed: 12/11/2022]

Affiliation(s)

Dustin Griesemer Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
James R Xue Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA.
Steven K Reilly Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA
Jacob C Ulirsch Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
Kalki Kukreja Department of Molecular and Cell Biology, Harvard University, Cambridge, MA 02138, USA
Joe R Davis BigHat Biosciences, San Carlos, CA 94070, USA
Masahiro Kanai Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
David K Yang Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA
John C Butts The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA
Mehmet H Guney Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
Jeremy Luban Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA; Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
Stephen B Montgomery Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
Hilary K Finucane Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
Carl D Novina Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
Ryan Tewhey The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA; Tufts University School of Medicine, Boston, MA 02111, USA
Pardis C Sabeti Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA

Collapse

Gupta D, Choudhury A, Gupta U, Singh P, Prasad M. Computational approach to clinical diagnosis of diabetes disease: a comparative study. MULTIMEDIA TOOLS AND APPLICATIONS 2021;80:30091-30116. [DOI: 10.1007/s11042-020-10242-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 10/14/2020] [Accepted: 12/09/2020] [Indexed: 08/30/2023]

Muneeb M, Henschel A. Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods. BMC Bioinformatics 2021;22:198. [PMID: 33874881 PMCID: PMC8056510 DOI: 10.1186/s12859-021-04077-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/03/2021] [Indexed: 01/08/2023] Open

Abstract

Background

Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning.

Results

The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%.

Conclusion

Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.

Collapse

Yu F, He B, Chen L, Wang F, Zhu H, Dong Y, Pan S. Intermuscular Fat Content in Young Chinese Men With Newly Diagnosed Type 2 Diabetes: Based on MR mDIXON-Quant Quantitative Technique. Front Endocrinol (Lausanne) 2021;12:536018. [PMID: 33868161 PMCID: PMC8044767 DOI: 10.3389/fendo.2021.536018] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 03/12/2021] [Indexed: 12/27/2022] Open

Abstract

OBJECTIVE

Skeletal muscle fat content is one of the important contributors to insulin resistance (IR), but its diagnostic value remains unknown, especially in the Chinese population. Therefore, we aimed to analyze differences in skeletal muscle fat content and various functional MRI parameters between diabetic patients and control subjects to evaluate the early indicators of diabetes. In addition, we aimed to investigate the associations among skeletal muscle fat content, magnetic resonance parameters of skeletal muscle function and IR in type 2 diabetic patients and control subjects.

METHODS

We enrolled 12 patients (age:29-38 years, BMI: 25-28 kg/m²) who were newly diagnosed with type 2 diabetes (intravenous plasma glucose concentration≥11.1mmol/l or fasting blood glucose concentration≥7.0mmol/l) together with 12 control subjects as the control group (age: 26-33 years, BMI: 21-28 kg/m²). Fasting blood samples were collected for the measurement of glucose, insulin, 2-hour postprandial blood glucose (PBG2h), and glycated hemoglobin (HbAlc). The magnetic resonance scan of the lower extremity and abdomen was performed, which can evaluate visceral fat content as well as skeletal muscle metabolism and function through transverse relaxation times (T2), fraction anisotropy (FA) and apparent diffusion coefficient (ADC) values.

RESULTS

We found a significant difference in intermuscular fat (IMAT) between the diabetes group and the control group (p<0.05), the ratio of IMAT in thigh muscles of diabetes group was higher than that of control group. In the entire cohort, IMAT was positively correlated with HOMA-IR, HbAlc, T2, and FA, and the T2 value was correlated with HOMA-IR, PBG2h and HbAlc (p<0.05). There were also significant differences in T2 and FA values between the diabetes group and the control group (p<0.05). According to the ROC, assuming 8.85% of IMAT as the cutoff value, the sensitivity and specificity of IMAT were 100% and 83.3%, respectively. Assuming 39.25ms as the cutoff value, the sensitivity and specificity of T2 value were 66.7% and 91.7%, respectively. All the statistical analyses were adjusted for age, BMI and visceral fat content.

CONCLUSION

Deposition of IMAT in skeletal muscles seems to be an important determinant for IR in type 2 diabetes. The skeletal muscle IMAT value greater than 8.85% and the T2 value greater than 39.25ms are suggestive of IR.

Collapse

Sun S, Dong B, Zou Q. Revisiting genome-wide association studies from statistical modelling to machine learning. Brief Bioinform 2020;22:5943789. [PMID: 33126243 DOI: 10.1093/bib/bbaa263] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/06/2020] [Accepted: 09/11/2020] [Indexed: 11/14/2022] Open

Narmadha D, Pravin A. An intelligent computer-aided approach for target protein prediction in infectious diseases. Soft comput 2020. [DOI: 10.1007/s00500-020-04815-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Giacobbo LC, Perin MAA, Pereira TM, Garmendia MO, Reichow A, Melo AC, Castilhos BB, Trevilatto PC. RANK / RANKL / OPG gene polymorphisms and loss of orthodontic mini‐implants. Orthod Craniofac Res 2020;23:210-222. [DOI: 10.1111/ocr.12360] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/09/2019] [Accepted: 12/10/2019] [Indexed: 01/20/2023]

Xu Y, Cao L, Zhao X, Yao Y, Liu Q, Zhang B, Wang Y, Mao Y, Ma Y, Ma JZ, Payne TJ, Li MD, Li L. Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches. Front Psychiatry 2020;11:416. [PMID: 32477189 PMCID: PMC7241440 DOI: 10.3389/fpsyt.2020.00416] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Accepted: 04/23/2020] [Indexed: 12/22/2022] Open

Affiliation(s)

Yi Xu State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Liyu Cao State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Xinyi Zhao State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Yinghao Yao State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Qiang Liu State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Bin Zhang State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Yan Wang State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Ying Mao State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Yunlong Ma State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Jennie Z Ma Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States
Thomas J Payne Department of Otolaryngology and Communicative Sciences, University of Mississippi Medical Center, Jackson, MS, United States
Ming D Li State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China
Lanjuan Li State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China

Collapse

GWAS for Meat and Carcass Traits Using Imputed Sequence Level Genotypes in Pooled F2-Designs in Pigs. G3-GENES GENOMES GENETICS 2019;9:2823-2834. [PMID: 31296617 PMCID: PMC6723123 DOI: 10.1534/g3.119.400452] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Rybicka M, Woziwodzka A, Romanowski T, Sznarkowska A, Stalke P, Dręczewski M, Bielawski KP. Host genetic background affects the course of infection and treatment response in patients with chronic hepatitis B. J Clin Virol 2019;120:1-5. [PMID: 31505315 DOI: 10.1016/j.jcv.2019.09.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 08/02/2019] [Accepted: 09/02/2019] [Indexed: 01/16/2023]

Naidenov B, Lim A, Willyerd K, Torres NJ, Johnson WL, Hwang HJ, Hoyt P, Gustafson JE, Chen C. Pan-Genomic and Polymorphic Driven Prediction of Antibiotic Resistance in Elizabethkingia. Front Microbiol 2019;10:1446. [PMID: 31333599 PMCID: PMC6622151 DOI: 10.3389/fmicb.2019.01446] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Accepted: 06/07/2019] [Indexed: 01/21/2023] Open

Elmansy D, Koyutürk M. Cross-population analysis for functional characterization of type II diabetes variants. BMC Bioinformatics 2019;20:320. [PMID: 31216985 PMCID: PMC6584529 DOI: 10.1186/s12859-019-2835-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

As Genome-Wide Association Studies (GWAS) have been increasingly used with data from various populations, it has been observed that data from different populations reveal different sets of Single Nucleotide Polymorphisms (SNPs) that are associated with the same disease. Using Type II Diabetes (T2D) as a test case, we develop measures and methods to characterize the functional overlap of SNPs associated with the same disease across populations.

RESULTS

We introduce the notion of an Overlap Matrix as a general means of characterizing the functional overlap between different SNP sets at different genomic and functional granularities. Using SNP-to-gene mapping, functional annotation databases, and functional association networks, we assess the degree of functional overlap across nine populations from Asian and European ethnic origins. We further assess the generalizability of the method by applying it to a dataset for another complex disease - Prostate Cancer. Our results show that more overlap is captured as more functional data is incorporated as we go through the pipeline, starting from SNPs and ending at network overlap analyses. We hypothesize that these observed differences in the functional mechanisms of T2D across populations can also explain the common use of different prescription drugs in different populations. We show that this hypothesis is concordant with the literature on the functional mechanisms of prescription drugs.

CONCLUSION

Our results show that although the etiology of a complex disease can be associated with distinct processes that are affected in different populations, network-based annotations can capture more functional overlap across populations. These results support the notion that it can be useful to take ethnicity into account in making personalized treatment decisions for complex diseases.

Collapse

Hwa HL, Wu MY, Lin CP, Hsieh WH, Yin HI, Lee TT, Lee JCI. A single nucleotide polymorphism panel for individual identification and ancestry assignment in Caucasians and four East and Southeast Asian populations using a machine learning classifier. Forensic Sci Med Pathol 2019;15:67-74. [PMID: 30649693 DOI: 10.1007/s12024-018-0071-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/05/2018] [Indexed: 11/26/2022]

Vivian‐Griffiths T, Baker E, Schmidt KM, Bracher‐Smith M, Walters J, Artemiou A, Holmans P, O'Donovan MC, Owen MJ, Pocklington A, Escott‐Price V. Predictive modeling of schizophrenia from genomic data: Comparison of polygenic risk score with kernel support vector machines approach. Am J Med Genet B Neuropsychiatr Genet 2019;180:80-85. [PMID: 30516002 PMCID: PMC6492016 DOI: 10.1002/ajmg.b.32705] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 09/03/2018] [Accepted: 11/09/2018] [Indexed: 11/07/2022]

Affiliation(s)

Timothy Vivian‐Griffiths Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
Emily Baker Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
Karl M. Schmidt School of MathematicsCardiff UniversityCardiffUnited Kingdom
Matthew Bracher‐Smith Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
James Walters Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
Andreas Artemiou School of MathematicsCardiff UniversityCardiffUnited Kingdom
Peter Holmans Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
Michael C. O'Donovan Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
Michael J. Owen Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
Andrew Pocklington Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
Valentina Escott‐Price Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom

Collapse

Valdés MG, Galván-Femenía I, Ripoll VR, Duran X, Yokota J, Gavaldà R, Rafael-Palou X, de Cid R. Pipeline design to identify key features and classify the chemotherapy response on lung cancer patients using large-scale genetic data. BMC SYSTEMS BIOLOGY 2018;12:97. [PMID: 30458782 PMCID: PMC6245589 DOI: 10.1186/s12918-018-0615-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Abstract

BACKGROUND

During the last decade, the interest to apply machine learning algorithms to genomic data has increased in many bioinformatics applications. Analyzing this type of data entails difficulties for managing high-dimensional data, class imbalance for knowledge extraction, identifying important features and classifying individuals. In this study, we propose a general framework to tackle these challenges with different machine learning algorithms and techniques. We apply the configuration of this framework on lung cancer patients, identifying genetic signatures for classifying response to drug treatment response. We intersect these relevant SNPs with the GWAS Catalog of the National Human Genome Research Institute and explore the Regulomedb, GTEx databases for functional analysis purposes.

RESULTS

The machine learning based solution proposed in this study is a scalable and flexible alternative to the classical uni-variate regression approach to analyze large-scale data. From 36 experiments executed using the machine learning framework design, we obtain good classification performance from the top 5 models with the highest cross-validation score and the smallest standard deviation. One thousand two hundred twenty four SNPs corresponding to the key features from the top 20 models (cross validation F1 mean >= 0.65) were compared with the GWAS Catalog finding no intersection with genome-wide significant reported hits. From these, new genetic signatures in MAE, CEP104, PRKCZ and ADRB2 show relevant biological regulatory functionality related to lung physiology.

CONCLUSIONS

We have defined a machine learning framework using data with an unbalanced large data-set of SNP-arrays and imputed genotyping data from a pharmacogenomics study in lung cancer patients subjected to first-line platinum-based treatment. This approach found genome signals with no genome-wide significance in the uni-variate regression approach (GWAS Catalog) that are valuable for classifying patients, only few of them with related biological function. The effect results of these variants can be explained by the recently proposed omnigenic model hypothesis, which states that complex traits can be influenced mostly by genes outside not only by the "core genes", mainly found by the genome-wide significant SNPs, but also by the rest of genes outside of the "core pathways" with apparent unrelated biological functionality.

Collapse

Uppu S, Krishna A, Gopalan RP. A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:599-612. [PMID: 28060710 DOI: 10.1109/tcbb.2016.2635125] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Abo Alchamlat S, Farnir F. KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies. BMC Bioinformatics 2017;18:184. [PMID: 28327091 PMCID: PMC5361736 DOI: 10.1186/s12859-017-1599-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 03/11/2017] [Indexed: 12/30/2022] Open

Adipose tissue macrophage in immune regulation of metabolism. SCIENCE CHINA-LIFE SCIENCES 2016;59:1232-1240. [DOI: 10.1007/s11427-016-0155-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 09/13/2016] [Indexed: 12/14/2022]

Graversen C, Olesen AE, Staahl C, Drewes AM, Farina D. Multivariate analysis of single-sweep evoked brain potentials for pharmaco-electroencephalography. Neuropsychobiology 2016;71:241-52. [PMID: 26278118 DOI: 10.1159/000375310] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 01/12/2015] [Indexed: 11/19/2022]

Abstract

BACKGROUND AND AIMS

Current findings on altered evoked potentials (EPs) caused by morphine are based on common alterations for a group of subjects after drug administration. However, this ignores the analysis of individual responses, which may explain the clinical differences in efficacy. Therefore, we explored the individual responses to morphine in terms of the altered single-sweep characteristics in a placebo-controlled crossover study. To account for multifactorial mechanisms, several characteristics were assessed simultaneously by multivariate pattern analysis (MVPA).

METHODS

EPs were recorded from 62 channels and obtained before and after morphine and placebo administration during repeated electrical stimulations of the oesophagus in 12 healthy males. Additionally, the pain detection threshold was recorded to reflect the subjective analgesic effect in each subject. The characteristics of the sweeps were extracted by a multivariate matching pursuit algorithm with Gabor atoms implemented with a variable amplitude and constant phase across the sweeps. The single-sweep amplitudes were used as input to an MVPA algorithm to discriminate individual responses. The accuracy of the MVPA for each individual subject was used for correlation analysis of the analgesic effect.

RESULTS

The mean classification accuracy when discriminating pre- and posttreatment morphine responses was 72% (p = 0.01). The individual classification accuracy was positively correlated to the analgesic effect of morphine (p = 0.03). Furthermore, the 2 posttreatment responses were classified and validated by the classification of the 2 pretreatment responses (p = 0.001).

CONCLUSIONS

The alterations in the single-sweep EPs after morphine reflect the analgesic effect. The MVPA approach is a novel methodology for monitoring the individual efficacy of analgesics.

Collapse

Luna GI, da Silva ICR, Sanchez MN. Association between -308G/A TNFA Polymorphism and Susceptibility to Type 2 Diabetes Mellitus: A Systematic Review. J Diabetes Res 2016;2016:6309484. [PMID: 27822481 PMCID: PMC5086378 DOI: 10.1155/2016/6309484] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 07/12/2016] [Accepted: 09/14/2016] [Indexed: 12/29/2022] Open

Statistical and Computational Methods for Genetic Diseases: An Overview. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015;2015:954598. [PMID: 26106440 PMCID: PMC4464008 DOI: 10.1155/2015/954598] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 04/23/2015] [Indexed: 12/19/2022]

Karambataki M, Malousi A, Kouidou S. Risk-associated coding synonymous SNPs in type 2 diabetes and neurodegenerative diseases: genetic silence and the underrated association with splicing regulation and epigenetics. Mutat Res 2014;770:85-93. [PMID: 25771874 DOI: 10.1016/j.mrfmmm.2014.09.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Revised: 09/15/2014] [Accepted: 09/16/2014] [Indexed: 06/04/2023]

Abstract

Single nucleotide polymorphisms (SNPs) are tentatively critical with regard to disease predisposition, but coding synonymous SNPs (sSNPs) are generally considered "neutral". Nevertheless, sSNPs in serine/arginine-rich (SR) and splice-site (SS) exonic splicing enhancers (ESEs) or in exonic CpG methylation targets, could be decisive for splicing, particularly in aging-related conditions, where mis-splicing is frequently observed. We presently identified 33 genes T2D-related and 28 related to neurodegenerative diseases, by investigating the impact of the corresponding coding sSNPs on splicing and using gene ontology data and computational tools. Potentially critical (prominent) sSNPs comply with the following criteria: changing the splicing potential of prominent SR-ESEs or of significant SS-ESEs by >1.5 units (Δscore), or formation/deletion of ESEs with maximum splicing score. We also noted the formation/disruption of CpGs (tentative methylation sites of epigenetic sSNPs). All disease association studies involving sSNPs are also reported. Only 21/670 coding SNPs, mostly epigenetic, reported in 33 T2D-related genes, were found to be prominent coding synonymous. No prominent sSNPs have been recorded in three key T2D-related genes (GCGR, PPARGC1A, IGF1). Similarly, 20/366 coding synonymous were identified in ND related genes, mostly epigenetic. Meta-analysis showed that 17 of the above prominent sSNPs were previously investigated in association with various pathological conditions. Three out of four sSNPs (all epigenetic) were associated with T2D and one with NDs (branch site sSNP). Five were associated with other or related pathological conditions. None of the four sSNPs introducing new ESEs was found to be disease-associated. sSNPs introducing smaller Δscore changes (<1.5) in key proteins (INSR, IRS1, DISC1) were also correlated to pathological conditions. This data reveals that genetic variation in splicing-regulatory and particularly CpG sites might be related to disease predisposition and that in-silico analysis is useful for identifying sSNPs, which might be falsely identified as silent or synonymous.

Collapse

de Oliveira FC, Borges CCH, Almeida FN, e Silva FF, da Silva Verneque R, da Silva MVGB, Arbex W. SNPs selection using support vector regression and genetic algorithms in GWAS. BMC Genomics 2014;15 Suppl 7:S4. [PMID: 25573332 PMCID: PMC4243330 DOI: 10.1186/1471-2164-15-s7-s4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Genome-wide association studies identified novel loci for non-high-density lipoprotein cholesterol and its postprandial lipemic response. Hum Genet 2014;133:919-30. [PMID: 24604477 DOI: 10.1007/s00439-014-1435-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 02/24/2014] [Indexed: 12/27/2022]

Qian Y, Besenbacher S, Mailund T, Schierup MH. Identifying disease associated genes by network propagation. BMC SYSTEMS BIOLOGY 2014;8 Suppl 1:S6. [PMID: 24565229 PMCID: PMC4080512 DOI: 10.1186/1752-0509-8-s1-s6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

Danila MI, Reynolds RJ, Tiwari HK, Bridges SL. Ethnic-specific genetic analyses in rheumatoid arthritis: incremental gains but valuable contributions to the big picture. ARTHRITIS AND RHEUMATISM 2013;65:3014-6. [PMID: 23918636 DOI: 10.1002/art.38111] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Accepted: 07/25/2013] [Indexed: 12/29/2022]

A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. BIOMED RESEARCH INTERNATIONAL 2013;2013:432375. [PMID: 24228248 PMCID: PMC3818807 DOI: 10.1155/2013/432375] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 08/26/2013] [Accepted: 08/27/2013] [Indexed: 01/04/2023]

Hajiloo M, Damavandi B, Hooshsadat M, Sangi F, Mackey JR, Cass CE, Greiner R, Damaraju S. Breast cancer prediction using genome wide single nucleotide polymorphism data. BMC Bioinformatics 2013;14 Suppl 13:S3. [PMID: 24266904 PMCID: PMC3891310 DOI: 10.1186/1471-2105-14-s13-s3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

This paper introduces and applies a genome wide predictive study to learn a model that predicts whether a new subject will develop breast cancer or not, based on her SNP profile.

RESULTS

We first genotyped 696 female subjects (348 breast cancer cases and 348 apparently healthy controls), predominantly of Caucasian origin from Alberta, Canada using Affymetrix Human SNP 6.0 arrays. Then, we applied EIGENSTRAT population stratification correction method to remove 73 subjects not belonging to the Caucasian population. Then, we filtered any SNP that had any missing calls, whose genotype frequency was deviated from Hardy-Weinberg equilibrium, or whose minor allele frequency was less than 5%. Finally, we applied a combination of MeanDiff feature selection method and KNN learning method to this filtered dataset to produce a breast cancer prediction model. LOOCV accuracy of this classifier is 59.55%. Random permutation tests show that this result is significantly better than the baseline accuracy of 51.52%. Sensitivity analysis shows that the classifier is fairly robust to the number of MeanDiff-selected SNPs. External validation on the CGEMS breast cancer dataset, the only other publicly available breast cancer dataset, shows that this combination of MeanDiff and KNN leads to a LOOCV accuracy of 60.25%, which is significantly better than its baseline of 50.06%. We then considered a dozen different combinations of feature selection and learning method, but found that none of these combinations produces a better predictive model than our model. We also considered various biological feature selection methods like selecting SNPs reported in recent genome wide association studies to be associated with breast cancer, selecting SNPs in genes associated with KEGG cancer pathways, or selecting SNPs associated with breast cancer in the F-SNP database to produce predictive models, but again found that none of these models achieved accuracy better than baseline.

CONCLUSIONS

We anticipate producing more accurate breast cancer prediction models by recruiting more study subjects, providing more accurate labelling of phenotypes (to accommodate the heterogeneity of breast cancer), measuring other genomic alterations such as point mutations and copy number variations, and incorporating non-genetic information about subjects such as environmental and lifestyle factors.

Collapse

Wang T, Ji X, Luo C, Fan J, Hou Z, Chen M, Han R, Ni C. Polymorphisms in SELE gene and risk of coal workers' pneumoconiosis in Chinese: a case-control study. PLoS One 2013;8:e73254. [PMID: 24066042 PMCID: PMC3774684 DOI: 10.1371/journal.pone.0073254] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 07/18/2013] [Indexed: 12/14/2022] Open

Upstill-Goddard R, Eccles D, Ennis S, Rafiq S, Tapper W, Fliege J, Collins A. Support Vector Machine classifier for estrogen receptor positive and negative early-onset breast cancer. PLoS One 2013;8:e68606. [PMID: 23894323 PMCID: PMC3716652 DOI: 10.1371/journal.pone.0068606] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2013] [Accepted: 05/30/2013] [Indexed: 12/20/2022] Open

Kang C, Yu H, Yi GS. Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data. BMC Med Inform Decis Mak 2013;13 Suppl 1:S3. [PMID: 23566118 PMCID: PMC3618247 DOI: 10.1186/1472-6947-13-s1-s3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Abstract

Background

Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity.

Methods

We aim to detect type 2 diabetes (T2D) causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. T2D causal SNP combinations and genome-wide SNPs are mapped into functional modules using expanded gene set enrichment analysis (GSEA) considering pathway, transcription factor (TF)-target, miRNA-target, gene ontology, and protein complex functional modules. The prediction error rates are measured for SNP sets from functional module-based filtration that selects SNPs within functional modules from genome-wide SNPs based expanded GSEA.

Results

A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected using optimal filtration criteria, with an error rate of 10.25%. Matching 101 SNPs with known T2D genes and functional modules reveals the relationships between T2D and SNP combinations. The prediction error rates of SNP sets from functional module-based filtration record no significance compared to the prediction error rates of randomly selected SNP sets and T2D causal SNP combinations from optimal filtration.

Conclusions

We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms.

Collapse

Malovini A, Barbarini N, Bellazzi R, de Michelis F. Hierarchical Naive Bayes for genetic association studies. BMC Bioinformatics 2012;13 Suppl 14:S6. [PMID: 23095471 PMCID: PMC3439732 DOI: 10.1186/1471-2105-13-s14-s6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Lee H, Wang B, Wu X, Zhang H, Xu F. Decision tree classifier makes genotyping more intuitive and more efficient. TISSUE ANTIGENS 2012;80:188-190. [PMID: 22708606 DOI: 10.1111/j.1399-0039.2012.01901.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Revised: 04/28/2012] [Accepted: 05/11/2012] [Indexed: 06/01/2023]

Pahikkala T, Okser S, Airola A, Salakoski T, Aittokallio T. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms Mol Biol 2012;7:11. [PMID: 22551170 PMCID: PMC3606421 DOI: 10.1186/1748-7188-7-11] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 04/23/2012] [Indexed: 12/22/2022] Open

Abstract

BACKGROUND

Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible.

RESULTS

We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension - UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS.

CONCLUSIONS

Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/.

Collapse

Bogan JS. Regulation of glucose transporter translocation in health and diabetes. Annu Rev Biochem 2012;81:507-32. [PMID: 22482906 DOI: 10.1146/annurev-biochem-060109-094246] [Citation(s) in RCA: 191] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Clarke D, Bhardwaj N, Gerstein MB. Novel insights through the integration of structural and functional genomics data with protein networks. J Struct Biol 2012;179:320-6. [PMID: 22343087 DOI: 10.1016/j.jsb.2012.02.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Revised: 02/02/2012] [Accepted: 02/02/2012] [Indexed: 12/13/2022]

Tokuda Y, Yagi T, Yoshii K, Ikeda Y, Fuwa M, Ueno M, Nakano M, Omi N, Tanaka M, Mori K, Kageyama M, Nagasaki I, Yagi K, Kinoshita S, Tashiro K. An approach to predict the risk of glaucoma development by integrating different attribute data. SPRINGERPLUS 2012;1:41. [PMID: 23961367 PMCID: PMC3725912 DOI: 10.1186/2193-1801-1-41] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Accepted: 10/15/2012] [Indexed: 11/10/2022]

Cosgun E, Limdi NA, Duarte CW. High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans. ACTA ACUST UNITED AC 2011;27:1384-9. [PMID: 21450715 DOI: 10.1093/bioinformatics/btr159] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Roshan U, Chikkagoudar S, Wei Z, Wang K, Hakonarson H. Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest. Nucleic Acids Res 2011;39:e62. [PMID: 21317188 PMCID: PMC3089490 DOI: 10.1093/nar/gkr064] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Machine learning techniques for single nucleotide polymorphism--disease classification models in schizophrenia. Molecules 2010;15:4875-89. [PMID: 20657396 PMCID: PMC6257637 DOI: 10.3390/molecules15074875] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Revised: 07/08/2010] [Accepted: 07/09/2010] [Indexed: 11/16/2022] Open

Lin HJ, Huang YC, Lin JM, Wu JY, Chen LA, Lin CJ, Tsui YP, Chen CP, Tsai FJ. Single-nucleotide polymorphisms in chromosome 3p14.1- 3p14.2 are associated with susceptibility of type 2 diabetes with cataract. Mol Vis 2010;16:1206-14. [PMID: 20664687 PMCID: PMC2901187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 06/24/2010] [Indexed: 12/04/2022] Open