1
|
Agho CA, Śliwka J, Nassar H, Niinemets Ü, Runno-Paurson E. Machine Learning-Based Identification of Mating Type and Metalaxyl Response in Phytophthora infestans Using SSR Markers. Microorganisms 2024; 12:982. [PMID: 38792811 PMCID: PMC11124124 DOI: 10.3390/microorganisms12050982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/06/2024] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
Phytophthora infestans is the causal agent of late blight in potato. The occurrence of P. infestans with both A1 and A2 mating types in the field may result in sexual reproduction and the generation of recombinant strains. Such strains with new combinations of traits can be highly aggressive, resistant to fungicides, and can make the disease difficult to control in the field. Metalaxyl-resistant isolates are now more prevalent in potato fields. Understanding the genetic structure and rapid identification of mating types and metalaxyl response of P. infestans in the field is a prerequisite for effective late blight disease monitoring and management. Molecular and phenotypic assays involving molecular and phenotypic markers such as mating types and metalaxyl response are typically conducted separately in the studies of the genotypic and phenotypic diversity of P. infestans. As a result, there is a pressing need to reduce the experimental workload and more efficiently assess the aggressiveness of different strains. We think that employing genetic markers to not only estimate genotypic diversity but also to identify the mating type and fungicide response using machine learning techniques can guide and speed up the decision-making process in late blight disease management, especially when the mating type and metalaxyl resistance data are not available. This technique can also be applied to determine these phenotypic traits for dead isolates. In this study, over 600 P. infestans isolates from different populations-Estonia, Pskov region, and Poland-were classified for mating types and metalaxyl response using machine learning techniques based on simple sequence repeat (SSR) markers. For both traits, random forest and the support vector machine demonstrated good accuracy of over 70%, compared to the decision tree and artificial neural network models whose accuracy was lower. There were also associations (p < 0.05) between the traits and some of the alleles detected, but machine learning prediction techniques based on multilocus SSR genotypes offered better prediction accuracy.
Collapse
Affiliation(s)
- Collins A. Agho
- Institute of Agricultural and Environmental Sciences, Estonian University of Life Sciences, Kreutzwaldi 1, 51006 Tartu, Estonia
| | - Jadwiga Śliwka
- Plant Breeding and Acclimatization Institute—National Research Institute in Radzików, Department of Potato Genetics and Parental Lines, Platanowa Str. 19, 05-831 Młochów, Poland
| | - Helina Nassar
- Institute of Agricultural and Environmental Sciences, Estonian University of Life Sciences, Kreutzwaldi 1, 51006 Tartu, Estonia
| | - Ülo Niinemets
- Institute of Agricultural and Environmental Sciences, Estonian University of Life Sciences, Kreutzwaldi 1, 51006 Tartu, Estonia
- Estonian Academy of Sciences, Kohtu 6, 10130 Tallinn, Estonia
| | - Eve Runno-Paurson
- Institute of Agricultural and Environmental Sciences, Estonian University of Life Sciences, Kreutzwaldi 1, 51006 Tartu, Estonia
| |
Collapse
|
2
|
Kulshreshtha A, Bhatnagar S. Structural effect of the H992D/H418D mutation of angiotensin-converting enzyme in the Indian population: implications for health and disease. J Biomol Struct Dyn 2024:1-18. [PMID: 38411559 DOI: 10.1080/07391102.2024.2321246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 02/14/2024] [Indexed: 02/28/2024]
Abstract
The Non synonymous SNPs (nsSNPs) of the renin-angiotensin-system (RAS) pathway, unique to the Indian population were investigated in view of its importance as an endocrine system. nsSNPs of the RAS pathway genes were mined from the IndiGenome database. Damaging nsSNPs were predicted using SIFT, PredictSNP, SNP and GO, Snap2 and Protein Variation Effect Analyzer. Loss of function was predicted based on protein stability change using I mutant, PremPS and CONSURF. The structural impact of the nsSNPs was predicted using HOPE and Missense3d followed by modeling, refinement, and energy minimization. Molecular Dynamics studies were carried out using Gromacsv2021.1. 23 Indian nsSNPs of the RAS pathway genes were selected for structural analysis and 8 were predicted to be damaging. Further sequence analysis showed that HEMGH zinc binding motif changes to HEMGD in somatic ACE-C domain (sACE-C) H992D and Testis ACE (tACE) H418D resulted in loss of zinc coordination, which is essential for enzymatic activity in this metalloprotease. There was a loss of internal interactions around the zinc coordination residues in the protein structural network. This was also confirmed by Principal Component Analysis, Free Energy Landscape and residue contact maps. Both mutations lead to broadening of the AngI binding cavity. The H992D mutation in sACE-C is likely to be favorable for cardiovascular health, but may lead to renal abnormalities with secondary impact on the heart. H418D in tACE is potentially associated with male infertility.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Akanksha Kulshreshtha
- Computational and Structural Biology Laboratory, Department of Biological Sciences and Engineering, Netaji Subhas University of Technology, Dwarka, New Delhi, India
| | - Sonika Bhatnagar
- Computational and Structural Biology Laboratory, Department of Biological Sciences and Engineering, Netaji Subhas University of Technology, Dwarka, New Delhi, India
| |
Collapse
|
3
|
Koshko L, Scofield S, Debarba L, Stilgenbauer L, Fakhoury P, Jayarathne H, Perez-Mojica JE, Griggs E, Lempradl A, Sadagurski M. Prenatal benzene exposure in mice alters offspring hypothalamic development predisposing to metabolic disease in later life. CHEMOSPHERE 2023; 330:138738. [PMID: 37084897 PMCID: PMC10199724 DOI: 10.1016/j.chemosphere.2023.138738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 04/10/2023] [Accepted: 04/18/2023] [Indexed: 05/03/2023]
Abstract
Maternal exposure to environmental contaminants during pregnancy poses a significant threat to a developing fetus, as these substances can easily cross the placenta and disrupt the neurodevelopment of offspring. Specifically, the hypothalamus is essential in the regulation of metabolism, notably during critical windows of development. An abnormal hormonal and inflammatory milieu during development can trigger persistent changes in the function of hypothalamic circuits, leading to long-lasting effects on the body's energy homeostasis and metabolism. We recently demonstrated that gestational exposure to clinically relevant levels of benzene induces severe metabolic dysregulation in the offspring. Given the central role of the hypothalamus in metabolic control, we hypothesized that prenatal exposure to benzene impacts hypothalamic development, contributing to the adverse metabolic effects in the offspring. C57BL/6JB dams were exposed to benzene at 50 ppm in the inhalation chambers exclusively during pregnancy (from E0.5 to E19). Transcriptomic analysis of the exposed offspring at postnatal day 21 (P21) revealed hypothalamic changes in genes related to metabolic regulation, inflammation, and neurodevelopment exclusively in males. Moreover, the hypothalamus of prenatally benzene-exposed male offspring displayed alterations in orexigenic and anorexigenic projections, impairments in leptin signaling, and increased microgliosis. Additional exposure to benzene during lactation did not promote further microgliosis or astrogliosis in the offspring, while the high-fat diet (HFD) challenge in adulthood exacerbated glucose metabolism and hypothalamic inflammation in benzene-exposed offspring of both sexes. These findings reveal the persistent adverse effects of prenatal benzene exposure on hypothalamic circuits and neuroinflammation, predisposing the offspring to long-lasting metabolic health conditions.
Collapse
Affiliation(s)
- Lisa Koshko
- Department of Biological Sciences, Institute of Environmental Health Sciences, Integrative Biosciences Center (IBio), Wayne State University, Detroit, MI, USA
| | - Sydney Scofield
- Department of Biological Sciences, Institute of Environmental Health Sciences, Integrative Biosciences Center (IBio), Wayne State University, Detroit, MI, USA
| | - Lucas Debarba
- Department of Biological Sciences, Institute of Environmental Health Sciences, Integrative Biosciences Center (IBio), Wayne State University, Detroit, MI, USA
| | - Lukas Stilgenbauer
- Department of Biological Sciences, Institute of Environmental Health Sciences, Integrative Biosciences Center (IBio), Wayne State University, Detroit, MI, USA
| | - Patrick Fakhoury
- Department of Biological Sciences, Institute of Environmental Health Sciences, Integrative Biosciences Center (IBio), Wayne State University, Detroit, MI, USA
| | - Hashan Jayarathne
- Department of Biological Sciences, Institute of Environmental Health Sciences, Integrative Biosciences Center (IBio), Wayne State University, Detroit, MI, USA
| | | | - Ellen Griggs
- Van Andel Research Institute, Grand Rapids, MI, USA
| | | | - Marianna Sadagurski
- Department of Biological Sciences, Institute of Environmental Health Sciences, Integrative Biosciences Center (IBio), Wayne State University, Detroit, MI, USA.
| |
Collapse
|
4
|
Zhang Y, Zhang X, Li F, Lin C, Zhang D, Duan B, Zhao Y, Li X, Xu D, Cheng J, Zhao L, Wang J, Wang W. Expression profiles of the CD274 and PLEKHH2 gene and association of its polymorphism with hematologic parameters in sheep. Vet Immunol Immunopathol 2023; 259:110597. [PMID: 37094535 DOI: 10.1016/j.vetimm.2023.110597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 04/10/2023] [Accepted: 04/20/2023] [Indexed: 04/26/2023]
Abstract
CD274 and PLEKHH2 genes have been identified as immune- and multiple diseases-related genes, and have recently garnered significant interest. However, their role in regulating immune functions in sheep remains largely unexplored. In this study, we aimed to investigate the effects of polymorphisms in CD274 and PLEKHH2 on hematologic parameters in 915 sheep. Our results showed that the CD274 and PLEKHH2 genes were most highly expressed in the spleen and tail fat, respectively, as determined by qRT-PCR. We also identified a G to A mutation (g 0.11858 G > A) in the exon 4 region of CD274, and a C to G mutation (g 0.38384 C > G) in the intron 8 region of PLEKH2. Association analysis revealed that CD274 g 0.11858 G > A was significantly associated with RBC, HCT, MCHC, and MCV (P < 0.05), while PLEKHH2 g 0.38384 C > G was significantly associated with HCT, MPV, MCHC, and MCV (P < 0.05). These results suggest that CD274 and PLEKHH2 genes may play a role in regulating blood physiological indicators and could be potential functional candidates for influencing immune traits in sheep breeding programs.
Collapse
Affiliation(s)
- Yukun Zhang
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
| | - Xiaoxue Zhang
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
| | - Fadi Li
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
| | - Changchun Lin
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
| | - Deyin Zhang
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
| | - Benzhen Duan
- Department of Medical Microbiology and Parasitology, School of Basic Medical Sciences, Fudan University, Shanghai 200433, China; Key Laboratory of Medical Molecular Virology, MOE & NHC, School of Basic Medical Sciences, Fudan University, Shanghai 200433, China
| | - Yuan Zhao
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
| | - Xiaolong Li
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
| | - Dan Xu
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
| | - Jiangbo Cheng
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China
| | - Liming Zhao
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
| | - Jianghui Wang
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China
| | - Weimin Wang
- College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, China; State Key Laboratory of Grassland Agro-Ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, Ministry of Education, Lanzhou University, Lanzhou 730020, China.
| |
Collapse
|
5
|
Koshko L, Scofield S, Debarba L, Stilgenbauer L, Sacla M, Fakhoury P, Jayarathne H, Perez-Mojica JE, Griggs E, Lempradl A, Sadagurski M. Prenatal benzene exposure alters offspring hypothalamic development predisposing to metabolic disease in later life. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522910. [PMID: 36711607 PMCID: PMC9881982 DOI: 10.1101/2023.01.05.522910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The hypothalamus is essential in the regulation of metabolism, notably during critical windows of development. An abnormal hormonal and inflammatory milieu during development can trigger persistent changes in the function of hypothalamic circuits, leading to long-lasting effects on the body’s energy homeostasis and metabolism. We recently demonstrated that gestational exposure to benzene at smoking levels induces severe metabolic dysregulation in the offspring. Given the central role of the hypothalamus in metabolic control, we hypothesized that prenatal exposure to benzene impacts hypothalamic development, contributing to the adverse metabolic effects in the offspring. C57BL/6JB dams were exposed to benzene in the inhalation chambers exclusively during pregnancy (from E0.5 to E19). The transcriptome analysis of the offspring hypothalamus at postnatal day 21 (P21) revealed changes in genes related to metabolic regulation, inflammation, and neurodevelopment exclusively in benzene-exposed male offspring. Moreover, the hypothalamus of prenatally benzene-exposed male offspring displayed alterations in orexigenic and anorexigenic projections, impairments in leptin signaling, and increased microgliosis. Additional exposure to benzene during lactation did not promote further microgliosis or astrogliosis in the offspring, while the high-fat diet (HFD) challenge in adulthood exacerbated glucose metabolism and hypothalamic inflammation in benzene-exposed offspring of both sexes. These findings reveal the persistent impact of prenatal benzene exposure on hypothalamic circuits and neuroinflammation, predisposing the offspring to long-lasting metabolic health conditions.
Collapse
|
6
|
Griesemer D, Xue JR, Reilly SK, Ulirsch JC, Kukreja K, Davis JR, Kanai M, Yang DK, Butts JC, Guney MH, Luban J, Montgomery SB, Finucane HK, Novina CD, Tewhey R, Sabeti PC. Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution. Cell 2021; 184:5247-5260.e19. [PMID: 34534445 PMCID: PMC8487971 DOI: 10.1016/j.cell.2021.08.025] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 05/25/2021] [Accepted: 08/19/2021] [Indexed: 12/11/2022]
Abstract
3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
Collapse
Affiliation(s)
- Dustin Griesemer
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - James R Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA.
| | - Steven K Reilly
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA
| | - Jacob C Ulirsch
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kalki Kukreja
- Department of Molecular and Cell Biology, Harvard University, Cambridge, MA 02138, USA
| | - Joe R Davis
- BigHat Biosciences, San Carlos, CA 94070, USA
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA 02115, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - David K Yang
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA
| | - John C Butts
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA
| | - Mehmet H Guney
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Jeremy Luban
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA; Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Carl D Novina
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME 04609, USA; Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME 04469, USA; Tufts University School of Medicine, Boston, MA 02111, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02143, USA; Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02143, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
7
|
Gupta D, Choudhury A, Gupta U, Singh P, Prasad M. Computational approach to clinical diagnosis of diabetes disease: a comparative study. MULTIMEDIA TOOLS AND APPLICATIONS 2021; 80:30091-30116. [DOI: 10.1007/s11042-020-10242-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 10/14/2020] [Accepted: 12/09/2020] [Indexed: 08/30/2023]
|
8
|
Muneeb M, Henschel A. Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods. BMC Bioinformatics 2021; 22:198. [PMID: 33874881 PMCID: PMC8056510 DOI: 10.1186/s12859-021-04077-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/03/2021] [Indexed: 01/08/2023] Open
Abstract
Background Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. Results The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%. Conclusion Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.
Collapse
Affiliation(s)
- Muhammad Muneeb
- Department of Electrical Engineering and Computer Science, Center for Biotechnology Khalifa University, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Andreas Henschel
- Department of Electrical Engineering and Computer Science, Center for Biotechnology Khalifa University, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
9
|
Yu F, He B, Chen L, Wang F, Zhu H, Dong Y, Pan S. Intermuscular Fat Content in Young Chinese Men With Newly Diagnosed Type 2 Diabetes: Based on MR mDIXON-Quant Quantitative Technique. Front Endocrinol (Lausanne) 2021; 12:536018. [PMID: 33868161 PMCID: PMC8044767 DOI: 10.3389/fendo.2021.536018] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 03/12/2021] [Indexed: 12/27/2022] Open
Abstract
OBJECTIVE Skeletal muscle fat content is one of the important contributors to insulin resistance (IR), but its diagnostic value remains unknown, especially in the Chinese population. Therefore, we aimed to analyze differences in skeletal muscle fat content and various functional MRI parameters between diabetic patients and control subjects to evaluate the early indicators of diabetes. In addition, we aimed to investigate the associations among skeletal muscle fat content, magnetic resonance parameters of skeletal muscle function and IR in type 2 diabetic patients and control subjects. METHODS We enrolled 12 patients (age:29-38 years, BMI: 25-28 kg/m2) who were newly diagnosed with type 2 diabetes (intravenous plasma glucose concentration≥11.1mmol/l or fasting blood glucose concentration≥7.0mmol/l) together with 12 control subjects as the control group (age: 26-33 years, BMI: 21-28 kg/m2). Fasting blood samples were collected for the measurement of glucose, insulin, 2-hour postprandial blood glucose (PBG2h), and glycated hemoglobin (HbAlc). The magnetic resonance scan of the lower extremity and abdomen was performed, which can evaluate visceral fat content as well as skeletal muscle metabolism and function through transverse relaxation times (T2), fraction anisotropy (FA) and apparent diffusion coefficient (ADC) values. RESULTS We found a significant difference in intermuscular fat (IMAT) between the diabetes group and the control group (p<0.05), the ratio of IMAT in thigh muscles of diabetes group was higher than that of control group. In the entire cohort, IMAT was positively correlated with HOMA-IR, HbAlc, T2, and FA, and the T2 value was correlated with HOMA-IR, PBG2h and HbAlc (p<0.05). There were also significant differences in T2 and FA values between the diabetes group and the control group (p<0.05). According to the ROC, assuming 8.85% of IMAT as the cutoff value, the sensitivity and specificity of IMAT were 100% and 83.3%, respectively. Assuming 39.25ms as the cutoff value, the sensitivity and specificity of T2 value were 66.7% and 91.7%, respectively. All the statistical analyses were adjusted for age, BMI and visceral fat content. CONCLUSION Deposition of IMAT in skeletal muscles seems to be an important determinant for IR in type 2 diabetes. The skeletal muscle IMAT value greater than 8.85% and the T2 value greater than 39.25ms are suggestive of IR.
Collapse
Affiliation(s)
- Fuyao Yu
- Department of Radiology, Shengjing Hospital of China Medical University, Shenyang, China
| | - Bing He
- Department of Endocrinology, Shengjing Hospital of China Medical University, Shenyang, China
| | - Li Chen
- Department of Medicine, Medical College of Georgia, Georgia Prevention Institute, Augusta, GA, United States
| | - Fengzhe Wang
- Department of Radiology, Shengjing Hospital of China Medical University, Shenyang, China
| | - Haidong Zhu
- Department of Medicine, Medical College of Georgia, Georgia Prevention Institute, Augusta, GA, United States
| | - Yanbin Dong
- Department of Medicine, Medical College of Georgia, Georgia Prevention Institute, Augusta, GA, United States
| | - Shinong Pan
- Department of Radiology, Shengjing Hospital of China Medical University, Shenyang, China
- *Correspondence: Shinong Pan,
| |
Collapse
|
10
|
Sun S, Dong B, Zou Q. Revisiting genome-wide association studies from statistical modelling to machine learning. Brief Bioinform 2020; 22:5943789. [PMID: 33126243 DOI: 10.1093/bib/bbaa263] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/06/2020] [Accepted: 09/11/2020] [Indexed: 11/14/2022] Open
Abstract
Over the last decade, genome-wide association studies (GWAS) have discovered thousands of genetic variants underlying complex human diseases and agriculturally important traits. These findings have been utilized to dissect the biological basis of diseases, to develop new drugs, to advance precision medicine and to boost breeding. However, the potential of GWAS is still underexploited due to methodological limitations. Many challenges have emerged, including detecting epistasis and single-nucleotide polymorphisms (SNPs) with small effects and distinguishing causal variants from other SNPs associated through linkage disequilibrium. These issues have motivated advancements in GWAS analyses in two contrasting cultures-statistical modelling and machine learning. In this review, we systematically present the basic concepts and the benefits and limitations in both methods. We further discuss recent efforts to mitigate their weaknesses. Additionally, we summarize the state-of-the-art tools for detecting the missed signals, ultrarare mutations and gene-gene interactions and for prioritizing SNPs. Our work can offer both theoretical and practical guidelines for performing GWAS analyses and for developing further new robust methods to fully exploit the potential of GWAS.
Collapse
Affiliation(s)
- Shanwen Sun
- Institute of Fundamental and Frontier Sciences at the University of Electronic Science and Technology of China, Chengdu, China
| | - Benzhi Dong
- College of Computer Science and Engineering, Northeast Forestry University, Harbin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences at the University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
11
|
Narmadha D, Pravin A. An intelligent computer-aided approach for target protein prediction in infectious diseases. Soft comput 2020. [DOI: 10.1007/s00500-020-04815-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Giacobbo LC, Perin MAA, Pereira TM, Garmendia MO, Reichow A, Melo AC, Castilhos BB, Trevilatto PC. RANK
/
RANKL
/
OPG
gene polymorphisms and loss of orthodontic mini‐implants. Orthod Craniofac Res 2020; 23:210-222. [DOI: 10.1111/ocr.12360] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/09/2019] [Accepted: 12/10/2019] [Indexed: 01/20/2023]
Affiliation(s)
| | | | - Thaís Munhoz Pereira
- School of Life Sciences Pontifícia Universidade Católica do Paraná Curitiba Brazil
| | | | | | | | | | | |
Collapse
|
13
|
Xu Y, Cao L, Zhao X, Yao Y, Liu Q, Zhang B, Wang Y, Mao Y, Ma Y, Ma JZ, Payne TJ, Li MD, Li L. Prediction of Smoking Behavior From Single Nucleotide Polymorphisms With Machine Learning Approaches. Front Psychiatry 2020; 11:416. [PMID: 32477189 PMCID: PMC7241440 DOI: 10.3389/fpsyt.2020.00416] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Accepted: 04/23/2020] [Indexed: 12/22/2022] Open
Abstract
Smoking is a complex behavior with a heritability as high as 50%. Given such a large genetic contribution, it provides an opportunity to prevent those individuals who are susceptible to smoking dependence from ever starting to smoke by predicting their inherited predisposition with their genomic profiles. Although previous studies have identified many susceptibility variants for smoking, they have limited power to predict smoking behavior. We applied the support vector machine (SVM) and random forest (RF) methods to build prediction models for smoking behavior. We first used 1,431 smokers and 1,503 non-smokers of African origin for model building with a 10-fold cross-validation and then tested the prediction models on an independent dataset consisting of 213 smokers and 224 non-smokers. The SVM model with 500 top single nucleotide polymorphisms (SNPs) selected using logistic regression (p<0.01) as the feature selection method achieved an area under the curve (AUC) of 0.691, 0.721, and 0.720 for the training, test, and independent test samples, respectively. The RF model with 500 top SNPs selected using logistic regression (p<0.01) achieved AUCs of 0.671, 0.665, and 0.667 for the training, test, and independent test samples, respectively. Finally, we used the combined logistic (p<0.01) and LASSO (λ=10-3) regression to select features and the SVM algorithm for model building. The SVM model with 500 top SNPs achieved AUCs of 0.756, 0.776, and 0.897 for the training, test, and independent test samples, respectively. We conclude that machine learning methods are promising means to build predictive models for smoking.
Collapse
Affiliation(s)
- Yi Xu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Liyu Cao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xinyi Zhao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yinghao Yao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qiang Liu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Bin Zhang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yan Wang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Ying Mao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yunlong Ma
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jennie Z Ma
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States
| | - Thomas J Payne
- Department of Otolaryngology and Communicative Sciences, University of Mississippi Medical Center, Jackson, MS, United States
| | - Ming D Li
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China
| | - Lanjuan Li
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
14
|
GWAS for Meat and Carcass Traits Using Imputed Sequence Level Genotypes in Pooled F2-Designs in Pigs. G3-GENES GENOMES GENETICS 2019; 9:2823-2834. [PMID: 31296617 PMCID: PMC6723123 DOI: 10.1534/g3.119.400452] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
In order to gain insight into the genetic architecture of economically important traits in pigs and to derive suitable genetic markers to improve these traits in breeding programs, many studies have been conducted to map quantitative trait loci. Shortcomings of these studies were low mapping resolution, large confidence intervals for quantitative trait loci-positions and large linkage disequilibrium blocks. Here, we overcome these shortcomings by pooling four large F2 designs to produce smaller linkage disequilibrium blocks and by resequencing the founder generation at high coverage and the F1 generation at low coverage for subsequent imputation of the F2 generation to whole genome sequencing marker density. This lead to the discovery of more than 32 million variants, 8 million of which have not been previously reported. The pooling of the four F2 designs enabled us to perform a joint genome-wide association study, which lead to the identification of numerous significantly associated variant clusters on chromosomes 1, 2, 4, 7, 17 and 18 for the growth and carcass traits average daily gain, back fat thickness, meat fat ratio, and carcass length. We could not only confirm previously reported, but also discovered new quantitative trait loci. As a result, several new candidate genes are discussed, among them BMP2 (bone morphogenetic protein 2), which we recently discovered in a related study. Variant effect prediction revealed that 15 high impact variants for the traits back fat thickness, meat fat ratio and carcass length were among the statistically significantly associated variants.
Collapse
|
15
|
Rybicka M, Woziwodzka A, Romanowski T, Sznarkowska A, Stalke P, Dręczewski M, Bielawski KP. Host genetic background affects the course of infection and treatment response in patients with chronic hepatitis B. J Clin Virol 2019; 120:1-5. [PMID: 31505315 DOI: 10.1016/j.jcv.2019.09.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 08/02/2019] [Accepted: 09/02/2019] [Indexed: 01/16/2023]
Abstract
BACKGROUND Hepatitis B virus (HBV) utilizes proteins encoded by the host to infect hepatocytes and replicate. Recently, several novel host factors have been identified and described as important to the HBV lifecycle. The influence of host genetic background on chronic hepatitis B (CHB) pathogenesis is still poorly understood. OBJECTIVES Here, we aimed to investigate the association of NTCP, FXRα, HNF1α, HNF4α, and TDP2 genetic polymorphisms with the natural course of CHB and antiviral treatment response. STUDY DESIGN We genotyped 18 single-nucleotide polymorphisms using MALDI-TOF mass spectrometry in 136 patients with CHB and 100 healthy individuals. We investigated associations of the selected polymorphisms with biochemical, serological and hepatic markers of disease progression and treatment response. RESULTS No significant differences in genotypic or allelic distribution between CHB and control groups were observed. Within TDP2, rs3087943 variations were associated with treatment response, and rs1047782 modified the risk of advanced liver inflammation. Rs7154439 within NTCP was associated with HBeAg seroconversion after 48 weeks of nucleos(t)ide analogue treatment. HNF1α genotypes were associated with treatment response, liver damage and baseline HBeAg presence. HNF4α rs1800961 predicted PEG-IFNα treatment-induced HBsAg clearance in long-term follow up. CONCLUSIONS This study indicates host genetic background relevance in the course of CHB and confirms the role of recently described genes for HBV infection. The obtained results might serve as a starting point for validation studies on the clinical application of selected genetic variants to predict individual risks of CHB-induced liver failure and treatment response.
Collapse
Affiliation(s)
- Magda Rybicka
- Department of Molecular Diagnostics, Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland.
| | - Anna Woziwodzka
- Department of Molecular Diagnostics, Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland.
| | - Tomasz Romanowski
- Department of Molecular Diagnostics, Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland.
| | - Alicja Sznarkowska
- Department of Molecular Diagnostics, Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland; International Centre for Cancer Vaccine Science, University of Gdansk, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland.
| | - Piotr Stalke
- Department of Infectious Diseases, Medical University of Gdansk, ul. Powstania Styczniowego 9b, 81-519 Gdynia, Poland.
| | - Marcin Dręczewski
- Department of Infectious Diseases, Medical University of Gdansk, ul. Powstania Styczniowego 9b, 81-519 Gdynia, Poland.
| | - Krzysztof Piotr Bielawski
- Department of Molecular Diagnostics, Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland.
| |
Collapse
|
16
|
Naidenov B, Lim A, Willyerd K, Torres NJ, Johnson WL, Hwang HJ, Hoyt P, Gustafson JE, Chen C. Pan-Genomic and Polymorphic Driven Prediction of Antibiotic Resistance in Elizabethkingia. Front Microbiol 2019; 10:1446. [PMID: 31333599 PMCID: PMC6622151 DOI: 10.3389/fmicb.2019.01446] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Accepted: 06/07/2019] [Indexed: 01/21/2023] Open
Abstract
The Elizabethkingia are a genetically diverse genus of emerging pathogens that exhibit multidrug resistance to a range of common antibiotics. Two representative species, Elizabethkingia bruuniana and E. meningoseptica, were phenotypically tested to determine minimum inhibitory concentrations (MICs) for five antibiotics. Ultra-long read sequencing with Oxford Nanopore Technologies (ONT) and subsequent de novo assembly produced complete, gapless circular genomes for each strain. Alignment based annotation with Prokka identified 5,480 features in E. bruuniana and 5,203 features in E. meningoseptica, where none of these identified genes or gene combinations corresponded to observed phenotypic resistance values. Pan-genomic analysis, performed with an additional 19 Elizabethkingia strains, identified a core-genome size of 2,658,537 bp, 32 uniquely identifiable intrinsic chromosomal antibiotic resistance core-genes and 77 antibiotic resistance pan-genes. Using core-SNPs and pan-genes in combination with six machine learning (ML) algorithms, binary classification of clindamycin and vancomycin resistance achieved f1 scores of 0.94 and 0.84, respectively. Performance on the more challenging multiclass problem for fusidic acid, rifampin and ciprofloxacin resulted in f1 scores of 0.70, 0.75, and 0.54, respectively. By producing two sets of quality biological predictors, pan-genome genes and core-genome SNPs, from long-read sequence data and applying an ensemble of ML techniques, our results demonstrated that accurate phenotypic inference, at multiple AMR resolutions, can be achieved.
Collapse
Affiliation(s)
- Bryan Naidenov
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Alexander Lim
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Karyn Willyerd
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Nathanial J. Torres
- Department of Cell Biology, Microbiology and Molecular Biology, University of South Florida, Tampa, FL, United States
| | - William L. Johnson
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Hong Jin Hwang
- 110F Henry Bellmon Research Center, Bioinformatics Graduate Certificate Program and Genomics Core Facility, Oklahoma State University, Stillwater, OK, United States
| | - Peter Hoyt
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
- 110F Henry Bellmon Research Center, Bioinformatics Graduate Certificate Program and Genomics Core Facility, Oklahoma State University, Stillwater, OK, United States
| | - John E. Gustafson
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| | - Charles Chen
- Department of Biochemistry and Molecular Biology, 246 Noble Research Center, Oklahoma State University, Stillwater, OK, United States
| |
Collapse
|
17
|
Abstract
BACKGROUND As Genome-Wide Association Studies (GWAS) have been increasingly used with data from various populations, it has been observed that data from different populations reveal different sets of Single Nucleotide Polymorphisms (SNPs) that are associated with the same disease. Using Type II Diabetes (T2D) as a test case, we develop measures and methods to characterize the functional overlap of SNPs associated with the same disease across populations. RESULTS We introduce the notion of an Overlap Matrix as a general means of characterizing the functional overlap between different SNP sets at different genomic and functional granularities. Using SNP-to-gene mapping, functional annotation databases, and functional association networks, we assess the degree of functional overlap across nine populations from Asian and European ethnic origins. We further assess the generalizability of the method by applying it to a dataset for another complex disease - Prostate Cancer. Our results show that more overlap is captured as more functional data is incorporated as we go through the pipeline, starting from SNPs and ending at network overlap analyses. We hypothesize that these observed differences in the functional mechanisms of T2D across populations can also explain the common use of different prescription drugs in different populations. We show that this hypothesis is concordant with the literature on the functional mechanisms of prescription drugs. CONCLUSION Our results show that although the etiology of a complex disease can be associated with distinct processes that are affected in different populations, network-based annotations can capture more functional overlap across populations. These results support the notion that it can be useful to take ethnicity into account in making personalized treatment decisions for complex diseases.
Collapse
Affiliation(s)
- Dalia Elmansy
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106 USA
| | - Mehmet Koyutürk
- Department of Electrical Engineering and Computer Science, Center for Proteomics and Bioinformatics, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106 USA
| |
Collapse
|
18
|
Hwa HL, Wu MY, Lin CP, Hsieh WH, Yin HI, Lee TT, Lee JCI. A single nucleotide polymorphism panel for individual identification and ancestry assignment in Caucasians and four East and Southeast Asian populations using a machine learning classifier. Forensic Sci Med Pathol 2019; 15:67-74. [PMID: 30649693 DOI: 10.1007/s12024-018-0071-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/05/2018] [Indexed: 11/26/2022]
Abstract
Single nucleotide polymorphism (SNP) profiling is an effective means of individual identification and ancestry inferences in forensic genetics. This study established a SNP panel for the simultaneous individual identification and ancestry assignment of Caucasian and four East and Southeast Asian populations. We analyzed 220 SNPs (125 autosomal, 17 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs) of the DNA samples from 563 unrelated individuals of five populations (89 Caucasian, 234 Taiwanese Han, 90 Filipino, 79 Indonesian and 71 Vietnamese) and 18 degraded DNA samples. Informativeness for assignment (In) was used to select ancestry informative SNPs (AISNPs). A machine learning classifier, support vector machine (SVM), was used for ancestry assignment. Of the 220 SNPs, 62 were individual identification SNPs (IISNPs) (51 autosomal and 11 X-chromosomal SNPs) and 191 were AISNPs (100 autosomal, 13 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs). The 51 autosomal IISNPs offered cumulative random match probabilities (cRMPs) ranging from 1.56 × 10-21 to 3.16 × 10-22 among these five populations. Using AISNPs with the SVM, the overall accuracy rate of ancestry inference achieved in the testing dataset between Caucasian, Taiwanese Han, and Filipino populations was 88.9%, whereas it was 70.0% between Caucasians and each of the four East and Southeast Asian populations. For the 18 degraded DNA samples with incomplete profiling, the accuracy rate of ancestry assignment was 94.4%. We have developed a 220-SNP panel for simultaneous individual identification and ethnic origin differentiation between Caucasian and the four East and Southeast Asian populations. This SNP panel may assist with DNA analysis of forensic casework.
Collapse
Affiliation(s)
- Hsiao-Lin Hwa
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, No. 1, Sec. 1, Jen Ai Rd, Taipei, 100, Taiwan
- Department of Obstetrics and Gynecology, National Taiwan University Hospital, No. 7 Chung Shan S. Rd, Taipei, 100, Taiwan
- Department of Medical Genetics, National Taiwan University Hospital, No. 7 Chung Shan S. Rd, Taipei, 100, Taiwan
| | - Ming-Yih Wu
- Department of Obstetrics and Gynecology, National Taiwan University Hospital, No. 7 Chung Shan S. Rd, Taipei, 100, Taiwan
| | - Chih-Peng Lin
- Yourgene Bioscience, No.376-5 Fuxing Rd., Shulin Dist, New Taipei City, 238, Taiwan
| | - Wei Hsin Hsieh
- Yourgene Bioscience, No.376-5 Fuxing Rd., Shulin Dist, New Taipei City, 238, Taiwan
| | - Hsiang-I Yin
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, No. 1, Sec. 1, Jen Ai Rd, Taipei, 100, Taiwan
| | - Tsui-Ting Lee
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, No. 1, Sec. 1, Jen Ai Rd, Taipei, 100, Taiwan
| | - James Chun-I Lee
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, No. 1, Sec. 1, Jen Ai Rd, Taipei, 100, Taiwan.
| |
Collapse
|
19
|
Vivian‐Griffiths T, Baker E, Schmidt KM, Bracher‐Smith M, Walters J, Artemiou A, Holmans P, O'Donovan MC, Owen MJ, Pocklington A, Escott‐Price V. Predictive modeling of schizophrenia from genomic data: Comparison of polygenic risk score with kernel support vector machines approach. Am J Med Genet B Neuropsychiatr Genet 2019; 180:80-85. [PMID: 30516002 PMCID: PMC6492016 DOI: 10.1002/ajmg.b.32705] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 09/03/2018] [Accepted: 11/09/2018] [Indexed: 11/07/2022]
Abstract
A major controversy in psychiatric genetics is whether nonadditive genetic interaction effects contribute to the risk of highly polygenic disorders. We applied a support vector machines (SVMs) approach, which is capable of building linear and nonlinear models using kernel methods, to classify cases from controls in a large schizophrenia case-control sample of 11,853 subjects (5,554 cases and 6,299 controls) and compared its prediction accuracy with the polygenic risk score (PRS) approach. We also investigated whether SVMs are a suitable approach to detecting nonlinear genetic effects, that is, interactions. We found that PRS provided more accurate case/control classification than either linear or nonlinear SVMs, and give a tentative explanation why PRS outperforms both multivariate regression and linear kernel SVMs. In addition, we observe that nonlinear kernel SVMs showed higher classification accuracy than linear SVMs when a large number of SNPs are entered into the model. We conclude that SVMs are a potential tool for assessing the presence of interactions, prior to searching for them explicitly.
Collapse
Affiliation(s)
- Timothy Vivian‐Griffiths
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | - Emily Baker
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | - Karl M. Schmidt
- School of MathematicsCardiff UniversityCardiffUnited Kingdom
| | - Matthew Bracher‐Smith
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | - James Walters
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | | | - Peter Holmans
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | - Michael C. O'Donovan
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | - Michael J. Owen
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | - Andrew Pocklington
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | - Valentina Escott‐Price
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| |
Collapse
|
20
|
Valdés MG, Galván-Femenía I, Ripoll VR, Duran X, Yokota J, Gavaldà R, Rafael-Palou X, de Cid R. Pipeline design to identify key features and classify the chemotherapy response on lung cancer patients using large-scale genetic data. BMC SYSTEMS BIOLOGY 2018; 12:97. [PMID: 30458782 PMCID: PMC6245589 DOI: 10.1186/s12918-018-0615-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
BACKGROUND During the last decade, the interest to apply machine learning algorithms to genomic data has increased in many bioinformatics applications. Analyzing this type of data entails difficulties for managing high-dimensional data, class imbalance for knowledge extraction, identifying important features and classifying individuals. In this study, we propose a general framework to tackle these challenges with different machine learning algorithms and techniques. We apply the configuration of this framework on lung cancer patients, identifying genetic signatures for classifying response to drug treatment response. We intersect these relevant SNPs with the GWAS Catalog of the National Human Genome Research Institute and explore the Regulomedb, GTEx databases for functional analysis purposes. RESULTS The machine learning based solution proposed in this study is a scalable and flexible alternative to the classical uni-variate regression approach to analyze large-scale data. From 36 experiments executed using the machine learning framework design, we obtain good classification performance from the top 5 models with the highest cross-validation score and the smallest standard deviation. One thousand two hundred twenty four SNPs corresponding to the key features from the top 20 models (cross validation F1 mean >= 0.65) were compared with the GWAS Catalog finding no intersection with genome-wide significant reported hits. From these, new genetic signatures in MAE, CEP104, PRKCZ and ADRB2 show relevant biological regulatory functionality related to lung physiology. CONCLUSIONS We have defined a machine learning framework using data with an unbalanced large data-set of SNP-arrays and imputed genotyping data from a pharmacogenomics study in lung cancer patients subjected to first-line platinum-based treatment. This approach found genome signals with no genome-wide significance in the uni-variate regression approach (GWAS Catalog) that are valuable for classifying patients, only few of them with related biological function. The effect results of these variants can be explained by the recently proposed omnigenic model hypothesis, which states that complex traits can be influenced mostly by genes outside not only by the "core genes", mainly found by the genome-wide significant SNPs, but also by the rest of genes outside of the "core pathways" with apparent unrelated biological functionality.
Collapse
Affiliation(s)
- María Gabriela Valdés
- Eurecat. Technology Centre of Catalonia, Av. Diagonal 177, 9th floor, Barcelona, 08018 Spain
| | - Iván Galván-Femenía
- PMPPC-IGTP. Programa de Medicina Predictiva i Personalitzada del Càncer - Institut Germans Trias i Pujol (IGTP). Genomes for Life - GCAT lab Group, Badalona, Spain
| | - Vicent Ribas Ripoll
- Eurecat. Technology Centre of Catalonia, Av. Diagonal 177, 9th floor, Barcelona, 08018 Spain
| | - Xavier Duran
- PMPPC-IGTP. Programa de Medicina Predictiva i Personalitzada del Càncer - Institut Germans Trias i Pujol (IGTP). Genomes for Life - GCAT lab Group, Badalona, Spain
| | - Jun Yokota
- PMPPC-IGTP. Programa de Medicina Predictiva i Personalitzada del Càncer - Institut Germans Trias i Pujol (IGTP). CancerGenome Biology, Badalona, Spain
| | - Ricard Gavaldà
- Universitat Politècnica de Catalunya, Barcelona, Spain
- Barcelona Graduate School of Mathematics, BGSMath, Barcelona, Spain
| | - Xavier Rafael-Palou
- Eurecat. Technology Centre of Catalonia, Av. Diagonal 177, 9th floor, Barcelona, 08018 Spain
| | - Rafael de Cid
- PMPPC-IGTP. Programa de Medicina Predictiva i Personalitzada del Càncer - Institut Germans Trias i Pujol (IGTP). Genomes for Life - GCAT lab Group, Badalona, Spain
| |
Collapse
|
21
|
Uppu S, Krishna A, Gopalan RP. A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:599-612. [PMID: 28060710 DOI: 10.1109/tcbb.2016.2635125] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this era of genome-wide association studies (GWAS), the quest for understanding the genetic architecture of complex diseases is rapidly increasing more than ever before. The development of high throughput genotyping and next generation sequencing technologies enables genetic epidemiological analysis of large scale data. These advances have led to the identification of a number of single nucleotide polymorphisms (SNPs) responsible for disease susceptibility. The interactions between SNPs associated with complex diseases are increasingly being explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. This paper reviews the current methods and the related software packages to detect the SNP interactions that contribute to diseases. The issues that need to be considered when developing these models are addressed in this review. The paper also reviews the achievements in data simulation to evaluate the performance of these models. Further, it discusses the future of SNP interaction analysis.
Collapse
|
22
|
Abo Alchamlat S, Farnir F. KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies. BMC Bioinformatics 2017; 18:184. [PMID: 28327091 PMCID: PMC5361736 DOI: 10.1186/s12859-017-1599-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 03/11/2017] [Indexed: 12/30/2022] Open
Abstract
Background Finding epistatic interactions in large association studies like genome-wide association studies (GWAS) with the nowadays-available large volume of genomic data is a challenging and largely unsolved issue. Few previous studies could handle genome-wide data due to the intractable difficulties met in searching a combinatorial explosive search space and statistically evaluating epistatic interactions given a limited number of samples. Our work is a contribution to this field. We propose a novel approach combining K-Nearest Neighbors (KNN) and Multi Dimensional Reduction (MDR) methods for detecting gene-gene interactions as a possible alternative to existing algorithms, e especially in situations where the number of involved determinants is high. After describing the approach, a comparison of our method (KNN-MDR) to a set of the other most performing methods (i.e., MDR, BOOST, BHIT, MegaSNPHunter and AntEpiSeeker) is carried on to detect interactions using simulated data as well as real genome-wide data. Results Experimental results on both simulated data and real genome-wide data show that KNN-MDR has interesting properties in terms of accuracy and power, and that, in many cases, it significantly outperforms its recent competitors. Conclusions The presented methodology (KNN-MDR) is valuable in the context of loci and interactions mapping and can be seen as an interesting addition to the arsenal used in complex traits analyses. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1599-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sinan Abo Alchamlat
- Department of Biostatistics, Faculty of Veterinary Medicine, FARAH, University of Liège, Sart Tilman B43, 4000, Liege, Belgium
| | - Frédéric Farnir
- Department of Biostatistics, Faculty of Veterinary Medicine, FARAH, University of Liège, Sart Tilman B43, 4000, Liege, Belgium.
| |
Collapse
|
23
|
Adipose tissue macrophage in immune regulation of metabolism. SCIENCE CHINA-LIFE SCIENCES 2016; 59:1232-1240. [DOI: 10.1007/s11427-016-0155-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 09/13/2016] [Indexed: 12/14/2022]
|
24
|
Graversen C, Olesen AE, Staahl C, Drewes AM, Farina D. Multivariate analysis of single-sweep evoked brain potentials for pharmaco-electroencephalography. Neuropsychobiology 2016; 71:241-52. [PMID: 26278118 DOI: 10.1159/000375310] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 01/12/2015] [Indexed: 11/19/2022]
Abstract
BACKGROUND AND AIMS Current findings on altered evoked potentials (EPs) caused by morphine are based on common alterations for a group of subjects after drug administration. However, this ignores the analysis of individual responses, which may explain the clinical differences in efficacy. Therefore, we explored the individual responses to morphine in terms of the altered single-sweep characteristics in a placebo-controlled crossover study. To account for multifactorial mechanisms, several characteristics were assessed simultaneously by multivariate pattern analysis (MVPA). METHODS EPs were recorded from 62 channels and obtained before and after morphine and placebo administration during repeated electrical stimulations of the oesophagus in 12 healthy males. Additionally, the pain detection threshold was recorded to reflect the subjective analgesic effect in each subject. The characteristics of the sweeps were extracted by a multivariate matching pursuit algorithm with Gabor atoms implemented with a variable amplitude and constant phase across the sweeps. The single-sweep amplitudes were used as input to an MVPA algorithm to discriminate individual responses. The accuracy of the MVPA for each individual subject was used for correlation analysis of the analgesic effect. RESULTS The mean classification accuracy when discriminating pre- and posttreatment morphine responses was 72% (p = 0.01). The individual classification accuracy was positively correlated to the analgesic effect of morphine (p = 0.03). Furthermore, the 2 posttreatment responses were classified and validated by the classification of the 2 pretreatment responses (p = 0.001). CONCLUSIONS The alterations in the single-sweep EPs after morphine reflect the analgesic effect. The MVPA approach is a novel methodology for monitoring the individual efficacy of analgesics.
Collapse
Affiliation(s)
- Carina Graversen
- Department of Gastroenterology and Hepatology, Mech-Sense, Aalborg University Hospital, Aalborg, Denmark
| | | | | | | | | |
Collapse
|
25
|
Luna GI, da Silva ICR, Sanchez MN. Association between -308G/A TNFA Polymorphism and Susceptibility to Type 2 Diabetes Mellitus: A Systematic Review. J Diabetes Res 2016; 2016:6309484. [PMID: 27822481 PMCID: PMC5086378 DOI: 10.1155/2016/6309484] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 07/12/2016] [Accepted: 09/14/2016] [Indexed: 12/29/2022] Open
Abstract
Diabetes mellitus (DM) is considered to be a worldwide epidemic disease and its type 2 form comprises more than 95% of all cases. Tumor necrosis factor-alpha (TNF-α) is a proinflammatory cytokine. Its dysregulation has been implicated in a variety of human diseases, including type 2 diabetes mellitus (T2DM). The control of expression of this cytokine is associated with insulin resistance and has a strong genetic influence. In order to understand this relationship, the literature from all case-control studies since 2000 to date was reviewed. The genotypes frequency results presented in ten publications with different ethnicities were compared. The correlation between the TNFA promoter genotypes and the risk of developing T2DM remains controversial due to the many discrepancies between the different studies available. Ethnic differences may play a role in these conflicting results, since the distribution of TNFA promoter polymorphisms is distinctive between individuals of dissimilar racial origin. Hence, although the relationship between T2DM incidence and presence of polymorphisms at position -308 of the TNFA gene is not entirely clear, the results of these studies suggest the need for further investigation.
Collapse
Affiliation(s)
- Geisa Izetti Luna
- Programa de Pós-Graduação em Saúde Coletiva, Universidade de Brasília, Brasília, DF, Brazil
- *Geisa Izetti Luna:
| | | | - Mauro Niskier Sanchez
- Programa de Pós-Graduação em Saúde Coletiva, Universidade de Brasília, Brasília, DF, Brazil
| |
Collapse
|
26
|
Statistical and Computational Methods for Genetic Diseases: An Overview. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:954598. [PMID: 26106440 PMCID: PMC4464008 DOI: 10.1155/2015/954598] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 04/23/2015] [Indexed: 12/19/2022]
Abstract
The identification of causes of genetic diseases has been carried out by several approaches with increasing complexity. Innovation of genetic methodologies leads to the production of large amounts of data that needs the support of statistical and computational methods to be correctly processed. The aim of the paper is to provide an overview of statistical and computational methods paying attention to methods for the sequence analysis and complex diseases.
Collapse
|
27
|
Karambataki M, Malousi A, Kouidou S. Risk-associated coding synonymous SNPs in type 2 diabetes and neurodegenerative diseases: genetic silence and the underrated association with splicing regulation and epigenetics. Mutat Res 2014; 770:85-93. [PMID: 25771874 DOI: 10.1016/j.mrfmmm.2014.09.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Revised: 09/15/2014] [Accepted: 09/16/2014] [Indexed: 06/04/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are tentatively critical with regard to disease predisposition, but coding synonymous SNPs (sSNPs) are generally considered "neutral". Nevertheless, sSNPs in serine/arginine-rich (SR) and splice-site (SS) exonic splicing enhancers (ESEs) or in exonic CpG methylation targets, could be decisive for splicing, particularly in aging-related conditions, where mis-splicing is frequently observed. We presently identified 33 genes T2D-related and 28 related to neurodegenerative diseases, by investigating the impact of the corresponding coding sSNPs on splicing and using gene ontology data and computational tools. Potentially critical (prominent) sSNPs comply with the following criteria: changing the splicing potential of prominent SR-ESEs or of significant SS-ESEs by >1.5 units (Δscore), or formation/deletion of ESEs with maximum splicing score. We also noted the formation/disruption of CpGs (tentative methylation sites of epigenetic sSNPs). All disease association studies involving sSNPs are also reported. Only 21/670 coding SNPs, mostly epigenetic, reported in 33 T2D-related genes, were found to be prominent coding synonymous. No prominent sSNPs have been recorded in three key T2D-related genes (GCGR, PPARGC1A, IGF1). Similarly, 20/366 coding synonymous were identified in ND related genes, mostly epigenetic. Meta-analysis showed that 17 of the above prominent sSNPs were previously investigated in association with various pathological conditions. Three out of four sSNPs (all epigenetic) were associated with T2D and one with NDs (branch site sSNP). Five were associated with other or related pathological conditions. None of the four sSNPs introducing new ESEs was found to be disease-associated. sSNPs introducing smaller Δscore changes (<1.5) in key proteins (INSR, IRS1, DISC1) were also correlated to pathological conditions. This data reveals that genetic variation in splicing-regulatory and particularly CpG sites might be related to disease predisposition and that in-silico analysis is useful for identifying sSNPs, which might be falsely identified as silent or synonymous.
Collapse
Affiliation(s)
- M Karambataki
- Lab of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - A Malousi
- Lab of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - S Kouidou
- Lab of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| |
Collapse
|
28
|
de Oliveira FC, Borges CCH, Almeida FN, e Silva FF, da Silva Verneque R, da Silva MVGB, Arbex W. SNPs selection using support vector regression and genetic algorithms in GWAS. BMC Genomics 2014; 15 Suppl 7:S4. [PMID: 25573332 PMCID: PMC4243330 DOI: 10.1186/1471-2164-15-s7-s4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels.
Collapse
|
29
|
Genome-wide association studies identified novel loci for non-high-density lipoprotein cholesterol and its postprandial lipemic response. Hum Genet 2014; 133:919-30. [PMID: 24604477 DOI: 10.1007/s00439-014-1435-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 02/24/2014] [Indexed: 12/27/2022]
Abstract
Non-high-density lipoprotein cholesterol(NHDL) is an independent and superior predictor of CVD risk as compared to low-density lipoprotein alone. It represents a spectrum of atherogenic lipid fractions with possibly a distinct genomic signature. We performed genome-wide association studies (GWAS) to identify loci influencing baseline NHDL and its postprandial lipemic (PPL) response. We carried out GWAS in 4,241 participants of European descent. Our discovery cohort included 928 subjects from the Genetics of Lipid-Lowering Drugs and Diet Network Study. Our replication cohorts included 3,313 subjects from the Heredity and Phenotype Intervention Heart Study and Family Heart Study. A linear mixed model using the kinship matrix was used for association tests. The best association signal was found in a tri-genic region at RHOQ-PIGF-CRIPT for baseline NHDL (lead SNP rs6544903, discovery p = 7e-7, MAF = 2 %; validation p = 6e-4 at 0.1 kb upstream neighboring SNP rs3768725, and 5e-4 at 0.7 kb downstream neighboring SNP rs6733143, MAF = 10 %). The lead and neighboring SNPs were not perfect surrogate proxies to each other (D' = 1, r (2) = 0.003) but they seemed to be partially dependent (likelihood ration test p = 0.04). Other suggestive loci (discovery p < 1e-6) included LOC100419812 and LOC100288337 for baseline NHDL, and LOC100420502 and CDH13 for NHDL PPL response that were not replicated (p > 0.01). The current and first GWAS of NHDL yielded an interesting common variant in RHOQ-PIGF-CRIPT influencing baseline NHDL levels. Another common variant in CDH13 for NHDL response to dietary high-fat intake challenge was also suggested. Further validations for both loci from large independent studies, especially interventional studies, are warranted.
Collapse
|
30
|
Qian Y, Besenbacher S, Mailund T, Schierup MH. Identifying disease associated genes by network propagation. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 1:S6. [PMID: 24565229 PMCID: PMC4080512 DOI: 10.1186/1752-0509-8-s1-s6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Background Genome-wide association studies have identified many individual genes associated with complex traits. However, pathway and network information have not been fully exploited in searches for genetic determinants, and including this information may increase our understanding of the underlying biology of common diseases. Results In this study, we propose a framework to address this problem in a principled way, with the underlying hypothesis that complex disease operates through multiple connected genes. Associations inferred from GWAS are translated into prior scores for vertices in a protein-protein interaction network, and these scores are propagated through the network. Permutation is used to select genes that are guilty-by-association and thus consistently obtain high scores after network propagation. We apply the approach to data of Crohn's disease and call candidate genes that have been reported by other independent GWAS, but not in the analysed data set. A prediction model based on these candidate genes show good predictive power as measured by Area Under the Receiver Operating Curve (AUC) in 10 fold cross-validations. Conclusions Our network propagation method applied to a genome-wide association study increases association findings over other approaches.
Collapse
|
31
|
Danila MI, Reynolds RJ, Tiwari HK, Bridges SL. Ethnic-specific genetic analyses in rheumatoid arthritis: incremental gains but valuable contributions to the big picture. ARTHRITIS AND RHEUMATISM 2013; 65:3014-6. [PMID: 23918636 DOI: 10.1002/art.38111] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Accepted: 07/25/2013] [Indexed: 12/29/2022]
|
32
|
A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. BIOMED RESEARCH INTERNATIONAL 2013; 2013:432375. [PMID: 24228248 PMCID: PMC3818807 DOI: 10.1155/2013/432375] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 08/26/2013] [Accepted: 08/27/2013] [Indexed: 01/04/2023]
Abstract
Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.
Collapse
|
33
|
Hajiloo M, Damavandi B, Hooshsadat M, Sangi F, Mackey JR, Cass CE, Greiner R, Damaraju S. Breast cancer prediction using genome wide single nucleotide polymorphism data. BMC Bioinformatics 2013; 14 Suppl 13:S3. [PMID: 24266904 PMCID: PMC3891310 DOI: 10.1186/1471-2105-14-s13-s3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This paper introduces and applies a genome wide predictive study to learn a model that predicts whether a new subject will develop breast cancer or not, based on her SNP profile. RESULTS We first genotyped 696 female subjects (348 breast cancer cases and 348 apparently healthy controls), predominantly of Caucasian origin from Alberta, Canada using Affymetrix Human SNP 6.0 arrays. Then, we applied EIGENSTRAT population stratification correction method to remove 73 subjects not belonging to the Caucasian population. Then, we filtered any SNP that had any missing calls, whose genotype frequency was deviated from Hardy-Weinberg equilibrium, or whose minor allele frequency was less than 5%. Finally, we applied a combination of MeanDiff feature selection method and KNN learning method to this filtered dataset to produce a breast cancer prediction model. LOOCV accuracy of this classifier is 59.55%. Random permutation tests show that this result is significantly better than the baseline accuracy of 51.52%. Sensitivity analysis shows that the classifier is fairly robust to the number of MeanDiff-selected SNPs. External validation on the CGEMS breast cancer dataset, the only other publicly available breast cancer dataset, shows that this combination of MeanDiff and KNN leads to a LOOCV accuracy of 60.25%, which is significantly better than its baseline of 50.06%. We then considered a dozen different combinations of feature selection and learning method, but found that none of these combinations produces a better predictive model than our model. We also considered various biological feature selection methods like selecting SNPs reported in recent genome wide association studies to be associated with breast cancer, selecting SNPs in genes associated with KEGG cancer pathways, or selecting SNPs associated with breast cancer in the F-SNP database to produce predictive models, but again found that none of these models achieved accuracy better than baseline. CONCLUSIONS We anticipate producing more accurate breast cancer prediction models by recruiting more study subjects, providing more accurate labelling of phenotypes (to accommodate the heterogeneity of breast cancer), measuring other genomic alterations such as point mutations and copy number variations, and incorporating non-genetic information about subjects such as environmental and lifestyle factors.
Collapse
|
34
|
Wang T, Ji X, Luo C, Fan J, Hou Z, Chen M, Han R, Ni C. Polymorphisms in SELE gene and risk of coal workers' pneumoconiosis in Chinese: a case-control study. PLoS One 2013; 8:e73254. [PMID: 24066042 PMCID: PMC3774684 DOI: 10.1371/journal.pone.0073254] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 07/18/2013] [Indexed: 12/14/2022] Open
Abstract
Background Coal workers' pneumoconiosis (CWP) is characterized by chronic pulmonary inflammation and fibrotic nodular lesions that usually lead to progressive fibrosis. Inflammation is the first step in the development of CWP. E-selectin, an adhesion molecule, is involved in the development of various inflammatory diseases. Methods We investigated the association between the functional polymorphisms in SELE and the risk of CWP in Han Chinese population. Three polymorphisms (T1880C/rs5355, T1559C/rs5368, A16089G/rs4786) in SELE were genotyped and analyzed in a case-control study with 697 CWP cases and 694 controls. The genotyping was based on the TaqMan method with the ABI 7900HT Real Time PCR system. Results The SELE rs5368 CT genotype was associated with a significantly increased risk of CWP (OR = 1.28, 95% CI = 1.02–1.60, P = 0.03) relative to the CC genotype. The statistical analysis of classification and regression tree (CART) and multifactor dimensionality reduction (MDR) were used to predict the interactions among risk factors of CWP. The MDR analysis found that the best interaction model was the two-factor model that contains pack-years smoked and SELE rs5368 genotypes. For non-smokers, the CART analysis showed an increased risk of CWP for carriers of the SELE rs_5368 variant genotype compared with the common genotype (OR = 1.51; 95% CI = 1.11–2.05, P = 0.0069). Conclusion The results suggest that the T1559C/rs5368 polymorphism and smoking are involved in the susceptibility to CWP. Further studies are warranted to validate these findings.
Collapse
Affiliation(s)
- Ting Wang
- Department of Occupational Medicine and Environmental Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Xiaoming Ji
- Department of Occupational Medicine and Environmental Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Chen Luo
- Department of Occupational Medicine and Environmental Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Jingjing Fan
- Department of Occupational Medicine and Environmental Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Zhiguo Hou
- Department of Occupational Medicine and Environmental Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Minjuan Chen
- Department of Occupational Medicine and Environmental Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Ruhui Han
- Department of Occupational Medicine and Environmental Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Chunhui Ni
- Department of Occupational Medicine and Environmental Health, School of Public Health, Nanjing Medical University, Nanjing, China
- * E-mail:
| |
Collapse
|
35
|
Upstill-Goddard R, Eccles D, Ennis S, Rafiq S, Tapper W, Fliege J, Collins A. Support Vector Machine classifier for estrogen receptor positive and negative early-onset breast cancer. PLoS One 2013; 8:e68606. [PMID: 23894323 PMCID: PMC3716652 DOI: 10.1371/journal.pone.0068606] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2013] [Accepted: 05/30/2013] [Indexed: 12/20/2022] Open
Abstract
Two major breast cancer sub-types are defined by the expression of estrogen receptors on tumour cells. Cancers with large numbers of receptors are termed estrogen receptor positive and those with few are estrogen receptor negative. Using genome-wide single nucleotide polymorphism genotype data for a sample of early-onset breast cancer patients we developed a Support Vector Machine (SVM) classifier from 200 germline variants associated with estrogen receptor status (p<0.0005). Using a linear kernel Support Vector Machine, we achieved classification accuracy exceeding 93%. The model indicates that polygenic variation in more than 100 genes is likely to underlie the estrogen receptor phenotype in early-onset breast cancer. Functional classification of the genes involved identifies enrichment of functions linked to the immune system, which is consistent with the current understanding of the biological role of estrogen receptors in breast cancer.
Collapse
Affiliation(s)
- Rosanna Upstill-Goddard
- Human Genetics and Cancer Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Diana Eccles
- Human Genetics and Cancer Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Sarah Ennis
- Human Genetics and Cancer Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Sajjad Rafiq
- Human Genetics and Cancer Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - William Tapper
- Human Genetics and Cancer Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Joerg Fliege
- Centre for Operational Research, Management Science and Information Systems, University of Southampton, Southampton, United Kingdom
| | - Andrew Collins
- Human Genetics and Cancer Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
36
|
Kang C, Yu H, Yi GS. Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data. BMC Med Inform Decis Mak 2013; 13 Suppl 1:S3. [PMID: 23566118 PMCID: PMC3618247 DOI: 10.1186/1472-6947-13-s1-s3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Background Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity. Methods We aim to detect type 2 diabetes (T2D) causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. T2D causal SNP combinations and genome-wide SNPs are mapped into functional modules using expanded gene set enrichment analysis (GSEA) considering pathway, transcription factor (TF)-target, miRNA-target, gene ontology, and protein complex functional modules. The prediction error rates are measured for SNP sets from functional module-based filtration that selects SNPs within functional modules from genome-wide SNPs based expanded GSEA. Results A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected using optimal filtration criteria, with an error rate of 10.25%. Matching 101 SNPs with known T2D genes and functional modules reveals the relationships between T2D and SNP combinations. The prediction error rates of SNP sets from functional module-based filtration record no significance compared to the prediction error rates of randomly selected SNP sets and T2D causal SNP combinations from optimal filtration. Conclusions We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms.
Collapse
Affiliation(s)
- Chiyong Kang
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| | | | | |
Collapse
|
37
|
Malovini A, Barbarini N, Bellazzi R, de Michelis F. Hierarchical Naive Bayes for genetic association studies. BMC Bioinformatics 2012; 13 Suppl 14:S6. [PMID: 23095471 PMCID: PMC3439732 DOI: 10.1186/1471-2105-13-s14-s6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Background Genome Wide Association Studies represent powerful approaches that aim at disentangling the genetic and molecular mechanisms underlying complex traits. The usual "one-SNP-at-the-time" testing strategy cannot capture the multi-factorial nature of this kind of disorders. We propose a Hierarchical Naïve Bayes classification model for taking into account associations in SNPs data characterized by Linkage Disequilibrium. Validation shows that our model reaches classification performances superior to those obtained by the standard Naïve Bayes classifier for simulated and real datasets. Methods In the Hierarchical Naïve Bayes implemented, the SNPs mapping to the same region of Linkage Disequilibrium are considered as "details" or "replicates" of the locus, each contributing to the overall effect of the region on the phenotype. A latent variable for each block, which models the "population" of correlated SNPs, can be then used to summarize the available information. The classification is thus performed relying on the latent variables conditional probability distributions and on the SNPs data available. Results The developed methodology has been tested on simulated datasets, each composed by 300 cases, 300 controls and a variable number of SNPs. Our approach has been also applied to two real datasets on the genetic bases of Type 1 Diabetes and Type 2 Diabetes generated by the Wellcome Trust Case Control Consortium. Conclusions The approach proposed in this paper, called Hierarchical Naïve Bayes, allows dealing with classification of examples for which genetic information of structurally correlated SNPs are available. It improves the Naïve Bayes performances by properly handling the within-loci variability.
Collapse
Affiliation(s)
- Alberto Malovini
- Department of Industrial and Information Engineering, University of Pavia, Pavia, 27100, Italy.
| | | | | | | |
Collapse
|
38
|
Lee H, Wang B, Wu X, Zhang H, Xu F. Decision tree classifier makes genotyping more intuitive and more efficient. TISSUE ANTIGENS 2012; 80:188-190. [PMID: 22708606 DOI: 10.1111/j.1399-0039.2012.01901.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Revised: 04/28/2012] [Accepted: 05/11/2012] [Indexed: 06/01/2023]
|
39
|
Pahikkala T, Okser S, Airola A, Salakoski T, Aittokallio T. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms Mol Biol 2012; 7:11. [PMID: 22551170 PMCID: PMC3606421 DOI: 10.1186/1748-7188-7-11] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 04/23/2012] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. RESULTS We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension - UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS. CONCLUSIONS Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/.
Collapse
Affiliation(s)
- Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Tero Aittokallio
- Turku Centre for Computer Science, Turku, Finland
- Department of Mathematics, University of Turku, Turku, Finland
- Data Mining and Modeling group, Turku Centre for Biotechnology, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| |
Collapse
|
40
|
Abstract
To enhance glucose uptake into muscle and fat cells, insulin stimulates the translocation of GLUT4 glucose transporters from intracellular membranes to the cell surface. This response requires the intersection of insulin signaling and vesicle trafficking pathways, and it is compromised in the setting of overnutrition to cause insulin resistance. Insulin signals through AS160/Tbc1D4 and Tbc1D1 to modulate Rab GTPases and through the Rho GTPase TC10α to act on other targets. In unstimulated cells, GLUT4 is incorporated into specialized storage vesicles containing IRAP, LRP1, sortilin, and VAMP2, which are sequestered by TUG, Ubc9, and other proteins. Insulin mobilizes these vesicles directly to the plasma membrane, and it modulates the trafficking itinerary so that cargo recycles from endosomes during ongoing insulin exposure. Knowledge of how signaling and trafficking pathways are coordinated will be essential to understanding the pathogenesis of diabetes and the metabolic syndrome and may also inform a wide range of other physiologies.
Collapse
Affiliation(s)
- Jonathan S Bogan
- Section of Endocrinology and Metabolism, Department of Internal Medicine, Yale University School of Medicine, New Haven, Connecticut 06520-8020, USA.
| |
Collapse
|
41
|
Clarke D, Bhardwaj N, Gerstein MB. Novel insights through the integration of structural and functional genomics data with protein networks. J Struct Biol 2012; 179:320-6. [PMID: 22343087 DOI: 10.1016/j.jsb.2012.02.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Revised: 02/02/2012] [Accepted: 02/02/2012] [Indexed: 12/13/2022]
Abstract
In recent years, major advances in genomics, proteomics, macromolecular structure determination, and the computational resources capable of processing and disseminating the large volumes of data generated by each have played major roles in advancing a more systems-oriented appreciation of biological organization. One product of systems biology has been the delineation of graph models for describing genome-wide protein-protein interaction networks. The network organization and topology which emerges in such models may be used to address fundamental questions in an array of cellular processes, as well as biological features intrinsic to the constituent proteins (or "nodes") themselves. However, graph models alone constitute an abstraction which neglects the underlying biological and physical reality that the network's nodes and edges are highly heterogeneous entities. Here, we explore some of the advantages of introducing a protein structural dimension to such models, as the marriage of conventional network representations with macromolecular structural data helps to place static node and edge constructs in a biologically more meaningful context. We emphasize that 3D protein structures constitute a valuable conceptual and predictive framework by discussing examples of the insights provided, such as enabling in silico predictions of protein-protein interactions, providing rational and compelling classification schemes for network elements, as well as revealing interesting intrinsic differences between distinct node types, such as disorder and evolutionary features, which may then be rationalized in light of their respective functions within networks.
Collapse
Affiliation(s)
- Declan Clarke
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | | | | |
Collapse
|
42
|
Tokuda Y, Yagi T, Yoshii K, Ikeda Y, Fuwa M, Ueno M, Nakano M, Omi N, Tanaka M, Mori K, Kageyama M, Nagasaki I, Yagi K, Kinoshita S, Tashiro K. An approach to predict the risk of glaucoma development by integrating different attribute data. SPRINGERPLUS 2012; 1:41. [PMID: 23961367 PMCID: PMC3725912 DOI: 10.1186/2193-1801-1-41] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Accepted: 10/15/2012] [Indexed: 11/10/2022]
Abstract
Primary open-angle glaucoma (POAG) is one of the major causes of blindness worldwide and considered to be influenced by inherited and environmental factors. Recently, we demonstrated a genome-wide association study for the susceptibility to POAG by comparing patients and controls. In addition, the serum cytokine levels, which are affected by environmental and postnatal factors, could be also obtained in patients as well as in controls, simultaneously. Here, in order to predict the effective diagnosis of POAG, we developed an “integration approach” using different attribute data which were integrated simply with several machine learning methods and random sampling. Two data sets were prepared for this study. The one is the “training data set”, which consisted of 42 POAG and 42 controls. The other is the “test data set” consisted of 73 POAG and 52 controls. We first examined for genotype and cytokine data using the training data set with general machine learning methods. After the integration approach was applied, we obtained the stable accuracy, using the support vector machine method with the radial basis function. Although our approach was based on well-known machine learning methods and a simple process, we demonstrated that the integration with two kinds of attributes, genotype and cytokines, was effective and helpful in diagnostic prediction of POAG.
Collapse
Affiliation(s)
- Yuichi Tokuda
- Department of Genomic Medical Sciences, Kyoto Prefectural University of Medicine, Kajiicho 465, Kawaramachi-Hirokoji, Kamigyo-ku, Kyoto, 602-8566 Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Cosgun E, Limdi NA, Duarte CW. High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans. ACTA ACUST UNITED AC 2011; 27:1384-9. [PMID: 21450715 DOI: 10.1093/bioinformatics/btr159] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION With complex traits and diseases having potential genetic contributions of thousands of genetic factors, and with current genotyping arrays consisting of millions of single nucleotide polymorphisms (SNPs), powerful high-dimensional statistical techniques are needed to comprehensively model the genetic variance. Machine learning techniques have many advantages including lack of parametric assumptions, and high power and flexibility. RESULTS We have applied three machine learning approaches: Random Forest Regression (RFR), Boosted Regression Tree (BRT) and Support Vector Regression (SVR) to the prediction of warfarin maintenance dose in a cohort of African Americans. We have developed a multi-step approach that selects SNPs, builds prediction models with different subsets of selected SNPs along with known associated genetic and environmental variables and tests the discovered models in a cross-validation framework. Preliminary results indicate that our modeling approach gives much higher accuracy than previous models for warfarin dose prediction. A model size of 200 SNPs (in addition to the known genetic and environmental variables) gives the best accuracy. The R(2) between the predicted and actual square root of warfarin dose in this model was on average 66.4% for RFR, 57.8% for SVR and 56.9% for BRT. Thus RFR had the best accuracy, but all three techniques achieved better performance than the current published R(2) of 43% in a sample of mixed ethnicity, and 27% in an African American sample. In summary, machine learning approaches for high-dimensional pharmacogenetic prediction, and for prediction of clinical continuous traits of interest, hold great promise and warrant further research.
Collapse
Affiliation(s)
- Erdal Cosgun
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | | | | |
Collapse
|
44
|
Roshan U, Chikkagoudar S, Wei Z, Wang K, Hakonarson H. Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest. Nucleic Acids Res 2011; 39:e62. [PMID: 21317188 PMCID: PMC3089490 DOI: 10.1093/nar/gkr064] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu.
Collapse
Affiliation(s)
- Usman Roshan
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA.
| | | | | | | | | |
Collapse
|
45
|
Machine learning techniques for single nucleotide polymorphism--disease classification models in schizophrenia. Molecules 2010; 15:4875-89. [PMID: 20657396 PMCID: PMC6257637 DOI: 10.3390/molecules15074875] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Revised: 07/08/2010] [Accepted: 07/09/2010] [Indexed: 11/16/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) can be used as inputs in disease computational studies such as pattern searching and classification models. Schizophrenia is an example of a complex disease with an important social impact. The multiple causes of this disease create the need of new genetic or proteomic patterns that can diagnose patients using biological information. This work presents a computational study of disease machine learning classification models using only single nucleotide polymorphisms at the HTR2A and DRD3 genes from Galician (Northwest Spain) schizophrenic patients. These classification models establish for the first time, to the best knowledge of the authors, a relationship between the sequence of the nucleic acid molecule and schizophrenia (Quantitative Genotype – Disease Relationships) that can automatically recognize schizophrenia DNA sequences and correctly classify between 78.3–93.8% of schizophrenia subjects when using datasets which include simulated negative subjects and a linear artificial neural network.
Collapse
|
46
|
Lin HJ, Huang YC, Lin JM, Wu JY, Chen LA, Lin CJ, Tsui YP, Chen CP, Tsai FJ. Single-nucleotide polymorphisms in chromosome 3p14.1- 3p14.2 are associated with susceptibility of type 2 diabetes with cataract. Mol Vis 2010; 16:1206-14. [PMID: 20664687 PMCID: PMC2901187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 06/24/2010] [Indexed: 12/04/2022] Open
Abstract
PURPOSE Type 2 diabetes (T2D) is highly prevalent worldwide and cataracts are of high incidence in T2D patients. In this study, we identify genetic variants that predispose type 2 diabetes (T2D) patients to cataracts in the Han-Chinese residing in Taiwan. METHODS We conducted a genome-wide association study with a total of 1,715 cases and 2,000 random controls. In the haplotype study, we defined haplotype 1 (Ht 1) to haplotype 4 (Ht 4) as the alternative alleles of the DM and cataract related chromosome 3p14.1- 3p14.2 polymorphisms. RESULTS The most significant association was detected with rs11129182, rs17047573, and rs17047586 in chromosome 3p14.1- 3p14.2 (p value=3.52x10(-7), 8.35x10(-8), and 7.65x10(-8), respectively). In genotype analysis, the "CT" genotype of rs11129182, the 'GG' genotype of rs17047573, and the 'GG' genotype of rs17047586 were significantly different in the T2D and cataract groups (OR=3.03, 7.47, and 7.51, individually; 95% confidence index (CI): 1.97-4.65, 3.36-16.6, and 3.38-16.7, individually). In the haplotype study, the distribution of the Ht3 and Ht4 between the DM and cataract group and the control group differed significantly between the two groups (p=0.0004). The odds ratio (OR) of Ht4 was 1.89 and the 95% confidence interval (CI) was 1.36-2.65. CONCLUSIONS The major functions of the genes are voltage-dependent anion-selective channel proteins, long myosin light chain kinase, adenylyl cyclase-associated proteins and retinoic acid receptors and are all closely related with the pathogenesis of T2D and cataractogenesis. This has helped us understand the pathogenesis of T2D patients with cataracts.
Collapse
Affiliation(s)
- Hui-Ju Lin
- Department of Medical Genetics, China Medical University Hospital, Taichung, Taiwan,School of Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan,Department of Ophthalmology, China Medical University Hospital, Taichung, Taiwan
| | - Yu-Chuen Huang
- Department of Medical Genetics, China Medical University Hospital, Taichung, Taiwan,School of Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan
| | - Jane-Ming Lin
- Department of Medical Genetics, China Medical University Hospital, Taichung, Taiwan,School of Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan,Department of Ophthalmology, China Medical University Hospital, Taichung, Taiwan
| | - Jer-Yuarn Wu
- National Genotyping Center, Academia Sinica, Taipei, Taiwan
| | - Liuh-An Chen
- Department of Medical Genetics, China Medical University Hospital, Taichung, Taiwan
| | - Chao-Jen Lin
- Department of Pediatrics, Changhua Christian Hospital, Taiwan
| | - Yung-Ping Tsui
- School of Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan,Department of Ophthalmology, China Medical University Hospital, Taichung, Taiwan
| | - Chih-Ping Chen
- Departments of Obstetrics and Gynecology and Medical Research, Mackay Memorial Hospital, Taipei, Taiwan
| | - Fuu-Jen Tsai
- Department of Medical Genetics, China Medical University Hospital, Taichung, Taiwan,School of Chinese Medicine, College of Chinese Medicine, China Medical University, Taichung, Taiwan
| |
Collapse
|