1
|
Laskar RS, Qu C, Huyghe JR, Harrison T, Hayes RB, Cao Y, Campbell PT, Steinfelder R, Talukdar FR, Brenner H, Ogino S, Brendt S, Bishop DT, Buchanan DD, Chan AT, Cotterchio M, Gruber SB, Gsur A, van Guelpen B, Jenkins MA, Keku TO, Lynch BM, Le Marchand L, Martin RM, McCarthy K, Moreno V, Pearlman R, Song M, Tsilidis KK, Vodička P, Woods MO, Wu K, Hsu L, Gunter MJ, Peters U, Murphy N. Genome-wide association studies and Mendelian randomization analyses provide insights into the causes of early-onset colorectal cancer. Ann Oncol 2024; 35:523-536. [PMID: 38408508 PMCID: PMC11213623 DOI: 10.1016/j.annonc.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/30/2024] [Accepted: 02/20/2024] [Indexed: 02/28/2024] Open
Abstract
BACKGROUND The incidence of early-onset colorectal cancer (EOCRC; diagnosed <50 years of age) is rising globally; however, the causes underlying this trend are largely unknown. CRC has strong genetic and environmental determinants, yet common genetic variants and causal modifiable risk factors underlying EOCRC are unknown. We conducted the first EOCRC-specific genome-wide association study (GWAS) and Mendelian randomization (MR) analyses to explore germline genetic and causal modifiable risk factors associated with EOCRC. PATIENTS AND METHODS We conducted a GWAS meta-analysis of 6176 EOCRC cases and 65 829 controls from the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO), the Colorectal Transdisciplinary Study (CORECT), the Colon Cancer Family Registry (CCFR), and the UK Biobank. We then used the EOCRC GWAS to investigate 28 modifiable risk factors using two-sample MR. RESULTS We found two novel risk loci for EOCRC at 1p34.1 and 4p15.33, which were not previously associated with CRC risk. We identified a deleterious coding variant (rs36053993, G396D) at polyposis-associated DNA repair gene MUTYH (odds ratio 1.80, 95% confidence interval 1.47-2.22) but show that most of the common genetic susceptibility was from noncoding signals enriched in epigenetic markers present in gastrointestinal tract cells. We identified new EOCRC-susceptibility genes, and in addition to pathways such as transforming growth factor (TGF) β, suppressor of Mothers Against Decapentaplegic (SMAD), bone morphogenetic protein (BMP) and phosphatidylinositol kinase (PI3K) signaling, our study highlights a role for insulin signaling and immune/infection-related pathways in EOCRC. In our MR analyses, we found novel evidence of probable causal associations for higher levels of body size and metabolic factors-such as body fat percentage, waist circumference, waist-to-hip ratio, basal metabolic rate, and fasting insulin-higher alcohol drinking, and lower education attainment with increased EOCRC risk. CONCLUSIONS Our novel findings indicate inherited susceptibility to EOCRC and suggest modifiable lifestyle and metabolic targets that could also be used to risk-stratify individuals for personalized screening strategies or other interventions.
Collapse
Affiliation(s)
- R S Laskar
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France; Early Cancer Institute, Department of Oncology, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
| | - C Qu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle
| | - J R Huyghe
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle
| | - T Harrison
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle
| | - R B Hayes
- Division of Epidemiology, Department of Population Health, New York University School of Medicine, New York
| | - Y Cao
- Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine, St Louis; Division of Gastroenterology, Department of Medicine, Washington University School of Medicine, St Louis; Alvin J. Siteman Cancer Center, St Louis
| | - P T Campbell
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, USA
| | - R Steinfelder
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle
| | - F R Talukdar
- Epigenomics and Mechanisms Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - H Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - S Ogino
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston; Program in Molecular Pathological Epidemiology, Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston; Department of Oncologic Pathology, Dana-Farber Cancer Institute, Boston
| | - S Brendt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, USA
| | - D T Bishop
- Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
| | - D D Buchanan
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville; University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Melbourne; Genomic Medicine and Family Cancer Clinic, Royal Melbourne Hospital, Parkville, Australia
| | - A T Chan
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston; Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston; Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, USA
| | - M Cotterchio
- Ontario Health (Cancer Care Ontario), Toronto; Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - S B Gruber
- Department of Medical Oncology & Therapeutics Research, City of Hope National Medical Center, Duarte, USA
| | - A Gsur
- Center for Cancer Research, Medical University of Vienna, Vienna, Austria
| | - B van Guelpen
- Department of Radiation Sciences, Oncology Unit, Umeå University, Umeå; Wallenberg Centre for Molecular Medicine, Umeå University, Umeå, Sweden
| | - M A Jenkins
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia
| | - T O Keku
- Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, USA
| | - B M Lynch
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia; Cancer Epidemiology Division, Cancer Council Victoria, Melbourne; Physical Activity Laboratory, Baker Heart and Diabetes Institute, Melbourne, Australia
| | | | - R M Martin
- Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol; Population Health Sciences, Bristol Medical School, University of Bristol, Bristol; National Institute for Health Research (NIHR) Bristol Biomedical Research Centre, University Hospitals Bristol and Weston NHS Foundation Trust and the University of Bristol, Bristol
| | - K McCarthy
- Department of Colorectal Surgery, North Bristol NHS Trust, Bristol, UK
| | - V Moreno
- Cancer Prevention and Control Program, Catalan Institute of Oncology-IDIBELL, L'Hospitalet de Llobregat, Barcelona; CIBER de Epidemiología y Salud Pública (CIBERESP), Madrid; Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
| | - R Pearlman
- Division of Human Genetics, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus
| | - M Song
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston; Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston; Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, USA; Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, USA
| | - K K Tsilidis
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK; Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece
| | - P Vodička
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, Prague; Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, Prague; Faculty of Medicine and Biomedical Center in Pilsen, Charles University, Pilsen, Czech Republic
| | - M O Woods
- Memorial University of Newfoundland, Discipline of Genetics, St. John's, Canada
| | - K Wu
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, USA
| | - L Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle
| | - M J Gunter
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France; Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
| | - U Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle; Department of Epidemiology, University of Washington, Seattle, USA
| | - N Murphy
- Nutrition and Metabolism Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France.
| |
Collapse
|
2
|
COVID-GWAB: A Web-Based Prediction of COVID-19 Host Genes via Network Boosting of Genome-Wide Association Data. Biomolecules 2022; 12:biom12101446. [PMID: 36291657 PMCID: PMC9599684 DOI: 10.3390/biom12101446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/01/2022] [Accepted: 10/02/2022] [Indexed: 11/17/2022] Open
Abstract
Host genetics affect both the susceptibility and response to viral infection. Searching for host genes that contribute to COVID-19, the Host Genetics Initiative (HGI) was formed to investigate the genetic factors involved in COVID-19 via genome-wide association studies (GWAS). The GWAS suffer from limited statistical power and in general, only a few genes can pass the conventional significance thresholds. This statistical limitation may be overcome by boosting weak association signals through integrating independent functional information such as molecular interactions. Additionally, the boosted results can be evaluated by various independent data for further connections to COVID-19. We present COVID-GWAB, a web-based tool to boost original GWAS signals from COVID-19 patients by taking the signals of the interactome neighbors. COVID-GWAB takes summary statistics from the COVID-19 HGI or user input data and reprioritizes candidate host genes for COVID-19 using HumanNet, a co-functional human gene network. The current version of COVID-GWAB provides the pre-processed data of releases 5, 6, and 7 of the HGI. Additionally, COVID-GWAB provides web interfaces for a summary of augmented GWAS signals, prediction evaluations by appearance frequency in COVID-19 literature, single-cell transcriptome data, and associated pathways. The web server also enables browsing the candidate gene networks.
Collapse
|
3
|
Rivero-García I, Castresana-Aguirre M, Guglielmo L, Guala D, Sonnhammer ELL. Drug repurposing improves disease targeting 11-fold and can be augmented by network module targeting, applied to COVID-19. Sci Rep 2021; 11:20687. [PMID: 34667255 PMCID: PMC8526804 DOI: 10.1038/s41598-021-99721-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 09/30/2021] [Indexed: 12/14/2022] Open
Abstract
This analysis presents a systematic evaluation of the extent of therapeutic opportunities that can be obtained from drug repurposing by connecting drug targets with disease genes. When using FDA-approved indications as a reference level we found that drug repurposing can offer an average of an 11-fold increase in disease coverage, with the maximum number of diseases covered per drug being increased from 134 to 167 after extending the drug targets with their high confidence first neighbors. Additionally, by network analysis to connect drugs to disease modules we found that drugs on average target 4 disease modules, yet the similarity between disease modules targeted by the same drug is generally low and the maximum number of disease modules targeted per drug increases from 158 to 229 when drug targets are neighbor-extended. Moreover, our results highlight that drug repurposing is more dependent on target proteins being shared between diseases than on polypharmacological properties of drugs. We apply our drug repurposing and network module analysis to COVID-19 and show that Fostamatinib is the drug with the highest module coverage.
Collapse
Affiliation(s)
- Inés Rivero-García
- grid.10548.380000 0004 1936 9377Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Miguel Castresana-Aguirre
- grid.10548.380000 0004 1936 9377Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Luca Guglielmo
- grid.10548.380000 0004 1936 9377Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Dimitri Guala
- grid.10548.380000 0004 1936 9377Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Erik L. L. Sonnhammer
- grid.10548.380000 0004 1936 9377Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
4
|
MacNamara A, Nakic N, Amin Al Olama A, Guo C, Sieber KB, Hurle MR, Gutteridge A. Network and pathway expansion of genetic disease associations identifies successful drug targets. Sci Rep 2020; 10:20970. [PMID: 33262371 PMCID: PMC7708424 DOI: 10.1038/s41598-020-77847-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 11/06/2020] [Indexed: 11/24/2022] Open
Abstract
Genetic evidence of disease association has often been used as a basis for selecting of drug targets for complex common diseases. Likewise, the propagation of genetic evidence through gene or protein interaction networks has been shown to accurately infer novel disease associations at genes for which no direct genetic evidence can be observed. However, an empirical test of the utility of combining these approaches for drug discovery has been lacking. In this study, we examine genetic associations arising from an analysis of 648 UK Biobank GWAS and evaluate whether targets identified as proxies of direct genetic hits are enriched for successful drug targets, as measured by historical clinical trial data. We find that protein networks formed from specific functional linkages such as protein complexes and ligand–receptor pairs are suitable for even naïve guilt-by-association network propagation approaches. In addition, more sophisticated approaches applied to global protein–protein interaction networks and pathway databases, also successfully retrieve targets enriched for clinically successful drug targets. We conclude that network propagation of genetic evidence can be used for drug target identification.
Collapse
Affiliation(s)
| | | | | | - Cong Guo
- Human Genetics, GSK, Collegeville, PA, USA
| | | | | | | |
Collapse
|
5
|
Ma X, Wang P, Xu G, Yu F, Ma Y. Integrative genomics analysis of various omics data and networks identify risk genes and variants vulnerable to childhood-onset asthma. BMC Med Genomics 2020; 13:123. [PMID: 32867763 PMCID: PMC7457797 DOI: 10.1186/s12920-020-00768-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 08/17/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Childhood-onset asthma is highly affected by genetic components. In recent years, many genome-wide association studies (GWAS) have reported a large group of genetic variants and susceptible genes associated with asthma-related phenotypes including childhood-onset asthma. However, the regulatory mechanisms of these genetic variants for childhood-onset asthma susceptibility remain largely unknown. METHODS In the current investigation, we conducted a two-stage designed Sherlock-based integrative genomics analysis to explore the cis- and/or trans-regulatory effects of genome-wide SNPs on gene expression as well as childhood-onset asthma risk through incorporating a large-scale GWAS data (N = 314,633) and two independent expression quantitative trait loci (eQTL) datasets (N = 1890). Furthermore, we applied various bioinformatics analyses, including MAGMA gene-based analysis, pathway enrichment analysis, drug/disease-based enrichment analysis, computer-based permutation analysis, PPI network analysis, gene co-expression analysis and differential gene expression analysis, to prioritize susceptible genes associated with childhood-onset asthma. RESULTS Based on comprehensive genomics analyses, we found 31 genes with multiple eSNPs to be convincing candidates for childhood-onset asthma risk; such as, PSMB9 (cis-rs4148882 and cis-rs2071534) and TAP2 (cis-rs9267798, cis-rs4148882, cis-rs241456, and trans-10,447,456). These 31 genes were functionally interacted with each other in our PPI network analysis. Our pathway enrichment analysis showed that numerous KEGG pathways including antigen processing and presentation, type I diabetes mellitus, and asthma were significantly enriched to involve in childhood-onset asthma risk. The co-expression patterns among 31 genes were remarkably altered according to asthma status, and 25 of 31 genes (25/31 = 80.65%) showed significantly or suggestively differential expression between asthma group and control group. CONCLUSIONS We provide strong evidence to highlight 31 candidate genes for childhood-onset asthma risk, and offer a new insight into the genetic pathogenesis of childhood-onset asthma.
Collapse
Affiliation(s)
- Xiuqing Ma
- Department of Pulmonary & Critical Care Medicine, Chinese PLA General Hospital, Beijing, 100853 China
| | - Peilan Wang
- Outpatient Department, Chinese PLA General Hospital, Beijing, 100853 China
| | - Guobing Xu
- Department of Cardiovascular Medicine, Zhongxiang People’s Hospital, Zhongxiang, 431900 Hubei Province China
| | - Fang Yu
- Department of Pediatrics, Chinese PLA General Hospital, Beijing, 100853 China
| | - Yunlong Ma
- Institute of Biomedical Big Data, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, 325027 P. R. China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
6
|
Ratnakumar A, Weinhold N, Mar JC, Riaz N. Protein-Protein interactions uncover candidate 'core genes' within omnigenic disease networks. PLoS Genet 2020; 16:e1008903. [PMID: 32678846 PMCID: PMC7390454 DOI: 10.1371/journal.pgen.1008903] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 07/29/2020] [Accepted: 06/01/2020] [Indexed: 01/09/2023] Open
Abstract
Genome wide association studies (GWAS) of human diseases have generally identified many loci associated with risk with relatively small effect sizes. The omnigenic model attempts to explain this observation by suggesting that diseases can be thought of as networks, where genes with direct involvement in disease-relevant biological pathways are named ‘core genes’, while peripheral genes influence disease risk via their interactions or regulatory effects on core genes. Here, we demonstrate a method for identifying candidate core genes solely from genes in or near disease-associated SNPs (GWAS hits) in conjunction with protein-protein interaction network data. Applied to 1,381 GWAS studies from 5 ancestries, we identify a total of 1,865 candidate core genes in 343 GWAS studies. Our analysis identifies several well-known disease-related genes that are not identified by GWAS, including BRCA1 in Breast Cancer, Amyloid Precursor Protein (APP) in Alzheimer’s Disease, INS in A1C measurement and Type 2 Diabetes, and PCSK9 in LDL cholesterol, amongst others. Notably candidate core genes are preferentially enriched for disease relevance over GWAS hits and are enriched for both Clinvar pathogenic variants and known drug targets—consistent with the predictions of the omnigenic model. We subsequently use parent term annotations provided by the GWAS catalog, to merge related GWAS studies and identify candidate core genes in over-arching disease processes such as cancer–where we identify 109 candidate core genes. A recent theory suggests that only a small number of genes underpin the biology of a disease, these genes are called ‘core genes’, and for most diseases, these core genes remain unknown. The suggested methods for finding them requires complex and expensive experiments. We reasoned that if we merge currently available datasets in smart ways, we may be able to uncover these ‘core genes’. Our method finds “hub” proteins by merging lists of genes previously linked with disease to information on how proteins interact with each other. We found that many of these hub proteins have central roles in disease, such as insulin for both A1C measurement and Type 2 Diabetes, BRCA1 in Breast cancer, and Amyloid Precursor Protein in Alzheimer’s Disease. We think these ‘hub’ proteins are candidate ‘core genes’, and offer our method as a way to find ‘core genes’ by utilizing publicly available reference datasets.
Collapse
Affiliation(s)
- Abhirami Ratnakumar
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- * E-mail:
| | - Nils Weinhold
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
| | - Jessica C. Mar
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Australia
| | - Nadeem Riaz
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
| |
Collapse
|
7
|
Sarkar D, Maranas CD. SNPeffect: identifying functional roles of SNPs using metabolic networks. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:512-531. [PMID: 32167625 PMCID: PMC9328443 DOI: 10.1111/tpj.14746] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 02/20/2020] [Indexed: 05/04/2023]
Abstract
Genetic sources of phenotypic variation have been a focus of plant studies aimed at improving agricultural yield and understanding adaptive processes. Genome-wide association studies identify the genetic background behind a trait by examining associations between phenotypes and single-nucleotide polymorphisms (SNPs). Although such studies are common, biological interpretation of the results remains a challenge; especially due to the confounding nature of population structure and the systematic biases thus introduced. Here, we propose a complementary analysis (SNPeffect) that offers putative genotype-to-phenotype mechanistic interpretations by integrating biochemical knowledge encoded in metabolic models. SNPeffect is used to explain differential growth rate and metabolite accumulation in A. thaliana and P. trichocarpa accessions as the outcome of SNPs in enzyme-coding genes. To this end, we also constructed a genome-scale metabolic model for Populus trichocarpa, the first for a perennial woody tree. As expected, our results indicate that growth is a complex polygenic trait governed by carbon and energy partitioning. The predicted set of functional SNPs in both species are associated with experimentally characterized growth-determining genes and also suggest putative ones. Functional SNPs were found in pathways such as amino acid metabolism, nucleotide biosynthesis, and cellulose and lignin biosynthesis, in line with breeding strategies that target pathways governing carbon and energy partition.
Collapse
Affiliation(s)
- Debolina Sarkar
- Department of Chemical EngineeringPennsylvania State UniversityUniversity ParkPAUSA
| | - Costas D. Maranas
- Department of Chemical EngineeringPennsylvania State UniversityUniversity ParkPAUSA
| |
Collapse
|
8
|
Leal LG, David A, Jarvelin MR, Sebert S, Männikkö M, Karhunen V, Seaby E, Hoggart C, Sternberg MJE. Identification of disease-associated loci using machine learning for genotype and network data integration. Bioinformatics 2020; 35:5182-5190. [PMID: 31070705 PMCID: PMC6954643 DOI: 10.1093/bioinformatics/btz310] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 03/28/2019] [Accepted: 04/25/2019] [Indexed: 01/19/2023] Open
Abstract
Motivation Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. Results We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. Availability and implementation An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luis G Leal
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Alessia David
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Marjo-Riita Jarvelin
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.,Biocenter Oulu, University of Oulu, Oulu 90220, Finland.,Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland.,Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.,Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK
| | - Sylvain Sebert
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.,Biocenter Oulu, University of Oulu, Oulu 90220, Finland
| | - Minna Männikkö
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland
| | - Ville Karhunen
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.,Biocenter Oulu, University of Oulu, Oulu 90220, Finland.,Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland.,Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.,Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK
| | - Eleanor Seaby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Clive Hoggart
- Department of Medicine, Imperial College London, London W2 1PG, UK
| | - Michael J E Sternberg
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
9
|
Genome‑wide analysis of DNA methylation and gene expression changes in an ovalbumin‑induced asthma mouse model. Mol Med Rep 2020; 22:1709-1716. [PMID: 32705270 PMCID: PMC7411290 DOI: 10.3892/mmr.2020.11245] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 02/04/2020] [Indexed: 12/12/2022] Open
Abstract
The aim of the present study was to establish an integrated network of DNA methylation and RNA expression in an ovalbumin (OVA)-induced asthma model, and to investigate the epigenetically-regulated genes involved in asthma development. Genome-wide CpG-DNA methylation profiling was conducted through the use of a methylated DNA immunoprecipitation microarray and RNA sequencing was performed using three lung samples from mice with OVA-induced asthma. A total of 35,401 differentially methylated regions (DMRs) were identified between mice with OVA-induced asthma and control mice. Of these, 3,060 were located in promoter regions and 370 of the genes containing these DMRs demonstrated an inverse correlation between methylation and gene expression. Kyoto Encyclopedia of Genes and Genomes pathway analysis identified that 368 genes were upregulated or downregulated in OVA-induced asthma samples, including genes involved in ‘chemokine signalling pathway’, ‘focal adhesion’, ‘leukocyte transendothelial migration’ and ‘vascular smooth muscle contraction signaling’ pathways. Integrated network analysis identified four hub genes, consisting of three upregulated genes [forkhead box O1 (FOXO1), SP1 transcription factor (SP1) and amyloid β precursor protein (APP)], and one downregulated gene [RUNX family transcription factor 1 (RUNX1)], all of which demonstrated an association between DNA methylation and gene expression. These genes were highly interconnected nodes in the Ingenuity Pathway Analysis module and were functionally significant. A total of four interconnected hub genes, FOXO1, RUNX1, SP1 and APP, were identified from the integrated DNA methylation and gene expression networks involved in asthma development. These results suggested that modulating these four genes could effectively control the development of asthma.
Collapse
|
10
|
Gazouli M, Dovrolis N, Franke A, Spyrou GM, Sechi LA, Kolios G. Differential genetic and functional background in inflammatory bowel disease phenotypes of a Greek population: a systems bioinformatics approach. Gut Pathog 2019; 11:31. [PMID: 31249629 PMCID: PMC6570833 DOI: 10.1186/s13099-019-0312-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 05/30/2019] [Indexed: 12/13/2022] Open
Abstract
Background Crohn’s disease (CD) and Ulcerative colitis (UC) are the two main entities of inflammatory bowel disease (IBD). Previous works have identified more than 200 risk factors (including loci and signaling pathways) in populations of predominantly European ancestry. Our study was conducted on an extended population-specific cohort of 573 Greek IBD patients (364 CD and 209 UC) and 445 controls. Aims To highlight the different genetic and functional background of IBD and its phenotypes, utilizing contemporary systems bioinformatics methodologies. Methods Disease-associated SNPs, obtained via our own 89 loci IBD risk GWAS panel, were detected with the whole genome association analysis toolset PLINK. These SNPs were used as input for 2 novel and different pathway analysis methods to detect functional interactions. Specifically, PathwayConnector was used to create complementary networks of interacting pathways whereas; the online database of protein interactions STRING provided protein–protein association networks and their derived pathways. Network analyses metrics were employed to identify proteins with high significance and subsequently to rank the signaling pathways those participate in. Results The reported complementary pathway and enriched protein–protein association networks reveal several novel and well-known key players, in the functional background of IBD like Toll-like receptor, TNF, Jak-STAT, PI3K-Akt, T cell receptor, Apoptosis, MAPK and B cell receptor signaling pathways. IBD subphenotypes are found to have distinct genetic and functional profiles which can contribute to their accurate identification and classification. As a secondary result we identify an extended network of diseases with common molecular background to IBD. Conclusions IBD’s burden on the quality of life of patients and intricate functional background presents us constantly with new challenges. Our data and methodology provide researchers with new insights to a specific population, but also, to possible differentiation markers of disease classification and progression. This work, not only provides new insights into the interplay among IBD risk variants and their related signaling pathways, elucidates the mechanisms underlying IBD and its clinical sequelae, but also, introduces a generalized bioinformatics-based methodology which can be applied to studies of different disorders. Electronic supplementary material The online version of this article (10.1186/s13099-019-0312-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maria Gazouli
- 1Laboratory of Biology, Medical School, National and Kapodistrian University of Athens, Michalakopoulou 176, 11527 Athens, Greece
| | - Nikolas Dovrolis
- 2Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, Xanthi, Greece
| | - Andre Franke
- 3Institute of Clinical Molecular Biology, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - George M Spyrou
- 4Bioinformatics ERA Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Leonardo A Sechi
- 5Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - George Kolios
- 2Laboratory of Pharmacology, Department of Medicine, Democritus University of Thrace, Xanthi, Greece
| |
Collapse
|
11
|
Yao V, Kaletsky R, Keyes W, Mor DE, Wong AK, Sohrabi S, Murphy CT, Troyanskaya OG. An integrative tissue-network approach to identify and test human disease genes. Nat Biotechnol 2018; 36:nbt.4246. [PMID: 30346941 PMCID: PMC7021177 DOI: 10.1038/nbt.4246] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 08/08/2018] [Indexed: 01/09/2023]
Abstract
Effective discovery of causal disease genes must overcome the statistical challenges of quantitative genetics studies and the practical limitations of human biology experiments. Here we developed diseaseQUEST, an integrative approach that combines data from human genome-wide disease studies with in silico network models of tissue- and cell-type-specific function in model organisms to prioritize candidates within functionally conserved processes and pathways. We used diseaseQUEST to predict candidate genes for 25 different diseases and traits, including cancer, longevity, and neurodegenerative diseases. Focusing on Parkinson's disease (PD), a diseaseQUEST-directed Caenhorhabditis elegans behavioral screen identified several candidate genes, which we experimentally verified and found to be associated with age-dependent motility defects mirroring PD clinical symptoms. Furthermore, knockdown of the top candidate gene, bcat-1, encoding a branched chain amino acid transferase, caused spasm-like 'curling' and neurodegeneration in C. elegans, paralleling decreased BCAT1 expression in PD patient brains. diseaseQUEST is modular and generalizable to other model organisms and human diseases of interest.
Collapse
Affiliation(s)
- Victoria Yao
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
| | - Rachel Kaletsky
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
- Department of Molecular Biology, Princeton University, Princeton, New Jersey, USA
| | - William Keyes
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
- Department of Molecular Biology, Princeton University, Princeton, New Jersey, USA
| | - Danielle E Mor
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
- Department of Molecular Biology, Princeton University, Princeton, New Jersey, USA
| | - Aaron K Wong
- Flatiron Institute, Simons Foundation, New York, New York, USA
| | - Salman Sohrabi
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
- Department of Molecular Biology, Princeton University, Princeton, New Jersey, USA
| | - Coleen T Murphy
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
- Department of Molecular Biology, Princeton University, Princeton, New Jersey, USA
| | - Olga G Troyanskaya
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
- Flatiron Institute, Simons Foundation, New York, New York, USA
| |
Collapse
|
12
|
Enabling Precision Medicine through Integrative Network Models. J Mol Biol 2018; 430:2913-2923. [DOI: 10.1016/j.jmb.2018.07.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2018] [Revised: 06/15/2018] [Accepted: 07/03/2018] [Indexed: 11/17/2022]
|
13
|
Wang C, Li H, Cao L, Wang G. Identification of differentially expressed genes associated with asthma in children based on the bioanalysis of the regulatory network. Mol Med Rep 2018; 18:2153-2163. [PMID: 29956778 PMCID: PMC6072229 DOI: 10.3892/mmr.2018.9205] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2017] [Accepted: 05/18/2018] [Indexed: 12/23/2022] Open
Abstract
Asthma, the most common chronic respiratory tract disease in children, is characterized by allergy, recurring airway obstruction and bronchospasm. The aim of the present study was to screen critical differentially expressed genes (DEGs) involved in asthma in children. Gene expression in different tissues was compared between asthmatic children and healthy control subjects in order to identify DEGs associated with asthma. Protein-protein interaction (PPI) networks were constructed for the DEGs and weighted gene co-expression network analysis methods were used to further determine the functional modules associated with DEGs in different tissue samples. In addition, the gene co-expression network was constructed. Gene Ontology function analysis and pathway analysis were conducted to identify critical DEGs. The results identified numerous DEGs from the different tissue samples, including 1,662 DEGs from nasal-epithelium tissue samples, 572 DEGs from peripheral blood (PB) samples and 146 DEGs from PB mononuclear cells samples. In the PPI network, F-box only protein 6 (FBXO6), histone deacetylase 1 (HDAC1) and amyloid β precursor protein (APP) were hub genes and served an important role in the process of asthma. In addition, proliferating cell nuclear antigen (PCNA), integrin α-4 (ITGA4), catenin α-1 (CTNNA1), nuclear factor-κB1 (NF-κB1) and mechanistic target of rapamycin (MTOR) may be critical DEGs involved in the progression of asthma in children. These results suggested that FBXO6, HDAC1 and APP may interact with PCNA, ITGA4, CTNNA1, NF-κB1 and mTOR in the progression of asthma in children.
Collapse
Affiliation(s)
- Chunyan Wang
- Department of Pediatrics, Shanghai Fengxian Fengcheng Hospital, Shanghai 200000, P.R. China
| | - Hengtao Li
- Department of Pediatrics, Shanghai Fengxian Fengcheng Hospital, Shanghai 200000, P.R. China
| | - Lanfang Cao
- Department of Pediatrics, Ren Ji Hospital Affiliated to Shanghai Jiaotong University School of Medicine, Shanghai 200000, P.R. China
| | - Genzai Wang
- Department of Pediatrics, Shanghai Fengxian Fengcheng Hospital, Shanghai 200000, P.R. China
| |
Collapse
|