1
|
Frederiksen SD, Avramović V, Maroilley T, Lehman A, Arbour L, Tarailo-Graovac M. Rare disorders have many faces: in silico characterization of rare disorder spectrum. Orphanet J Rare Dis 2022; 17:76. [PMID: 35193637 PMCID: PMC8864832 DOI: 10.1186/s13023-022-02217-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 02/06/2022] [Indexed: 11/18/2022] Open
Abstract
Background The diagnostic journey for many rare disease patients remains challenging despite use of latest genetic technological advancements. We hypothesize that some patients remain undiagnosed due to more complex diagnostic scenarios that are currently not considered in genome analysis pipelines. To better understand this, we characterized the rare disorder (RD) spectrum using various bioinformatics resources (e.g., Orphanet/Orphadata, Human Phenotype Ontology, Reactome pathways) combined with custom-made R scripts. Results Our in silico characterization led to identification of 145 borderline-common, 412 rare and 2967 ultra-rare disorders. Based on these findings and point prevalence, we would expect that approximately 6.53%, 0.34%, and 0.30% of individuals in a randomly selected population have a borderline-common, rare, and ultra-rare disorder, respectively (equaling to 1 RD patient in 14 people). Importantly, our analyses revealed that (1) a higher proportion of borderline-common disorders were caused by multiple gene defects and/or other factors compared with the rare and ultra-rare disorders, (2) the phenotypic expressivity was more variable for the borderline-common disorders than for the rarer disorders, and (3) unique clinical characteristics were observed across the disorder categories forming the spectrum. Conclusions Recognizing that RD patients who remain unsolved even after genome sequencing might belong to the more common end of the RD spectrum support the usage of computational pipelines that account for more complex genetic and phenotypic scenarios. Supplementary Information The online version contains supplementary material available at 10.1186/s13023-022-02217-9.
Collapse
Affiliation(s)
- Simona D Frederiksen
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Vladimir Avramović
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Tatiana Maroilley
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Anna Lehman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z2, Canada
| | - Laura Arbour
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z2, Canada
| | - Maja Tarailo-Graovac
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada. .,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada.
| |
Collapse
|
2
|
Babbi G, Martelli PL, Casadio R. PhenPath: a tool for characterizing biological functions underlying different phenotypes. BMC Genomics 2019; 20:548. [PMID: 31307376 PMCID: PMC6631446 DOI: 10.1186/s12864-019-5868-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Background Many diseases are associated with complex patterns of symptoms and phenotypic manifestations. Parsimonious explanations aim at reconciling the multiplicity of phenotypic traits with the perturbation of one or few biological functions. For this, it is necessary to characterize human phenotypes at the molecular and functional levels, by exploiting gene annotations and known relations among genes, diseases and phenotypes. This characterization makes it possible to implement tools for retrieving functions shared among phenotypes, co-occurring in the same patient and facilitating the formulation of hypotheses about the molecular causes of the disease. Results We introduce PhenPath, a new resource consisting of two parts: PhenPathDB and PhenPathTOOL. The former is a database collecting the human genes associated with the phenotypes described in Human Phenotype Ontology (HPO) and OMIM Clinical Synopses. Phenotypes are then associated with biological functions and pathways by means of NET-GE, a network-based method for functional enrichment of sets of genes. The present version considers only phenotypes related to diseases. PhenPathDB collects information for 18 OMIM Clinical synopses and 7137 HPO phenotypes, related to 4292 diseases and 3446 genes. Enrichment of Gene Ontology annotations endows some 87.7, 86.9 and 73.6% of HPO phenotypes with Biological Process, Molecular Function and Cellular Component terms, respectively. Furthermore, 58.8 and 77.8% of HPO phenotypes are also enriched for KEGG and Reactome pathways, respectively. Based on PhenPathDB, PhenPathTOOL analyzes user-defined sets of phenotypes retrieving diseases, genes and functional terms which they share. This information can provide clues for interpreting the co-occurrence of phenotypes in a patient. Conclusions The resource allows finding molecular features useful to investigate diseases characterized by multiple phenotypes, and by this, it can help researchers and physicians in identifying molecular mechanisms and biological functions underlying the concomitant manifestation of phenotypes. The resource is freely available at http://phenpath.biocomp.unibo.it. Electronic supplementary material The online version of this article (10.1186/s12864-019-5868-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Giulia Babbi
- University of Bologna, FABIT, Via San Donato 15, 40126, Bologna, Italy.,Department of BIGEA, University of Bologna, Piazza di Porta S. Donato, 1, 40126, Bologna, Italy
| | - Pier Luigi Martelli
- University of Bologna, FABIT, Via San Donato 15, 40126, Bologna, Italy. .,Interdepartmental Center "Luigi Galvani" for integrated studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, CIG, Via G. Petroni 26, 40126, Bologna, Italy.
| | - Rita Casadio
- University of Bologna, FABIT, Via San Donato 15, 40126, Bologna, Italy.,Interdepartmental Center "Luigi Galvani" for integrated studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, CIG, Via G. Petroni 26, 40126, Bologna, Italy.,CNR, Institute of Biomembrane and Bioenergetics (IBIOM), Via Giovanni Amendola 165/A, 70126, Bari, Italy
| |
Collapse
|
3
|
Jhamb D, Magid-Slav M, Hurle MR, Agarwal P. Pathway analysis of GWAS loci identifies novel drug targets and repurposing opportunities. Drug Discov Today 2019; 24:1232-1236. [PMID: 30935985 DOI: 10.1016/j.drudis.2019.03.024] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 02/09/2019] [Accepted: 03/26/2019] [Indexed: 12/29/2022]
Abstract
Genome-wide association studies (GWAS) have made considerable progress and there is emerging evidence that genetics-based targets can lead to 28% more launched drugs. We analyzed 1589 GWAS across 1456 pathways to translate these often imprecise genetic loci into therapeutic hypotheses for 182 diseases. These pathway-based genetic targets were validated by testing whether current drug targets were enriched in the pathway space for the same indication. Remarkably, 30% of diseases had significantly more targets in these pathways than expected by chance; the comparable number for GWAS alone (without pathway analysis) was zero. This study shows that a systematic global pathway analysis can translate genetic findings into therapeutic hypotheses for both new drug discovery and repositioning opportunities for current drugs.
Collapse
Affiliation(s)
- Deepali Jhamb
- Computational Biology, GSK R&D, Collegeville, PA, USA
| | | | - Mark R Hurle
- Computational Biology, GSK R&D, Collegeville, PA, USA
| | | |
Collapse
|
4
|
Pei G, Sun H, Dai Y, Liu X, Zhao Z, Jia P. Investigation of multi-trait associations using pathway-based analysis of GWAS summary statistics. BMC Genomics 2019; 20:79. [PMID: 30712509 PMCID: PMC6360716 DOI: 10.1186/s12864-018-5373-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background Genome-wide association studies (GWAS) have been successful in identifying disease-associated genetic variants. Recently, an increasing number of GWAS summary statistics have been made available to the research community, providing extensive repositories for studies of human complex diseases. In particular, cross-trait associations at the genetic level can be beneficial from large-scale GWAS summary statistics by using genetic variants that are associated with multiple traits. However, direct assessment of cross-trait associations using susceptibility loci has been challenging due to the complex genetic architectures in most diseases, calling for advantageous methods that could integrate functional interpretation and imply biological mechanisms. Results We developed an analytical framework for systematic integration of cross-trait associations. It incorporates two different approaches to detect enriched pathways and requires only summary statistics. We demonstrated the framework using 25 traits belonging to four phenotype groups. Our results revealed an average of 54 significantly associated pathways (ranged between 18 and 175) per trait. We further proved that pathway-based analysis provided increased power to estimate cross-trait associations compared to gene-level analysis. Based on Fisher’s Exact Test (FET), we identified a total of 24 (53) pairs of trait-trait association at adjusted pFET < 1 × 10− 3 (pFET < 0.01) among the 25 traits. Our trait-trait association network revealed not only many relationships among the traits within the same group but also novel relationships among traits from different groups, which warrants further investigation in future. Conclusions Our study revealed that risk variants for 25 different traits aggregated in particular biological pathways and that these pathways were frequently shared among traits. Our results confirmed known mechanisms and also suggested several novel insights into the etiology of multi-traits. Electronic supplementary material The online version of this article (10.1186/s12864-018-5373-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Guangsheng Pei
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA
| | - Hua Sun
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA
| | - Yulin Dai
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA
| | - Xiaoming Liu
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA. .,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA. .,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA.
| |
Collapse
|
5
|
Mishra B, Kumar N, Mukhtar MS. Systems Biology and Machine Learning in Plant-Pathogen Interactions. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2019; 32:45-55. [PMID: 30418085 DOI: 10.1094/mpmi-08-18-0221-fi] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Systems biology is an inclusive approach to study the static and dynamic emergent properties on a global scale by integrating multiomics datasets to establish qualitative and quantitative associations among multiple biological components. With an abundance of improved high throughput -omics datasets, network-based analyses and machine learning technologies are playing a pivotal role in comprehensive understanding of biological systems. Network topological features reveal most important nodes within a network as well as prioritize significant molecular components for diverse biological networks, including coexpression, protein-protein interaction, and gene regulatory networks. Machine learning techniques provide enormous predictive power through specific feature extraction from biological data. Deep learning, a subtype of machine learning, has plausible future applications because a domain expert for feature extraction is not needed in this algorithm. Inspired by diverse domains of biology, we here review classic systems biology techniques applied in plant immunity thus far. We also discuss additional advanced approaches in both graph theory and machine learning, which may provide new insights for understanding plant-microbe interactions. Finally, we propose a hybrid approach in plant immune systems that harnesses the power of both network biology and machine learning, with a potential to be applicable to both model systems and agronomically important crop plants.
Collapse
Affiliation(s)
| | | | - M Shahid Mukhtar
- 1 Department of Biology, and
- 2 Nutrition Obesity Research Center, University of Alabama at Birmingham, 1300 University Blvd., Birmingham 35294, U.S.A
| |
Collapse
|
6
|
Suravajhala P, Benso A. Prioritizing single-nucleotide polymorphisms and variants associated with clinical mastitis. Adv Appl Bioinform Chem 2017; 10:57-64. [PMID: 28652783 PMCID: PMC5473491 DOI: 10.2147/aabc.s123604] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Next-generation sequencing technology has provided resources to easily explore and identify candidate single-nucleotide polymorphisms (SNPs) and variants. However, there remains a challenge in identifying and inferring the causal SNPs from sequence data. A problem with different methods that predict the effect of mutations is that they produce false positives. In this hypothesis, we provide an overview of methods known for identifying causal variants and discuss the challenges, fallacies, and prospects in discerning candidate SNPs. We then propose a three-point classification strategy, which could be an additional annotation method in identifying causalities.
Collapse
Affiliation(s)
- Prashanth Suravajhala
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Alfredo Benso
- Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
| |
Collapse
|
7
|
Zhu L, Jiang K, Webber K, Wong L, Liu T, Chen Y, Jarvis JN. Chromatin landscapes and genetic risk for juvenile idiopathic arthritis. Arthritis Res Ther 2017; 19:57. [PMID: 28288683 PMCID: PMC5348874 DOI: 10.1186/s13075-017-1260-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 02/13/2017] [Indexed: 02/07/2023] Open
Abstract
Background The transcriptomes of peripheral blood cells in children with juvenile idiopathic arthritis (JIA) have distinct transcriptional aberrations that suggest impairment of transcriptional regulation. To gain a better understanding of this phenomenon, we studied known JIA genetic risk loci, the majority of which are located in non-coding regions, where transcription is regulated and coordinated on a genome-wide basis. We examined human neutrophils and CD4 primary T cells to identify genes and functional elements located within those risk loci. Methods We analyzed RNA sequencing (RNA-Seq) data, H3K27ac and H3K4me1 chromatin immunoprecipitation-sequencing (ChIP-Seq) data, and previously published chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) data to characterize the chromatin landscapes within the known JIA-associated risk loci. Results In both neutrophils and primary CD4+ T cells, the majority of the JIA-associated linkage disequilibrium (LD) blocks contained H3K27ac and/or H3K4me1 marks. These LD blocks were also binding sites for a small group of transcription factors, particularly in neutrophils. Furthermore, these regions showed abundant intronic and intergenic transcription in neutrophils. In neutrophils, none of the genes that were differentially expressed between untreated patients with JIA and healthy children were located within the JIA-risk LD blocks. In CD4+ T cells, multiple genes, including HLA-DQA1, HLA-DQB2, TRAF1, and IRF1 were associated with the long-distance interacting regions within the LD regions as determined from ChIA-PET data. Conclusions These findings suggest that genetic risk contributes to the aberrant transcriptional control observed in JIA. Furthermore, these findings demonstrate the challenges of identifying the actual causal variants within complex genomic/chromatin landscapes. Electronic supplementary material The online version of this article (doi:10.1186/s13075-017-1260-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lisha Zhu
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Kaiyu Jiang
- Department of Pediatrics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Karstin Webber
- Graduate Program in Biological Sciences, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Laiping Wong
- Department of Pediatrics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Tao Liu
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.,Genetics, Genomics, & Bioinformatics Program, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Yanmin Chen
- Department of Pediatrics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - James N Jarvis
- Department of Pediatrics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA. .,Genetics, Genomics, & Bioinformatics Program, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.
| |
Collapse
|
8
|
Lee J, Jo K, Lee S, Kang J, Kim S. Prioritizing biological pathways by recognizing context in time-series gene expression data. BMC Bioinformatics 2016; 17:477. [PMID: 28155707 PMCID: PMC5259824 DOI: 10.1186/s12859-016-1335-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background The primary goal of pathway analysis using transcriptome data is to find significantly perturbed pathways. However, pathway analysis is not always successful in identifying pathways that are truly relevant to the context under study. A major reason for this difficulty is that a single gene is involved in multiple pathways. In the KEGG pathway database, there are 146 genes, each of which is involved in more than 20 pathways. Thus activation of even a single gene will result in activation of many pathways. This complex relationship often makes the pathway analysis very difficult. While we need much more powerful pathway analysis methods, a readily available alternative way is to incorporate the literature information. Results In this study, we propose a novel approach for prioritizing pathways by combining results from both pathway analysis tools and literature information. The basic idea is as follows. Whenever there are enough articles that provide evidence on which pathways are relevant to the context, we can be assured that the pathways are indeed related to the context, which is termed as relevance in this paper. However, if there are few or no articles reported, then we should rely on the results from the pathway analysis tools, which is termed as significance in this paper. We realized this concept as an algorithm by introducing Context Score and Impact Score and then combining the two into a single score. Our method ranked truly relevant pathways significantly higher than existing pathway analysis tools in experiments with two data sets. Conclusions Our novel framework was implemented as ContextTRAP by utilizing two existing tools, TRAP and BEST. ContextTRAP will be a useful tool for the pathway based analysis of gene expression data since the user can specify the context of the biological experiment in a set of keywords. The web version of ContextTRAP is available at http://biohealth.snu.ac.kr/software/contextTRAP. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1335-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jusang Lee
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Kyuri Jo
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sunwon Lee
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea. .,Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
9
|
Rost B, Radivojac P, Bromberg Y. Protein function in precision medicine: deep understanding with machine learning. FEBS Lett 2016; 590:2327-41. [PMID: 27423136 PMCID: PMC5937700 DOI: 10.1002/1873-3468.12307] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 07/12/2016] [Accepted: 07/12/2016] [Indexed: 12/21/2022]
Abstract
Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both.
Collapse
Affiliation(s)
- Burkhard Rost
- Department of Informatics and Bioinformatics, Institute for Advanced Studies, Technical University of Munich, Garching, Germany
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| |
Collapse
|
10
|
Brodie A, Azaria JR, Ofran Y. How far from the SNP may the causative genes be? Nucleic Acids Res 2016; 44:6046-54. [PMID: 27269582 PMCID: PMC5291268 DOI: 10.1093/nar/gkw500] [Citation(s) in RCA: 112] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Revised: 05/20/2016] [Accepted: 05/22/2016] [Indexed: 02/03/2023] Open
Abstract
While GWAS identify many disease-associated SNPs, using them to decipher disease mechanisms is hindered by the difficulty in mapping SNPs to genes. Most SNPs are in non-coding regions and it is often hard to identify the genes they implicate. To explore how far the SNP may be from the affected genes we used a pathway-based approach. We found that affected genes are often up to 2 Mbps away from the associated SNP, and are not necessarily the closest genes to the SNP. Existing approaches for mapping SNPs to genes leave many SNPs unmapped to genes and reveal only 86 significant phenotype-pathway associations for all known GWAS hits combined. Using the pathway-based approach we propose here allows mapping of virtually all SNPs to genes and reveals 435 statistically significant phenotype-pathway associations. In search for mechanisms that may explain the relationships between SNPs and distant genes, we found that SNPs that are mapped to distant genes have significantly more large insertions/deletions around them than other SNPs, suggesting that these SNPs may sometimes be markers for large insertions/deletions that may affect large genomic regions.
Collapse
Affiliation(s)
- Aharon Brodie
- The Goodman faculty of life sciences, Nanotechnology building, Bar Ilan University, Ramat Gan 52900, Israel
| | - Johnathan Roy Azaria
- The Goodman faculty of life sciences, Nanotechnology building, Bar Ilan University, Ramat Gan 52900, Israel
| | - Yanay Ofran
- The Goodman faculty of life sciences, Nanotechnology building, Bar Ilan University, Ramat Gan 52900, Israel
| |
Collapse
|