1
|
Li Z, Chen L. Predicting functional consequences of SNPs on mRNA translation via machine learning. Nucleic Acids Res 2023; 51:7868-7881. [PMID: 37427781 PMCID: PMC10450169 DOI: 10.1093/nar/gkad576] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 05/18/2023] [Accepted: 06/23/2023] [Indexed: 07/11/2023] Open
Abstract
The functional impact of single nucleotide polymorphisms (SNPs) on translation has yet to be considered when prioritizing disease-causing SNPs from genome-wide association studies (GWAS). Here we apply machine learning models to genome-wide ribosome profiling data to predict SNP function by forecasting ribosome collisions during mRNA translation. SNPs causing remarkable ribosome occupancy changes are named RibOc-SNPs (Ribosome-Occupancy-SNPs). We found that disease-related SNPs tend to cause notable changes in ribosome occupancy, suggesting translational regulation as an essential pathogenesis step. Nucleotide conversions, such as 'G → T', 'T → G' and 'C → A', are enriched in RibOc-SNPs, with the most significant impact on ribosome occupancy, while 'A → G' (or 'A→ I' RNA editing) and 'G → A' are less deterministic. Among amino acid conversions, 'Glu → stop (codon)' shows the most significant enrichment in RibOc-SNPs. Interestingly, there is selection pressure on stop codons with a lower collision likelihood. RibOc-SNPs are enriched at the 5'-coding sequence regions, implying hot spots of translation initiation regulation. Strikingly, ∼22.1% of the RibOc-SNPs lead to opposite changes in ribosome occupancy on alternative transcript isoforms, suggesting that SNPs can amplify the differences between splicing isoforms by oppositely regulating their translation efficiency.
Collapse
Affiliation(s)
- Zheyu Li
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, 925 Bloom Walk, Los Angeles, CA 90089, USA
| | - Liang Chen
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA
| |
Collapse
|
2
|
Richer S, Tian Y, Schoenfelder S, Hurst L, Murrell A, Pisignano G. Widespread allele-specific topological domains in the human genome are not confined to imprinted gene clusters. Genome Biol 2023; 24:40. [PMID: 36869353 PMCID: PMC9983196 DOI: 10.1186/s13059-023-02876-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 02/13/2023] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND There is widespread interest in the three-dimensional chromatin conformation of the genome and its impact on gene expression. However, these studies frequently do not consider parent-of-origin differences, such as genomic imprinting, which result in monoallelic expression. In addition, genome-wide allele-specific chromatin conformation associations have not been extensively explored. There are few accessible bioinformatic workflows for investigating allelic conformation differences and these require pre-phased haplotypes which are not widely available. RESULTS We developed a bioinformatic pipeline, "HiCFlow," that performs haplotype assembly and visualization of parental chromatin architecture. We benchmarked the pipeline using prototype haplotype phased Hi-C data from GM12878 cells at three disease-associated imprinted gene clusters. Using Region Capture Hi-C and Hi-C data from human cell lines (1-7HB2, IMR-90, and H1-hESCs), we can robustly identify the known stable allele-specific interactions at the IGF2-H19 locus. Other imprinted loci (DLK1 and SNRPN) are more variable and there is no "canonical imprinted 3D structure," but we could detect allele-specific differences in A/B compartmentalization. Genome-wide, when topologically associating domains (TADs) are unbiasedly ranked according to their allele-specific contact frequencies, a set of allele-specific TADs could be defined. These occur in genomic regions of high sequence variation. In addition to imprinted genes, allele-specific TADs are also enriched for allele-specific expressed genes. We find loci that have not previously been identified as allele-specific expressed genes such as the bitter taste receptors (TAS2Rs). CONCLUSIONS This study highlights the widespread differences in chromatin conformation between heterozygous loci and provides a new framework for understanding allele-specific expressed genes.
Collapse
Affiliation(s)
- Stephen Richer
- Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Yuan Tian
- Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
- UCL Cancer Institute, University College London, Paul O'Gorman Building, London, UK
| | | | - Laurence Hurst
- Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Adele Murrell
- Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK.
| | - Giuseppina Pisignano
- Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK.
| |
Collapse
|
3
|
Kuksa PP, Greenfest-Allen E, Cifello J, Ionita M, Wang H, Nicaretta H, Cheng PL, Lee WP, Wang LS, Leung YY. Scalable approaches for functional analyses of whole-genome sequencing non-coding variants. Hum Mol Genet 2022; 31:R62-R72. [PMID: 35943817 PMCID: PMC9585666 DOI: 10.1093/hmg/ddac191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/04/2022] [Accepted: 08/08/2022] [Indexed: 11/23/2022] Open
Abstract
Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, which are typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduce additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases and functional genomic resources for interpreting the variant findings from WGS based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions which will enhance the data and tools necessary for the effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.
Collapse
Affiliation(s)
- Pavel P Kuksa
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emily Greenfest-Allen
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jeffrey Cifello
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matei Ionita
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hui Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Heather Nicaretta
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Po-Liang Cheng
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
4
|
Song H, Liu Y, Tan Y, Zhang Y, Jin W, Chen L, Wu S, Yan J, Li J, Chen Z, Chen S, Wang K. Recurrent noncoding somatic and germline WT1 variants converge to disrupt MYB binding in acute promyelocytic leukemia. Blood 2022; 140:1132-1144. [PMID: 35653587 PMCID: PMC9461475 DOI: 10.1182/blood.2021014945] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 05/24/2022] [Indexed: 11/22/2022] Open
Abstract
Genetic alternations can occur at noncoding regions, but how they contribute to cancer pathogenesis is poorly understood. Here, we established a mutational landscape of cis-regulatory regions (CREs) in acute promyelocytic leukemia (APL) based on whole-genome sequencing analysis of paired tumor and germline samples from 24 patients and epigenetic profiling of 16 patients. Mutations occurring in CREs occur preferentially in active enhancers bound by the complex of master transcription factors in APL. Among significantly enriched mutated CREs, we found a recurrently mutated region located within the third intron of WT1, an essential regulator of normal and malignant hematopoiesis. Focusing on noncoding mutations within this WT1 intron, an analysis on 169 APL patients revealed that somatic mutations were clustered into a focal hotspot region, including one site identified as a germline polymorphism contributing to APL risk. Significantly decreased WT1 expression was observed in APL patients bearing somatic and/or germline noncoding WT1 variants. Furthermore, biallelic WT1 inactivation was recurrently found in APL patients with noncoding WT1 variants, which resulted in the complete loss of WT1. The high incidence of biallelic inactivation suggested the tumor suppressor activity of WT1 in APL. Mechanistically, noncoding WT1 variants disrupted MYB binding on chromatin and suppressed the enhancer activity and WT1 expression through destroying the chromatin looping formation. Our study highlights the important role of noncoding variants in the leukemogenesis of APL.
Collapse
Affiliation(s)
- Huan Song
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yabin Liu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yun Tan
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yi Zhang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Wen Jin
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Sino-French Research Center for Life Sciences and Genomics, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; and
| | - Li Chen
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shishuang Wu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jinsong Yan
- Department of Hematology, the Second Hospital of Dalian Medical University, Dalian, China
| | - Junmin Li
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhu Chen
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Saijuan Chen
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Kankan Wang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- Sino-French Research Center for Life Sciences and Genomics, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; and
| |
Collapse
|
5
|
Brooks-Warburton J, Modos D, Sudhakar P, Madgwick M, Thomas JP, Bohar B, Fazekas D, Zoufir A, Kapuy O, Szalay-Beko M, Verstockt B, Hall LJ, Watson A, Tremelling M, Parkes M, Vermeire S, Bender A, Carding SR, Korcsmaros T. A systems genomics approach to uncover patient-specific pathogenic pathways and proteins in ulcerative colitis. Nat Commun 2022; 13:2299. [PMID: 35484353 PMCID: PMC9051123 DOI: 10.1038/s41467-022-29998-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 04/06/2022] [Indexed: 12/11/2022] Open
Abstract
We describe a precision medicine workflow, the integrated single nucleotide polymorphism network platform (iSNP), designed to determine the mechanisms by which SNPs affect cellular regulatory networks, and how SNP co-occurrences contribute to disease pathogenesis in ulcerative colitis (UC). Using SNP profiles of 378 UC patients we map the regulatory effects of the SNPs to a human signalling network containing protein-protein, miRNA-mRNA and transcription factor binding interactions. With unsupervised clustering algorithms we group these patient-specific networks into four distinct clusters driven by PRKCB, HLA, SNAI1/CEBPB/PTPN1 and VEGFA/XPO5/POLH hubs. The pathway analysis identifies calcium homeostasis, wound healing and cell motility as key processes in UC pathogenesis. Using transcriptomic data from an independent patient cohort, with three complementary validation approaches focusing on the SNP-affected genes, the patient specific modules and affected functions, we confirm the regulatory impact of non-coding SNPs. iSNP identified regulatory effects for disease-associated non-coding SNPs, and by predicting the patient-specific pathogenic processes, we propose a systems-level way to stratify patients.
Collapse
Affiliation(s)
- Johanne Brooks-Warburton
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Department of Clinical, Pharmaceutical and Biological Sciences, University of Hertfordshire, Hertford, UK
- Gastroenterology Department, Lister Hospital, Stevenage, UK
| | - Dezso Modos
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Padhmanand Sudhakar
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
| | - Matthew Madgwick
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - John P Thomas
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
| | - Balazs Bohar
- Earlham Institute, Norwich Research Park, Norwich, UK
- Department of Genetics, Eötvös Loránd University, Budapest, Hungary
| | - David Fazekas
- Earlham Institute, Norwich Research Park, Norwich, UK
- Department of Genetics, Eötvös Loránd University, Budapest, Hungary
| | - Azedine Zoufir
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Orsolya Kapuy
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | | | - Bram Verstockt
- KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
- University Hospitals Leuven, Department of Gastroenterology and Hepatology, KU Leuven, Leuven, Belgium
| | - Lindsay J Hall
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Norwich Medical School, University of East Anglia, Norwich, UK
- School of Life Sciences, ZIEL - Institute for Food & Health, Technical University of Munich, 80333, Freising, Germany
| | - Alastair Watson
- Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
- Norwich Medical School, University of East Anglia, Norwich, UK
| | - Mark Tremelling
- Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
| | - Miles Parkes
- Inflammatory Bowel Disease Research Group, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
| | - Severine Vermeire
- KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
- University Hospitals Leuven, Department of Gastroenterology and Hepatology, KU Leuven, Leuven, Belgium
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Simon R Carding
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK.
- Norwich Medical School, University of East Anglia, Norwich, UK.
| | - Tamas Korcsmaros
- Earlham Institute, Norwich Research Park, Norwich, UK.
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK.
| |
Collapse
|
6
|
Towards the Genetic Architecture of Complex Gene Expression Traits: Challenges and Prospects for eQTL Mapping in Humans. Genes (Basel) 2022; 13:genes13020235. [PMID: 35205280 PMCID: PMC8871770 DOI: 10.3390/genes13020235] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/21/2022] [Accepted: 01/25/2022] [Indexed: 12/10/2022] Open
Abstract
The discovery of expression quantitative trait loci (eQTLs) and their target genes (eGenes) has not only compensated for the limitations of genome-wide association studies for complex phenotypes but has also provided a basis for predicting gene expression. Efforts have been made to develop analytical methods in statistical genetics, a key discipline in eQTL analysis. In particular, mixed model– and deep learning–based analytical methods have been extremely beneficial in mapping eQTLs and predicting gene expression. Nevertheless, we still face many challenges associated with eQTL discovery. Here, we discuss two key aspects of these challenges: 1, the complexity of eTraits with various factors such as polygenicity and epistasis and 2, the voluminous work required for various types of eQTL profiles. The properties and prospects of statistical methods, including the mixed model method, Bayesian inference, the deep learning method, and the integration method, are presented as future directions for eQTL discovery. This review will help expedite the design and use of efficient methods for eQTL discovery and eTrait prediction.
Collapse
|