1
|
Ohtani H, Liu M, Liang G, Jang HJ, Jones PA. Efficient activation of hundreds of LTR12C elements reveals cis-regulatory function determined by distinct epigenetic mechanisms. Nucleic Acids Res 2024:gkae498. [PMID: 38874474 DOI: 10.1093/nar/gkae498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 05/23/2024] [Accepted: 06/05/2024] [Indexed: 06/15/2024] Open
Abstract
Long terminal repeats (LTRs), which often contain promoter and enhancer sequences of intact endogenous retroviruses (ERVs), are known to be co-opted as cis-regulatory elements for fine-tuning host-coding gene expression. Since LTRs are mainly silenced by the deposition of repressive epigenetic marks, substantial activation of LTRs has been found in human cells after treatment with epigenetic inhibitors. Although the LTR12C family makes up the majority of ERVs activated by epigenetic inhibitors, how these epigenetically and transcriptionally activated LTR12C elements can regulate the host-coding gene expression remains unclear due to genome-wide alteration of transcriptional changes after epigenetic inhibitor treatments. Here, we specifically transactivated >600 LTR12C elements by using single guide RNA-based dCas9-SunTag-VP64, a site-specific targeting CRISPR activation (CRISPRa) system, with minimal off-target events. Interestingly, most of the transactivated LTR12C elements acquired the H3K27ac-marked enhancer feature, while only 20% were co-marked with promoter-associated H3K4me3 modifications. The enrichment of the H3K4me3 signal was intricately associated with downstream regions of LTR12C, such as internal regions of intact ERV9 or other types of retrotransposons. Here, we leverage an optimized CRISPRa system to identify two distinct epigenetic signatures that define LTR12C transcriptional activation, which modulate the expression of proximal protein-coding genes.
Collapse
Affiliation(s)
- Hitoshi Ohtani
- Department of Epigenetics, Van Andel Research Institute, Grand Rapids, MI 49503, USA
- Department of Animal Sciences, Graduate School of Bioagricultural Sciences, Nagoya University, Chikusa-ku, Nagoya, Aichi 464-8601, Japan
| | - Minmin Liu
- Department of Epigenetics, Van Andel Research Institute, Grand Rapids, MI 49503, USA
| | - Gangning Liang
- Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - H Josh Jang
- Department of Epigenetics, Van Andel Research Institute, Grand Rapids, MI 49503, USA
| | - Peter A Jones
- Department of Epigenetics, Van Andel Research Institute, Grand Rapids, MI 49503, USA
| |
Collapse
|
2
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
3
|
Kübra Kırboğa K, Uğur Küçüksille E. Identifying Cardiovascular Disease Risk Factors in Adults with Explainable Artificial Intelligence. Anatol J Cardiol 2023; 27:657-663. [PMID: 37624075 PMCID: PMC10621606 DOI: 10.14744/anatoljcardiol.2023.3214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 07/03/2023] [Indexed: 08/26/2023] Open
Abstract
BACKGROUND The aim of this study was to evaluate the relationship between risk factors causing cardiovascular diseases and their importance with explainable machine learning models. METHODS In this retrospective study, multiple databases were searched, and data on 11 risk factors of 70 000 patients were obtained. Data included risk factors highly associated with cardiovascular disease and having/not having any cardiovascular disease. The explainable prediction model was constructed using 7 machine learning algorithms: Random Forest Classifier, Extreme Gradient Boost Classifier, Decision Tree Classifier, KNeighbors Classifier, Support Vector Machine Classifier, and GaussianNB. Receiver operating characteristic curve, Brier scores, and mean accuracy were used to assess the model's performance. The interpretability of the predicted results was examined using Shapley additive description values. RESULTS The accuracy, area under the curve values, and Brier scores of the Extreme Gradient Boost model (the best prediction model for cardiovascular disease risk factors) were calculated as 0.739, 0.803, and 0.260, respectively. The most important risk factors in the permutation feature importance method and explainable artificial intelligence-Shapley's explanations method are systolic blood pressure (ap_hi) [0.1335 ± 0.0045 w (weight)], cholesterol (0.0341 ± 0.0022 w), and age (0.0211 ± 0.0036 w). CONCLUSION The created explainable machine learning model has become a successful clinical model that can predict cardiovascular patients and explain the impact of risk factors. Especially in the clinical setting, this model, which has an accurate, explainable, and transparent algorithm, will help encourage early diagnosis of patients with cardiovascular diseases, risk factors, and possible treatment options.
Collapse
Affiliation(s)
- Kevser Kübra Kırboğa
- Department of Bioengineering, Bilecik Seyh Edebali University, Faculty of Engineering, Bilecik, Türkiye
- Informatics Institute, İstanbul Technical University, İstanbul, Türkiye
| | - Ecir Uğur Küçüksille
- Department of Computer Engineering, Süleyman Demirel University, Isparta, Türkiye
| |
Collapse
|
4
|
Sokač M, Kjær A, Dyrskjøt L, Haibe-Kains B, JWL Aerts H, Birkbak NJ. Spatial transformation of multi-omics data unlocks novel insights into cancer biology. eLife 2023; 12:RP87133. [PMID: 37669321 PMCID: PMC10479962 DOI: 10.7554/elife.87133] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2023] Open
Abstract
The application of next-generation sequencing (NGS) has transformed cancer research. As costs have decreased, NGS has increasingly been applied to generate multiple layers of molecular data from the same samples, covering genomics, transcriptomics, and methylomics. Integrating these types of multi-omics data in a combined analysis is now becoming a common issue with no obvious solution, often handled on an ad hoc basis, with multi-omics data arriving in a tabular format and analyzed using computationally intensive statistical methods. These methods particularly ignore the spatial orientation of the genome and often apply stringent p-value corrections that likely result in the loss of true positive associations. Here, we present GENIUS (GEnome traNsformatIon and spatial representation of mUltiomicS data), a framework for integrating multi-omics data using deep learning models developed for advanced image analysis. The GENIUS framework is able to transform multi-omics data into images with genes displayed as spatially connected pixels and successfully extract relevant information with respect to the desired output. We demonstrate the utility of GENIUS by applying the framework to multi-omics datasets from the Cancer Genome Atlas. Our results are focused on predicting the development of metastatic cancer from primary tumors, and demonstrate how through model inference, we are able to extract the genes which are driving the model prediction and are likely associated with metastatic disease progression. We anticipate our framework to be a starting point and strong proof of concept for multi-omics data transformation and analysis without the need for statistical correction.
Collapse
Affiliation(s)
- Mateo Sokač
- Department of Molecular Medicine, Aarhus University HospitalAarhusDenmark
- Department of Clinical Medicine, Aarhus UniversityAarhusDenmark
- Bioinformatics Research Center, Aarhus UniversityAarhusDenmark
| | - Asbjørn Kjær
- Department of Molecular Medicine, Aarhus University HospitalAarhusDenmark
- Department of Clinical Medicine, Aarhus UniversityAarhusDenmark
- Bioinformatics Research Center, Aarhus UniversityAarhusDenmark
| | - Lars Dyrskjøt
- Department of Molecular Medicine, Aarhus University HospitalAarhusDenmark
- Department of Clinical Medicine, Aarhus UniversityAarhusDenmark
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Temerty Faculty of Medicine, University of TorontoTorontoCanada
| | - Hugo JWL Aerts
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical SchoolBostonUnited States
- Departments of Radiation Oncology and Radiology, Brigham and Women’s Hospital, Dana-Farber Cancer Institute, Harvard Medical SchoolBostonUnited States
- Radiology and Nuclear Medicine, CARIM & GROW, Maastricht UniversityMaastrichtNetherlands
| | - Nicolai J Birkbak
- Department of Molecular Medicine, Aarhus University HospitalAarhusDenmark
- Department of Clinical Medicine, Aarhus UniversityAarhusDenmark
- Bioinformatics Research Center, Aarhus UniversityAarhusDenmark
| |
Collapse
|
5
|
Edwards DM, Davies P, Hebenstreit D. Synergising single-cell resolution and 4sU labelling boosts inference of transcriptional bursting. Genome Biol 2023; 24:138. [PMID: 37328900 PMCID: PMC10276402 DOI: 10.1186/s13059-023-02977-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 05/25/2023] [Indexed: 06/18/2023] Open
Abstract
Despite the recent rise of RNA-seq datasets combining single-cell (sc) resolution with 4-thiouridine (4sU) labelling, analytical methods exploiting their power to dissect transcriptional bursting are lacking. Here, we present a mathematical model and Bayesian inference implementation to facilitate genome-wide joint parameter estimation and confidence quantification (R package: burstMCMC). We demonstrate that, unlike conventional scRNA-seq, 4sU scRNA-seq resolves temporal parameters and furthermore boosts inference of dimensionless parameters via a synergy between single-cell resolution and 4sU labelling. We apply our method to published 4sU scRNA-seq data and linked with ChIP-seq data, we uncover previously obscured associations between different parameters and histone modifications.
Collapse
Affiliation(s)
| | - Philip Davies
- School of Life Sciences, University of Warwick, Coventry, UK
| | | |
Collapse
|
6
|
Li C, Alike Y, Hou J, Long Y, Zheng Z, Meng K, Yang R. Machine learning model successfully identifies important clinical features for predicting outpatients with rotator cuff tears. Knee Surg Sports Traumatol Arthrosc 2023:10.1007/s00167-022-07298-4. [PMID: 36629889 DOI: 10.1007/s00167-022-07298-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Accepted: 12/20/2022] [Indexed: 01/12/2023]
Abstract
PURPOSE The aim of this study is to develop a machine learning model to identify important clinical features related to rotator cuff tears (RCTs) using explainable artificial intelligence (XAI) for efficiently predicting outpatients with RCTs. METHODS A retrospective review of a local clinical registry dataset was performed to include patients with shoulder pain and dysfunction who underwent questionnaires and physical examinations between 2019 and 2022. RCTs were diagnosed by shoulder arthroscopy. Six machine-learning algorithms (Stacking, Gradient Boosting Machine, Bagging, Random Forest, Extreme Gradient Boost (XGBoost), and Adaptive Boosting) were developed for the prediction. The performance of the models was assessed by the area under the receiver operating characteristic curve (AUC), Brier scores, and Decision curve. The interpretability of the predicted outcomes was evaluated using Shapley additive explanation (SHAP) values. RESULTS A total of 1684 patients who completed questionnaires and clinical tests were included, and 417 patients with RCTs underwent shoulder arthroscopy. In six machining learning algorithms for predicting RCTs, the accuracy, AUC values, and Brier scores were in the range of 0.81-0.86, 0.75-0.92, and 0.15-0.19, respectively. The XGBoost model showed superior performance with accuracy, AUC, and Brier scores of 0.85(95% confidence interval, 0.82-0.87), 0.92 (95% confidence interval,0.90-0.94), and 0.15 (95% confidence interval,0.14-0.16), respectively. The Shapley plot showed the impact of the clinical features on predicting RCTs. The most important variables were Jobe test, Bear hug test, and age for prediction, with mean SHAP values of 1.458, 0.950, and 0.790, respectively. CONCLUSION The machine learning model successfully identified important clinical variables for predicting patients with RCTs. In addition, the best algorithm was also integrated into a digital application to provide predictions in outpatient settings. This tool may assist patients in reducing their pain experience and providing prompt treatments. LEVEL OF EVIDENCE Level III.
Collapse
Affiliation(s)
- Cheng Li
- Department of Orthopedics, Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University, 107 Yan Jiang Road West, Guangzhou, 510120, Guangdong, China
| | - Yamuhanmode Alike
- Department of Orthopedics, Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University, 107 Yan Jiang Road West, Guangzhou, 510120, Guangdong, China
| | - Jingyi Hou
- Department of Orthopedics, Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University, 107 Yan Jiang Road West, Guangzhou, 510120, Guangdong, China
| | - Yi Long
- Department of Orthopedics, Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University, 107 Yan Jiang Road West, Guangzhou, 510120, Guangdong, China
| | - Zhenze Zheng
- Department of Orthopedics, Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University, 107 Yan Jiang Road West, Guangzhou, 510120, Guangdong, China
| | - Ke Meng
- Department of Orthopedics, Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University, 107 Yan Jiang Road West, Guangzhou, 510120, Guangdong, China
| | - Rui Yang
- Department of Orthopedics, Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University, 107 Yan Jiang Road West, Guangzhou, 510120, Guangdong, China.
| |
Collapse
|
7
|
Zhao J, Huai J. Role of primary aging hallmarks in Alzheimer´s disease. Theranostics 2023; 13:197-230. [PMID: 36593969 PMCID: PMC9800733 DOI: 10.7150/thno.79535] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 11/15/2022] [Indexed: 12/03/2022] Open
Abstract
Alzheimer's disease (AD) is the most common neurodegenerative disease, which severely threatens the health of the elderly and causes significant economic and social burdens. The causes of AD are complex and include heritable but mostly aging-related factors. The primary aging hallmarks include genomic instability, telomere wear, epigenetic changes, and loss of protein stability, which play a dominant role in the aging process. Although AD is closely associated with the aging process, the underlying mechanisms involved in AD pathogenesis have not been well characterized. This review summarizes the available literature about primary aging hallmarks and their roles in AD pathogenesis. By analyzing published literature, we attempted to uncover the possible mechanisms of aberrant epigenetic markers with related enzymes, transcription factors, and loss of proteostasis in AD. In particular, the importance of oxidative stress-induced DNA methylation and DNA methylation-directed histone modifications and proteostasis are highlighted. A molecular network of gene regulatory elements that undergoes a dynamic change with age may underlie age-dependent AD pathogenesis, and can be used as a new drug target to treat AD.
Collapse
|
8
|
Osuntoki IG, Harrison A, Dai H, Bao Y, Zabet NR. ZipHiC: a novel Bayesian framework to identify enriched interactions and experimental biases in Hi-C data. Bioinformatics 2022; 38:3523-3531. [PMID: 35678507 PMCID: PMC9272800 DOI: 10.1093/bioinformatics/btac387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 05/23/2022] [Accepted: 06/07/2022] [Indexed: 11/26/2022] Open
Abstract
Motivation Several computational and statistical methods have been developed to analyze data generated through the 3C-based methods, especially the Hi-C. Most of the existing methods do not account for dependency in Hi-C data. Results Here, we present ZipHiC, a novel statistical method to explore Hi-C data focusing on the detection of enriched contacts. ZipHiC implements a Bayesian method based on a hidden Markov random field (HMRF) model and the Approximate Bayesian Computation (ABC) to detect interactions in two-dimensional space based on a Hi-C contact frequency matrix. ZipHiC uses data on the sources of biases related to the contact frequency matrix, allows borrowing information from neighbours using the Potts model and improves computation speed using the ABC model. In addition to outperforming existing tools on both simulated and real data, our model also provides insights into different sources of biases that affects Hi-C data. We show that some datasets display higher biases from DNA accessibility or Transposable Elements content. Furthermore, our analysis in Drosophila melanogaster showed that approximately half of the detected significant interactions connect promoters with other parts of the genome indicating a functional biological role. Finally, we found that the micro-C datasets display higher biases from DNA accessibility compared to a similar Hi-C experiment, but this can be corrected by ZipHiC. Availability and implementation The R scripts are available at https://github.com/igosungithub/HMRFHiC.git. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Itunu G Osuntoki
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom.,Statistics, Modelling and Economics Department, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Andrew Harrison
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Hongsheng Dai
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Yanchun Bao
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Nicolae Radu Zabet
- School of Life Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom.,Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, E1 2AT, United Kingdom
| |
Collapse
|