1
|
Wen W, Zhong J, Zhang Z, Jia L, Chu T, Wang N, Danko CG, Wang Z. dHICA: a deep transformer-based model enables accurate histone imputation from chromatin accessibility. Brief Bioinform 2024; 25:bbae459. [PMID: 39316943 PMCID: PMC11421843 DOI: 10.1093/bib/bbae459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/13/2024] [Accepted: 09/04/2024] [Indexed: 09/26/2024] Open
Abstract
Histone modifications (HMs) are pivotal in various biological processes, including transcription, replication, and DNA repair, significantly impacting chromatin structure. These modifications underpin the molecular mechanisms of cell-type-specific gene expression and complex diseases. However, annotating HMs across different cell types solely using experimental approaches is impractical due to cost and time constraints. Herein, we present dHICA (deep histone imputation using chromatin accessibility), a novel deep learning framework that integrates DNA sequences and chromatin accessibility data to predict multiple HM tracks. Employing the transformer architecture alongside dilated convolutions, dHICA boasts an extensive receptive field and captures more cell-type-specific information. dHICA outperforms state-of-the-art baselines and achieves superior performance in cell-type-specific loci and gene elements, aligning with biological expectations. Furthermore, dHICA's imputations hold significant potential for downstream applications, including chromatin state segmentation and elucidating the functional implications of SNPs (Single Nucleotide Polymorphisms). In conclusion, dHICA serves as a valuable tool for advancing the understanding of chromatin dynamics, offering enhanced predictive capabilities and interpretability.
Collapse
Affiliation(s)
- Wen Wen
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Jiaxin Zhong
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Zhaoxi Zhang
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Lijuan Jia
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Tinyi Chu
- Meinig School of Biomedical Engineering, Cornell University, Weill Hall, Ithaca, NY 14853, United States
| | - Nating Wang
- Department of Molecular Biology and Genetics, Cornell University, Biotechnology Building, Ithaca, NY 14853, United States
| | - Charles G Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Hungerford Hill Rd, Ithaca, NY 14853, United States
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Tower Rd, Ithaca, NY 14853, United States
| | - Zhong Wang
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| |
Collapse
|
2
|
Shang Z, Chauhan V, Devi K, Patil S. Artificial Intelligence, the Digital Surgeon: Unravelling Its Emerging Footprint in Healthcare - The Narrative Review. J Multidiscip Healthc 2024; 17:4011-4022. [PMID: 39165254 PMCID: PMC11333562 DOI: 10.2147/jmdh.s482757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 08/09/2024] [Indexed: 08/22/2024] Open
Abstract
Background Artificial Intelligence (AI) holds transformative potential for the healthcare industry, offering innovative solutions for diagnosis, treatment planning, and improving patient outcomes. As AI continues to be integrated into healthcare systems, it promises advancements across various domains. This review explores the diverse applications of AI in healthcare, along with the challenges and limitations that need to be addressed. The aim is to provide a comprehensive overview of AI's impact on healthcare and to identify areas for further development and focus. Main Applications The review discusses the broad range of AI applications in healthcare. In medical imaging and diagnostics, AI enhances the accuracy and efficiency of diagnostic processes, aiding in early disease detection. AI-powered clinical decision support systems assist healthcare professionals in patient management and decision-making. Predictive analytics using AI enables the prediction of patient outcomes and identification of potential health risks. AI-driven robotic systems have revolutionized surgical procedures, improving precision and outcomes. Virtual assistants and chatbots enhance patient interaction and support, providing timely information and assistance. In the pharmaceutical industry, AI accelerates drug discovery and development by identifying potential drug candidates and predicting their efficacy. Additionally, AI improves administrative efficiency and operational workflows in healthcare, streamlining processes and reducing costs. AI-powered remote monitoring and telehealth solutions expand access to healthcare, particularly in underserved areas. Challenges and Limitations Despite the significant promise of AI in healthcare, several challenges persist. Ensuring the reliability and consistency of AI-driven outcomes is crucial. Privacy and security concerns must be navigated carefully, particularly in handling sensitive patient data. Ethical considerations, including bias and fairness in AI algorithms, need to be addressed to prevent unintended consequences. Overcoming these challenges is critical for the ethical and successful integration of AI in healthcare. Conclusion The integration of AI into healthcare is advancing rapidly, offering substantial benefits in improving patient care and operational efficiency. However, addressing the associated challenges is essential to fully realize the transformative potential of AI in healthcare. Future efforts should focus on enhancing the reliability, transparency, and ethical standards of AI technologies to ensure they contribute positively to global health outcomes.
Collapse
Affiliation(s)
- Zifang Shang
- Guangdong Engineering Technological Research Centre of Clinical Molecular Diagnosis and Antibody Drugs, Meizhou People’s Hospital (Huangtang Hospital), Meizhou Academy of Medical Sciences, Meizhou, People’s Republic of China
| | - Varun Chauhan
- Multi-Disciplinary Research Unit, Government Institute of Medical Sciences, Greater Noida, India
| | - Kirti Devi
- Department of Medicine, Government Institute of Medical Sciences, Greater Noida, India
| | - Sandip Patil
- Department Haematology and Oncology, Shenzhen Children’s Hospital, Shenzhen, People’s Republic of China
| |
Collapse
|
3
|
Rachid Zaim S, Pebworth MP, McGrath I, Okada L, Weiss M, Reading J, Czartoski JL, Torgerson TR, McElrath MJ, Bumol TF, Skene PJ, Li XJ. MOCHA's advanced statistical modeling of scATAC-seq data enables functional genomic inference in large human cohorts. Nat Commun 2024; 15:6828. [PMID: 39122670 PMCID: PMC11316085 DOI: 10.1038/s41467-024-50612-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 07/13/2024] [Indexed: 08/12/2024] Open
Abstract
Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) is being increasingly used to study gene regulation. However, major analytical gaps limit its utility in studying gene regulatory programs in complex diseases. In response, MOCHA (Model-based single cell Open CHromatin Analysis) presents major advances over existing analysis tools, including: 1) improving identification of sample-specific open chromatin, 2) statistical modeling of technical drop-out with zero-inflated methods, 3) mitigation of false positives in single cell analysis, 4) identification of alternative transcription-starting-site regulation, and 5) modules for inferring temporal gene regulatory networks from longitudinal data. These advances, in addition to open chromatin analyses, provide a robust framework after quality control and cell labeling to study gene regulatory programs in human disease. We benchmark MOCHA with four state-of-the-art tools to demonstrate its advances. We also construct cross-sectional and longitudinal gene regulatory networks, identifying potential mechanisms of COVID-19 response. MOCHA provides researchers with a robust analytical tool for functional genomic inference from scATAC-seq data.
Collapse
Affiliation(s)
| | | | | | - Lauren Okada
- Allen Institute for Immunology, Seattle, WA, USA
| | - Morgan Weiss
- Allen Institute for Immunology, Seattle, WA, USA
| | | | - Julie L Czartoski
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | - M Juliana McElrath
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | | | - Xiao-Jun Li
- Allen Institute for Immunology, Seattle, WA, USA.
| |
Collapse
|
4
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
5
|
Ozturk K, Panwala R, Sheen J, Ford K, Jayne N, Portell A, Zhang DE, Hutter S, Haferlach T, Ideker T, Mali P, Carter H. Interface-guided phenotyping of coding variants in the transcription factor RUNX1. Cell Rep 2024; 43:114436. [PMID: 38968069 PMCID: PMC11345852 DOI: 10.1016/j.celrep.2024.114436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 07/07/2024] Open
Abstract
Single-gene missense mutations remain challenging to interpret. Here, we deploy scalable functional screening by sequencing (SEUSS), a Perturb-seq method, to generate mutations at protein interfaces of RUNX1 and quantify their effect on activities of downstream cellular programs. We evaluate single-cell RNA profiles of 115 mutations in myelogenous leukemia cells and categorize them into three functionally distinct groups, wild-type (WT)-like, loss-of-function (LoF)-like, and hypomorphic, that we validate in orthogonal assays. LoF-like variants dominate the DNA-binding site and are recurrent in cancer; however, recurrence alone does not predict functional impact. Hypomorphic variants share characteristics with LoF-like but favor protein interactions, promoting gene expression indicative of nerve growth factor (NGF) response and cytokine recruitment of neutrophils. Accessible DNA near differentially expressed genes frequently contains RUNX1-binding motifs. Finally, we reclassify 16 variants of uncertain significance and train a classifier to predict 103 more. Our work demonstrates the potential of targeting protein interactions to better define the landscape of phenotypes reachable by missense mutations.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Rebecca Panwala
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Jeanna Sheen
- School of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Kyle Ford
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Nathan Jayne
- School of Biological Sciences, University of California, San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Andrew Portell
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Dong-Er Zhang
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Stephan Hutter
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 Munich, Germany
| | - Torsten Haferlach
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 Munich, Germany
| | - Trey Ideker
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Prashant Mali
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
| | - Hannah Carter
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
6
|
Gong M, Yu Y, Wang Z, Zhang J, Wang X, Fu C, Zhang Y, Wang X. scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis. Comput Biol Med 2024; 171:108230. [PMID: 38442554 DOI: 10.1016/j.compbiomed.2024.108230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/06/2024] [Accepted: 02/25/2024] [Indexed: 03/07/2024]
Abstract
Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT's methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence's grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.
Collapse
Affiliation(s)
- Meiqin Gong
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China
| | - Yun Yu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Zixuan Wang
- College of Electronics and information Engineering, SiChuan University, Chengdu, 610065, China
| | - Junming Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiongyi Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Cheng Fu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiaodong Wang
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
7
|
Varshney A, Manickam N, Orchard P, Tovar A, Zhang Z, Feng F, Erdos MR, Narisu N, Ventresca C, Nishino K, Rai V, Stringham HM, Jackson AU, Tamsen T, Gao C, Yang M, Koues OI, Welch JD, Burant CF, Williams LK, Jenkinson C, DeFronzo RA, Norton L, Saramies J, Lakka TA, Laakso M, Tuomilehto J, Mohlke KL, Kitzman JO, Koistinen HA, Liu J, Boehnke M, Collins FS, Scott LJ, Parker SCJ. Population-scale skeletal muscle single-nucleus multi-omic profiling reveals extensive context specific genetic regulation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.15.571696. [PMID: 38168419 PMCID: PMC10760134 DOI: 10.1101/2023.12.15.571696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Skeletal muscle, the largest human organ by weight, is relevant to several polygenic metabolic traits and diseases including type 2 diabetes (T2D). Identifying genetic mechanisms underlying these traits requires pinpointing the relevant cell types, regulatory elements, target genes, and causal variants. Here, we used genetic multiplexing to generate population-scale single nucleus (sn) chromatin accessibility (snATAC-seq) and transcriptome (snRNA-seq) maps across 287 frozen human skeletal muscle biopsies representing 456,880 nuclei. We identified 13 cell types that collectively represented 983,155 ATAC summits. We integrated genetic variation to discover 6,866 expression quantitative trait loci (eQTL) and 100,928 chromatin accessibility QTL (caQTL) (5% FDR) across the five most abundant cell types, cataloging caQTL peaks that atlas-level snATAC maps often miss. We identified 1,973 eGenes colocalized with caQTL and used mediation analyses to construct causal directional maps for chromatin accessibility and gene expression. 3,378 genome-wide association study (GWAS) signals across 43 relevant traits colocalized with sn-e/caQTL, 52% in a cell-specific manner. 77% of GWAS signals colocalized with caQTL and not eQTL, highlighting the critical importance of population-scale chromatin profiling for GWAS functional studies. GWAS-caQTL colocalization showed distinct cell-specific regulatory paradigms. For example, a C2CD4A/B T2D GWAS signal colocalized with caQTL in muscle fibers and multiple chromatin loop models nominated VPS13C, a glucose uptake gene. Sequence of the caQTL peak overlapping caSNP rs7163757 showed allelic regulatory activity differences in a human myocyte cell line massively parallel reporter assay. These results illuminate the genetic regulatory architecture of human skeletal muscle at high-resolution epigenomic, transcriptomic, and cell state scales and serve as a template for population-scale multi-omic mapping in complex tissues and traits.
Collapse
Affiliation(s)
- Arushi Varshney
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Nandini Manickam
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Peter Orchard
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Adelaide Tovar
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Zhenhao Zhang
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Fan Feng
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Michael R Erdos
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Narisu Narisu
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Christa Ventresca
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Dept. of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Kirsten Nishino
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Vivek Rai
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Heather M Stringham
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Anne U Jackson
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Tricia Tamsen
- Biomedical Research Core Facilities Advanced Genomics Core, University of Michigan, Ann Arbor, MI, USA
| | - Chao Gao
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Mao Yang
- Department of Internal Medicine, Center for Individualized and Genomic Medicine Research, Henry Ford Hospital, Detroit, MI, USA
| | - Olivia I Koues
- Biomedical Research Core Facilities Advanced Genomics Core, University of Michigan, Ann Arbor, MI, USA
| | - Joshua D Welch
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Charles F Burant
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - L Keoki Williams
- Department of Internal Medicine, Center for Individualized and Genomic Medicine Research, Henry Ford Hospital, Detroit, MI, USA
| | - Chris Jenkinson
- South Texas Diabetes and Obesity Research Institute, School of Medicine, University of Texas, Rio Grande Valley, TX, USA
| | - Ralph A DeFronzo
- Department of Medicine/Diabetes Division, University of Texas Health, San Antonio, TX, USA
| | - Luke Norton
- Department of Medicine/Diabetes Division, University of Texas Health, San Antonio, TX, USA
| | - Jouko Saramies
- Savitaipale Health Center, South Karelia Central Hospital, Lappeenranta, Finland
| | - Timo A Lakka
- Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Markku Laakso
- Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland
| | - Jaakko Tuomilehto
- Dept. of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- Dept. of Public Health, University of Helsinki, Helsinki, Finland
- Diabetes Research Group, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Karen L Mohlke
- Dept. of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Jacob O Kitzman
- Dept. of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Heikki A Koistinen
- Dept. of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- Department of Medicine, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Minerva Foundation Institute for Medical Research, Helsinki, Finland
| | - Jie Liu
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Michael Boehnke
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Francis S Collins
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Laura J Scott
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Stephen C J Parker
- Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Dept. of Human Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
8
|
Berson E, Sreenivas A, Phongpreecha T, Perna A, Grandi FC, Xue L, Ravindra NG, Payrovnaziri N, Mataraso S, Kim Y, Espinosa C, Chang AL, Becker M, Montine KS, Fox EJ, Chang HY, Corces MR, Aghaeepour N, Montine TJ. Whole genome deconvolution unveils Alzheimer's resilient epigenetic signature. Nat Commun 2023; 14:4947. [PMID: 37587197 PMCID: PMC10432546 DOI: 10.1038/s41467-023-40611-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 08/03/2023] [Indexed: 08/18/2023] Open
Abstract
Assay for Transposase Accessible Chromatin by sequencing (ATAC-seq) accurately depicts the chromatin regulatory state and altered mechanisms guiding gene expression in disease. However, bulk sequencing entangles information from different cell types and obscures cellular heterogeneity. To address this, we developed Cellformer, a deep learning method that deconvolutes bulk ATAC-seq into cell type-specific expression across the whole genome. Cellformer enables cost-effective cell type-specific open chromatin profiling in large cohorts. Applied to 191 bulk samples from 3 brain regions, Cellformer identifies cell type-specific gene regulatory mechanisms involved in resilience to Alzheimer's disease, an uncommon group of cognitively healthy individuals that harbor a high pathological load of Alzheimer's disease. Cell type-resolved chromatin profiling unveils cell type-specific pathways and nominates potential epigenetic mediators underlying resilience that may illuminate therapeutic opportunities to limit the cognitive impact of the disease. Cellformer is freely available to facilitate future investigations using high-throughput bulk ATAC-seq data.
Collapse
Affiliation(s)
- Eloise Berson
- Department of Pathology, Stanford University, Stanford, CA, USA.
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | - Anjali Sreenivas
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Thanaphong Phongpreecha
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Amalia Perna
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Fiorella C Grandi
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Lei Xue
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Neal G Ravindra
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Neelufar Payrovnaziri
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Samson Mataraso
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Yeasul Kim
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Camilo Espinosa
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Alan L Chang
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | - Martin Becker
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | | | - Edward J Fox
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Howard Y Chang
- Center for Personal Dynamic Regulomes, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - M Ryan Corces
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
9
|
Ozturk K, Panwala R, Sheen J, Ford K, Payne N, Zhang DE, Hutter S, Haferlach T, Ideker T, Mali P, Carter H. Interface-guided phenotyping of coding variants in the transcription factor RUNX1 with SEUSS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.03.551876. [PMID: 37577681 PMCID: PMC10418284 DOI: 10.1101/2023.08.03.551876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Understanding the consequences of single amino acid substitutions in cancer driver genes remains an unmet need. Perturb-seq provides a tool to investigate the effects of individual mutations on cellular programs. Here we deploy SEUSS, a Perturb-seq like approach, to generate and assay mutations at physical interfaces of the RUNX1 Runt domain. We measured the impact of 115 mutations on RNA profiles in single myelogenous leukemia cells and used the profiles to categorize mutations into three functionally distinct groups: wild-type (WT)-like, loss-of-function (LOF)-like and hypomorphic. Notably, the largest concentration of functional mutations (non-WT-like) clustered at the DNA binding site and contained many of the more frequently observed mutations in human cancers. Hypomorphic variants shared characteristics with loss of function variants but had gene expression profiles indicative of response to neural growth factor and cytokine recruitment of neutrophils. Additionally, DNA accessibility changes upon perturbations were enriched for RUNX1 binding motifs, particularly near differentially expressed genes. Overall, our work demonstrates the potential of targeting protein interaction interfaces to better define the landscape of prospective phenotypes reachable by amino acid substitutions.
Collapse
|
10
|
Gao VR, Yang R, Das A, Luo R, Luo H, McNally DR, Karagiannidis I, Rivas MA, Wang ZM, Barisic D, Karbalayghareh A, Wong W, Zhan YA, Chin CR, Noble W, Bilmes JA, Apostolou E, Kharas MG, Béguelin W, Viny AD, Huangfu D, Rudensky AY, Melnick AM, Leslie CS. ChromaFold predicts the 3D contact map from single-cell chromatin accessibility. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550836. [PMID: 37546906 PMCID: PMC10402156 DOI: 10.1101/2023.07.27.550836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.
Collapse
Affiliation(s)
- Vianne R. Gao
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Rui Yang
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Arnav Das
- University of Washington, Seattle, WA, USA
| | - Renhe Luo
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Hanzhi Luo
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dylan R. McNally
- Caryl and Israel Englander Institute for Precision Medicine, Institute for Computational Biomedicine, Weill Cornell Medicine, Cornell University, New York, NY, USA
| | - Ioannis Karagiannidis
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Martin A. Rivas
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Zhong-Min Wang
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Darko Barisic
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Alireza Karbalayghareh
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wilfred Wong
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Yingqian A. Zhan
- Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christopher R. Chin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | | | - Effie Apostolou
- Sanford I Weill department of Medicine, Sandra and Edward Meyer Cancer center, Weill Cornell Medicine, New York, NY, USA
| | - Michael G. Kharas
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wendy Béguelin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Aaron D. Viny
- Departments of Medicine, Division of Hematology & Oncology, and of Genetics & Development, Columbia Stem Cell Initiative, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Danwei Huangfu
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Alexander Y. Rudensky
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ari M. Melnick
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Christina S. Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
11
|
Osheter T, Campisi Pinto S, Randieri C, Perrotta A, Linder C, Weisman Z. Semi-Autonomic AI LF-NMR Sensor for Industrial Prediction of Edible Oil Oxidation Status. SENSORS (BASEL, SWITZERLAND) 2023; 23:2125. [PMID: 36850723 PMCID: PMC9962559 DOI: 10.3390/s23042125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 02/09/2023] [Accepted: 02/12/2023] [Indexed: 06/18/2023]
Abstract
The evaluation of an oil's oxidation status during industrial production is highly important for monitoring the oil's purity and nutritional value during production, transportation, storage, and cooking. The oil and food industry is seeking a real-time, non-destructive, rapid, robust, and low-cost sensor for nutritional oil's material characterization. Towards this goal, a 1H LF-NMR relaxation sensor application based on the chemical and structural profiling of non-oxidized and oxidized oils was developed. This study dealt with a relatively large-scale oil oxidation database, which included crude data of a 1H LF-NMR relaxation curve, and its reconstruction into T1 and T2 spectral fingerprints, self-diffusion coefficient D, and conventional standard chemical test results. This study used a convolutional neural network (CNN) that was trained to classify T2 relaxation curves into three ordinal classes representing three different oil oxidation levels (non-oxidized, partial oxidation, and high level of oxidation). Supervised learning was used on the T2 signals paired with the ground-truth labels of oxidation values as per conventional chemical lab oxidation tests. The test data results (not used for training) show a high classification accuracy (95%). The proposed AI method integrates a large training set, an LF-NMR sensor, and a machine learning program that meets the requirements of the oil and food industry and can be further developed for other applications.
Collapse
Affiliation(s)
- Tatiana Osheter
- Phyto-Lipid Biotech Lab (PLBL), Department of Biotechnology Engineering, Ben Gurion University of the Negev, Beer Sheva 8499000, Israel
| | - Salvatore Campisi Pinto
- Phyto-Lipid Biotech Lab (PLBL), Department of Biotechnology Engineering, Ben Gurion University of the Negev, Beer Sheva 8499000, Israel
| | | | - Andrea Perrotta
- eCampus University, Via Isimbardi, 10, 22060 Novedrate, Italy
| | - Charles Linder
- Phyto-Lipid Biotech Lab (PLBL), Department of Biotechnology Engineering, Ben Gurion University of the Negev, Beer Sheva 8499000, Israel
| | - Zeev Weisman
- Phyto-Lipid Biotech Lab (PLBL), Department of Biotechnology Engineering, Ben Gurion University of the Negev, Beer Sheva 8499000, Israel
| |
Collapse
|
12
|
Liu D, Zinski A, Mishra A, Noh H, Park GH, Qin Y, Olorife O, Park JM, Abani CP, Park JS, Fung J, Sawaqed F, Coyle JT, Stahl E, Bendl J, Fullard JF, Roussos P, Zhang X, Stanton PK, Yin C, Huang W, Kim HY, Won H, Cho JH, Chung S. Impact of schizophrenia GWAS loci converge onto distinct pathways in cortical interneurons vs glutamatergic neurons during development. Mol Psychiatry 2022; 27:4218-4233. [PMID: 35701597 DOI: 10.1038/s41380-022-01654-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 05/24/2022] [Accepted: 05/31/2022] [Indexed: 02/07/2023]
Abstract
Remarkable advances have been made in schizophrenia (SCZ) GWAS, but gleaning biological insight from these loci is challenging. Genetic influences on gene expression (e.g., eQTLs) are cell type-specific, but most studies that attempt to clarify GWAS loci's influence on gene expression have employed tissues with mixed cell compositions that can obscure cell-specific effects. Furthermore, enriched SCZ heritability in the fetal brain underscores the need to study the impact of SCZ risk loci in specific developing neurons. MGE-derived cortical interneurons (cINs) are consistently affected in SCZ brains and show enriched SCZ heritability in human fetal brains. We identified SCZ GWAS risk genes that are dysregulated in iPSC-derived homogeneous populations of developing SCZ cINs. These SCZ GWAS loci differential expression (DE) genes converge on the PKC pathway. Their disruption results in PKC hyperactivity in developing cINs, leading to arborization deficits. We show that the fine-mapped GWAS locus in the ATP2A2 gene of the PKC pathway harbors enhancer marks by ATACseq and ChIPseq, and regulates ATP2A2 expression. We also generated developing glutamatergic neurons (GNs), another population with enriched SCZ heritability, and confirmed their functionality after transplantation into the mouse brain. Then, we identified SCZ GWAS risk genes that are dysregulated in developing SCZ GNs. GN-specific SCZ GWAS loci DE genes converge on the ion transporter pathway, distinct from those for cINs. Disruption of the pathway gene CACNA1D resulted in deficits of Ca2+ currents in developing GNs, suggesting compromised neuronal function by GWAS loci pathway deficits during development. This study allows us to identify cell type-specific and developmental stage-specific mechanisms of SCZ risk gene function, and may aid in identifying mechanism-based novel therapeutic targets.
Collapse
Affiliation(s)
- Dongxin Liu
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA.
- Department of Developmental Cell Biology, Key Laboratory of Cell Biology, Ministry of Public Health, and Key Laboratory of Medical Cell Biology, Ministry of Education, China Medical University, Shenyang, China.
| | - Amy Zinski
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Akanksha Mishra
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Haneul Noh
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
- Department of Psychiatry, McLean Hospital/Harvard Medical School, Belmont, MA, 02478, USA
| | - Gun-Hoo Park
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Yiren Qin
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Oshoname Olorife
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - James M Park
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Chiderah P Abani
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Joy S Park
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Janice Fung
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Farah Sawaqed
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Joseph T Coyle
- Department of Psychiatry, McLean Hospital/Harvard Medical School, Belmont, MA, 02478, USA
| | - Eli Stahl
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
| | - Jaroslav Bendl
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
| | - John F Fullard
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
| | - Panos Roussos
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, New York, NY, 10029, USA
- Mental Illness Research Education and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, 10468, USA
| | - Xiaolei Zhang
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Patric K Stanton
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA
| | - Changhong Yin
- Department of Pathology, New York Medical College, Valhalla, NY, 10595, USA
| | - Weihua Huang
- Department of Pathology, New York Medical College, Valhalla, NY, 10595, USA
| | - Hae-Young Kim
- Department of Public Health, New York Medical College, Valhalla, NY, USA
| | - Hyejung Won
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Jun-Hyeong Cho
- Department of Molecular, Cell and Systems Biology, University of California, Riverside, CA, 92521, USA
| | - Sangmi Chung
- Department of Cell biology and Anatomy, New York Medical College, Valhalla, NY, 10595, USA.
- Department of Psychiatry, McLean Hospital/Harvard Medical School, Belmont, MA, 02478, USA.
| |
Collapse
|
13
|
Shi P, Nie Y, Yang J, Zhang W, Tang Z, Xu J. Fundamental and practical approaches for single-cell ATAC-seq analysis. ABIOTECH 2022; 3:212-223. [PMID: 36313930 PMCID: PMC9590475 DOI: 10.1007/s42994-022-00082-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/07/2022] [Indexed: 11/28/2022]
Abstract
Assays for transposase-accessible chromatin through high-throughput sequencing (ATAC-seq) are effective tools in the study of genome-wide chromatin accessibility landscapes. With the rapid development of single-cell technology, open chromatin regions that play essential roles in epigenetic regulation have been measured at the single-cell level using single-cell ATAC-seq approaches. The application of scATAC-seq has become as popular as that of scRNA-seq. However, owing to the nature of scATAC-seq data, which are sparse and noisy, processing the data requires different methodologies and empirical experience. This review presents a practical guide for processing scATAC-seq data, from quality evaluation to downstream analysis, for various applications. In addition to the epigenomic profiling from scATAC-seq, we also discuss recent studies in which the function of non-coding variants has been investigated based on cell type-specific cis-regulatory elements and how to use the by-product genetic information obtained from scATAC-seq to infer single-cell copy number variants and trace cell lineage. We anticipate that this review will assist researchers in designing and implementing scATAC-seq assays to facilitate research in diverse fields.
Collapse
Affiliation(s)
- Peiyu Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Yage Nie
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Jiawen Yang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Weixing Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Zhongjie Tang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| | - Jin Xu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275 China
| |
Collapse
|
14
|
scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat Methods 2022; 19:1088-1096. [PMID: 35941239 DOI: 10.1038/s41592-022-01562-8] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 06/27/2022] [Indexed: 12/25/2022]
Abstract
Single-cell assay for transposase-accessible chromatin using sequencing (scATAC) shows great promise for studying cellular heterogeneity in epigenetic landscapes, but there remain important challenges in the analysis of scATAC data due to the inherent high dimensionality and sparsity. Here we introduce scBasset, a sequence-based convolutional neural network method to model scATAC data. We show that by leveraging the DNA sequence information underlying accessibility peaks and the expressiveness of a neural network model, scBasset achieves state-of-the-art performance across a variety of tasks on scATAC and single-cell multiome datasets, including cell clustering, scATAC profile denoising, data integration across assays and transcription factor activity inference.
Collapse
|
15
|
Encoding and decoding NF-κB nuclear dynamics. Curr Opin Cell Biol 2022; 77:102103. [DOI: 10.1016/j.ceb.2022.102103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 03/16/2022] [Accepted: 04/24/2022] [Indexed: 11/22/2022]
|
16
|
Shannon J, Sundaresan A, Bukulmez O, Jiao Z, Doody K, Capelouto S, Carr B, Banaszynski LA. Chromatin Accessibility Analysis from Fresh and Cryopreserved Human Ovarian Follicles. Mol Hum Reprod 2022; 28:gaac020. [PMID: 35674368 DOI: 10.1093/molehr/gaac020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 05/06/2022] [Indexed: 11/14/2022] Open
Abstract
Understanding how gene regulatory elements influence ovarian follicle development has important implications in clinically relevant settings. This includes understanding decreased fertility with age and understanding the short-lived graft function commonly observed after ovarian tissue cryopreservation and subsequent autologous transplantation as a fertility preservation treatment. The Assay for Transposase Accessible Chromatin by sequencing (ATAC-seq) is a powerful tool to identify distal and proximal regulatory elements important for activity-dependent gene regulation and hormonal and environmental responses such as those involved in germ cell maturation and human fertility. Original ATAC protocols were optimized for fresh cells, a major barrier to implementing this technique for clinical tissue samples which are more often than not frozen and stored. While recent advances have improved data obtained from stored samples, this technique has yet to be applied to human ovarian follicles, perhaps due to the difficulty in isolating follicles in sufficient quantities from stored clinical samples. Further, it remains unknown whether the process of cryopreservation affects the quality of the data obtained from ovarian follicles. Here, we generate ATAC-seq data sets from matched fresh and cryopreserved human ovarian follicles. We find that data obtained from cryopreserved samples are of reduced quality but consistent with data obtained from fresh samples, suggesting that the act of cryopreservation does not significantly affect biological interpretation of chromatin accessibility data. Our study encourages the use of this method to uncover the role of chromatin regulation in a number of clinical settings with the ultimate goal of improving fertility.
Collapse
Affiliation(s)
- Jennifer Shannon
- Department of Obstetrics and Gynecology: Division of Reproductive Endocrinology and Infertility, UT Southwestern Medical Center, Dallas, TX, 75390, USA
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Aishwarya Sundaresan
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Orhan Bukulmez
- Department of Obstetrics and Gynecology: Division of Reproductive Endocrinology and Infertility, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Zexu Jiao
- Department of Obstetrics and Gynecology: Division of Reproductive Endocrinology and Infertility, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Kaitlin Doody
- Department of Obstetrics and Gynecology: Division of Reproductive Endocrinology and Infertility, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Sarah Capelouto
- Department of Obstetrics and Gynecology: Division of Reproductive Endocrinology and Infertility, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Bruce Carr
- Department of Obstetrics and Gynecology: Division of Reproductive Endocrinology and Infertility, UT Southwestern Medical Center, Dallas, TX, 75390, USA
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Laura A Banaszynski
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| |
Collapse
|
17
|
Deep learning modeling m 6A deposition reveals the importance of downstream cis-element sequences. Nat Commun 2022; 13:2720. [PMID: 35581216 PMCID: PMC9114009 DOI: 10.1038/s41467-022-30209-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 04/06/2022] [Indexed: 11/08/2022] Open
Abstract
The N6-methyladenosine (m6A) modification is deposited to nascent transcripts on chromatin, but its site-specificity mechanism is mostly unknown. Here we model the m6A deposition to pre-mRNA by iM6A (intelligent m6A), a deep learning method, demonstrating that the site-specific m6A methylation is primarily determined by the flanking nucleotide sequences. iM6A accurately models the m6A deposition (AUROC = 0.99) and uncovers surprisingly that the cis-elements regulating the m6A deposition preferentially reside within the 50 nt downstream of the m6A sites. The m6A enhancers mostly include part of the RRACH motif and the m6A silencers generally contain CG/GT/CT motifs. Our finding is supported by both independent experimental validations and evolutionary conservation. Moreover, our work provides evidences that mutations resulting in synonymous codons can affect the m6A deposition and the TGA stop codon favors m6A deposition nearby. Our iM6A deep learning modeling enables fast paced biological discovery which would be cost-prohibitive and unpractical with traditional experimental approaches, and uncovers a key cis-regulatory mechanism for m6A site-specific deposition.
Collapse
|
18
|
LaFave LM, Savage RE, Buenrostro JD. Single-Cell Epigenomics Reveals Mechanisms of Cancer Progression. ANNUAL REVIEW OF CANCER BIOLOGY 2022. [DOI: 10.1146/annurev-cancerbio-070620-094453] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Cancer initiation is driven by the cooperation between genetic and epigenetic aberrations that disrupt gene regulatory programs critical to maintaining specialized cellular functions. After initiation, cells acquire additional genetic and epigenetic alterations influenced by tumor-intrinsic and -extrinsic mechanisms, which increase intratumoral heterogeneity, reshape the cell's underlying gene regulatory networks and promote cancer evolution. Furthermore, environmental or therapeutic insults drive the selection of heterogeneous cell states, with implications for cancer initiation, maintenance, and drug resistance. The advancement of single-cell genomics has begun to uncover the full repertoire of chromatin and gene expression states (cell states) that exist within individual tumors. These single-cell analyses suggest that cells diversify in their regulatory states upon transformation by co-opting damage-induced and nonlineage regulatory programs that can lead to epigenomic plasticity. Here, we review these recent studies related to regulatory state changes in cancer progression and highlight the growing single-cell epigenomics toolkit poised to address unresolved questions in the field.
Collapse
Affiliation(s)
- Lindsay M. LaFave
- Department of Cell Biology and Albert Einstein Cancer Center, Albert Einstein College of Medicine, Montefiore Health System, Bronx, NY, USA
| | - Rachel E. Savage
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Jason D. Buenrostro
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
19
|
Ashuach T, Reidenbach DA, Gayoso A, Yosef N. PeakVI: A deep generative model for single-cell chromatin accessibility analysis. CELL REPORTS METHODS 2022; 2:100182. [PMID: 35475224 PMCID: PMC9017241 DOI: 10.1016/j.crmeth.2022.100182] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 01/08/2022] [Accepted: 02/23/2022] [Indexed: 12/20/2022]
Abstract
Single-cell ATAC sequencing (scATAC-seq) is a powerful and increasingly popular technique to explore the regulatory landscape of heterogeneous cellular populations. However, the high noise levels, degree of sparsity, and scale of the generated data make its analysis challenging. Here, we present PeakVI, a probabilistic framework that leverages deep neural networks to analyze scATAC-seq data. PeakVI fits an informative latent space that preserves biological heterogeneity while correcting batch effects and accounting for technical effects, such as library size and region-specific biases. In addition, PeakVI provides a technique for identifying differential accessibility at a single-region resolution, which can be used for cell-type annotation as well as identification of key cis-regulatory elements. We use public datasets to demonstrate that PeakVI is scalable, stable, robust to low-quality data, and outperforms current analysis methods on a range of critical analysis tasks. PeakVI is publicly available and implemented in the scvi-tools framework.
Collapse
Affiliation(s)
- Tal Ashuach
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Daniel A. Reidenbach
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Adam Gayoso
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
- Chan Zuckerberg BioHub, San Francisco, CA, USA
| |
Collapse
|
20
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|
21
|
Xu Y, Liu X, Cao X, Huang C, Liu E, Qian S, Liu X, Wu Y, Dong F, Qiu CW, Qiu J, Hua K, Su W, Wu J, Xu H, Han Y, Fu C, Yin Z, Liu M, Roepman R, Dietmann S, Virta M, Kengara F, Zhang Z, Zhang L, Zhao T, Dai J, Yang J, Lan L, Luo M, Liu Z, An T, Zhang B, He X, Cong S, Liu X, Zhang W, Lewis JP, Tiedje JM, Wang Q, An Z, Wang F, Zhang L, Huang T, Lu C, Cai Z, Wang F, Zhang J. Artificial intelligence: A powerful paradigm for scientific research. Innovation (N Y) 2021; 2:100179. [PMID: 34877560 PMCID: PMC8633405 DOI: 10.1016/j.xinn.2021.100179] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 10/26/2021] [Indexed: 12/18/2022] Open
Abstract
Artificial intelligence (AI) coupled with promising machine learning (ML) techniques well known from computer science is broadly affecting many aspects of various fields including science and technology, industry, and even our day-to-day life. The ML techniques have been developed to analyze high-throughput data with a view to obtaining useful insights, categorizing, predicting, and making evidence-based decisions in novel ways, which will promote the growth of novel applications and fuel the sustainable booming of AI. This paper undertakes a comprehensive survey on the development and application of AI in different aspects of fundamental sciences, including information science, mathematics, medical science, materials science, geoscience, life science, physics, and chemistry. The challenges that each discipline of science meets, and the potentials of AI techniques to handle these challenges, are discussed in detail. Moreover, we shed light on new research trends entailing the integration of AI into each scientific discipline. The aim of this paper is to provide a broad research guideline on fundamental sciences with potential infusion of AI, to help motivate researchers to deeply understand the state-of-the-art applications of AI-based fundamental sciences, and thereby to help promote the continuous development of these fundamental sciences.
Collapse
Affiliation(s)
- Yongjun Xu
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin Cao
- Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai 200032, China
| | - Changping Huang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Enke Liu
- Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
- Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China
| | - Sen Qian
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Xingchen Liu
- Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China
| | - Yanjun Wu
- Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fengliang Dong
- National Center for Nanoscience and Technology, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Cheng-Wei Qiu
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583, Singapore
| | - Junjun Qiu
- Department of Gynaecology, Obstetrics and Gynaecology Hospital, Fudan University, Shanghai 200011, China
- Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Shanghai 200011, China
| | - Keqin Hua
- Department of Gynaecology, Obstetrics and Gynaecology Hospital, Fudan University, Shanghai 200011, China
- Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Shanghai 200011, China
| | - Wentao Su
- School of Food Science and Technology, Dalian Polytechnic University, Dalian 116034, China
| | - Jian Wu
- Second Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou 310058, China
| | - Huiyu Xu
- Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing 100191, China
| | - Yong Han
- Zhejiang Provincial People’s Hospital, Hangzhou 310014, China
| | - Chenguang Fu
- School of Materials Science and Engineering, Zhejiang University, Hangzhou 310027, China
| | - Zhigang Yin
- Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou 350002, China
| | - Miao Liu
- Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
- Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China
| | - Ronald Roepman
- Medical Center, Radboud University, 6500 Nijmegen, the Netherlands
| | - Sabine Dietmann
- Institute for Informatics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Marko Virta
- Department of Microbiology, University of Helsinki, 00014 Helsinki, Finland
| | - Fredrick Kengara
- School of Pure and Applied Sciences, Bomet University College, Bomet 20400, Kenya
| | - Ze Zhang
- Agriculture College of Shihezi University, Xinjiang 832000, China
| | - Lifu Zhang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
- Agriculture College of Shihezi University, Xinjiang 832000, China
| | - Taolan Zhao
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Ji Dai
- The Brain Cognition and Brain Disease Institute, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Shenzhen-Hong Kong Institute of Brain Science-Shenzhen Fundamental Research Institutions, Shenzhen 518055, China
| | | | - Liang Lan
- Department of Communication Studies, Hong Kong Baptist University, Hong Kong, China
| | - Ming Luo
- South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Zhaofeng Liu
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tao An
- Shanghai Astronomical Observatory, Chinese Academy of Sciences, Shanghai 200030, China
| | - Bin Zhang
- Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China
| | - Xiao He
- Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
| | - Shan Cong
- Suzhou Institute of Nano-Tech and Nano-Bionics, Chinese Academy of Sciences, Suzhou 215123, China
| | - Xiaohong Liu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - Wei Zhang
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
| | - James P. Lewis
- Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China
| | - James M. Tiedje
- Center for Microbial Ecology, Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Qi Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Zhejiang Lab, Hangzhou 311121, China
| | - Zhulin An
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fei Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Libo Zhang
- Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China
| | - Chuan Lu
- Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion SY23 3FL, UK
| | - Zhipeng Cai
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
| | - Fang Wang
- Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiabao Zhang
- Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
22
|
Blanco MA, Sykes DB, Gu L, Wu M, Petroni R, Karnik R, Wawer M, Rico J, Li H, Jacobus WD, Jambhekar A, Cheloufi S, Meissner A, Hochedlinger K, Scadden DT, Shi Y. Chromatin-state barriers enforce an irreversible mammalian cell fate decision. Cell Rep 2021; 37:109967. [PMID: 34758323 DOI: 10.1016/j.celrep.2021.109967] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 05/12/2021] [Accepted: 10/19/2021] [Indexed: 12/13/2022] Open
Abstract
Stem and progenitor cells have the capacity to balance self-renewal and differentiation. Hematopoietic myeloid progenitors replenish more than 25 billion terminally differentiated neutrophils every day under homeostatic conditions and can increase this output in response to stress or infection. At what point along the spectrum of maturation do progenitors lose capacity for self-renewal and become irreversibly committed to differentiation? Using a system of conditional myeloid development that can be toggled between self-renewal and differentiation, we interrogate determinants of this "point of no return" in differentiation commitment. Irreversible commitment is due primarily to loss of open regulatory site access and disruption of a positive feedback transcription factor activation loop. Restoration of the transcription factor feedback loop extends the window of cell plasticity and alters the point of no return. These findings demonstrate how the chromatin state enforces and perpetuates cell fate and identify potential avenues for manipulating cell identity.
Collapse
Affiliation(s)
- M Andrés Blanco
- Department of Biomedical Sciences, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, USA; Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA; Department of Cell Biology, Harvard Medical School, Boston, MA, USA.
| | - David B Sykes
- Center for Regenerative Medicine, Massachusetts General Hospital, Boston, MA, USA; Harvard Stem Cell Institute, Cambridge, MA, USA
| | - Lei Gu
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA; Department of Cell Biology, Harvard Medical School, Boston, MA, USA; Cardiopulmonary Institute (CPI), Bad Nauheim, Germany; Epigenetics Laboratory, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Mengjun Wu
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA; Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| | - Ricardo Petroni
- Department of Biomedical Sciences, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Rahul Karnik
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Stem Cell Institute, Cambridge, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Mathias Wawer
- Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joshua Rico
- Department of Biomedical Sciences, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Haitao Li
- Department of Biomedical Sciences, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - William D Jacobus
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA; Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| | - Ashwini Jambhekar
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA; Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| | - Sihem Cheloufi
- Department of Biochemistry, Stem Cell Center, University of California, Riverside, Riverside, CA, USA
| | - Alexander Meissner
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Stem Cell Institute, Cambridge, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA; Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Konrad Hochedlinger
- Center for Regenerative Medicine, Massachusetts General Hospital, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Stem Cell Institute, Cambridge, MA, USA; Department of Molecular Biology and Cancer Center, Massachusetts General Hospital, Boston, MA, USA
| | - David T Scadden
- Center for Regenerative Medicine, Massachusetts General Hospital, Boston, MA, USA; Harvard Stem Cell Institute, Cambridge, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.
| | - Yang Shi
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA; Ludwig Institute for Cancer Research, Oxford Branch, Oxford University, Oxford, UK.
| |
Collapse
|
23
|
Li Z, Kuppe C, Ziegler S, Cheng M, Kabgani N, Menzel S, Zenke M, Kramann R, Costa IG. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen. Nat Commun 2021; 12:6386. [PMID: 34737275 PMCID: PMC8568974 DOI: 10.1038/s41467-021-26530-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 10/04/2021] [Indexed: 12/18/2022] Open
Abstract
A major drawback of single-cell ATAC-seq (scATAC-seq) is its sparsity, i.e., open chromatin regions with no reads due to loss of DNA material during the scATAC-seq protocol. Here, we propose scOpen, a computational method based on regularized non-negative matrix factorization for imputing and quantifying the open chromatin status of regulatory regions from sparse scATAC-seq experiments. We show that scOpen improves crucial downstream analysis steps of scATAC-seq data as clustering, visualization, cis-regulatory DNA interactions, and delineation of regulatory features. We demonstrate the power of scOpen to dissect regulatory changes in the development of fibrosis in the kidney. This identifies a role of Runx1 and target genes by promoting fibroblast to myofibroblast differentiation driving kidney fibrosis.
Collapse
Affiliation(s)
- Zhijian Li
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Christoph Kuppe
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany
- Division of Nephrology and Clinical Immunology, RWTH Aachen University, 52074, Aachen, Germany
| | - Susanne Ziegler
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Mingbo Cheng
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Nazanin Kabgani
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Sylvia Menzel
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany
| | - Martin Zenke
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, 52074, Aachen, Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, Aachen, Germany
| | - Rafael Kramann
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University Medical School, 52074, Aachen, Germany.
- Division of Nephrology and Clinical Immunology, RWTH Aachen University, 52074, Aachen, Germany.
- Department of Internal Medicine, Nephrology and Transplantation, Erasmus Medical Center, 3015GD, Rotterdam, The Netherlands.
| | - Ivan G Costa
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074, Aachen, Germany.
| |
Collapse
|
24
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
25
|
Dibaeinia P, Sinha S. Deciphering enhancer sequence using thermodynamics-based models and convolutional neural networks. Nucleic Acids Res 2021; 49:10309-10327. [PMID: 34508359 PMCID: PMC8501998 DOI: 10.1093/nar/gkab765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/18/2021] [Accepted: 08/25/2021] [Indexed: 11/18/2022] Open
Abstract
Deciphering the sequence-function relationship encoded in enhancers holds the key to interpreting non-coding variants and understanding mechanisms of transcriptomic variation. Several quantitative models exist for predicting enhancer function and underlying mechanisms; however, there has been no systematic comparison of these models characterizing their relative strengths and shortcomings. Here, we interrogated a rich data set of neuroectodermal enhancers in Drosophila, representing cis- and trans- sources of expression variation, with a suite of biophysical and machine learning models. We performed rigorous comparisons of thermodynamics-based models implementing different mechanisms of activation, repression and cooperativity. Moreover, we developed a convolutional neural network (CNN) model, called CoNSEPT, that learns enhancer 'grammar' in an unbiased manner. CoNSEPT is the first general-purpose CNN tool for predicting enhancer function in varying conditions, such as different cell types and experimental conditions, and we show that such complex models can suggest interpretable mechanisms. We found model-based evidence for mechanisms previously established for the studied system, including cooperative activation and short-range repression. The data also favored one hypothesized activation mechanism over another and suggested an intriguing role for a direct, distance-independent repression mechanism. Our modeling shows that while fundamentally different models can yield similar fits to data, they vary in their utility for mechanistic inference. CoNSEPT is freely available at: https://github.com/PayamDiba/CoNSEPT.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|