1
|
Tian Y, Wu X, Luo S, Xiong D, Liu R, Hu L, Yuan Y, Shi G, Yao J, Huang Z, Fu F, Yang X, Tang Z, Zhang J, Hu K. A multi-omic single-cell landscape of cellular diversification in the developing human cerebral cortex. Comput Struct Biotechnol J 2024; 23:2173-2189. [PMID: 38827229 PMCID: PMC11141146 DOI: 10.1016/j.csbj.2024.05.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 05/09/2024] [Accepted: 05/13/2024] [Indexed: 06/04/2024] Open
Abstract
The vast neuronal diversity in the human neocortex is vital for high-order brain functions, necessitating elucidation of the regulatory mechanisms underlying such unparalleled diversity. However, recent studies have yet to comprehensively reveal the diversity of neurons and the molecular logic of neocortical origin in humans at single-cell resolution through profiling transcriptomic or epigenomic landscapes, owing to the application of unimodal data alone to depict exceedingly heterogeneous populations of neurons. In this study, we generated a comprehensive compendium of the developing human neocortex by simultaneously profiling gene expression and open chromatin from the same cell. We computationally reconstructed the differentiation trajectories of excitatory projection neurons of cortical origin and inferred the regulatory logic governing lineage bifurcation decisions for neuronal diversification. We demonstrated that neuronal diversity arises from progenitor cell lineage specificity and postmitotic differentiation at distinct stages. Our data paves the way for understanding the primarily coordinated regulatory logic for neuronal diversification in the neocortex.
Collapse
Affiliation(s)
- Yuhan Tian
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Xia Wu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Songhao Luo
- School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China
| | - Dan Xiong
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Rong Liu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Lanqi Hu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Yuchen Yuan
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Guowei Shi
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Junjie Yao
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Zhiwei Huang
- School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China
| | - Fang Fu
- Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou 511436, China
| | - Xin Yang
- Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou 511436, China
| | - Zhonghui Tang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
| | - Jiajun Zhang
- School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China
| | - Kunhua Hu
- Guangdong Provincial Key Laboratory of Brain Function and Disease, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
- Public Platform Laboratory, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou 510630, China
| |
Collapse
|
2
|
Zhu T, Xia C, Yu R, Zhou X, Xu X, Wang L, Zong Z, Yang J, Liu Y, Ming L, You Y, Chen D, Xie W. Comprehensive mapping and modelling of the rice regulome landscape unveils the regulatory architecture underlying complex traits. Nat Commun 2024; 15:6562. [PMID: 39095348 PMCID: PMC11297339 DOI: 10.1038/s41467-024-50787-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 07/19/2024] [Indexed: 08/04/2024] Open
Abstract
Unraveling the regulatory mechanisms that govern complex traits is pivotal for advancing crop improvement. Here we present a comprehensive regulome atlas for rice (Oryza sativa), charting the chromatin accessibility across 23 distinct tissues from three representative varieties. Our study uncovers 117,176 unique open chromatin regions (OCRs), accounting for ~15% of the rice genome, a notably higher proportion compared to previous reports in plants. Integrating RNA-seq data from matched tissues, we confidently predict 59,075 OCR-to-gene links, with enhancers constituting 69.54% of these associations, including many known enhancer-to-gene links. Leveraging this resource, we re-evaluate genome-wide association study results and discover a previously unknown function of OsbZIP06 in seed germination, which we subsequently confirm through experimental validation. We optimize deep learning models to decode regulatory grammar, achieving robust modeling of tissue-specific chromatin accessibility. This approach allows to predict cross-variety regulatory dynamics from genomic sequences, shedding light on the genetic underpinnings of cis-regulatory divergence and morphological disparities between varieties. Overall, our study establishes a foundational resource for rice functional genomics and precision molecular breeding, providing valuable insights into regulatory mechanisms governing complex traits.
Collapse
Affiliation(s)
- Tao Zhu
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
- Chemistry and Biomedicine Innovation Center, Nanjing University, Nanjing, 210023, China
| | - Chunjiao Xia
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ranran Yu
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
| | - Xinkai Zhou
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
| | - Xingbing Xu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Lin Wang
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
| | - Zhanxiang Zong
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Junjiao Yang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yinmeng Liu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Luchang Ming
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuxin You
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Gastroenterology, Nanjing Drum Tower Hospital, National Resource Center for Mutant Mice, School of Life Sciences, Nanjing University, Nanjing, 210023, China.
- Chemistry and Biomedicine Innovation Center, Nanjing University, Nanjing, 210023, China.
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China.
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, 430070, China.
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
| |
Collapse
|
3
|
Kathail P, Shuai RW, Chung R, Ye CJ, Loeb GB, Ioannidis NM. Current genomic deep learning models display decreased performance in cell type-specific accessible regions. Genome Biol 2024; 25:202. [PMID: 39090688 PMCID: PMC11293111 DOI: 10.1186/s13059-024-03335-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 07/10/2024] [Indexed: 08/04/2024] Open
Abstract
BACKGROUND A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type-specific CREs contain a large proportion of complex disease heritability. RESULTS We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks) and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models-Enformer and Sei-varies across the genome and is reduced in cell type-specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type-specific regulatory syntax-through single-task learning or high capacity multi-task models-can improve performance in cell type-specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. CONCLUSIONS Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type-specific accessible regions. We also identify strategies to maximize performance in cell type-specific accessible regions.
Collapse
Affiliation(s)
- Pooja Kathail
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
| | - Richard W Shuai
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Ryan Chung
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Chun Jimmie Ye
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gabriel B Loeb
- Division of Nephrology, Department of Medicine, University of California, San Francisco, CA, USA.
- Cardiovascular Research Institute, University of California, San Francisco, CA, USA.
| | - Nilah M Ioannidis
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
4
|
Kathail P, Shuai RW, Chung R, Ye CJ, Loeb GB, Ioannidis NM. Current genomic deep learning models display decreased performance in cell type specific accessible regions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.05.602265. [PMID: 39026761 PMCID: PMC11257480 DOI: 10.1101/2024.07.05.602265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Background A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type specific CREs contain a large proportion of complex disease heritability. Results We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks), and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models - Enformer and Sei - varies across the genome and is reduced in cell type specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type specific regulatory syntax - through single-task learning or high capacity multi-task models - can improve performance in cell type specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. Conclusions Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type specific accessible regions. We also identify strategies to maximize performance in cell type specific accessible regions.
Collapse
Affiliation(s)
- Pooja Kathail
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Richard W. Shuai
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Ryan Chung
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Chun Jimmie Ye
- Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gabriel B. Loeb
- Division of Nephrology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Nilah M. Ioannidis
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| |
Collapse
|
5
|
Yin C, Hair SC, Byeon GW, Bromley P, Meuleman W, Seelig G. Iterative deep learning-design of human enhancers exploits condensed sequence grammar to achieve cell type-specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599076. [PMID: 38915713 PMCID: PMC11195158 DOI: 10.1101/2024.06.14.599076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
An important and largely unsolved problem in synthetic biology is how to target gene expression to specific cell types. Here, we apply iterative deep learning to design synthetic enhancers with strong differential activity between two human cell lines. We initially train models on published datasets of enhancer activity and chromatin accessibility and use them to guide the design of synthetic enhancers that maximize predicted specificity. We experimentally validate these sequences, use the measurements to re-optimize the predictor, and design a second generation of enhancers with improved specificity. Our design methods embed relevant transcription factor binding site (TFBS) motifs with higher frequencies than comparable endogenous enhancers while using a more selective motif vocabulary, and we show that enhancer activity is correlated with transcription factor expression at the single cell level. Finally, we characterize causal features of top enhancers via perturbation experiments and show enhancers as short as 50bp can maintain specificity.
Collapse
Affiliation(s)
- Christopher Yin
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
| | | | - Gun Woo Byeon
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
| | - Peter Bromley
- Altius Institute for Biomedical Sciences, Seattle, WA
| | - Wouter Meuleman
- Altius Institute for Biomedical Sciences, Seattle, WA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| |
Collapse
|
6
|
Shi Q, Song F, Zhou X, Chen X, Cao J, Na J, Fan Y, Zhang G, Zheng L. Early Predicting Osteogenic Differentiation of Mesenchymal Stem Cells Based on Deep Learning Within One Day. Ann Biomed Eng 2024; 52:1706-1718. [PMID: 38488988 DOI: 10.1007/s10439-024-03483-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/24/2024] [Indexed: 03/17/2024]
Abstract
Osteogenic differentiation of mesenchymal stem cells (MSCs) is proposed to be critical for bone tissue engineering and regenerative medicine. However, the current approach for evaluating osteogenic differentiation mainly involves immunohistochemical staining of specific markers which often can be detected at day 5-7 of osteogenic inducing. Deep learning (DL) is a significant technology for realizing artificial intelligence (AI). Computer vision, a branch of AI, has been proved to achieve high-precision image recognition using convolutional neural networks (CNNs). Our goal was to train CNNs to quantitatively measure the osteogenic differentiation of MSCs. To this end, bright-field images of MSCs during early osteogenic differentiation (day 0, 1, 3, 5, and 7) were captured using a simple optical phase contrast microscope to train CNNs. The results showed that the CNNs could be trained to recognize undifferentiated cells and differentiating cells with an accuracy of 0.961 on the independent test set. In addition, we found that CNNs successfully distinguished differentiated cells at a very early stage (only 1 day). Further analysis showed that overall morphological features of MSCs were the main basis for the CNN classification. In conclusion, MSCs differentiation detection can be achieved early and accurately through simple bright-field images and DL networks, which may also provide a potential and novel method for the field of cell detection in the near future.
Collapse
Affiliation(s)
- Qiusheng Shi
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Fan Song
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Xiaocheng Zhou
- Department of Statistics, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR, China
| | - Xinyuan Chen
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Jingqi Cao
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Jing Na
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Yubo Fan
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China.
| | - Guanglei Zhang
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China.
| | - Lisha Zheng
- Key Laboratory of Biomechanics and Mechanobiology (Beihang University), Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, No. 37 Xueyuan Road, Haidian District, Beijing, 100191, China.
| |
Collapse
|
7
|
Pratt HE, Andrews G, Shedd N, Phalke N, Li T, Pampari A, Jensen M, Wen C, Consortium P, Gandal MJ, Geschwind DH, Gerstein M, Moore J, Kundaje A, Colubri A, Weng Z. Using a comprehensive atlas and predictive models to reveal the complexity and evolution of brain-active regulatory elements. SCIENCE ADVANCES 2024; 10:eadj4452. [PMID: 38781344 PMCID: PMC11114231 DOI: 10.1126/sciadv.adj4452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 04/25/2024] [Indexed: 05/25/2024]
Abstract
Most genetic variants associated with psychiatric disorders are located in noncoding regions of the genome. To investigate their functional implications, we integrate epigenetic data from the PsychENCODE Consortium and other published sources to construct a comprehensive atlas of candidate brain cis-regulatory elements. Using deep learning, we model these elements' sequence syntax and predict how binding sites for lineage-specific transcription factors contribute to cell type-specific gene regulation in various types of glia and neurons. The elements' evolutionary history suggests that new regulatory information in the brain emerges primarily via smaller sequence mutations within conserved mammalian elements rather than entirely new human- or primate-specific sequences. However, primate-specific candidate elements, particularly those active during fetal brain development and in excitatory neurons and astrocytes, are implicated in the heritability of brain-related human traits. Additionally, we introduce PsychSCREEN, a web-based platform offering interactive visualization of PsychENCODE-generated genetic and epigenetic data from diverse brain cell types in individuals with psychiatric disorders and healthy controls.
Collapse
Affiliation(s)
- Henry E. Pratt
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Gregory Andrews
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Nicole Shedd
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Nishigandha Phalke
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Tongxin Li
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Khoury College of Computer Science, Northeastern University, Boston, MA 02115, USA
| | - Anusri Pampari
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matthew Jensen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Cindy Wen
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Michael J. Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Daniel H. Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Jill Moore
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Andrés Colubri
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Zhiping Weng
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| |
Collapse
|
8
|
Duncan AG, Mitchell JA, Moses AM. Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae190. [PMID: 38588559 DOI: 10.1093/bioinformatics/btae190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/12/2024] [Accepted: 04/05/2024] [Indexed: 04/10/2024]
Abstract
MOTIVATION Supervised deep learning is used to model the complex relationship between genomic sequence and regulatory function. Understanding how these models make predictions can provide biological insight into regulatory functions. Given the complexity of the sequence to regulatory function mapping (the cis-regulatory code), it has been suggested that the genome contains insufficient sequence variation to train models with suitable complexity. Data augmentation is a widely used approach to increase the data variation available for model training, however current data augmentation methods for genomic sequence data are limited. RESULTS Inspired by the success of comparative genomics, we show that augmenting genomic sequences with evolutionarily related sequences from other species, which we term phylogenetic augmentation, improves the performance of deep learning models trained on regulatory genomic sequences to predict high-throughput functional assay measurements. Additionally, we show that phylogenetic augmentation can rescue model performance when the training set is down-sampled and permits deep learning on a real-world small dataset, demonstrating that this approach improves data efficiency. Overall, this data augmentation method represents a solution for improving model performance that is applicable to many supervised deep-learning problems in genomics. AVAILABILITY AND IMPLEMENTATION The open-source GitHub repository agduncan94/phylogenetic_augmentation_paper includes the code for rerunning the analyses here and recreating the figures.
Collapse
Affiliation(s)
- Andrew G Duncan
- Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada
| | | | - Alan M Moses
- Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada
| |
Collapse
|
9
|
Qiu W, Dincer AB, Janizek JD, Celik S, Pittet M, Naxerova K, Lee SI. A deep profile of gene expression across 18 human cancers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.17.585426. [PMID: 38559197 PMCID: PMC10980029 DOI: 10.1101/2024.03.17.585426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Clinically and biologically valuable information may reside untapped in large cancer gene expression data sets. Deep unsupervised learning has the potential to extract this information with unprecedented efficacy but has thus far been hampered by a lack of biological interpretability and robustness. Here, we present DeepProfile, a comprehensive framework that addresses current challenges in applying unsupervised deep learning to gene expression profiles. We use DeepProfile to learn low-dimensional latent spaces for 18 human cancers from 50,211 transcriptomes. DeepProfile outperforms existing dimensionality reduction methods with respect to biological interpretability. Using DeepProfile interpretability methods, we show that genes that are universally important in defining the latent spaces across all cancer types control immune cell activation, while cancer type-specific genes and pathways define molecular disease subtypes. By linking DeepProfile latent variables to secondary tumor characteristics, we discover that tumor mutation burden is closely associated with the expression of cell cycle-related genes. DNA mismatch repair and MHC class II antigen presentation pathway expression, on the other hand, are consistently associated with patient survival. We validate these results through Kaplan-Meier analyses and nominate tumor-associated macrophages as an important source of survival-correlated MHC class II transcripts. Our results illustrate the power of unsupervised deep learning for discovery of novel cancer biology from existing gene expression data.
Collapse
Affiliation(s)
- Wei Qiu
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| | - Ayse B. Dincer
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| | - Joseph D. Janizek
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
- Medical Scientist Training Program, University of Washington, Seattle, WA
| | | | - Mikael Pittet
- Department of Pathology and Immunology, University of Geneva, Switzerland
- Ludwig Institute for Cancer Research, Lausanne Branch, Switzerland
| | - Kamila Naxerova
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Su-In Lee
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
| |
Collapse
|
10
|
Tostado CP, Da Ong LX, Heng JJW, Miccolis C, Chia S, Seow JJW, Toh Y, DasGupta R. An AI-assisted integrated, scalable, single-cell phenomic-transcriptomic platform to elucidate intratumor heterogeneity against immune response. Bioeng Transl Med 2024; 9:e10628. [PMID: 38435825 PMCID: PMC10905538 DOI: 10.1002/btm2.10628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 11/16/2023] [Indexed: 03/05/2024] Open
Abstract
We present a novel framework combining single-cell phenotypic data with single-cell transcriptomic analysis to identify factors underpinning heterogeneity in antitumor immune response. We developed a pairwise, tumor-immune discretized interaction assay between natural killer (NK-92MI) cells and patient-derived head and neck squamous cell carcinoma (HNSCC) cell lines on a microfluidic cell-trapping platform. Furthermore we generated a deep-learning computer vision algorithm that is capable of automating the acquisition and analysis of a large, live-cell imaging data set (>1 million) of paired tumor-immune interactions spanning a time course of 24 h across multiple HNSCC lines (n = 10). Finally, we combined the response data measured by Kaplan-Meier survival analysis against NK-mediated killing with downstream single-cell transcriptomic analysis to interrogate molecular signatures associated with NK-effector response. As proof-of-concept for the proposed framework, we efficiently identified MHC class I-driven cytotoxic resistance as a key mechanism for immune evasion in nonresponders, while enhanced expression of cell adhesion molecules was found to be correlated with sensitivity against NK-mediated cytotoxicity. We conclude that this integrated, data-driven phenotypic approach holds tremendous promise in advancing the rapid identification of new mechanisms and therapeutic targets related to immune evasion and response.
Collapse
Affiliation(s)
- Christopher P. Tostado
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
- Institute for Health Innovation and Technology (iHealthtech), National University of SingaporeSingaporeSingapore
| | - Lucas Xian Da Ong
- Institute for Health Innovation and Technology (iHealthtech), National University of SingaporeSingaporeSingapore
| | - Joel Jia Wei Heng
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| | - Carlo Miccolis
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| | - Shumei Chia
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| | - Justine Jia Wen Seow
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| | - Yi‐Chin Toh
- Institute for Health Innovation and Technology (iHealthtech), National University of SingaporeSingaporeSingapore
- School of Mechanical, Medical and Process EngineeringQueensland University of TechnologyBrisbaneAustralia
- Centre for Biomedical TechnologiesQueensland University of TechnologyBrisbaneAustralia
| | - Ramanuj DasGupta
- Genome Institute of Singapore, Laboratory of Precision Oncology and Cancer EvolutionSingaporeSingapore
| |
Collapse
|
11
|
Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon J, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman W, Parcy F, Mathelier A. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2024; 52:D174-D182. [PMID: 37962376 PMCID: PMC10767809 DOI: 10.1093/nar/gkad1059] [Citation(s) in RCA: 40] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/31/2023] [Indexed: 11/15/2023] Open
Abstract
JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Rafael Riudavets-Puig
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Romain Blanc-Mathieu
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Katalin Ferenc
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Vipin Kumar
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Roza Berhanu Lemma
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Jérémy Lucas
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jeanne Chèneby
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Damir Baranasic
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta, 10000 Zagreb, Croatia
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Sveinung Gundersen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Morten Johansen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2200 Copenhagen N, Denmark
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - François Parcy
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
12
|
Pechmann S. Single-cell expression predicts neuron-specific protein homeostasis networks. Open Biol 2024; 14:230386. [PMID: 38262604 PMCID: PMC10805596 DOI: 10.1098/rsob.230386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 11/17/2023] [Indexed: 01/25/2024] Open
Abstract
The protein homeostasis network keeps proteins in their correct shapes and avoids unwanted aggregation. In turn, the accumulation of aberrantly misfolded proteins has been directly associated with the onset of ageing-associated neurodegenerative diseases such as Alzheimer's and Parkinson's. However, a detailed and rational understanding of how protein homeostasis is achieved in health, and how it can be targeted for therapeutic intervention in diseases remains missing. Here, large-scale single-cell expression data from the Allen Brain Map are analysed to investigate the transcription regulation of the core protein homeostasis network across the human brain. Remarkably, distinct expression profiles suggest specialized protein homeostasis networks with systematic adaptations in excitatory neurons, inhibitory neurons and non-neuronal cells. Moreover, several chaperones and Ubiquitin ligases are found transcriptionally coregulated with genes important for synapse formation and maintenance, thus linking protein homeostasis to the regulation of neuronal function. Finally, evolutionary analyses highlight the conservation of an elevated interaction density in the chaperone network, suggesting that one of the most exciting aspects of chaperone action may yet be discovered in their collective action at the systems level. More generally, our work highlights the power of computational analyses for breaking down complexity and gaining complementary insights into fundamental biological problems.
Collapse
|
13
|
Chung HK, Liu C, Sun M, Casillas E, Chen T, Chick B, Wang J, Ma S, Mcdonald B, He P, Yang Q, Varanasi SK, Mann T, Chen D, Hoffmann F, Tripple V, Hang Y, Ho J, Cho UH, Williams A, Wang Y, Hargreaves D, Kaech SM, Wang W. Multiomics atlas-assisted discovery of transcription factors enables specific cell state programming. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.03.522354. [PMID: 36711632 PMCID: PMC9881845 DOI: 10.1101/2023.01.03.522354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The same types of cells can assume diverse states with varying functionalities. Effective cell therapy can be achieved by specifically driving a desirable cell state, which requires the elucidation of key transcription factors (TFs). Here, we integrated epigenomic and transcriptomic data at the systems level to identify TFs that define different CD8 + T cell states in an unbiased manner. These TF profiles can be used for cell state programming that aims to maximize the therapeutic potential of T cells. For example, T cells can be programmed to avoid a terminal exhaustion state (Tex Term ), a dysfunctional T cell state that is often found in tumors or chronic infections. However, Tex Term exhibits high similarity with the beneficial tissue-resident memory T states (T RM ) in terms of their locations and transcription profiles. Our bioinformatic analysis predicted Zscan20 , a novel TF, to be uniquely active in Tex Term . Consistently, Zscan20 knock-out thwarted the differentiation of Tex Term in vivo , but not that of T RM . Furthermore, perturbation of Zscan20 programs T cells into an effector-like state that confers superior tumor and virus control and synergizes with immune checkpoint therapy. We also identified Jdp2 and Nfil3 as powerful Tex Term drivers. In short, our multiomics-based approach discovered novel TFs that enhance anti-tumor immunity, and enable highly effective cell state programming. One sentence summary Multiomics atlas enables the systematic identification of cell-state specifying transcription factors for therapeutic cell state programming.
Collapse
|
14
|
Sasse A, Ng B, Spiro AE, Tasaki S, Bennett DA, Gaiteri C, De Jager PL, Chikina M, Mostafavi S. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. Nat Genet 2023; 55:2060-2064. [PMID: 38036778 DOI: 10.1038/s41588-023-01524-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 09/08/2023] [Indexed: 12/02/2023]
Abstract
Deep learning methods have recently become the state of the art in a variety of regulatory genomic tasks1-6, including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions; however, systematic benchmarking is lacking to assess their predictions across individuals, which would directly evaluate their utility as personal DNA interpreters. We used paired whole genome sequencing and gene expression from 839 individuals in the ROSMAP study7 to evaluate the ability of current methods to predict gene expression variation across individuals at varied loci. Our approach identifies a limitation of current methods to correctly predict the direction of variant effects. We show that this limitation stems from insufficiently learned sequence motif grammar and suggest new model training strategies to improve performance.
Collapse
Affiliation(s)
- Alexander Sasse
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Bernard Ng
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Anna E Spiro
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Christopher Gaiteri
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Philip L De Jager
- Center for Translational & Computational Neuroimmunology, Department of Neurology, and the Taub Institute for the Study of Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| |
Collapse
|
15
|
Nair S, Ameen M, Sundaram L, Pampari A, Schreiber J, Balsubramani A, Wang YX, Burns D, Blau HM, Karakikes I, Wang KC, Kundaje A. Transcription factor stoichiometry, motif affinity and syntax regulate single-cell chromatin dynamics during fibroblast reprogramming to pluripotency. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.04.560808. [PMID: 37873116 PMCID: PMC10592962 DOI: 10.1101/2023.10.04.560808] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Ectopic expression of OCT4, SOX2, KLF4 and MYC (OSKM) transforms differentiated cells into induced pluripotent stem cells. To refine our mechanistic understanding of reprogramming, especially during the early stages, we profiled chromatin accessibility and gene expression at single-cell resolution across a densely sampled time course of human fibroblast reprogramming. Using neural networks that map DNA sequence to ATAC-seq profiles at base-resolution, we annotated cell-state-specific predictive transcription factor (TF) motif syntax in regulatory elements, inferred affinity- and concentration-dependent dynamics of Tn5-bias corrected TF footprints, linked peaks to putative target genes, and elucidated rewiring of TF-to-gene cis-regulatory networks. Our models reveal that early in reprogramming, OSK, at supraphysiological concentrations, rapidly open transient regulatory elements by occupying non-canonical low-affinity binding sites. As OSK concentration falls, the accessibility of these transient elements decays as a function of motif affinity. We find that these OSK-dependent transient elements sequester the somatic TF AP-1. This redistribution is strongly associated with the silencing of fibroblast-specific genes within individual nuclei. Together, our integrated single-cell resource and models reveal insights into the cis-regulatory code of reprogramming at unprecedented resolution, connect TF stoichiometry and motif syntax to diversification of cell fate trajectories, and provide new perspectives on the dynamics and role of transient regulatory elements in somatic silencing.
Collapse
Affiliation(s)
- Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Mohamed Ameen
- Department of Cancer Biology, Stanford University, Stanford, CA, USA
- Cardiovascular Institute, Stanford University, Stanford, CA, USA
- Department of Dermatology, Stanford University, Stanford, CA, USA
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
| | | | - Anusri Pampari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jacob Schreiber
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Yu Xin Wang
- Baxter Laboratory for Stem Cell Biology, Stanford University, Stanford, CA, USA
| | - David Burns
- Baxter Laboratory for Stem Cell Biology, Stanford University, Stanford, CA, USA
| | - Helen M Blau
- Baxter Laboratory for Stem Cell Biology, Stanford University, Stanford, CA, USA
- Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA
| | - Ioannis Karakikes
- Cardiovascular Institute, Stanford University, Stanford, CA, USA
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, USA
| | - Kevin C Wang
- Department of Dermatology, Stanford University, Stanford, CA, USA
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA
- Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| |
Collapse
|
16
|
Brennan KJ, Weilert M, Krueger S, Pampari A, Liu HY, Yang AWH, Morrison JA, Hughes TR, Rushlow CA, Kundaje A, Zeitlinger J. Chromatin accessibility in the Drosophila embryo is determined by transcription factor pioneering and enhancer activation. Dev Cell 2023; 58:1898-1916.e9. [PMID: 37557175 PMCID: PMC10592203 DOI: 10.1016/j.devcel.2023.07.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 05/09/2023] [Accepted: 07/13/2023] [Indexed: 08/11/2023]
Abstract
Chromatin accessibility is integral to the process by which transcription factors (TFs) read out cis-regulatory DNA sequences, but it is difficult to differentiate between TFs that drive accessibility and those that do not. Deep learning models that learn complex sequence rules provide an unprecedented opportunity to dissect this problem. Using zygotic genome activation in Drosophila as a model, we analyzed high-resolution TF binding and chromatin accessibility data with interpretable deep learning and performed genetic validation experiments. We identify a hierarchical relationship between the pioneer TF Zelda and the TFs involved in axis patterning. Zelda consistently pioneers chromatin accessibility proportional to motif affinity, whereas patterning TFs augment chromatin accessibility in sequence contexts where they mediate enhancer activation. We conclude that chromatin accessibility occurs in two tiers: one through pioneering, which makes enhancers accessible but not necessarily active, and the second when the correct combination of TFs leads to enhancer activation.
Collapse
Affiliation(s)
- Kaelan J Brennan
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Melanie Weilert
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Sabrina Krueger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Palo Alto, CA 94305, USA
| | - Hsiao-Yun Liu
- Department of Biology, New York University, New York, NY 10003, USA
| | - Ally W H Yang
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Jason A Morrison
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | | | - Anshul Kundaje
- Department of Computer Science, Stanford University, Palo Alto, CA 94305, USA; Department of Genetics, Stanford University, Palo Alto, CA 94305, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA; Department of Pathology & Laboratory Medicine, The University of Kansas Medical Center, Kansas City, KS 66160, USA.
| |
Collapse
|
17
|
Hepkema J, Lee NK, Stewart BJ, Ruangroengkulrith S, Charoensawan V, Clatworthy MR, Hemberg M. Predicting the impact of sequence motifs on gene regulation using single-cell data. Genome Biol 2023; 24:189. [PMID: 37582793 PMCID: PMC10426127 DOI: 10.1186/s13059-023-03021-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 07/21/2023] [Indexed: 08/17/2023] Open
Abstract
The binding of transcription factors at proximal promoters and distal enhancers is central to gene regulation. Identifying regulatory motifs and quantifying their impact on expression remains challenging. Using a convolutional neural network trained on single-cell data, we infer putative regulatory motifs and cell type-specific importance. Our model, scover, explains 29% of the variance in gene expression in multiple mouse tissues. Applying scover to distal enhancers identified using scATAC-seq from the developing human brain, we identify cell type-specific motif activities in distal enhancers. Scover can identify regulatory motifs and their importance from single-cell data where all parameters and outputs are easily interpretable.
Collapse
Affiliation(s)
- Jacob Hepkema
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Nicholas Keone Lee
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK
| | - Benjamin J Stewart
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
- Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 0QQ, UK
| | - Siwat Ruangroengkulrith
- Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Varodom Charoensawan
- Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
- Integrative Computational BioScience (ICBS) Center, Mahidol University, Nakhon Pathom, 7310, Thailand
- Systems Biology of Diseases Research Unit, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
| | - Menna R Clatworthy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, CB2 0QQ, UK
- Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 0QQ, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.
- The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.
- Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
18
|
Monti R, Ohler U. Toward Identification of Functional Sequences and Variants in Noncoding DNA. Annu Rev Biomed Data Sci 2023; 6:191-210. [PMID: 37262323 DOI: 10.1146/annurev-biodatasci-122120-110102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Understanding the noncoding part of the genome, which encodes gene regulation, is necessary to identify genetic mechanisms of disease and translate findings from genome-wide association studies into actionable results for treatments and personalized care. Here we provide an overview of the computational analysis of noncoding regions, starting from gene-regulatory mechanisms and their representation in data. Deep learning methods, when applied to these data, highlight important regulatory sequence elements and predict the functional effects of genetic variants. These and other algorithms are used to predict damaging sequence variants. Finally, we introduce rare-variant association tests that incorporate functional annotations and predictions in order to increase interpretability and statistical power.
Collapse
Affiliation(s)
- Remo Monti
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
- Digital Health-Machine Learning, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Uwe Ohler
- Max Delbrück Center for Molecular Medicine (MDC), Helmholtz Association of German Research Centers, Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany;
| |
Collapse
|
19
|
Chowdhary K, Benoist C. A variegated model of transcription factor function in the immune system. Trends Immunol 2023; 44:530-541. [PMID: 37258360 PMCID: PMC10332489 DOI: 10.1016/j.it.2023.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/26/2023] [Accepted: 05/01/2023] [Indexed: 06/02/2023]
Abstract
Specific combinations of transcription factors (TFs) control the gene expression programs that underlie specialized immune responses. Previous models of TF function in immunocytes had restricted each TF to a single functional categorization [e.g., lineage-defining (LDTFs) vs. signal-dependent TFs (SDTFs)] within one cell type. Synthesizing recent results, we instead propose a variegated model of immunological TF function, whereby many TFs have flexible and different roles across distinct cell states, contributing to cell phenotypic diversity. We discuss evidence in support of this variegated model, describe contextual inputs that enable TF diversification, and look to the future to imagine warranted experimental and computational tools to build quantitative and predictive models of immunocyte gene regulatory networks.
Collapse
|
20
|
Penhaskashi J, Sekimoto O, Chiappelli F. Permafrost viremia and immune tweening. Bioinformation 2023; 19:685-691. [PMID: 37885785 PMCID: PMC10598357 DOI: 10.6026/97320630019685] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 10/28/2023] Open
Abstract
The immune system, an exquisitely regulated physiological system, utilizes a wide spectrum of soluble factors and multiple cell populations and subpopulations at diverse states of maturation to monitor and protect the organism against foreign organisms. Immune surveillance is ensured by distinguishing self-antigens from self-associated with non-self (e.g., viral) peptides presented by major histocompatibility complexes (MHC). Pathology is often identified as unregulated inflammatory responses (e.g., cytokine storm), or recognizing self as a non-self entity (i.e., auto-immunity). Artificial intelligence (AI), and in particular specific machine learning (ML) paradigms (e.g., Deep Learning [DL]) proffer powerful algorithms to better understand and more accurately predict immune responses, immune regulation and homeostasis, and immune reactivity to challenges (i.e., immune allostasis) by their intrinsic ability to interpret immune parameters, pathways and events by analyzing large amounts of complex data and drawing predictive inferences (i.e., immune tweening). We propose here that DL models play an increasingly significant role in better defining and characterizing immunological surveillance to ancient and novel virus species released by thawing permafrost.
Collapse
Affiliation(s)
- Jaden Penhaskashi
- />Division of West Valley Dental Implant Center, Encino, CA 91316, USA
| | | | - Francesco Chiappelli
- />Dental Group of Sherman Oaks, CA 91403 , USA
- />Center for the Health Sciences, UCLA, Los Angeles, CA, USA
| |
Collapse
|
21
|
Balcı AT, Ebeid MM, Benos PV, Kostka D, Chikina M. An intrinsically interpretable neural network architecture for sequence-to-function learning. Bioinformatics 2023; 39:i413-i422. [PMID: 37387140 PMCID: PMC10311317 DOI: 10.1093/bioinformatics/btad271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post hoc analyses, and even then, one can often not explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called totally interpretable sequence-to-function model (tiSFM). tiSFM improves upon the performance of standard multilayer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multilayer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs. RESULTS We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context-specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition. AVAILABILITY AND IMPLEMENTATION The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv, implemented in Python.
Collapse
Affiliation(s)
- Ali Tuğrul Balcı
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Mark Maher Ebeid
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Panayiotis V Benos
- Department of Epidemiology, University of Florida, Gainesville, FL 32610, United States
| | - Dennis Kostka
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
- Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Maria Chikina
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Pittsburgh, PA 15213, United States
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, United States
| |
Collapse
|
22
|
Novakovsky G, Fornes O, Saraswat M, Mostafavi S, Wasserman WW. ExplaiNN: interpretable and transparent neural networks for genomics. Genome Biol 2023; 24:154. [PMID: 37370113 DOI: 10.1186/s13059-023-02985-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Deep learning models such as convolutional neural networks (CNNs) excel in genomic tasks but lack interpretability. We introduce ExplaiNN, which combines the expressiveness of CNNs with the interpretability of linear models. ExplaiNN can predict TF binding, chromatin accessibility, and de novo motifs, achieving performance comparable to state-of-the-art methods. Its predictions are transparent, providing global (cell state level) as well as local (individual sequence level) biological insights into the data. ExplaiNN can serve as a plug-and-play platform for pretrained models and annotated position weight matrices. ExplaiNN aims to accelerate the adoption of deep learning in genomic sequence analysis by domain experts.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Oriol Fornes
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
| | - Manu Saraswat
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington (UW), Seattle, USA
| | - Wyeth W Wasserman
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
23
|
Janizek JD, Dincer AB, Celik S, Chen H, Chen W, Naxerova K, Lee SI. Uncovering expression signatures of synergistic drug responses via ensembles of explainable machine-learning models. Nat Biomed Eng 2023; 7:811-829. [PMID: 37127711 PMCID: PMC11149694 DOI: 10.1038/s41551-023-01034-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 04/01/2023] [Indexed: 05/03/2023]
Abstract
Machine learning may aid the choice of optimal combinations of anticancer drugs by explaining the molecular basis of their synergy. By combining accurate models with interpretable insights, explainable machine learning promises to accelerate data-driven cancer pharmacology. However, owing to the highly correlated and high-dimensional nature of transcriptomic data, naively applying current explainable machine-learning strategies to large transcriptomic datasets leads to suboptimal outcomes. Here by using feature attribution methods, we show that the quality of the explanations can be increased by leveraging ensembles of explainable machine-learning models. We applied the approach to a dataset of 133 combinations of 46 anticancer drugs tested in ex vivo tumour samples from 285 patients with acute myeloid leukaemia and uncovered a haematopoietic-differentiation signature underlying drug combinations with therapeutic synergy. Ensembles of machine-learning models trained to predict drug combination synergies on the basis of gene-expression data may improve the feature attribution quality of complex machine-learning models.
Collapse
Affiliation(s)
- Joseph D Janizek
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Ayse B Dincer
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Safiye Celik
- Recursion Pharmaceuticals, Salt Lake City, UT, USA
| | - Hugh Chen
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - William Chen
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Kamila Naxerova
- Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Su-In Lee
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
24
|
Herrera-Uribe J, Lim KS, Byrne KA, Daharsh L, Liu H, Corbett RJ, Marco G, Schroyen M, Koltes JE, Loving CL, Tuggle CK. Integrative profiling of gene expression and chromatin accessibility elucidates specific transcriptional networks in porcine neutrophils. Front Genet 2023; 14:1107462. [PMID: 37287538 PMCID: PMC10242145 DOI: 10.3389/fgene.2023.1107462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 04/27/2023] [Indexed: 06/09/2023] Open
Abstract
Neutrophils are vital components of the immune system for limiting the invasion and proliferation of pathogens in the body. Surprisingly, the functional annotation of porcine neutrophils is still limited. The transcriptomic and epigenetic assessment of porcine neutrophils from healthy pigs was performed by bulk RNA sequencing and transposase accessible chromatin sequencing (ATAC-seq). First, we sequenced and compared the transcriptome of porcine neutrophils with eight other immune cell transcriptomes to identify a neutrophil-enriched gene list within a detected neutrophil co-expression module. Second, we used ATAC-seq analysis to report for the first time the genome-wide chromatin accessible regions of porcine neutrophils. A combined analysis using both transcriptomic and chromatin accessibility data further defined the neutrophil co-expression network controlled by transcription factors likely important for neutrophil lineage commitment and function. We identified chromatin accessible regions around promoters of neutrophil-specific genes that were predicted to be bound by neutrophil-specific transcription factors. Additionally, published DNA methylation data from porcine immune cells including neutrophils were used to link low DNA methylation patterns to accessible chromatin regions and genes with highly enriched expression in porcine neutrophils. In summary, our data provides the first integrative analysis of the accessible chromatin regions and transcriptional status of porcine neutrophils, contributing to the Functional Annotation of Animal Genomes (FAANG) project, and demonstrates the utility of chromatin accessible regions to identify and enrich our understanding of transcriptional networks in a cell type such as neutrophils.
Collapse
Affiliation(s)
- Juber Herrera-Uribe
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Kyu-Sang Lim
- Department of Animal Science, Iowa State University, Ames, IA, United States
- Department of Animal Resource Science, Kongju National University, Yesan, Republic of Korea
| | - Kristen A. Byrne
- USDA-Agriculture Research Service, National Animal Disease Center, Food Safety and Enteric Pathogens Research Unit, Ames, IA, United States
| | - Lance Daharsh
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Haibo Liu
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Ryan J. Corbett
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Gianna Marco
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Martine Schroyen
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - James E. Koltes
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Crystal L. Loving
- USDA-Agriculture Research Service, National Animal Disease Center, Food Safety and Enteric Pathogens Research Unit, Ames, IA, United States
| | | |
Collapse
|
25
|
Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW, Mostafavi S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet 2023; 24:125-137. [PMID: 36192604 DOI: 10.1038/s41576-022-00532-2] [Citation(s) in RCA: 63] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/31/2022] [Indexed: 01/24/2023]
Abstract
Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models. We discuss and categorize approaches for model interpretation, including an intuitive understanding of how each approach works and their underlying assumptions and limitations in the context of typical high-throughput biological datasets.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - Nick Dexter
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada.,School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. .,Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| |
Collapse
|
26
|
Shin B, Rothenberg EV. Multi-modular structure of the gene regulatory network for specification and commitment of murine T cells. Front Immunol 2023; 14:1108368. [PMID: 36817475 PMCID: PMC9928580 DOI: 10.3389/fimmu.2023.1108368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 01/11/2023] [Indexed: 02/04/2023] Open
Abstract
T cells develop from multipotent progenitors by a gradual process dependent on intrathymic Notch signaling and coupled with extensive proliferation. The stages leading them to T-cell lineage commitment are well characterized by single-cell and bulk RNA analyses of sorted populations and by direct measurements of precursor-product relationships. This process depends not only on Notch signaling but also on multiple transcription factors, some associated with stemness and multipotency, some with alternative lineages, and others associated with T-cell fate. These factors interact in opposing or semi-independent T cell gene regulatory network (GRN) subcircuits that are increasingly well defined. A newly comprehensive picture of this network has emerged. Importantly, because key factors in the GRN can bind to markedly different genomic sites at one stage than they do at other stages, the genes they significantly regulate are also stage-specific. Global transcriptome analyses of perturbations have revealed an underlying modular structure to the T-cell commitment GRN, separating decisions to lose "stem-ness" from decisions to block alternative fates. Finally, the updated network sheds light on the intimate relationship between the T-cell program, which depends on the thymus, and the innate lymphoid cell (ILC) program, which does not.
Collapse
Affiliation(s)
- Boyoung Shin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, United States
| | - Ellen V. Rothenberg
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, United States
| |
Collapse
|
27
|
Balcı AT, Ebeid MM, Benos PV, Kostka D, Chikina M. An intrinsically interpretable neural network architecture for sequence to function learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.25.525572. [PMID: 36747873 PMCID: PMC9900791 DOI: 10.1101/2023.01.25.525572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Motivation Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post-hoc analyses, and even then, we often cannot explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called tiSFM (totally interpretable sequence to function model). tiSFM improves upon the performance of standard multi-layer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multi-layer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs. Results tiSFM's model architecture makes use of convolutions with a fixed set of kernel weights representing known transcription factor (TF) binding site motifs. We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state- of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition. Availability and implementation The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv , implemented in Python. Contact atb44@pitt.edu.
Collapse
Affiliation(s)
- Ali Tuğrul Balcı
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States,Department of Computational and Systems Biology University of Pittsburgh, Pittsburgh, 15213, Unites States
| | - Mark Maher Ebeid
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States,Department of Computational and Systems Biology University of Pittsburgh, Pittsburgh, 15213, Unites States
| | - Panayiotis V Benos
- Department of Epidemiology, University of Florida, Gainesville, 32610, Unites States
| | - Dennis Kostka
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States,Department of Computational and Systems Biology University of Pittsburgh, Pittsburgh, 15213, Unites States, (D.K.) and (M.C.)
| | - Maria Chikina
- Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States,Department of Computational and Systems Biology University of Pittsburgh, Pittsburgh, 15213, Unites States, (D.K.) and (M.C.)
| |
Collapse
|
28
|
Kawaguchi RK, Tang Z, Fischer S, Rajesh C, Tripathy R, Koo PK, Gillis J. Learning single-cell chromatin accessibility profiles using meta-analytic marker genes. Brief Bioinform 2023; 24:bbac541. [PMID: 36549922 PMCID: PMC9851328 DOI: 10.1093/bib/bbac541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/29/2022] [Accepted: 11/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a valuable resource to learn cis-regulatory elements such as cell-type specific enhancers and transcription factor binding sites. However, cell-type identification of scATAC-seq data is known to be challenging due to the heterogeneity derived from different protocols and the high dropout rate. RESULTS In this study, we perform a systematic comparison of seven scATAC-seq datasets of mouse brain to benchmark the efficacy of neuronal cell-type annotation from gene sets. We find that redundant marker genes give a dramatic improvement for a sparse scATAC-seq annotation across the data collected from different studies. Interestingly, simple aggregation of such marker genes achieves performance comparable or higher than that of machine-learning classifiers, suggesting its potential for downstream applications. Based on our results, we reannotated all scATAC-seq data for detailed cell types using robust marker genes. Their meta scATAC-seq profiles are publicly available at https://gillisweb.cshl.edu/Meta_scATAC. Furthermore, we trained a deep neural network to predict chromatin accessibility from only DNA sequence and identified key motifs enriched for each neuronal subtype. Those predicted profiles are visualized together in our database as a valuable resource to explore cell-type specific epigenetic regulation in a sequence-dependent and -independent manner.
Collapse
Affiliation(s)
| | - Ziqi Tang
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Stephan Fischer
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Chandana Rajesh
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Rohit Tripathy
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Peter K Koo
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, Cold Spring Harbor 11724, USA
- Department of Physiology and Donnelly Centre for Cellular & Biomolecular Research Department, University of Toronto, Ontario M5S 3E1, Canada
| |
Collapse
|
29
|
George S, Martin JAJ, Graziani V, Sanz-Moreno V. Amoeboid migration in health and disease: Immune responses versus cancer dissemination. Front Cell Dev Biol 2023; 10:1091801. [PMID: 36699013 PMCID: PMC9869768 DOI: 10.3389/fcell.2022.1091801] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 12/15/2022] [Indexed: 01/07/2023] Open
Abstract
Cell migration is crucial for efficient immune responses and is aberrantly used by cancer cells during metastatic dissemination. Amoeboid migrating cells use myosin II-powered blebs to propel themselves, and change morphology and direction. Immune cells use amoeboid strategies to respond rapidly to infection or tissue damage, which require quick passage through several barriers, including blood, lymph and interstitial tissues, with complex and varied environments. Amoeboid migration is also used by metastatic cancer cells to aid their migration, dissemination and survival, whereby key mechanisms are hijacked from professionally motile immune cells. We explore important parallels observed between amoeboid immune and cancer cells. We also consider key distinctions that separate the lifespan, state and fate of these cell types as they migrate and/or fulfil their function. Finally, we reflect on unexplored areas of research that would enhance our understanding of how tumour cells use immune cell strategies during metastasis, and how to target these processes.
Collapse
|
30
|
Milanese JS, Marcotte R, Costain WJ, Kablar B, Drouin S. Roles of Skeletal Muscle in Development: A Bioinformatics and Systems Biology Overview. ADVANCES IN ANATOMY, EMBRYOLOGY, AND CELL BIOLOGY 2023; 236:21-55. [PMID: 37955770 DOI: 10.1007/978-3-031-38215-4_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
The ability to assess various cellular events consequent to perturbations, such as genetic mutations, disease states and therapies, has been recently revolutionized by technological advances in multiple "omics" fields. The resulting deluge of information has enabled and necessitated the development of tools required to both process and interpret the data. While of tremendous value to basic researchers, the amount and complexity of the data has made it extremely difficult to manually draw inference and identify factors key to the study objectives. The challenges of data reduction and interpretation are being met by the development of increasingly complex tools that integrate disparate knowledge bases and synthesize coherent models based on current biological understanding. This chapter presents an example of how genomics data can be integrated with biological network analyses to gain further insight into the developmental consequences of genetic perturbations. State of the art methods for conducting similar studies are discussed along with modern methods used to analyze and interpret the data.
Collapse
Affiliation(s)
| | - Richard Marcotte
- Human Health Therapeutics, National Research Council of Canada , Montreal, QC, Canada
| | - Willard J Costain
- Human Health Therapeutics, National Research Council of Canada, Ottawa, ON, Canada
| | - Boris Kablar
- Department of Medical Neuroscience, Anatomy and Pathology, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
| | - Simon Drouin
- Human Health Therapeutics, National Research Council of Canada , Montreal, QC, Canada.
| |
Collapse
|
31
|
Alatawneh R, Salomon Y, Eshel R, Orenstein Y, Birnbaum RY. Deciphering transcription factors and their corresponding regulatory elements during inhibitory interneuron differentiation using deep neural networks. Front Cell Dev Biol 2023; 11:1034604. [PMID: 36891511 PMCID: PMC9986276 DOI: 10.3389/fcell.2023.1034604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 01/23/2023] [Indexed: 02/22/2023] Open
Abstract
During neurogenesis, the generation and differentiation of neuronal progenitors into inhibitory gamma-aminobutyric acid-containing interneurons is dependent on the combinatorial activity of transcription factors (TFs) and their corresponding regulatory elements (REs). However, the roles of neuronal TFs and their target REs in inhibitory interneuron progenitors are not fully elucidated. Here, we developed a deep-learning-based framework to identify enriched TF motifs in gene REs (eMotif-RE), such as poised/repressed enhancers and putative silencers. Using epigenetic datasets (e.g., ATAC-seq and H3K27ac/me3 ChIP-seq) from cultured interneuron-like progenitors, we distinguished between active enhancer sequences (open chromatin with H3K27ac) and non-active enhancer sequences (open chromatin without H3K27ac). Using our eMotif-RE framework, we discovered enriched motifs of TFs such as ASCL1, SOX4, and SOX11 in the active enhancer set suggesting a cooperativity function for ASCL1 and SOX4/11 in active enhancers of neuronal progenitors. In addition, we found enriched ZEB1 and CTCF motifs in the non-active set. Using an in vivo enhancer assay, we showed that most of the tested putative REs from the non-active enhancer set have no enhancer activity. Two of the eight REs (25%) showed function as poised enhancers in the neuronal system. Moreover, mutated REs for ZEB1 and CTCF motifs increased their in vivo activity as enhancers indicating a repressive effect of ZEB1 and CTCF on these REs that likely function as repressed enhancers or silencers. Overall, our work integrates a novel framework based on deep learning together with a functional assay that elucidated novel functions of TFs and their corresponding REs. Our approach can be applied to better understand gene regulation not only in inhibitory interneuron differentiation but in other tissue and cell types.
Collapse
Affiliation(s)
- Rawan Alatawneh
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,The Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yahel Salomon
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Reut Eshel
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,The Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yaron Orenstein
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel.,The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
| | - Ramon Y Birnbaum
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,The Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
32
|
Cazares TA, Rizvi FW, Iyer B, Chen X, Kotliar M, Bejjani AT, Wayman JA, Donmez O, Wronowski B, Parameswaran S, Kottyan LC, Barski A, Weirauch MT, Prasath VBS, Miraldi ER. maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks. PLoS Comput Biol 2023; 19:e1010863. [PMID: 36719906 PMCID: PMC9917285 DOI: 10.1371/journal.pcbi.1010863] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 02/10/2023] [Accepted: 01/10/2023] [Indexed: 02/01/2023] Open
Abstract
Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM in vivo Transcription-Factor Binding Site (TFBS) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods for TFBS prediction from DNase-seq. However, the more recent Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq has surpassed DNase-seq as the most widely-used chromatin accessibility profiling method. Furthermore, ATAC-seq is the only such technique available at single-cell resolution from standard commercial platforms. While ATAC-seq datasets grow exponentially, suboptimal motif scanning is unfortunately the most common method for TFBS prediction from ATAC-seq. To enable community access to state-of-the-art TFBS prediction from ATAC-seq, we (1) curated an extensive benchmark dataset (127 TFs) for ATAC-seq model training and (2) built "maxATAC", a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the largest collection of high-performance TFBS prediction models for ATAC-seq. maxATAC performance extends to primary cells and single-cell ATAC-seq, enabling improved TFBS prediction in vivo. We demonstrate maxATAC's capabilities by identifying TFBS associated with allele-dependent chromatin accessibility at atopic dermatitis genetic risk loci.
Collapse
Affiliation(s)
- Tareian A. Cazares
- Immunology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Faiz W. Rizvi
- Systems Biology and Physiology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Balaji Iyer
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Xiaoting Chen
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Michael Kotliar
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Anthony T. Bejjani
- Molecular and Developmental Biology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Joseph A. Wayman
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Omer Donmez
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Benjamin Wronowski
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Sreeja Parameswaran
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Leah C. Kottyan
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Artem Barski
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Matthew T. Weirauch
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - V. B. Surya Prasath
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Emily R. Miraldi
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| |
Collapse
|
33
|
Toneyan S, Tang Z, Koo PK. Evaluating deep learning for predicting epigenomic profiles. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00570-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
34
|
Current challenges in understanding the role of enhancers in disease. Nat Struct Mol Biol 2022; 29:1148-1158. [PMID: 36482255 DOI: 10.1038/s41594-022-00896-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/04/2022] [Indexed: 12/13/2022]
Abstract
Enhancers play a central role in the spatiotemporal control of gene expression and tend to work in a cell-type-specific manner. In addition, they are suggested to be major contributors to phenotypic variation, evolution and disease. There is growing evidence that enhancer dysfunction due to genetic, structural or epigenetic mechanisms contributes to a broad range of human diseases referred to as enhanceropathies. Such mechanisms often underlie the susceptibility to common diseases, but can also play a direct causal role in cancer or Mendelian diseases. Despite the recent gain of insights into enhancer biology and function, we still have a limited ability to predict how enhancer dysfunction impacts gene expression. Here we discuss the major challenges that need to be overcome when studying the role of enhancers in disease etiology and highlight opportunities and directions for future studies, aiming to disentangle the molecular basis of enhanceropathies.
Collapse
|
35
|
Chambost AJ, Berabez N, Cochet-Escartin O, Ducray F, Gabut M, Isaac C, Martel S, Idbaih A, Rousseau D, Meyronet D, Monnier S. Machine learning-based detection of label-free cancer stem-like cell fate. Sci Rep 2022; 12:19066. [PMID: 36352045 PMCID: PMC9646748 DOI: 10.1038/s41598-022-21822-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 10/04/2022] [Indexed: 11/11/2022] Open
Abstract
The detection of cancer stem-like cells (CSCs) is mainly based on molecular markers or functional tests giving a posteriori results. Therefore label-free and real-time detection of single CSCs remains a difficult challenge. The recent development of microfluidics has made it possible to perform high-throughput single cell imaging under controlled conditions and geometries. Such a throughput requires adapted image analysis pipelines while providing the necessary amount of data for the development of machine-learning algorithms. In this paper, we provide a data-driven study to assess the complexity of brightfield time-lapses to monitor the fate of isolated cancer stem-like cells in non-adherent conditions. We combined for the first time individual cell fate and cell state temporality analysis in a unique algorithm. We show that with our experimental system and on two different primary cell lines our optimized deep learning based algorithm outperforms classical computer vision and shallow learning-based algorithms in terms of accuracy while being faster than cutting-edge convolutional neural network (CNNs). With this study, we show that tailoring our deep learning-based algorithm to the image analysis problem yields better results than pre-trained models. As a result, such a rapid and accurate CNN is compatible with the rise of high-throughput data generation and opens the door to on-the-fly CSC fate analysis.
Collapse
Affiliation(s)
- Alexis J. Chambost
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France ,grid.7849.20000 0001 2150 7757Univ Lyon, CNRS, Institut Lumière Matière, Univ Claude Bernard Lyon 1, 69622 Villeurbanne, France ,grid.413852.90000 0001 2163 3825Pathology Institute, Hospices Civils de Lyon, Lyon, France
| | - Nabila Berabez
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France
| | - Olivier Cochet-Escartin
- grid.7849.20000 0001 2150 7757Univ Lyon, CNRS, Institut Lumière Matière, Univ Claude Bernard Lyon 1, 69622 Villeurbanne, France
| | - François Ducray
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France ,grid.413852.90000 0001 2163 3825Neuro-oncology Department, Hospices Civils de Lyon, Lyon, France
| | - Mathieu Gabut
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France
| | - Caroline Isaac
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France
| | - Sylvie Martel
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France
| | - Ahmed Idbaih
- grid.462844.80000 0001 2308 1657Institut du Cerveau - Paris Brain Institute - ICM, Inserm, CNRS, AP-HP, Hôpital Universitaire La Pitié Salpêtrière, DMU Neurosciences, Sorbonne Université, Paris, France
| | - David Rousseau
- grid.7252.20000 0001 2248 3363Laboratoire Angevin de Recherche en Ingénierie des Systèmes (LARIS), UMR Inrae IRHS, Université d’Angers, 49000 Angers, France
| | - David Meyronet
- grid.7849.20000 0001 2150 7757Cancer Initiation and Tumor Cell Identity Department, Cancer Research Centre of Lyon (CRCL) INSERM 1052, CNRS UMR5286, Centre Léon Bérard, Université Claude Bernard Lyon 1, 69008 Lyon, Villeurbanne, France ,grid.413852.90000 0001 2163 3825Pathology Institute, Hospices Civils de Lyon, Lyon, France
| | - Sylvain Monnier
- grid.7849.20000 0001 2150 7757Univ Lyon, CNRS, Institut Lumière Matière, Univ Claude Bernard Lyon 1, 69622 Villeurbanne, France
| |
Collapse
|
36
|
Li J, Wang J, Zhang P, Wang R, Mei Y, Sun Z, Fei L, Jiang M, Ma L, E W, Chen H, Wang X, Fu Y, Wu H, Liu D, Wang X, Li J, Guo Q, Liao Y, Yu C, Jia D, Wu J, He S, Liu H, Ma J, Lei K, Chen J, Han X, Guo G. Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nat Genet 2022; 54:1711-1720. [PMID: 36229673 DOI: 10.1038/s41588-022-01197-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Accepted: 08/31/2022] [Indexed: 11/09/2022]
Abstract
Despite extensive efforts to generate and analyze reference genomes, genetic models to predict gene regulation and cell fate decisions are lacking for most species. Here, we generated whole-body single-cell transcriptomic landscapes of zebrafish, Drosophila and earthworm. We then integrated cell landscapes from eight representative metazoan species to study gene regulation across evolution. Using these uniformly constructed cross-species landscapes, we developed a deep-learning-based strategy, Nvwa, to predict gene expression and identify regulatory sequences at the single-cell level. We systematically compared cell-type-specific transcription factors to reveal conserved genetic regulation in vertebrates and invertebrates. Our work provides a valuable resource and offers a new strategy for studying regulatory grammar in diverse biological systems.
Collapse
Affiliation(s)
- Jiaqi Li
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
| | - Jingjing Wang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China. .,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China.
| | - Peijing Zhang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
| | - Renying Wang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yuqing Mei
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhongyi Sun
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Lijiang Fei
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Mengmeng Jiang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
| | - Lifeng Ma
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Weigao E
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Haide Chen
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
| | - Xinru Wang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yuting Fu
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hanyu Wu
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Daiyuan Liu
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xueyi Wang
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jingyu Li
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qile Guo
- Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, China
| | - Yuan Liao
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, China
| | - Chengxuan Yu
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Danmei Jia
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jian Wu
- Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, First Affiliated Hospital School of Medicine, Zhejiang University, Hangzhou, China
| | - Shibo He
- College of Control Science and Engineering, Zhejiang University, Hangzhou, China
| | - Huanju Liu
- Women's Hospital and Institute of Genetics, Zhenjiang University School of Medicine, Hangzhou, China
| | - Jun Ma
- Women's Hospital and Institute of Genetics, Zhenjiang University School of Medicine, Hangzhou, China
| | - Kai Lei
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China
| | - Jiming Chen
- College of Control Science and Engineering, Zhejiang University, Hangzhou, China
| | - Xiaoping Han
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China. .,Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, China.
| | - Guoji Guo
- Center for Stem Cell and Regenerative Medicine and Bone Marrow Transplantation Center of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China. .,Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China. .,Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, China. .,Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, China. .,Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, China.
| |
Collapse
|
37
|
Yin Y, Liu XZ, Tian Q, Fan YX, Ye Z, Meng TQ, Wei GH, Xiong CL, Li HG, He X, Zhou LQ. Transcriptome and DNA methylome analysis of peripheral blood samples reveals incomplete restoration and transposable element activation after 3-months recovery of COVID-19. Front Cell Dev Biol 2022; 10:1001558. [PMID: 36263014 PMCID: PMC9574079 DOI: 10.3389/fcell.2022.1001558] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 09/15/2022] [Indexed: 01/08/2023] Open
Abstract
Comprehensive analyses showed that SARS-CoV-2 infection caused COVID-19 and induced strong immune responses and sometimes severe illnesses. However, cellular features of recovered patients and long-term health consequences remain largely unexplored. In this study, we collected peripheral blood samples from nine recovered COVID-19 patients (median age of 36 years old) from Hubei province, China, 3 months after discharge as well as 5 age- and gender-matched healthy controls; and carried out RNA-seq and whole-genome bisulfite sequencing to identify hallmarks of recovered COVID-19 patients. Our analyses showed significant changes both in transcript abundance and DNA methylation of genes and transposable elements (TEs) in recovered COVID-19 patients. We identified 425 upregulated genes, 214 downregulated genes, and 18,516 differentially methylated regions (DMRs) in total. Aberrantly expressed genes and DMRs were found to be associated with immune responses and other related biological processes, implicating prolonged overreaction of the immune system in response to SARS-CoV-2 infection. Notably, a significant amount of TEs was aberrantly activated and their activation was positively correlated with COVID-19 severity. Moreover, differentially methylated TEs may regulate adjacent gene expression as regulatory elements. Those identified transcriptomic and epigenomic signatures define and drive the features of recovered COVID-19 patients, helping determine the risks of long COVID-19, and guiding clinical intervention.
Collapse
Affiliation(s)
- Ying Yin
- Department of Physiology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Institute of Reproductive Health, Center for Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Center for Genomics and Proteomics Research, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Hubei Key Laboratory of Drug Target Research and Pharmacodynamic Evaluation, Huazhong University of Science and Technology, Wuhan, China
| | - Xiao-zhao Liu
- Department of Physiology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Center for Genomics and Proteomics Research, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Hubei Key Laboratory of Drug Target Research and Pharmacodynamic Evaluation, Huazhong University of Science and Technology, Wuhan, China
| | - Qing Tian
- Institute of Reproductive Health, Center for Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Yi-xian Fan
- Department of Physiology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Center for Genomics and Proteomics Research, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Hubei Key Laboratory of Drug Target Research and Pharmacodynamic Evaluation, Huazhong University of Science and Technology, Wuhan, China
| | - Zhen Ye
- Institute of Reproductive Health, Center for Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Tian-qing Meng
- Institute of Reproductive Health, Center for Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Gong-hong Wei
- MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University Shanghai Cancer Center, Shanghai Medical College of Fudan University, Shanghai, China
| | - Cheng-liang Xiong
- Institute of Reproductive Health, Center for Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Hong-gang Li
- Institute of Reproductive Health, Center for Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- *Correspondence: Hong-gang Li, ; Ximiao He, ; Li-quan Zhou,
| | - Ximiao He
- Department of Physiology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Center for Genomics and Proteomics Research, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Hubei Key Laboratory of Drug Target Research and Pharmacodynamic Evaluation, Huazhong University of Science and Technology, Wuhan, China
- *Correspondence: Hong-gang Li, ; Ximiao He, ; Li-quan Zhou,
| | - Li-quan Zhou
- Institute of Reproductive Health, Center for Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- *Correspondence: Hong-gang Li, ; Ximiao He, ; Li-quan Zhou,
| |
Collapse
|
38
|
Liu C, Omilusik K, Toma C, Kurd NS, Chang JT, Goldrath AW, Wang W. Systems-level identification of key transcription factors in immune cell specification. PLoS Comput Biol 2022; 18:e1010116. [PMID: 36156073 PMCID: PMC9536753 DOI: 10.1371/journal.pcbi.1010116] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 10/06/2022] [Accepted: 08/10/2022] [Indexed: 01/30/2023] Open
Abstract
Transcription factors (TFs) are crucial for regulating cell differentiation during the development of the immune system. However, the key TFs for orchestrating the specification of distinct immune cells are not fully understood. Here, we integrated the transcriptomic and epigenomic measurements in 73 mouse and 61 human primary cell types, respectively, that span the immune cell differentiation pathways. We constructed the cell-type-specific transcriptional regulatory network and assessed the global importance of TFs based on the Taiji framework, which is a method we have previously developed that can infer the global impact of TFs using integrated transcriptomic and epigenetic data. Integrative analysis across cell types revealed putative driver TFs in cell lineage-specific differentiation in both mouse and human systems. We have also identified TF combinations that play important roles in specific developmental stages. Furthermore, we validated the functions of predicted novel TFs in murine CD8+ T cell differentiation and showed the importance of Elf1 and Prdm9 in the effector versus memory T cell fate specification and Kdm2b and Tet3 in promoting differentiation of CD8+ tissue resident memory (Trm) cells, validating the approach. Thus, we have developed a bioinformatic approach that provides a global picture of the regulatory mechanisms that govern cellular differentiation in the immune system and aids the discovery of novel mechanisms in cell fate decisions.
Collapse
Affiliation(s)
- Cong Liu
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, United States of America
| | - Kyla Omilusik
- Division of Biological Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Clara Toma
- Division of Biological Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Nadia S. Kurd
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
| | - John T. Chang
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Ananda W. Goldrath
- Division of Biological Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Wei Wang
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, United States of America
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
39
|
Wu S, Yin Y, Wang X. The epigenetic regulation of the germinal center response. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2022; 1865:194828. [PMID: 35643396 DOI: 10.1016/j.bbagrm.2022.194828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 05/22/2022] [Indexed: 06/15/2023]
Abstract
In response to T-cell-dependent antigens, antigen-experienced B cells migrate to the center of the B-cell follicle to seed the germinal center (GC) response after cognate interactions with CD4+ T cells. These GC B cells eventually mature into memory and long-lived antibody-secreting plasma cells, thus generating long-lived humoral immunity. Within GC, B cells undergo somatic hypermutation of their B cell receptors (BCR) and positive selection for the emergence of high-affinity antigen-specific B-cell clones. However, this process may be dangerous, as the accumulation of aberrant mutations could result in malignant transformation of GC B cells or give rise to autoreactive B cell clones that can cause autoimmunity. Because of this, better understanding of GC development provides diagnostic and therapeutic clues to the underlying pathologic process. A productive GC response is orchestrated by multiple mechanisms. An emerging important regulator of GC reaction is epigenetic modulation, which has key transcriptional regulatory properties. In this review, we summarize the current knowledge on the biology of epigenetic mechanisms in the regulation of GC reaction and outline its importance in identification of immunotherapy decision making.
Collapse
Affiliation(s)
- Shusheng Wu
- Department of Immunology, State Key Laboratory of Reproductive Medicine, NHC Key Laboratory of Antibody Technique, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Yuye Yin
- Department of Immunology, State Key Laboratory of Reproductive Medicine, NHC Key Laboratory of Antibody Technique, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Xiaoming Wang
- Department of Immunology, State Key Laboratory of Reproductive Medicine, NHC Key Laboratory of Antibody Technique, Nanjing Medical University, Nanjing, Jiangsu, China.
| |
Collapse
|
40
|
Fu X, Bates PA. Application of deep learning methods: From molecular modelling to patient classification. Exp Cell Res 2022; 418:113278. [PMID: 35810775 DOI: 10.1016/j.yexcr.2022.113278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/16/2022] [Accepted: 07/05/2022] [Indexed: 11/28/2022]
Abstract
We are now well into the information driven age with complex, heterogeneous, datasets in the biological sciences continuing to grow at a rapid pace. Moreover, distilling of such datasets, to find new governing principles, are underway. Leading the surge are new and exciting algorithmic developments in computer simulation and machine learning, most notably for the latter, those centred on deep learning. However, practical applications of cell centric computations within the biological sciences, even when carefully benchmarked against existing experimental datasets, remain challenging. Here we discuss the application of deep learning methodologies to support our understanding of cell functionality and as an aid to patient classification. Whilst comprehensive end-to-end deep learning approaches that utilise knowledge of the cell and its molecular components to aid human disease classification are yet to be implemented, important for opening the door to more effective molecular and cell-based therapies, we illustrate that many deep learning applications have been developed to tackle components of such an ambitious pipeline. We end our discussion on what the future may hold, especially how an integrated framework of computer simulations and deep learning, in conjunction with wet-bench experimentation, could enable to reveal the governing principles underlying cell functionalities within the tissue environments cells operate.
Collapse
Affiliation(s)
- Xiao Fu
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Rd, London, NW1 1AT, UK.
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Rd, London, NW1 1AT, UK.
| |
Collapse
|
41
|
Kundaje A, Meuleman W. Automated sequence-based annotation and interpretation of the human genome. Nat Genet 2022; 54:916-917. [PMID: 35817978 DOI: 10.1038/s41588-022-01123-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Anshul Kundaje
- Department of Genetics, Stanford University, Palo Alto, CA, USA.
- Department of Computer Science, Stanford University, Palo Alto, CA, USA.
| | - Wouter Meuleman
- Altius Institute for Biomedical Sciences, Seattle, WA, USA.
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
42
|
Zhou W, Gao F, Romero-Wolf M, Jo S, Rothenberg EV. Single-cell deletion analyses show control of pro-T cell developmental speed and pathways by Tcf7, Spi1, Gata3, Bcl11a, Erg, and Bcl11b. Sci Immunol 2022; 7:eabm1920. [PMID: 35594339 PMCID: PMC9273332 DOI: 10.1126/sciimmunol.abm1920] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
As early T cell precursors transition from multipotentiality to T lineage commitment, they change expression of multiple transcription factors. It is unclear whether individual transcription factors directly control choices between T cell identity and some alternative fate or whether these factors mostly affect proliferation or survival during the normal commitment process. Here, we unraveled the impacts of deleting individual transcription factors at two stages in early T cell development, using synchronized in vitro differentiation systems, single-cell RNA-seq with batch indexing, and controlled gene-disruption strategies. First, using a customized method for single-cell CRISPR disruption, we defined how the early-acting transcription factors Bcl11a, Erg, Spi1 (PU.1), Gata3, and Tcf7 (TCF1) function before commitment. The results revealed a kinetic tug of war within individual cells between T cell factors Tcf7 and Gata3 and progenitor factors Spi1 and Bcl11a, with an unexpected guidance role for Erg. Second, we tested how activation of transcription factor Bcl11b during commitment altered ongoing cellular programs. In knockout cells where Bcl11b expression was prevented, the cells did not undergo developmental arrest, instead following an alternative path as T lineage commitment was blocked. A stepwise, time-dependent regulatory cascade began with immediate-early transcription factor activation and E protein inhibition, finally leading Bcl11b knockout cells toward exit from the T cell pathway. Last, gene regulatory networks of transcription factor cross-regulation were extracted from the single-cell transcriptome results, characterizing the specification network operating before T lineage commitment and revealing its links to both the Bcl11b knockout alternative network and the network consolidating T cell identity during commitment.
Collapse
Affiliation(s)
- Wen Zhou
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, CA 91125 USA
- Program in Biochemistry and Molecular Biophysics, California Institute of Technology
- Current address: BillionToOne, Menlo Park, CA
| | - Fan Gao
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, CA 91125 USA
- Caltech Bioinformatics Resource Center, Beckman Institute of Caltech
| | - Maile Romero-Wolf
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, CA 91125 USA
- Current address: Center for Stem Cell Biology and Regenerative Medicine, University of Southern California
| | - Suin Jo
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, CA 91125 USA
- Current address: Washington University of St. Louis
| | - Ellen V. Rothenberg
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, CA 91125 USA
| |
Collapse
|
43
|
Lorzadeh A, Hammond C, Wang F, Knapp DJHF, Wong JC, Zhu JYA, Cao Q, Heravi-Moussavi A, Carles A, Wong M, Sharafian Z, Steif J, Moksa M, Bilenky M, Lavoie PM, Eaves CJ, Hirst M. Polycomb contraction differentially regulates terminal human hematopoietic differentiation programs. BMC Biol 2022; 20:104. [PMID: 35550087 PMCID: PMC9102747 DOI: 10.1186/s12915-022-01315-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 04/28/2022] [Indexed: 12/05/2022] Open
Abstract
Background Lifelong production of the many types of mature blood cells from less differentiated progenitors is a hierarchically ordered process that spans multiple cell divisions. The nature and timing of the molecular events required to integrate the environmental signals, transcription factor activity, epigenetic modifications, and changes in gene expression involved are thus complex and still poorly understood. To address this gap, we generated comprehensive reference epigenomes of 8 phenotypically defined subsets of normal human cord blood. Results We describe a striking contraction of H3K27me3 density in differentiated myelo-erythroid cells that resembles a punctate pattern previously ascribed to pluripotent embryonic stem cells. Phenotypically distinct progenitor cell types display a nearly identical repressive H3K27me3 signature characterized by large organized chromatin K27-modification domains that are retained by mature lymphoid cells but lost in terminally differentiated monocytes and erythroblasts. We demonstrate that inhibition of polycomb group members predicted to control large organized chromatin K27-modification domains influences lymphoid and myeloid fate decisions of primary neonatal hematopoietic progenitors in vitro. We further show that a majority of active enhancers appear in early progenitors, a subset of which are DNA hypermethylated and become hypomethylated and induced during terminal differentiation. Conclusion Primitive human hematopoietic cells display a unique repressive H3K27me3 signature that is retained by mature lymphoid cells but is lost in monocytes and erythroblasts. Intervention data implicate that control of this chromatin state change is a requisite part of the process whereby normal human hematopoietic progenitor cells make lymphoid and myeloid fate decisions. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-022-01315-1.
Collapse
Affiliation(s)
- A Lorzadeh
- Department of Microbiology and Immunology, Michael Smith Laboratories, UBC, Vancouver, Canada
| | - C Hammond
- Terry Fox Laboratory, BC Cancer, Vancouver, Canada.,Department of Medicine, UBC, Vancouver, Canada
| | - F Wang
- Terry Fox Laboratory, BC Cancer, Vancouver, Canada.,Department of Medical Genetics, UBC, Vancouver, Canada
| | - D J H F Knapp
- Terry Fox Laboratory, BC Cancer, Vancouver, Canada.,Department of Medicine, UBC, Vancouver, Canada
| | - J Ch Wong
- Department of Microbiology and Immunology, Michael Smith Laboratories, UBC, Vancouver, Canada
| | - J Y A Zhu
- Department of Microbiology and Immunology, Michael Smith Laboratories, UBC, Vancouver, Canada
| | - Q Cao
- Department of Microbiology and Immunology, Michael Smith Laboratories, UBC, Vancouver, Canada
| | - A Heravi-Moussavi
- Canada's Michael Smith Genome Science Centre, BC Cancer, Vancouver, Canada
| | - A Carles
- Department of Microbiology and Immunology, Michael Smith Laboratories, UBC, Vancouver, Canada
| | - M Wong
- Department of Microbiology and Immunology, Michael Smith Laboratories, UBC, Vancouver, Canada
| | - Z Sharafian
- BC Children's Hospital Research Institute, Department of Pediatrics, UBC, Vancouver, Canada
| | - J Steif
- Department of Microbiology and Immunology, Michael Smith Laboratories, UBC, Vancouver, Canada
| | - M Moksa
- Department of Microbiology and Immunology, Michael Smith Laboratories, UBC, Vancouver, Canada
| | - M Bilenky
- Canada's Michael Smith Genome Science Centre, BC Cancer, Vancouver, Canada
| | - P M Lavoie
- BC Children's Hospital Research Institute, Department of Pediatrics, UBC, Vancouver, Canada
| | - C J Eaves
- Terry Fox Laboratory, BC Cancer, Vancouver, Canada.,Department of Medicine, UBC, Vancouver, Canada.,Department of Medical Genetics, UBC, Vancouver, Canada
| | - M Hirst
- Department of Microbiology and Immunology, Michael Smith Laboratories, UBC, Vancouver, Canada. .,Canada's Michael Smith Genome Science Centre, BC Cancer, Vancouver, Canada.
| |
Collapse
|
44
|
Lee SE, Rudd BD, Smith NL. Fate-mapping mice: new tools and technology for immune discovery. Trends Immunol 2022; 43:195-209. [PMID: 35094945 PMCID: PMC8882138 DOI: 10.1016/j.it.2022.01.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/20/2022]
Abstract
The fate-mapping mouse has become an essential tool in the immunologist's toolbox. Although traditionally used by developmental biologists to trace the origins of cells, immunologists are turning to fate-mapping to better understand the development and function of immune cells. Thus, an expansion in the variety of fate-mapping mouse models has occurred to answer fundamental questions about the immune system. These models are also being combined with new genetic tools to study cancer, infection, and autoimmunity. In this review, we summarize different types of fate-mapping mice and describe emerging technologies that might allow immunologists to leverage this valuable tool and expand our functional knowledge of the immune system.
Collapse
Affiliation(s)
- Scarlett E Lee
- Department of Microbiology and Immunology, Cornell University, Ithaca, NY 14850, USA
| | - Brian D Rudd
- Department of Microbiology and Immunology, Cornell University, Ithaca, NY 14850, USA
| | - Norah L Smith
- Department of Microbiology and Immunology, Cornell University, Ithaca, NY 14850, USA.
| |
Collapse
|
45
|
Bongrand P. Is There a Need for a More Precise Description of Biomolecule Interactions to Understand Cell Function? Curr Issues Mol Biol 2022; 44:505-525. [PMID: 35723321 PMCID: PMC8929073 DOI: 10.3390/cimb44020035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 01/15/2022] [Accepted: 01/17/2022] [Indexed: 11/16/2022] Open
Abstract
An important goal of biological research is to explain and hopefully predict cell behavior from the molecular properties of cellular components. Accordingly, much work was done to build extensive “omic” datasets and develop theoretical methods, including computer simulation and network analysis to process as quantitatively as possible the parameters contained in these resources. Furthermore, substantial effort was made to standardize data presentation and make experimental results accessible to data scientists. However, the power and complexity of current experimental and theoretical tools make it more and more difficult to assess the capacity of gathered parameters to support optimal progress in our understanding of cell function. The purpose of this review is to focus on biomolecule interactions, the interactome, as a specific and important example, and examine the limitations of the explanatory and predictive power of parameters that are considered as suitable descriptors of molecular interactions. Recent experimental studies on important cell functions, such as adhesion and processing of environmental cues for decision-making, support the suggestion that it should be rewarding to complement standard binding properties such as affinity and kinetic constants, or even force dependence, with less frequently used parameters such as conformational flexibility or size of binding molecules.
Collapse
Affiliation(s)
- Pierre Bongrand
- Lab Adhesion and Inflammation (LAI), Inserm UMR 1067, Cnrs UMR 7333, Aix-Marseille Université UM 61, Marseille 13009, France
| |
Collapse
|
46
|
The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation. Nat Genet 2021; 53:1564-1576. [PMID: 34650237 PMCID: PMC8763320 DOI: 10.1038/s41588-021-00947-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 09/01/2021] [Indexed: 01/24/2023]
Abstract
Transcription factors bind DNA sequence motif vocabularies in cis-regulatory elements (CREs) to modulate chromatin state and gene expression during cell state transitions. A quantitative understanding of how motif lexicons influence dynamic regulatory activity has been elusive due to the combinatorial nature of the cis-regulatory code. To address this, we undertook multiomic data profiling of chromatin and expression dynamics across epidermal differentiation to identify 40,103 dynamic CREs associated with 3,609 dynamically expressed genes, then applied an interpretable deep-learning framework to model the cis-regulatory logic of chromatin accessibility. This analysis framework identified cooperative DNA sequence rules in dynamic CREs regulating synchronous gene modules with diverse roles in skin differentiation. Massively parallel reporter assay analysis validated temporal dynamics and cooperative cis-regulatory logic. Variants linked to human polygenic skin disease were enriched in these time-dependent combinatorial motif rules. This integrative approach shows the combinatorial cis-regulatory lexicon of epidermal differentiation and represents a general framework for deciphering the organizational principles of the cis-regulatory code of dynamic gene regulation.
Collapse
|
47
|
Morilla I. Repairing the human with artificial intelligence in oncology. Artif Intell Cancer 2021; 2:60-68. [DOI: 10.35713/aic.v2.i5.60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 10/26/2021] [Accepted: 10/27/2021] [Indexed: 02/06/2023] Open
Abstract
Artificial intelligence is a groundbreaking tool to learn and analyse higher features extracted from any dataset at large scale. This ability makes it ideal to facing any complex problem that may generally arise in the biomedical domain or oncology in particular. In this work, we envisage to provide a global vision of this mathematical discipline outgrowth by linking some other related subdomains such as transfer, reinforcement or federated learning. Complementary, we also introduce the recently popular method of topological data analysis that improves the performance of learning models.
Collapse
Affiliation(s)
- Ian Morilla
- Laboratoire Analyse, Géométrie et Applications - Institut Galilée, Sorbonne Paris Nord University, Paris 75006, France
| |
Collapse
|
48
|
Novakovsky G, Saraswat M, Fornes O, Mostafavi S, Wasserman WW. Biologically relevant transfer learning improves transcription factor binding prediction. Genome Biol 2021; 22:280. [PMID: 34579793 PMCID: PMC8474956 DOI: 10.1186/s13059-021-02499-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 09/15/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task. RESULTS We assess a transfer learning strategy for TF binding prediction consisting of a pre-training step, wherein we train a multi-task model with multiple TFs, and a fine-tuning step, wherein we initialize single-task models for individual TFs with the weights learned by the multi-task model, after which the single-task models are trained at a lower learning rate. We corroborate that transfer learning improves model performance, especially if in the pre-training step the multi-task model is trained with biologically relevant TFs. We show the effectiveness of transfer learning for TFs with ~ 500 ChIP-seq peak regions. Using model interpretation techniques, we demonstrate that the features learned in the pre-training step are refined in the fine-tuning step to resemble the binding motif of the target TF (i.e., the recipient of transfer learning in the fine-tuning step). Moreover, pre-training with biologically relevant TFs allows single-task models in the fine-tuning step to learn useful features other than the motif of the target TF. CONCLUSIONS Our results confirm that transfer learning is a powerful technique for TF binding prediction.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada
| | - Manu Saraswat
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada.
| | - Sara Mostafavi
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada
- Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Canadian Institute for Advanced Research, CIFAR AI Chair, and Child and Brain Development, Toronto, ON, M5G 1 M1, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada.
| |
Collapse
|
49
|
Weidemüller P, Kholmatov M, Petsalaki E, Zaugg JB. Transcription factors: Bridge between cell signaling and gene regulation. Proteomics 2021; 21:e2000034. [PMID: 34314098 DOI: 10.1002/pmic.202000034] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/17/2023]
Abstract
Transcription factors (TFs) are key regulators of intrinsic cellular processes, such as differentiation and development, and of the cellular response to external perturbation through signaling pathways. In this review we focus on the role of TFs as a link between signaling pathways and gene regulation. Cell signaling tends to result in the modulation of a set of TFs that then lead to changes in the cell's transcriptional program. We highlight the molecular layers at which TF activity can be measured and the associated technical and conceptual challenges. These layers include post-translational modifications (PTMs) of the TF, regulation of TF binding to DNA through chromatin accessibility and epigenetics, and expression of target genes. We highlight that a large number of TFs are understudied in both signaling and gene regulation studies, and that our knowledge about known TF targets has a strong literature bias. We argue that TFs serve as a perfect bridge between the fields of gene regulation and signaling, and that separating these fields hinders our understanding of cell functions. Multi-omics approaches that measure multiple dimensions of TF activity are ideally suited to study the interplay of cell signaling and gene regulation using TFs as the anchor to link the two fields.
Collapse
Affiliation(s)
- Paula Weidemüller
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Maksim Kholmatov
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| | - Evangelia Petsalaki
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Judith B Zaugg
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| |
Collapse
|
50
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|