1
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and Deep Learning Methods for Predicting 3D Genome Organization. Methods Mol Biol 2025; 2856:357-400. [PMID: 39283464 DOI: 10.1007/978-1-0716-4136-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Three-dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, topologically associating domains (TADs), and A/B compartments, play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers and transcription factor binding site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, and TAD boundaries) and analyze their pros and cons. We also point out obstacles to the computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P G Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
2
|
Abassah-Oppong S, Zoia M, Mannion BJ, Rouco R, Tissières V, Spurrell CH, Roland V, Darbellay F, Itum A, Gamart J, Festa-Daroux TA, Sullivan CS, Kosicki M, Rodríguez-Carballo E, Fukuda-Yuzawa Y, Hunter RD, Novak CS, Plajzer-Frick I, Tran S, Akiyama JA, Dickel DE, Lopez-Rios J, Barozzi I, Andrey G, Visel A, Pennacchio LA, Cobb J, Osterwalder M. A gene desert required for regulatory control of pleiotropic Shox2 expression and embryonic survival. Nat Commun 2024; 15:8793. [PMID: 39389973 PMCID: PMC11467299 DOI: 10.1038/s41467-024-53009-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 09/26/2024] [Indexed: 10/12/2024] Open
Abstract
Approximately a quarter of the human genome consists of gene deserts, large regions devoid of genes often located adjacent to developmental genes and thought to contribute to their regulation. However, defining the regulatory functions embedded within these deserts is challenging due to their large size. Here, we explore the cis-regulatory architecture of a gene desert flanking the Shox2 gene, which encodes a transcription factor indispensable for proximal limb, craniofacial, and cardiac pacemaker development. We identify the gene desert as a regulatory hub containing more than 15 distinct enhancers recapitulating anatomical subdomains of Shox2 expression. Ablation of the gene desert leads to embryonic lethality due to Shox2 depletion in the cardiac sinus venosus, caused in part by the loss of a specific distal enhancer. The gene desert is also required for stylopod morphogenesis, mediated via distributed proximal limb enhancers. In summary, our study establishes a multi-layered role of the Shox2 gene desert in orchestrating pleiotropic developmental expression through modular arrangement and coordinated dynamics of tissue-specific enhancers.
Collapse
Affiliation(s)
- Samuel Abassah-Oppong
- Department of Biological Sciences, University of Calgary, 2500 University Drive N.W., Calgary, AB, T2N 1N4, Canada
- Department of Biological Sciences, Fort Hays State University, Hays, KS, 67601, USA
| | - Matteo Zoia
- Department for BioMedical Research (DBMR), University of Bern, 3008, Bern, Switzerland
| | - Brandon J Mannion
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Comparative Biochemistry Program, University of California, Berkeley, CA, 94720, USA
| | - Raquel Rouco
- Department of Genetic Medicine and Development and iGE3, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Virginie Tissières
- Department for BioMedical Research (DBMR), University of Bern, 3008, Bern, Switzerland
- Centro Andaluz de Biología del Desarrollo (CABD), CSIC-Universidad Pablo de Olavide-Junta de Andalucía, 41013, Seville, Spain
- Department of Cardiology, Bern University Hospital, 3010, Bern, Switzerland
| | - Cailyn H Spurrell
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Virginia Roland
- Department for BioMedical Research (DBMR), University of Bern, 3008, Bern, Switzerland
| | - Fabrice Darbellay
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Genetic Medicine and Development and iGE3, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Anja Itum
- Department of Biological Sciences, University of Calgary, 2500 University Drive N.W., Calgary, AB, T2N 1N4, Canada
| | - Julie Gamart
- Department for BioMedical Research (DBMR), University of Bern, 3008, Bern, Switzerland
- Department of Cardiology, Bern University Hospital, 3010, Bern, Switzerland
| | - Tabitha A Festa-Daroux
- Department of Biological Sciences, University of Calgary, 2500 University Drive N.W., Calgary, AB, T2N 1N4, Canada
| | - Carly S Sullivan
- Department of Biological Sciences, University of Calgary, 2500 University Drive N.W., Calgary, AB, T2N 1N4, Canada
| | - Michael Kosicki
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Eddie Rodríguez-Carballo
- Department of Genetics and Evolution, University of Geneva, Geneva, Switzerland
- Department of Molecular Biology, University of Geneva, Geneva, Switzerland
| | - Yoko Fukuda-Yuzawa
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Riana D Hunter
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Catherine S Novak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Ingrid Plajzer-Frick
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Stella Tran
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jennifer A Akiyama
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Javier Lopez-Rios
- Centro Andaluz de Biología del Desarrollo (CABD), CSIC-Universidad Pablo de Olavide-Junta de Andalucía, 41013, Seville, Spain
- School of Health Sciences, Universidad Loyola Andalucía, Seville, Spain
| | - Iros Barozzi
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Center for Cancer Research, Medical University of Vienna, Vienna, Austria
| | - Guillaume Andrey
- Department of Genetic Medicine and Development and iGE3, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- School of Natural Sciences, University of California, Merced, Merced, CA, 95343, USA
| | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Comparative Biochemistry Program, University of California, Berkeley, CA, 94720, USA
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - John Cobb
- Department of Biological Sciences, University of Calgary, 2500 University Drive N.W., Calgary, AB, T2N 1N4, Canada.
| | - Marco Osterwalder
- Department for BioMedical Research (DBMR), University of Bern, 3008, Bern, Switzerland.
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
- Department of Cardiology, Bern University Hospital, 3010, Bern, Switzerland.
| |
Collapse
|
3
|
Xie W, Yao Z, Yuan Y, Too J, Li F, Wang H, Zhan Y, Wu X, Wang Z, Zhang G. W2V-repeated index: Prediction of enhancers and their strength based on repeated fragments. Genomics 2024; 116:110906. [PMID: 39084477 DOI: 10.1016/j.ygeno.2024.110906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/10/2024] [Accepted: 07/24/2024] [Indexed: 08/02/2024]
Abstract
Enhancers are crucial in gene expression regulation, dictating the specificity and timing of transcriptional activity, which highlights the importance of their identification for unravelling the intricacies of genetic regulation. Therefore, it is critical to identify enhancers and their strengths. Repeated sequences in the genome are repeats of the same or symmetrical fragments. There has been a great deal of evidence that repetitive sequences contain enormous amounts of genetic information. Thus, We introduce the W2V-Repeated Index, designed to identify enhancer sequence fragments and evaluates their strength through the analysis of repeated K-mer sequences in enhancer regions. Utilizing the word2vector algorithm for numerical conversion and Manta Ray Foraging Optimization for feature selection, this method effectively captures the frequency and distribution of K-mer sequences. By concentrating on repeated K-mer sequences, it minimizes computational complexity and facilitates the analysis of larger K values. Experiments indicate that our method performs better than all other advanced methods on almost all indicators.
Collapse
Affiliation(s)
- Weiming Xie
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Zhaomin Yao
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
| | - Yizhe Yuan
- China Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jingwei Too
- Faculty of Electrical Engineering, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, Durian Tunggal, 76100 Melaka, Malaysia
| | - Fei Li
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Hongyu Wang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Ying Zhan
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Xiaodan Wu
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Zhiguo Wang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
| | - Guoxu Zhang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
| |
Collapse
|
4
|
Hu W, Li Y, Wu Y, Guan L, Li M. A deep learning model for DNA enhancer prediction based on nucleotide position aware feature encoding. iScience 2024; 27:110030. [PMID: 38868182 PMCID: PMC11167433 DOI: 10.1016/j.isci.2024.110030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 04/23/2024] [Accepted: 05/16/2024] [Indexed: 06/14/2024] Open
Abstract
Enhancers, genomic DNA elements, regulate neighboring gene expression crucial for biological processes like cell differentiation and stress response. However, current machine learning methods for predicting DNA enhancers often underutilize hidden features in gene sequences, limiting model accuracy. Hence, this article proposes the PDCNN model, a deep learning-based enhancer prediction method. PDCNN extracts statistical nucleotide representations from gene sequences, discerning positional distribution information of nucleotides in modifier-like DNA sequences. With a convolutional neural network structure, PDCNN employs dual convolutional and fully connected layers. The cross-entropy loss function iteratively updates using a gradient descent algorithm, enhancing prediction accuracy. Model parameters are fine-tuned to select optimal combinations for training, achieving over 95% accuracy. Comparative analysis with traditional methods and existing models demonstrates PDCNN's robust feature extraction capability. It outperforms advanced machine learning methods in identifying DNA enhancers, presenting an effective method with broad implications for genomics, biology, and medical research.
Collapse
Affiliation(s)
- Wenxing Hu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Yelin Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Yan Wu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Lixin Guan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Mengshan Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| |
Collapse
|
5
|
Barth D, Van R, Cardwell J, Han MV. Supervised learning of enhancer-promoter specificity based on genome-wide perturbation studies highlights areas for improvement in learning. Bioinformatics 2024; 40:btae367. [PMID: 38870532 PMCID: PMC11211214 DOI: 10.1093/bioinformatics/btae367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 05/29/2024] [Accepted: 06/11/2024] [Indexed: 06/15/2024] Open
Abstract
MOTIVATION Understanding the rules that govern enhancer-driven transcription remains a central unsolved problem in genomics. Now with multiple massively parallel enhancer perturbation assays published, there are enough data that we can utilize to learn to predict enhancer-promoter (EP) relationships in a data-driven manner. RESULTS We applied machine learning to one of the largest enhancer perturbation studies integrated with transcription factor (TF) and histone modification ChIP-seq. The results uncovered a discrepancy in the prediction of genome-wide data compared to data from targeted experiments. Relative strength of contact was important for prediction, confirming the basic principle of EP regulation. Novel features such as the density of the enhancers/promoters in the genomic region was found to be important, highlighting our lack of understanding on how other elements in the region contribute to the regulation. Several TF peaks were identified that improved the prediction by identifying the negatives and reducing False Positives. In summary, integrating genomic assays with enhancer perturbation studies increased the accuracy of the model, and provided novel insights into the understanding of enhancer-driven transcription. AVAILABILITY AND IMPLEMENTATION The trained models, data, and the source code are available at http://doi.org/10.5281/zenodo.11290386 and https://github.com/HanLabUNLV/sleps.
Collapse
Affiliation(s)
- Dylan Barth
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, United States
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, United States
| | - Richard Van
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, United States
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, United States
| | - Jonathan Cardwell
- Department of Medicine, University of Colorado School of Medicine, Denver, CO 80045, United States
| | - Mira V Han
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, United States
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, United States
| |
Collapse
|
6
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
7
|
Mehmood F, Arshad S, Shoaib M. ADH-Enhancer: an attention-based deep hybrid framework for enhancer identification and strength prediction. Brief Bioinform 2024; 25:bbae030. [PMID: 38385876 PMCID: PMC10885011 DOI: 10.1093/bib/bbae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/30/2023] [Accepted: 01/11/2024] [Indexed: 02/23/2024] Open
Abstract
Enhancers play an important role in the process of gene expression regulation. In DNA sequence abundance or absence of enhancers and irregularities in the strength of enhancers affects gene expression process that leads to the initiation and propagation of diverse types of genetic diseases such as hemophilia, bladder cancer, diabetes and congenital disorders. Enhancer identification and strength prediction through experimental approaches is expensive, time-consuming and error-prone. To accelerate and expedite the research related to enhancers identification and strength prediction, around 19 computational frameworks have been proposed. These frameworks used machine and deep learning methods that take raw DNA sequences and predict enhancer's presence and strength. However, these frameworks still lack in performance and are not useful in real time analysis. This paper presents a novel deep learning framework that uses language modeling strategies for transforming DNA sequences into statistical feature space. It applies transfer learning by training a language model in an unsupervised fashion by predicting a group of nucleotides also known as k-mers based on the context of existing k-mers in a sequence. At the classification stage, it presents a novel classifier that reaps the benefits of two different architectures: convolutional neural network and attention mechanism. The proposed framework is evaluated over the enhancer identification benchmark dataset where it outperforms the existing best-performing framework by 5%, and 9% in terms of accuracy and MCC. Similarly, when evaluated over the enhancer strength prediction benchmark dataset, it outperforms the existing best-performing framework by 4%, and 7% in terms of accuracy and MCC.
Collapse
Affiliation(s)
- Faiza Mehmood
- Department of Computer Science, University of Engineering and Technology Lahore, (Faisalabad Campus) Pakistan
| | - Shazia Arshad
- Department of Computer Science, University of Engineering and Technology Lahore, 54890, Pakistan
| | - Muhammad Shoaib
- Department of Computer Science, University of Engineering and Technology Lahore, 54890, Pakistan
| |
Collapse
|
8
|
Wang J, Zhang H, Chen N, Zeng T, Ai X, Wu K. PorcineAI-Enhancer: Prediction of Pig Enhancer Sequences Using Convolutional Neural Networks. Animals (Basel) 2023; 13:2935. [PMID: 37760334 PMCID: PMC10526013 DOI: 10.3390/ani13182935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/21/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
Understanding the mechanisms of gene expression regulation is crucial in animal breeding. Cis-regulatory DNA sequences, such as enhancers, play a key role in regulating gene expression. Identifying enhancers is challenging, despite the use of experimental techniques and computational methods. Enhancer prediction in the pig genome is particularly significant due to the costliness of high-throughput experimental techniques. The study constructed a high-quality database of pig enhancers by integrating information from multiple sources. A deep learning prediction framework called PorcineAI-enhancer was developed for the prediction of pig enhancers. This framework employs convolutional neural networks for feature extraction and classification. PorcineAI-enhancer showed excellent performance in predicting pig enhancers, validated on an independent test dataset. The model demonstrated reliable prediction capability for unknown enhancer sequences and performed remarkably well on tissue-specific enhancer sequences.The study developed a deep learning prediction framework, PorcineAI-enhancer, for predicting pig enhancers. The model demonstrated significant predictive performance and potential for tissue-specific enhancers. This research provides valuable resources for future studies on gene expression regulation in pigs.
Collapse
Affiliation(s)
- Ji Wang
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Han Zhang
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Nanzhu Chen
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China;
| | - Tong Zeng
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Xiaohua Ai
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| | - Keliang Wu
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; (J.W.); (H.Z.); (T.Z.); (X.A.)
| |
Collapse
|
9
|
Wang W, Wu Q, Li C. iEnhancer-DCSA: identifying enhancers via dual-scale convolution and spatial attention. BMC Genomics 2023; 24:393. [PMID: 37442977 DOI: 10.1186/s12864-023-09468-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 06/20/2023] [Indexed: 07/15/2023] Open
Abstract
BACKGROUND Due to the dynamic nature of enhancers, identifying enhancers and their strength are major bioinformatics challenges. With the development of deep learning, several models have facilitated enhancers detection in recent years. However, existing studies either neglect different length motifs information or treat the features at all spatial locations equally. How to effectively use multi-scale motifs information while ignoring irrelevant information is a question worthy of serious consideration. In this paper, we propose an accurate and stable predictor iEnhancer-DCSA, mainly composed of dual-scale fusion and spatial attention, automatically extracting features of different length motifs and selectively focusing on the important features. RESULTS Our experimental results demonstrate that iEnhancer-DCSA is remarkably superior to existing state-of-the-art methods on the test dataset. Especially, the accuracy and MCC of enhancer identification are improved by 3.45% and 9.41%, respectively. Meanwhile, the accuracy and MCC of enhancer classification are improved by 7.65% and 18.1%, respectively. Furthermore, we conduct ablation studies to demonstrate the effectiveness of dual-scale fusion and spatial attention. CONCLUSIONS iEnhancer-DCSA will be a valuable computational tool in identifying and classifying enhancers, especially for those not included in the training dataset.
Collapse
Affiliation(s)
- Wenjun Wang
- School of Software Engineering, South China University of Technology, Guangzhou, China
- School of Data Science and Information Engineering, Guizhou Minzu University, Guiyang, China
- Key Laboratory of Big Data and Intelligent Robot, Ministry of Education, Guangzhou, China
| | - Qingyao Wu
- School of Software Engineering, South China University of Technology, Guangzhou, China.
- Pazhou Lab, Guangzhou, China.
- Peng Cheng Laboratory, Shenzhen, China.
| | - Chunshan Li
- Department of Computer Science and Technology, Harbin Institute of Technology, Weihai, China.
| |
Collapse
|
10
|
Trigila AP, Castagna VC, Berasain L, Montini D, Rubinstein M, Gomez-Casati ME, Franchini LF. Accelerated Evolution Analysis Uncovers PKNOX2 as a Key Transcription Factor in the Mammalian Cochlea. Mol Biol Evol 2023; 40:msad128. [PMID: 37247388 PMCID: PMC10337857 DOI: 10.1093/molbev/msad128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 04/12/2023] [Accepted: 04/17/2023] [Indexed: 05/31/2023] Open
Abstract
The genetic bases underlying the evolution of morphological and functional innovations of the mammalian inner ear are poorly understood. Gene regulatory regions are thought to play an important role in the evolution of form and function. To uncover crucial hearing genes whose regulatory machinery evolved specifically in mammalian lineages, we mapped accelerated noncoding elements (ANCEs) in inner ear transcription factor (TF) genes and found that PKNOX2 harbors the largest number of ANCEs within its transcriptional unit. Using reporter gene expression assays in transgenic zebrafish, we determined that four PKNOX2-ANCEs drive differential expression patterns when compared with ortholog sequences from close outgroup species. Because the functional role of PKNOX2 in cochlear hair cells has not been previously investigated, we decided to study Pknox2 null mice generated by CRISPR/Cas9 technology. We found that Pknox2-/- mice exhibit reduced distortion product otoacoustic emissions (DPOAEs) and auditory brainstem response (ABR) thresholds at high frequencies together with an increase in peak 1 amplitude, consistent with a higher number of inner hair cells (IHCs)-auditory nerve synapsis observed at the cochlear basal region. A comparative cochlear transcriptomic analysis of Pknox2-/- and Pknox2+/+ mice revealed that key auditory genes are under Pknox2 control. Hence, we report that PKNOX2 plays a critical role in cochlear sensitivity at higher frequencies and that its transcriptional regulation underwent lineage-specific evolution in mammals. Our results provide novel insights about the contribution of PKNOX2 to normal auditory function and to the evolution of high-frequency hearing in mammals.
Collapse
Affiliation(s)
- Anabella P Trigila
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Valeria C Castagna
- Facultad de Medicina, Instituto de Farmacología, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Lara Berasain
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Dante Montini
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Marcelo Rubinstein
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Departamento de Fisiología, Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | | | - Lucía F Franchini
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
11
|
Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023; 23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]
Abstract
Enhancers are non-coding DNA elements that play a crucial role in enhancing the transcription rate of a specific gene in the genome. Experiments for identifying enhancers can be restricted by their conditions and involve complicated, time-consuming, laborious, and costly steps. To overcome these challenges, computational platforms have been developed to complement experimental methods that enable high-throughput identification of enhancers. Over the last few years, the development of various enhancer computational tools has resulted in significant progress in predicting putative enhancers. Thus, researchers are now able to use a variety of strategies to enhance and advance enhancer study. In this review, an overview of machine learning (ML)-based prediction methods for enhancer identification and related databases has been provided. The existing enhancer-prediction methods have also been reviewed regarding their algorithms, feature selection processes, validation techniques, and software utility. In addition, the advantages and drawbacks of these ML approaches and guidelines for developing bioinformatic tools have been highlighted for a more efficient enhancer prediction. This review will serve as a useful resource for experimentalists in selecting the appropriate ML tool for their study, and for bioinformaticians in developing more accurate and advanced ML-based predictors.
Collapse
Affiliation(s)
- Le Thi Phan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Changmin Oh
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| |
Collapse
|
12
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
13
|
Whalen S, Inoue F, Ryu H, Fair T, Markenscoff-Papadimitriou E, Keough K, Kircher M, Martin B, Alvarado B, Elor O, Laboy Cintron D, Williams A, Hassan Samee MA, Thomas S, Krencik R, Ullian EM, Kriegstein A, Rubenstein JL, Shendure J, Pollen AA, Ahituv N, Pollard KS. Machine learning dissection of human accelerated regions in primate neurodevelopment. Neuron 2023; 111:857-873.e8. [PMID: 36640767 PMCID: PMC10023452 DOI: 10.1016/j.neuron.2022.12.026] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/29/2022] [Accepted: 12/18/2022] [Indexed: 01/15/2023]
Abstract
Using machine learning (ML), we interrogated the function of all human-chimpanzee variants in 2,645 human accelerated regions (HARs), finding 43% of HARs have variants with large opposing effects on chromatin state and 14% on neurodevelopmental enhancer activity. This pattern, consistent with compensatory evolution, was confirmed using massively parallel reporter assays in chimpanzee and human neural progenitor cells. The species-specific enhancer activity of HARs was accurately predicted from the presence and absence of transcription factor footprints in each species. Despite these striking cis effects, activity of a given HAR sequence was nearly identical in human and chimpanzee cells. This suggests that HARs did not evolve to compensate for changes in the trans environment but instead altered their ability to bind factors present in both species. Thus, ML prioritized variants with functional effects on human neurodevelopment and revealed an unexpected reason why HARs may have evolved so rapidly.
Collapse
Affiliation(s)
- Sean Whalen
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Hane Ryu
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Pharmaceutical Sciences and Pharmacogenomics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
| | | | - Kathleen Keough
- Gladstone Institutes, San Francisco, CA 94158, USA; Pharmaceutical Sciences and Pharmacogenomics Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Martin Kircher
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany; Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, 23562 Lübeck, Germany
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Beatriz Alvarado
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Orry Elor
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Dianne Laboy Cintron
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | | | | | - Sean Thomas
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Robert Krencik
- Department of Neurosurgery, Center for Neuroregeneration, Houston Methodist Research Institute, Houston, TX, USA
| | - Erik M Ullian
- Departments of Ophthalmology and Physiology, University of California, San Francisco, San Francisco, CA, USA; Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Arnold Kriegstein
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - John L Rubenstein
- Department of Psychiatry, University of California, San Francisco, San Francisco, CA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Alex A Pollen
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Department of Epidemiology and Biostatistics and Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, CA, USA; Chan-Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
14
|
Wang C, Zou Q, Ju Y, Shi H. Enhancer-FRL: Improved and Robust Identification of Enhancers and Their Activities Using Feature Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:967-975. [PMID: 36063523 DOI: 10.1109/tcbb.2022.3204365] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Enhancers are crucial for precise regulation of gene expression, while enhancer identification and strength prediction are challenging because of their free distribution and tremendous number of similar fractions in the genome. Although several bioinformatics tools have been developed, shortfalls in these models remain, and their performances need further improvement. In the present study, a two-layer predictor called Enhancer-FRL was proposed for identifying enhancers (enhancers or nonenhancers) and their activities (strong and weak). More specifically, to build an efficient model, the feature representation learning scheme was applied to generate a 50D probabilistic vector based on 10 feature encodings and five machine learning algorithms. Subsequently, the multiview probabilistic features were integrated to construct the final prediction model. Compared with the single feature-based model, Enhancer-FRL showed significant performance improvement and model robustness. Performance assessment on the independent test dataset indicated that the proposed model outperformed state-of-the-art available toolkits. The webserver Enhancer-FRL is freely accessible at http://lab.malab.cn/∼wangchao/softwares/Enhancer-FRL/, The code and datasets can be downloaded at the webserver page or at the Github https://github.com/wangchao-malab/Enhancer-FRL/.
Collapse
|
15
|
Jia J, Lei R, Qin L, Wu G, Wei X. iEnhancer-DCSV: Predicting enhancers and their strength based on DenseNet and improved convolutional block attention module. Front Genet 2023; 14:1132018. [PMID: 36936423 PMCID: PMC10014624 DOI: 10.3389/fgene.2023.1132018] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 02/13/2023] [Indexed: 03/06/2023] Open
Abstract
Enhancers play a crucial role in controlling gene transcription and expression. Therefore, bioinformatics puts many emphases on predicting enhancers and their strength. It is vital to create quick and accurate calculating techniques because conventional biomedical tests take too long time and are too expensive. This paper proposed a new predictor called iEnhancer-DCSV built on a modified densely connected convolutional network (DenseNet) and an improved convolutional block attention module (CBAM). Coding was performed using one-hot and nucleotide chemical property (NCP). DenseNet was used to extract advanced features from raw coding. The channel attention and spatial attention modules were used to evaluate the significance of the advanced features and then input into a fully connected neural network to yield the prediction probabilities. Finally, ensemble learning was employed on the final categorization findings via voting. According to the experimental results on the test set, the first layer of enhancer recognition achieved an accuracy of 78.95%, and the Matthews correlation coefficient value was 0.5809. The second layer of enhancer strength prediction achieved an accuracy of 80.70%, and the Matthews correlation coefficient value was 0.6609. The iEnhancer-DCSV method can be found at https://github.com/leirufeng/iEnhancer-DCSV. It is easy to obtain the desired results without using the complex mathematical formulas involved.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
- *Correspondence: Jianhua Jia, ; Rufeng Lei,
| | - Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
- *Correspondence: Jianhua Jia, ; Rufeng Lei,
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Genqiang Wu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, Nanchang, China
| |
Collapse
|
16
|
Genome-wide identification and characterization of DNA enhancers with a stacked multivariate fusion framework. PLoS Comput Biol 2022; 18:e1010779. [PMID: 36520922 PMCID: PMC9836277 DOI: 10.1371/journal.pcbi.1010779] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 01/12/2023] [Accepted: 11/29/2022] [Indexed: 12/23/2022] Open
Abstract
Enhancers are short non-coding DNA sequences outside of the target promoter regions that can be bound by specific proteins to increase a gene's transcriptional activity, which has a crucial role in the spatiotemporal and quantitative regulation of gene expression. However, enhancers do not have a specific sequence motifs or structures, and their scattered distribution in the genome makes the identification of enhancers from human cell lines particularly challenging. Here we present a novel, stacked multivariate fusion framework called SMFM, which enables a comprehensive identification and analysis of enhancers from regulatory DNA sequences as well as their interpretation. Specifically, to characterize the hierarchical relationships of enhancer sequences, multi-source biological information and dynamic semantic information are fused to represent regulatory DNA enhancer sequences. Then, we implement a deep learning-based sequence network to learn the feature representation of the enhancer sequences comprehensively and to extract the implicit relationships in the dynamic semantic information. Ultimately, an ensemble machine learning classifier is trained based on the refined multi-source features and dynamic implicit relations obtained from the deep learning-based sequence network. Benchmarking experiments demonstrated that SMFM significantly outperforms other existing methods using several evaluation metrics. In addition, an independent test set was used to validate the generalization performance of SMFM by comparing it to other state-of-the-art enhancer identification methods. Moreover, we performed motif analysis based on the contribution scores of different bases of enhancer sequences to the final identification results. Besides, we conducted interpretability analysis of the identified enhancer sequences based on attention weights of EnhancerBERT, a fine-tuned BERT model that provides new insights into exploring the gene semantic information likely to underlie the discovered enhancers in an interpretable manner. Finally, in a human placenta study with 4,562 active distal gene regulatory enhancers, SMFM successfully exposed tissue-related placental development and the differential mechanism, demonstrating the generalizability and stability of our proposed framework.
Collapse
|
17
|
iEnhancer-MRBF: Identifying enhancers and their strength with a multiple Laplacian-regularized radial basis function network. Methods 2022; 208:1-8. [DOI: 10.1016/j.ymeth.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/26/2022] [Accepted: 10/03/2022] [Indexed: 11/07/2022] Open
|
18
|
Abstract
Human accelerated regions (HARs) are the fastest-evolving sequences in the human genome. When HARs were discovered in 2006, their function was mysterious due to scant annotation of the noncoding genome. Diverse technologies, from transgenic animals to machine learning, have consistently shown that HARs function as gene regulatory enhancers with significant enrichment in neurodevelopment. It is now possible to quantitatively measure the enhancer activity of thousands of HARs in parallel and model how each nucleotide contributes to gene expression. These strategies have revealed that many human HAR sequences function differently than their chimpanzee orthologs, though individual nucleotide changes in the same HAR may have opposite effects, consistent with compensatory substitutions. To fully evaluate the role of HARs in human evolution, it will be necessary to experimentally and computationally dissect them across more cell types and developmental stages.
Collapse
Affiliation(s)
- Sean Whalen
- Gladstone Institute of Data Science and Biotechnology, San Francisco, California, USA; ,
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, California, USA; ,
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA
- Chan Zuckerberg Biohub, San Francisco, California, USA
| |
Collapse
|
19
|
Zug R, Uller T. Evolution and dysfunction of human cognitive and social traits: A transcriptional regulation perspective. EVOLUTIONARY HUMAN SCIENCES 2022; 4:e43. [PMID: 37588924 PMCID: PMC10426018 DOI: 10.1017/ehs.2022.42] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 08/11/2022] [Accepted: 09/11/2022] [Indexed: 11/07/2022] Open
Abstract
Evolutionary changes in brain and craniofacial development have endowed humans with unique cognitive and social skills, but also predisposed us to debilitating disorders in which these traits are disrupted. What are the developmental genetic underpinnings that connect the adaptive evolution of our cognition and sociality with the persistence of mental disorders with severe negative fitness effects? We argue that loss of function of genes involved in transcriptional regulation represents a crucial link between the evolution and dysfunction of human cognitive and social traits. The argument is based on the haploinsufficiency of many transcriptional regulator genes, which makes them particularly sensitive to loss-of-function mutations. We discuss how human brain and craniofacial traits evolved through partial loss of function (i.e. reduced expression) of these genes, a perspective compatible with the idea of human self-domestication. Moreover, we explain why selection against loss-of-function variants supports the view that mutation-selection-drift, rather than balancing selection, underlies the persistence of psychiatric disorders. Finally, we discuss testable predictions.
Collapse
Affiliation(s)
- Roman Zug
- Department of Biology, Lund University, Lund, Sweden
| | - Tobias Uller
- Department of Biology, Lund University, Lund, Sweden
| |
Collapse
|
20
|
Butt AH, Alkhalifah T, Alturise F, Khan YD. A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns. Sci Rep 2022; 12:15183. [PMID: 36071071 PMCID: PMC9452539 DOI: 10.1038/s41598-022-19099-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 08/24/2022] [Indexed: 11/26/2022] Open
Abstract
Enhancers regulate gene expression, by playing a crucial role in the synthesis of RNAs and proteins. They do not directly encode proteins or RNA molecules. In order to control gene expression, it is important to predict enhancers and their potency. Given their distance from the target gene, lack of common motifs, and tissue/cell specificity, enhancer regions are thought to be difficult to predict in DNA sequences. Recently, a number of bioinformatics tools were created to distinguish enhancers from other regulatory components and to pinpoint their advantages. However, because the quality of its prediction method needs to be improved, its practical application value must also be improved. Based on nucleotide composition and statistical moment-based features, the current study suggests a novel method for identifying enhancers and non-enhancers and evaluating their strength. The proposed study outperformed state-of-the-art techniques using fivefold and tenfold cross-validation in terms of accuracy. The accuracy from the current study results in 86.5% and 72.3% in enhancer site and its strength prediction respectively. The results of the suggested methodology point to the potential for more efficient and successful outcomes when statistical moment-based features are used. The current study's source code is available to the research community at https://github.com/csbioinfopk/enpred.
Collapse
Affiliation(s)
- Ahmad Hassan Butt
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Saudi Arabia.
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
21
|
Zeng L, Liu Y, Yu ZG, Liu Y. iEnhancer-DLRA: identification of enhancers and their strengths by a self-attention fusion strategy for local and global features. Brief Funct Genomics 2022; 21:399-407. [PMID: 35942693 DOI: 10.1093/bfgp/elac023] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/30/2022] [Accepted: 07/12/2022] [Indexed: 11/14/2022] Open
Abstract
Identification and classification of enhancers are highly significant because they play crucial roles in controlling gene transcription. Recently, several deep learning-based methods for identifying enhancers and their strengths have been developed. However, existing methods are usually limited because they use only local or only global features. The combination of local and global features is critical to further improve the prediction performance. In this work, we propose a novel deep learning-based method, called iEnhancer-DLRA, to identify enhancers and their strengths. iEnhancer-DLRA extracts local and multi-scale global features of sequences by using a residual convolutional network and two bidirectional long short-term memory networks. Then, a self-attention fusion strategy is proposed to deeply integrate these local and global features. The experimental results on the independent test dataset indicate that iEnhancer-DLRA performs better than nine existing state-of-the-art methods in both identification and classification of enhancers in almost all metrics. iEnhancer-DLRA achieves 13.8% (for identifying enhancers) and 12.6% (for classifying strengths) improvement in accuracy compared with the best existing state-of-the-art method. This is the first time that the accuracy of an enhancer identifier exceeds 0.9 and the accuracy of the enhancer classifier exceeds 0.8 on the independent test set. Moreover, iEnhancer-DLRA achieves superior predictive performance on the rice dataset compared with the state-of-the-art method RiceENN.
Collapse
Affiliation(s)
- Li Zeng
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, 411105, Xiangtan, China
| | - Yang Liu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, 411105, Xiangtan, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, 411105, Xiangtan, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| |
Collapse
|
22
|
Huang G, Luo W, Zhang G, Zheng P, Yao Y, Lyu J, Liu Y, Wei DQ. Enhancer-LSTMAtt: A Bi-LSTM and Attention-Based Deep Learning Method for Enhancer Recognition. Biomolecules 2022; 12:biom12070995. [PMID: 35883552 PMCID: PMC9313278 DOI: 10.3390/biom12070995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/03/2022] [Accepted: 07/07/2022] [Indexed: 01/27/2023] Open
Abstract
Enhancers are short DNA segments that play a key role in biological processes, such as accelerating transcription of target genes. Since the enhancer resides anywhere in a genome sequence, it is difficult to precisely identify enhancers. We presented a bi-directional long-short term memory (Bi-LSTM) and attention-based deep learning method (Enhancer-LSTMAtt) for enhancer recognition. Enhancer-LSTMAtt is an end-to-end deep learning model that consists mainly of deep residual neural network, Bi-LSTM, and feed-forward attention. We extensively compared the Enhancer-LSTMAtt with 19 state-of-the-art methods by 5-fold cross validation, 10-fold cross validation and independent test. Enhancer-LSTMAtt achieved competitive performances, especially in the independent test. We realized Enhancer-LSTMAtt into a user-friendly web application. Enhancer-LSTMAtt is applicable not only to recognizing enhancers, but also to distinguishing strong enhancer from weak enhancers. Enhancer-LSTMAtt is believed to become a promising tool for identifying enhancers.
Collapse
Affiliation(s)
- Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (W.L.); (G.Z.); (P.Z.); (J.L.)
- Correspondence:
| | - Wei Luo
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (W.L.); (G.Z.); (P.Z.); (J.L.)
| | - Guiyang Zhang
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (W.L.); (G.Z.); (P.Z.); (J.L.)
| | - Peijie Zheng
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (W.L.); (G.Z.); (P.Z.); (J.L.)
| | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China;
| | - Jianyi Lyu
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (W.L.); (G.Z.); (P.Z.); (J.L.)
| | - Yuewu Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410083, China;
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China;
| |
Collapse
|
23
|
Gao Y, Chen Y, Feng H, Zhang Y, Yue Z. RicENN: Prediction of Rice Enhancers with Neural Network Based on DNA Sequences. Interdiscip Sci 2022; 14:555-565. [PMID: 35190950 DOI: 10.1007/s12539-022-00503-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 01/07/2022] [Accepted: 01/18/2022] [Indexed: 01/22/2023]
Abstract
Enhancers are the primary cis-elements of transcriptional regulation and play a vital role in gene expression at different stages of plant growth and development. Having high locational variation and free scattering in non-encoding genomes, identification of enhancers is a crucial, but challenging work in understanding the biological mechanism of model plants. Recently, applications of neural network models are gaining increasing popularity in predicting the function of genomic elements. Although several computational models have shown great advantages to tackle this challenge, a further study of the identification of rice enhancers from DNA sequences is still lacking. We present RicENN, a novel deep learning framework capable of accurately identifying enhancers of rice, integrating convolution neural networks (CNNs), bi-directional recurrent neural networks (RNNs), and attention mechanisms. A combined-feature representation method was designed to extract the sequence features from original DNA sequences using six types of autocorrelation encodings. Moreover, we verified that the integrated model achieves the best performance by an ablation study. Finally, our deep learning framework realized a reliable prediction of the rice enhancers. The results show RicENN outperforms available alternative approaches in rice species, achieving the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) of 0.960 and 0.960 on cross-validation, and 0.879 and 0.877 during independent tests, respectively. This study develops a hybrid model to combine the merits of different neural network architectures, which shows the potential ability to apply deep learning in bioinformatic sequences and contributes to the acceleration of functional genomic studies of rice. RicENN and its code are freely accessible at http://bioinfor.aielab.cc/RicENN/ .
Collapse
Affiliation(s)
- Yujia Gao
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Yiqiong Chen
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Haisong Feng
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Youhua Zhang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Zhenyu Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
24
|
Mulero Hernández J, Fernández-Breis JT. Analysis of the landscape of human enhancer sequences in biological databases. Comput Struct Biotechnol J 2022; 20:2728-2744. [PMID: 35685360 PMCID: PMC9168495 DOI: 10.1016/j.csbj.2022.05.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 12/01/2022] Open
Abstract
The process of gene regulation extends as a network in which both genetic sequences and proteins are involved. The levels of regulation and the mechanisms involved are multiple. Transcription is the main control mechanism for most genes, being the downstream steps responsible for refining the transcription patterns. In turn, gene transcription is mainly controlled by regulatory events that occur at promoters and enhancers. Several studies are focused on analyzing the contribution of enhancers in the development of diseases and their possible use as therapeutic targets. The study of regulatory elements has advanced rapidly in recent years with the development and use of next generation sequencing techniques. All this information has generated a large volume of information that has been transferred to a growing number of public repositories that store this information. In this article, we analyze the content of those public repositories that contain information about human enhancers with the aim of detecting whether the knowledge generated by scientific research is contained in those databases in a way that could be computationally exploited. The analysis will be based on three main aspects identified in the literature: types of enhancers, type of evidence about the enhancers, and methods for detecting enhancer-promoter interactions. Our results show that no single database facilitates the optimal exploitation of enhancer data, most types of enhancers are not represented in the databases and there is need for a standardized model for enhancers. We have identified major gaps and challenges for the computational exploitation of enhancer data.
Collapse
Affiliation(s)
- Juan Mulero Hernández
- Dept. Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Spain
| | | |
Collapse
|
25
|
Amilpur S, Bhukya R. A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction. J Bioinform Comput Biol 2022; 20:2250005. [PMID: 35264081 DOI: 10.1142/s0219720022500056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Enhancers are short regulatory DNA fragments that are bound with proteins called activators. They are free-bound and distant elements, which play a vital role in controlling gene expression. It is challenging to identify enhancers and their strength due to their dynamic nature. Although some machine learning methods exist to accelerate identification process, their prediction accuracy and efficiency will need more improvement. In this regard, we propose a two-layer prediction model with enhanced feature extraction strategy which does feature combination from improved position-specific amino acid propensity (PSTKNC) method along with Enhanced Nucleic Acid Composition (ENAC) and Composition of k-spaced Nucleic Acid Pairs (CKSNAP). The feature sets from all three feature extraction approaches were concatenated and then sent through a simple artificial neural network (ANN) to accurately identify enhancers in the first layer and their strength in the second layer. Experiments are conducted on benchmark chromatin nine cell lines dataset. A 10-fold cross validation method is employed to evaluate model's performance. The results show that the proposed model gives an outstanding performance with 94.50%, 0.8903 of accuracy and Matthew's correlation coefficient (MCC) in predicting enhancers and fairly does well with independent test also when compared with all other existing methods.
Collapse
Affiliation(s)
- Santhosh Amilpur
- Computer Science and Engineering, National Institute of Technology Warangal, Warangal Telangana 506004, India
| | - Raju Bhukya
- Computer Science and Engineering, National Institute of Technology Warangal, Warangal Telangana 506004, India
| |
Collapse
|
26
|
iEnhancer-Deep: A Computational Predictor for Enhancer Sites and Their Strength Using Deep Learning. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12042120] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Enhancers are short motifs that contain high position variability and free scattering. Identifying these non-coding DNA fragments and their strength is vital because they play an important role in the control of gene regulation. Enhancer identification is more complicated than other genetic factors due to free scattering and their very high amount of locational variation. To classify this biological difficulty, several computational tools in bioinformatics have been created over the last few years as current learning models are still lacking. To overcome these limitations, we introduce iEnhancer-Deep, a deep learning-based framework that uses One-Hot Encoding and a convolutional neural network for model construction, primarily for the identification of enhancers and secondarily for the classification of their strength. Parallels between the iEnhancer-Deep and existing state-of-the-art methodologies were drawn to evaluate the performance of the proposed model. Furthermore, a cross-species test was carried out to assess the generalizability of the proposed model. In general, the results show that the proposed model produced comparable results with the state-of-the-art models.
Collapse
|
27
|
Jain M, Garg R. Enhancers as potential targets for engineering salinity stress tolerance in crop plants. PHYSIOLOGIA PLANTARUM 2021; 173:1382-1391. [PMID: 33837536 DOI: 10.1111/ppl.13421] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 03/19/2021] [Accepted: 04/06/2021] [Indexed: 06/12/2023]
Abstract
Enhancers represent noncoding regulatory regions of the genome located distantly from their target genes. They regulate gene expression programs in a context-specific manner via interacting with promoters of one or more target genes and are generally associated with transcription factor binding sites and epi(genomic)/chromatin features, such as regions of chromatin accessibility and histone modifications. The enhancers are difficult to identify due to the modularity of their associated features. Although enhancers have been studied extensively in human and animals, only a handful of them has been identified in few plant species till date due to nonavailability of plant-specific experimental and computational approaches for their discovery. Being an important regulatory component of the genome, enhancers represent potential targets for engineering agronomic traits, including salinity stress tolerance in plants. Here, we provide a review of the available experimental and computational approaches along with the associated sequence and chromatin/epigenetic features for the discovery of enhancers in plants. In addition, we provide insights into the challenges and future prospects of enhancer research in plant biology with emphasis on potential applications in engineering salinity stress tolerance in crop plants.
Collapse
Affiliation(s)
- Mukesh Jain
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Rohini Garg
- Department of Life Sciences, School of Natural Sciences, Shiv Nadar University, Gautam Buddha Nagar, Uttar Pradesh, India
| |
Collapse
|
28
|
Lyu Y, Zhang Z, Li J, He W, Ding Y, Guo F. iEnhancer-KL: A Novel Two-Layer Predictor for Identifying Enhancers by Position Specific of Nucleotide Composition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2809-2815. [PMID: 33481715 DOI: 10.1109/tcbb.2021.3053608] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
An enhancer is a short region of DNA with the ability to recruit transcription factors and their complexes, increasing the likelihood of the transcription of a particular gene. Considering the importance of enhancers, enhancer identification is a prevailing problem in computational biology. In this paper, we propose a novel two-layer enhancer predictor called iEnhancer-KL, using computational biology algorithms to identify enhancers and then classify these enhancers into strong or weak types. Kullback-Leibler (KL) divergence is creatively taken into consideration to improve the feature extraction method PSTNPss. Then, LASSO is used to reduce the dimension of features and finally helps to get better prediction performance. Furthermore, the selected features are tested on several machine learning models, and the SVM algorithm achieves the best performance. The rigorous cross-validation indicates that our predictor is remarkably superior to the existing state-of-the-art methods with an Acc of 84.23 percent and the MCC of 0.6849 for identifying enhancers. Our code and results can be freely downloaded from https://github.com/Not-so-middle/iEnhancer-KL.git.
Collapse
|
29
|
Liang Y, Zhang S, Qiao H, Cheng Y. iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:8797-8814. [PMID: 34814323 DOI: 10.3934/mbe.2021434] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Enhancer is a non-coding DNA fragment that can be bound with proteins to activate transcription of a gene, hence play an important role in regulating gene expression. Enhancer identification is very challenging and more complicated than other genetic factors due to their position variation and free scattering. In addition, it has been proved that genetic variation in enhancers is related to human diseases. Therefore, identification of enhancers and their strength has important biological meaning. In this paper, a novel model named iEnhancer-MFGBDT is developed to identify enhancer and their strength by fusing multiple features and gradient boosting decision tree (GBDT). Multiple features include k-mer and reverse complement k-mer nucleotide composition based on DNA sequence, and second-order moving average, normalized Moreau-Broto auto-cross correlation and Moran auto-cross correlation based on dinucleotide physical structural property matrix. Then we use GBDT to select features and perform classification successively. The accuracies reach 78.67% and 66.04% for identifying enhancers and their strength on the benchmark dataset, respectively. Compared with other models, the results show that our model is useful and effective intelligent tool to identify enhancers and their strength, of which the datasets and source codes are available at https://github.com/shengli0201/iEnhancer-MFGBDT1.
Collapse
Affiliation(s)
- Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Huijuan Qiao
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Yinan Cheng
- Department of Statistics, University of California at Davis, Davis, CA 95616, USA
| |
Collapse
|
30
|
Libé-Philippot B, Vanderhaeghen P. Cellular and Molecular Mechanisms Linking Human Cortical Development and Evolution. Annu Rev Genet 2021; 55:555-581. [PMID: 34535062 DOI: 10.1146/annurev-genet-071719-020705] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The cerebral cortex is at the core of brain functions that are thought to be particularly developed in the human species. Human cortex specificities stem from divergent features of corticogenesis, leading to increased cortical size and complexity. Underlying cellular mechanisms include prolonged patterns of neuronal generation and maturation, as well as the amplification of specific types of stem/progenitor cells. While the gene regulatory networks of corticogenesis appear to be largely conserved among all mammals including humans, they have evolved in primates, particularly in the human species, through the emergence of rapidly divergent transcriptional regulatory elements, as well as recently duplicated novel genes. These human-specific molecular features together control key cellular milestones of human corticogenesis and are often affected in neurodevelopmental disorders, thus linking human neural development, evolution, and diseases. Expected final online publication date for the Annual Review of Genetics, Volume 55 is November 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Baptiste Libé-Philippot
- VIB-KU Leuven Center for Brain & Disease Research, KU Leuven Department of Neurosciences, Leuven Brain Institute, 3000 Leuven, Belgium; .,Institut de Recherches Interdisciplinaires en Biologie Humaine et Moléculaire (IRIBHM) and ULB Neuroscience Institute (UNI), Université Libre de Bruxelles (ULB), 1070 Brussels, Belgium
| | - Pierre Vanderhaeghen
- VIB-KU Leuven Center for Brain & Disease Research, KU Leuven Department of Neurosciences, Leuven Brain Institute, 3000 Leuven, Belgium; .,Institut de Recherches Interdisciplinaires en Biologie Humaine et Moléculaire (IRIBHM) and ULB Neuroscience Institute (UNI), Université Libre de Bruxelles (ULB), 1070 Brussels, Belgium
| |
Collapse
|
31
|
iEnhancer-RD: Identification of enhancers and their strength using RKPK features and deep neural networks. Anal Biochem 2021; 630:114318. [PMID: 34364858 DOI: 10.1016/j.ab.2021.114318] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 07/02/2021] [Accepted: 07/27/2021] [Indexed: 11/20/2022]
Abstract
Enhancers are regulatory elements involved in gene expression.It is a part of DNA, which can enhance the transcription rate of gene. However, the identification of enhancer by biological experimental methods is time-consuming and expensive. Therefore, there is an urgent need for more efficient methods to identify them.In this study, we propose a new feature extraction method RKPK, which combines three feature methods and uses the recursive feature elimination algorithm for feature selection, and apply deep neural network as classifier to construct the iEnhancer-RD calculation method for enhancer identification. It is a two-layer classification architecture in which the first layer(layer I) identifies enhancers from a set of DNA sequences, and the second layer(layer II) divides the identified enhancers into two subgroups, namely strong and weak enhancers. Independent dataset test indicates that the proposed method is significantly better than most existing methods, and attains the accuracy of 78.8% and 70.5% in the two layers, respectively. Our iEnhancer-RD architecture is implemented in Python and is available at https://github.com/YangHuan639/iEnhancer-RD.
Collapse
|
32
|
Basith S, Hasan MM, Lee G, Wei L, Manavalan B. Integrative machine learning framework for the identification of cell-specific enhancers from the human genome. Brief Bioinform 2021; 22:6315815. [PMID: 34226917 DOI: 10.1093/bib/bbab252] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 06/08/2021] [Accepted: 06/14/2021] [Indexed: 02/06/2023] Open
Abstract
Enhancers are deoxyribonucleic acid (DNA) fragments which when bound by transcription factors enhance the transcription of related genes. Due to its sporadic distribution and similar fractions, identification of enhancers from the human genome seems a daunting task. Compared to the traditional experimental approaches, computational methods with easy-to-use platforms could be efficiently applied to annotate enhancers' functions and physiological roles. In this aspect, several bioinformatics tools have been developed to identify enhancers. Despite their spectacular performances, existing methods have certain drawbacks and limitations, including fixed length of sequences being utilized for model development and cell-specificity negligence. A novel predictor would be beneficial in the context of genome-wide enhancer prediction by addressing the above-mentioned issues. In this study, we constructed new datasets for eight different cell types. Utilizing these data, we proposed an integrative machine learning (ML)-based framework called Enhancer-IF for identifying cell-specific enhancers. Enhancer-IF comprehensively explores a wide range of heterogeneous features with five commonly used ML methods (random forest, extremely randomized tree, multilayer perceptron, support vector machine and extreme gradient boosting). Specifically, these five classifiers were trained with seven encodings and obtained 35 baseline models. The output of these baseline models was integrated and again inputted to five classifiers for the construction of five meta-models. Finally, the integration of five meta-models through ensemble learning improved the model robustness. Our proposed approach showed an excellent prediction performance compared to the baseline models on both training and independent datasets in different cell types, thus highlighting the superiority of our approach in the identification of the enhancers. We assume that Enhancer-IF will be a valuable tool for screening and identifying potential enhancers from the human DNA sequences.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| | - Md Mehedi Hasan
- Tulane University, USA.,Kyushu Institute of Technology, Japan
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| | - Leyi Wei
- Xiamen University, China.,Shandong University, China
| | | |
Collapse
|
33
|
Cai L, Ren X, Fu X, Peng L, Gao M, Zeng X. iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor. Bioinformatics 2021; 37:1060-1067. [PMID: 33119044 DOI: 10.1093/bioinformatics/btaa914] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 09/30/2020] [Accepted: 10/15/2020] [Indexed: 01/10/2023] Open
Abstract
MOTIVATION Enhancers are non-coding DNA fragments with high position variability and free scattering. They play an important role in controlling gene expression. As machine learning has become more widely used in identifying enhancers, a number of bioinformatic tools have been developed. Although several models for identifying enhancers and their strengths have been proposed, their accuracy and efficiency have yet to be improved. RESULTS We propose a two-layer predictor called 'iEnhancer-XG.' It comprises a one-layer predictor (for identifying enhancers) and a second classifier (for their strength) and uses 'XGBoost' as a base classifier and five feature extraction methods, namely, k-Spectrum Profile, Mismatch k-tuple, Subsequence Profile, Position-specific scoring matrix (PSSM) and Pseudo dinucleotide composition (PseDNC). Each method has an independent output. We place the feature vector matrix into the ensemble learning for fusion. This experiment involves the method of 'SHapley Additive explanations' to provide interpretability for the previous black box machine learning methods and improve their credibility. The accuracies of the ensemble learning method are 0.811 (first layer) and 0.657 (second layer). The rigorous 10-fold cross-validation confirms that the proposed method is significantly better than existing technologies. AVAILABILITY AND IMPLEMENTATION The source code and dataset for the enhancer predictions have been uploaded to https://github.com/jimmyrate/ienhancer-xg. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lijun Cai
- College of Computer Science and Electronic Engineering, Hunan University, 410082 Changsha, Hunan, China
| | - Xuanbai Ren
- College of Computer Science and Electronic Engineering, Hunan University, 410082 Changsha, Hunan, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, 410082 Changsha, Hunan, China
| | - Li Peng
- College of Computer Science and Engineering, Hunan University of Science and Technology, 411103 XiangTan, China
| | - Mingyu Gao
- College of Computer Science and Electronic Engineering, Hunan University, 410082 Changsha, Hunan, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, 410082 Changsha, Hunan, China
| |
Collapse
|
34
|
Parisi C, Vashisht S, Winata CL. Fish-Ing for Enhancers in the Heart. Int J Mol Sci 2021; 22:3914. [PMID: 33920121 PMCID: PMC8069060 DOI: 10.3390/ijms22083914] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/07/2021] [Accepted: 04/08/2021] [Indexed: 12/19/2022] Open
Abstract
Precise control of gene expression is crucial to ensure proper development and biological functioning of an organism. Enhancers are non-coding DNA elements which play an essential role in regulating gene expression. They contain specific sequence motifs serving as binding sites for transcription factors which interact with the basal transcription machinery at their target genes. Heart development is regulated by intricate gene regulatory network ensuring precise spatiotemporal gene expression program. Mutations affecting enhancers have been shown to result in devastating forms of congenital heart defect. Therefore, identifying enhancers implicated in heart biology and understanding their mechanism is key to improve diagnosis and therapeutic options. Despite their crucial role, enhancers are poorly studied, mainly due to a lack of reliable way to identify them and determine their function. Nevertheless, recent technological advances have allowed rapid progress in enhancer discovery. Model organisms such as the zebrafish have contributed significant insights into the genetics of heart development through enabling functional analyses of genes and their regulatory elements in vivo. Here, we summarize the current state of knowledge on heart enhancers gained through studies in model organisms, discuss various approaches to discover and study their function, and finally suggest methods that could further advance research in this field.
Collapse
Affiliation(s)
- Costantino Parisi
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland; (C.P.); (S.V.)
| | - Shikha Vashisht
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland; (C.P.); (S.V.)
| | - Cecilia Lanny Winata
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland; (C.P.); (S.V.)
- Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany
| |
Collapse
|
35
|
Benito-Kwiecinski S, Giandomenico SL, Sutcliffe M, Riis ES, Freire-Pritchett P, Kelava I, Wunderlich S, Martin U, Wray GA, McDole K, Lancaster MA. An early cell shape transition drives evolutionary expansion of the human forebrain. Cell 2021; 184:2084-2102.e19. [PMID: 33765444 PMCID: PMC8054913 DOI: 10.1016/j.cell.2021.02.050] [Citation(s) in RCA: 118] [Impact Index Per Article: 39.3] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 12/10/2020] [Accepted: 02/22/2021] [Indexed: 12/12/2022]
Abstract
The human brain has undergone rapid expansion since humans diverged from other great apes, but the mechanism of this human-specific enlargement is still unknown. Here, we use cerebral organoids derived from human, gorilla, and chimpanzee cells to study developmental mechanisms driving evolutionary brain expansion. We find that neuroepithelial differentiation is a protracted process in apes, involving a previously unrecognized transition state characterized by a change in cell shape. Furthermore, we show that human organoids are larger due to a delay in this transition, associated with differences in interkinetic nuclear migration and cell cycle length. Comparative RNA sequencing (RNA-seq) reveals differences in expression dynamics of cell morphogenesis factors, including ZEB2, a known epithelial-mesenchymal transition regulator. We show that ZEB2 promotes neuroepithelial transition, and its manipulation and downstream signaling leads to acquisition of nonhuman ape architecture in the human context and vice versa, establishing an important role for neuroepithelial cell shape in human brain expansion. Human brain organoids are expanded relative to nonhuman apes prior to neurogenesis Ape neural progenitors go through a newly identified transition morphotype state Delayed morphological transition with shorter cell cycles underlie human expansion ZEB2 is as an evolutionary regulator of this transition
Collapse
Affiliation(s)
- Silvia Benito-Kwiecinski
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Stefano L Giandomenico
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Magdalena Sutcliffe
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Erlend S Riis
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Wilberforce Road, Cambridge CB3 0WA, UK
| | - Paula Freire-Pritchett
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Iva Kelava
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Stephanie Wunderlich
- Leibniz Research Laboratories for Biotechnology and Artificial Organs (LEBAO), REBIRTH-Research Center for Translational and Regenerative Medicine, Hannover Medical School, 30625 Hannover, Germany; Biomedical Research in Endstage and Obstructive Lung Disease (BREATH), Member of the German Center for Lung Research (DZL), Hannover Medical School, 30625 Hannover, Germany
| | - Ulrich Martin
- Leibniz Research Laboratories for Biotechnology and Artificial Organs (LEBAO), REBIRTH-Research Center for Translational and Regenerative Medicine, Hannover Medical School, 30625 Hannover, Germany; Biomedical Research in Endstage and Obstructive Lung Disease (BREATH), Member of the German Center for Lung Research (DZL), Hannover Medical School, 30625 Hannover, Germany
| | - Gregory A Wray
- Department of Biology, Duke University, Biological Sciences Building, 124 Science Drive, Durham, NC 27708, USA
| | - Kate McDole
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Madeline A Lancaster
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge CB2 0QH, UK.
| |
Collapse
|
36
|
Mu X, Wang Y, Duan M, Liu S, Li F, Wang X, Zhang K, Huang L, Zhou F. A Novel Position-Specific Encoding Algorithm (SeqPose) of Nucleotide Sequences and Its Application for Detecting Enhancers. Int J Mol Sci 2021; 22:ijms22063079. [PMID: 33802922 PMCID: PMC8002641 DOI: 10.3390/ijms22063079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 03/04/2021] [Accepted: 03/11/2021] [Indexed: 11/16/2022] Open
Abstract
Enhancers are short genomic regions exerting tissue-specific regulatory roles, usually for remote coding regions. Enhancers are observed in both prokaryotic and eukaryotic genomes, and their detections facilitate a better understanding of the transcriptional regulation mechanism. The accurate detection and transcriptional regulation strength evaluation of the enhancers remain a major bioinformatics challenge. Most of the current studies utilized the statistical features of short fixed-length nucleotide sequences. This study introduces the location information of each k-mer (SeqPose) into the encoding strategy of a DNA sequence and employs the attention mechanism in the two-layer bi-directional long-short term memory (BD-LSTM) model (spEnhancer) for the enhancer detection problem. The first layer of the delivered classifier discriminates between enhancers and non-enhancers, and the second layer evaluates the transcriptional regulation strength of the detected enhancer. The SeqPose-encoded features are selected by the Chi-squared test, and 45 positions are removed from further analysis. The existing studies may focus on selecting the statistical DNA sequence descriptors with large contributions to the prediction models. This study does not utilize these statistical DNA sequence descriptors. Then the word vector of the SeqPose-encoded features is obtained by using the word embedding layer. This study hypothesizes that different word vector features may contribute differently to the enhancer detection model, and assigns different weights to these word vectors through the attention mechanism in the BD-LSTM model. The previous study generously provided the training and independent test datasets, and the proposed spEnhancer is compared with the three existing state-of-the-art studies using the same experimental procedure. The leave-one-out validation data on the training dataset shows that the proposed spEnhancer achieves similar detection performances as the three existing studies. While spEnhancer achieves the best overall performance metric MCC for both of the two binary classification problems on the independent test dataset. The experimental data shows that the strategy of removing redundant positions (SeqPose) may help improve the DNA sequence-based prediction models. spEnhancer may serve well as a complementary model to the existing studies, especially for the novel query enhancers that are not included in the training dataset.
Collapse
Affiliation(s)
- Xuechen Mu
- Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
- School of Mathematics, Jilin University, Changchun 130012, China; (X.W.); (K.Z.)
| | - Yueying Wang
- Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun 130021, China
| | - Meiyu Duan
- Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
| | - Shuai Liu
- Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
| | - Fei Li
- Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
| | - Xiuli Wang
- School of Mathematics, Jilin University, Changchun 130012, China; (X.W.); (K.Z.)
| | - Kai Zhang
- School of Mathematics, Jilin University, Changchun 130012, China; (X.W.); (K.Z.)
| | - Lan Huang
- Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
| | - Fengfeng Zhou
- Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
- Correspondence: or
| |
Collapse
|
37
|
Zeng W, Chen S, Cui X, Chen X, Gao Z, Jiang R. SilencerDB: a comprehensive database of silencers. Nucleic Acids Res 2021; 49:D221-D228. [PMID: 33045745 PMCID: PMC7778955 DOI: 10.1093/nar/gkaa839] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/14/2020] [Accepted: 09/18/2020] [Indexed: 12/20/2022] Open
Abstract
Gene regulatory elements, including promoters, enhancers, silencers, etc., control transcriptional programs in a spatiotemporal manner. Though these elements are known to be able to induce either positive or negative transcriptional control, the community has been mostly studying enhancers which amplify transcription initiation, with less emphasis given to silencers which repress gene expression. To facilitate the study of silencers and the investigation of their potential roles in transcriptional control, we developed SilencerDB (http://health.tsinghua.edu.cn/silencerdb/), a comprehensive database of silencers by manually curating silencers from 2300 published articles. The current version, SilencerDB 1.0, contains (1) 33 060 validated silencers from experimental methods, and (ii) 5 045 547 predicted silencers from state-of-the-art machine learning methods. The functionality of SilencerDB includes (a) standardized categorization of silencers in a tree-structured class hierarchy based on species, organ, tissue and cell line and (b) comprehensive annotations of silencers with the nearest gene and potential regulatory genes. SilencerDB, to the best of our knowledge, is the first comprehensive database at this scale dedicated to silencers, with reliable annotations and user-friendly interactive database features. We believe this database has the potential to enable advanced understanding of silencers in regulatory mechanisms and to empower researchers to devise diverse applications of silencers in disease development.
Collapse
Affiliation(s)
- Wanwen Zeng
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China.,College of Software, Nankai University, Tianjin 300071, China
| | - Shengquan Chen
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuejian Cui
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
38
|
Schreiber J, Singh R, Bilmes J, Noble WS. A pitfall for machine learning methods aiming to predict across cell types. Genome Biol 2020; 21:282. [PMID: 33213499 PMCID: PMC7678316 DOI: 10.1186/s13059-020-02177-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Accepted: 10/07/2020] [Indexed: 01/19/2023] Open
Abstract
Machine learning models that predict genomic activity are most useful when they make accurate predictions across cell types. Here, we show that when the training and test sets contain the same genomic loci, the resulting model may falsely appear to perform well by effectively memorizing the average activity associated with each locus across the training cell types. We demonstrate this phenomenon in the context of predicting gene expression and chromatin domain boundaries, and we suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data becomes available, future projects will increasingly risk suffering from this issue.
Collapse
Affiliation(s)
- Jacob Schreiber
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA
| | - Ritambhara Singh
- Department of Genome Science, University of Washington, Seattle, USA.,Current Affiliation: Department of Computer Science, and Center for Computational Molecular Biology, Brown University, Providence, 02906, RI, United States
| | - Jeffrey Bilmes
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA.,Department of Electrical & Computer Engineering, University of Washington, Seattle, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA. .,Department of Genome Science, University of Washington, Seattle, USA.
| |
Collapse
|
39
|
Babenko V, Babenko R, Orlov Y. Analyzing a putative enhancer of optic disc morphology. BMC Genet 2020; 21:73. [PMID: 33092545 PMCID: PMC7583307 DOI: 10.1186/s12863-020-00873-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 06/23/2020] [Indexed: 01/06/2023] Open
Abstract
Background Genome-wide association studies have identified the CDC7-TGFBR3 intergenic region on chromosome 1 to be strongly associated with optic disc area size. The mechanism of its function remained unclear until new data on eQTL markers emerged from the Genotype-Tissue Expression project. The target region was found to contain a strong silencer of the distal (800 kb) Transcription Factor (TF) gene GFI1 (Growth Factor Independent Transcription Repressor 1) specifically in neuroendocrine cells (pituitary gland). GFI1 has also been reported to be involved in the development of sensory neurons and hematopoiesis. Therefore, GFI1, being a developmental gene, is likely to affect optic disc area size by altering the expression of the associated genes via long-range interactions. Results Distribution of haplotypes in the putative enhancer region has been assessed using the data on four continental supergroups generated by the 1000 Genomes Project. The East Asian (EAS) populations were shown to manifest a highly homogenous unimodal haplotype distribution pattern within the region with the major haplotype occurring with the frequency of 0.9. Another European specific haplotype was observed with the frequency of 0.21. The major haplotype appears to be involved in silencing GFI1repressor gene expression, which might be the cause of increased optic disc area characteristic of the EAS populations. The enhancer/eQTL region overlaps AluJo element, which implies that this particular regulatory element is primate-specific and confined to few tissues. Conclusion Population specific distribution of GFI1 enhancer alleles may predispose certain ethnic groups to glaucoma.
Collapse
Affiliation(s)
- Vladimir Babenko
- Institute of Cytology and Genetics, Lavrentyeva 10, Novosibirsk, 630090, Russia. .,Novosibirsk State University, Pirogova Str 2, Novosibirsk, 630090, Russia.
| | - Roman Babenko
- Institute of Cytology and Genetics, Lavrentyeva 10, Novosibirsk, 630090, Russia.,Novosibirsk State University, Pirogova Str 2, Novosibirsk, 630090, Russia
| | - Yuri Orlov
- Institute of Cytology and Genetics, Lavrentyeva 10, Novosibirsk, 630090, Russia.,Novosibirsk State University, Pirogova Str 2, Novosibirsk, 630090, Russia.,I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Trubetskaya 8-2, Moscow, 119991, Russia
| |
Collapse
|
40
|
Tobias IC, Abatti LE, Moorthy SD, Mullany S, Taylor T, Khader N, Filice MA, Mitchell JA. Transcriptional enhancers: from prediction to functional assessment on a genome-wide scale. Genome 2020; 64:426-448. [PMID: 32961076 DOI: 10.1139/gen-2020-0104] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Enhancers are cis-regulatory sequences located distally to target genes. These sequences consolidate developmental and environmental cues to coordinate gene expression in a tissue-specific manner. Enhancer function and tissue specificity depend on the expressed set of transcription factors, which recognize binding sites and recruit cofactors that regulate local chromatin organization and gene transcription. Unlike other genomic elements, enhancers are challenging to identify because they function independently of orientation, are often distant from their promoters, have poorly defined boundaries, and display no reading frame. In addition, there are no defined genetic or epigenetic features that are unambiguously associated with enhancer activity. Over recent years there have been developments in both empirical assays and computational methods for enhancer prediction. We review genome-wide tools, CRISPR advancements, and high-throughput screening approaches that have improved our ability to both observe and manipulate enhancers in vitro at the level of primary genetic sequences, chromatin states, and spatial interactions. We also highlight contemporary animal models and their importance to enhancer validation. Together, these experimental systems and techniques complement one another and broaden our understanding of enhancer function in development, evolution, and disease.
Collapse
Affiliation(s)
- Ian C Tobias
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Luis E Abatti
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Sakthi D Moorthy
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Shanelle Mullany
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Tiegh Taylor
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Nawrah Khader
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Mario A Filice
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Jennifer A Mitchell
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| |
Collapse
|
41
|
Markenscoff-Papadimitriou E, Whalen S, Przytycki P, Thomas R, Binyameen F, Nowakowski TJ, Kriegstein AR, Sanders SJ, State MW, Pollard KS, Rubenstein JL. A Chromatin Accessibility Atlas of the Developing Human Telencephalon. Cell 2020; 182:754-769.e18. [PMID: 32610082 PMCID: PMC7415678 DOI: 10.1016/j.cell.2020.06.002] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 03/16/2020] [Accepted: 05/29/2020] [Indexed: 12/26/2022]
Abstract
To discover regulatory elements driving the specificity of gene expression in different cell types and regions of the developing human brain, we generated an atlas of open chromatin from nine dissected regions of the mid-gestation human telencephalon, as well as microdissected upper and deep layers of the prefrontal cortex. We identified a subset of open chromatin regions (OCRs), termed predicted regulatory elements (pREs), that are likely to function as developmental brain enhancers. pREs showed temporal, regional, and laminar differences in chromatin accessibility and were correlated with gene expression differences across regions and gestational ages. We identified two functional de novo variants in a pRE for autism risk gene SLC6A1, and using CRISPRa, demonstrated that this pRE regulates SCL6A1. Additionally, mouse transgenic experiments validated enhancer activity for pREs proximal to FEZF2 and BCL11A. Thus, this atlas serves as a resource for decoding neurodevelopmental gene regulation in health and disease.
Collapse
Affiliation(s)
- Eirene Markenscoff-Papadimitriou
- Department of Psychiatry, Langley Porter Psychiatric Institute, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Sean Whalen
- Gladstone Institutes, San Francisco, CA, USA
| | | | | | - Fadya Binyameen
- Department of Psychiatry, Langley Porter Psychiatric Institute, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Tomasz J Nowakowski
- Department of Psychiatry, Langley Porter Psychiatric Institute, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA; Chan-Zuckerberg Biohub, San Francisco, CA, USA; Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA
| | - Arnold R Kriegstein
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Stephan J Sanders
- Department of Psychiatry, Langley Porter Psychiatric Institute, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Matthew W State
- Department of Psychiatry, Langley Porter Psychiatric Institute, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA, USA; Chan-Zuckerberg Biohub, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA; Quantitative Biology Institute, University of California, San Francisco, San Francisco, CA, USA.
| | - John L Rubenstein
- Department of Psychiatry, Langley Porter Psychiatric Institute, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
42
|
Supervised enhancer prediction with epigenetic pattern recognition and targeted validation. Nat Methods 2020; 17:807-814. [PMID: 32737473 PMCID: PMC8073243 DOI: 10.1038/s41592-020-0907-8] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Accepted: 06/18/2020] [Indexed: 12/20/2022]
Abstract
Enhancers are important noncoding elements, but they have been traditionally hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mouse and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription-factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model effectively discriminating between enhancers and promoters.
Collapse
|
43
|
Akerberg BN, Pu WT. Genetic and Epigenetic Control of Heart Development. Cold Spring Harb Perspect Biol 2020; 12:cshperspect.a036756. [PMID: 31818853 DOI: 10.1101/cshperspect.a036756] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A transcriptional program implemented by transcription factors and epigenetic regulators governs cardiac development and disease. Mutations in these factors are important causes of congenital heart disease. Here, we review selected recent advances in our understanding of the transcriptional and epigenetic control of heart development, including determinants of cardiac transcription factor chromatin occupancy, the gene regulatory network that regulates atrial septation, the chromatin landscape and cardiac gene regulation, and the role of Brg/Brahma-associated factor (BAF), nucleosome remodeling and histone deacetylation (NuRD), and Polycomb epigenetic regulatory complexes in heart development.
Collapse
Affiliation(s)
- Brynn N Akerberg
- Department of Cardiology, Boston Children's Hospital, Boston, Massachusetts 02115, USA
| | - William T Pu
- Department of Cardiology, Boston Children's Hospital, Boston, Massachusetts 02115, USA.,Harvard Stem Cell Institute, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
44
|
Xu T, Zheng X, Li B, Jin P, Qin Z, Wu H. A comprehensive review of computational prediction of genome-wide features. Brief Bioinform 2020; 21:120-134. [PMID: 30462144 PMCID: PMC10233247 DOI: 10.1093/bib/bby110] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 10/15/2018] [Accepted: 10/16/2018] [Indexed: 12/15/2022] Open
Abstract
There are significant correlations among different types of genetic, genomic and epigenomic features within the genome. These correlations make the in silico feature prediction possible through statistical or machine learning models. With the accumulation of a vast amount of high-throughput data, feature prediction has gained significant interest lately, and a plethora of papers have been published in the past few years. Here we provide a comprehensive review on these published works, categorized by the prediction targets, including protein binding site, enhancer, DNA methylation, chromatin structure and gene expression. We also provide discussions on some important points and possible future directions.
Collapse
Affiliation(s)
- Tianlei Xu
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| |
Collapse
|
45
|
Nguyen QH, Nguyen-Vo TH, Le NQK, Do TTT, Rahardja S, Nguyen BP. iEnhancer-ECNN: identifying enhancers and their strength using ensembles of convolutional neural networks. BMC Genomics 2019; 20:951. [PMID: 31874637 PMCID: PMC6929481 DOI: 10.1186/s12864-019-6336-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and k-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.'s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance. RESULTS Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews's correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively. CONCLUSIONS iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.
Collapse
Affiliation(s)
- Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Thanh-Hoang Nguyen-Vo
- School of Mathematics and Statistics, Victoria University of Wellington, Gate 7, Kelburn Parade, Wellington, 6142, New Zealand
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Keelung Road, Da'an Distric, Taipei City, 106, Taiwan (R.O.C.)
| | - Trang T T Do
- Institute of Research and Development, Duy Tan University, Danang 550000, Vietnam
| | - Susanto Rahardja
- School of Marine Science and Technology, Northwestern Polytechnical University, 127 West Youyi Road, Xi'an 710072, China.
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Gate 7, Kelburn Parade, Wellington, 6142, New Zealand.
| |
Collapse
|
46
|
Liu B, Li K, Huang DS, Chou KC. iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics 2019; 34:3835-3842. [PMID: 29878118 DOI: 10.1093/bioinformatics/bty458] [Citation(s) in RCA: 137] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 06/06/2018] [Indexed: 11/14/2022] Open
Abstract
Motivation Identification of enhancers and their strength is important because they play a critical role in controlling gene expression. Although some bioinformatics tools were developed, they are limited in discriminating enhancers from non-enhancers only. Recently, a two-layer predictor called 'iEnhancer-2L' was developed that can be used to predict the enhancer's strength as well. However, its prediction quality needs further improvement to enhance the practical application value. Results A new predictor called 'iEnhancer-EL' was proposed that contains two layer predictors: the first one (for identifying enhancers) is formed by fusing an array of six key individual classifiers, and the second one (for their strength) formed by fusing an array of ten key individual classifiers. All these key classifiers were selected from 171 elementary classifiers formed by SVM (Support Vector Machine) based on kmer, subsequence profile and PseKNC (Pseudo K-tuple Nucleotide Composition), respectively. Rigorous cross-validations have indicated that the proposed predictor is remarkably superior to the existing state-of-the-art one in this area. Availability and implementation A web server for the iEnhancer-EL has been established at http://bioinformatics.hitsz.edu.cn/iEnhancer-EL/, by which users can easily get their desired results without the need to go through the mathematical details. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Gordon Life Science Institute, Belmont, MA, USA
| | - Kai Li
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA, USA.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
47
|
Kreimer A, Yan Z, Ahituv N, Yosef N. Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types. Hum Mutat 2019; 40:1299-1313. [PMID: 31131957 PMCID: PMC6771677 DOI: 10.1002/humu.23820] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 05/18/2019] [Accepted: 05/24/2019] [Indexed: 01/01/2023]
Abstract
Deciphering the potential of noncoding loci to influence gene regulation has been the subject of intense research, with important implications in understanding genetic underpinnings of human diseases. Massively parallel reporter assays (MPRAs) can measure regulatory activity of thousands of DNA sequences and their variants in a single experiment. With increasing number of publically available MPRA data sets, one can now develop data-driven models which, given a DNA sequence, predict its regulatory activity. Here, we performed a comprehensive meta-analysis of several MPRA data sets in a variety of cellular contexts. We first applied an ensemble of methods to predict MPRA output in each context and observed that the most predictive features are consistent across data sets. We then demonstrate that predictive models trained in one cellular context can be used to predict MPRA output in another, with loss of accuracy attributed to cell-type-specific features. Finally, we show that our approach achieves top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" Challenge for predicting effects of single-nucleotide variants. Overall, our analysis provides insights into how MPRA data can be leveraged to highlight functional regulatory regions throughout the genome and can guide effective design of future experiments by better prioritizing regions of interest.
Collapse
Affiliation(s)
- Anat Kreimer
- Department of Electrical Engineering and Computer Sciences, Center for Computational BiologyUniversity of CaliforniaBerkeleyCalifornia
- Department of Bioengineering and Therapeutic SciencesUniversity of California, San FranciscoSan FranciscoCalifornia
| | - Zhongxia Yan
- Department of Electrical Engineering and Computer Sciences, Center for Computational BiologyUniversity of CaliforniaBerkeleyCalifornia
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic SciencesUniversity of California, San FranciscoSan FranciscoCalifornia
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences, Center for Computational BiologyUniversity of CaliforniaBerkeleyCalifornia
- Ragon Institute of MGH MIT and HarvardCambridgeMassachusetts
- Chan Zuckerberg BiohubSan FranciscoCalifornia
| |
Collapse
|
48
|
Enhancer prediction with histone modification marks using a hybrid neural network model. Methods 2019; 166:48-56. [DOI: 10.1016/j.ymeth.2019.03.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 02/28/2019] [Accepted: 03/16/2019] [Indexed: 01/19/2023] Open
|
49
|
Perenthaler E, Yousefi S, Niggl E, Barakat TS. Beyond the Exome: The Non-coding Genome and Enhancers in Neurodevelopmental Disorders and Malformations of Cortical Development. Front Cell Neurosci 2019; 13:352. [PMID: 31417368 PMCID: PMC6685065 DOI: 10.3389/fncel.2019.00352] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 07/16/2019] [Indexed: 12/22/2022] Open
Abstract
The development of the human cerebral cortex is a complex and dynamic process, in which neural stem cell proliferation, neuronal migration, and post-migratory neuronal organization need to occur in a well-organized fashion. Alterations at any of these crucial stages can result in malformations of cortical development (MCDs), a group of genetically heterogeneous neurodevelopmental disorders that present with developmental delay, intellectual disability and epilepsy. Recent progress in genetic technologies, such as next generation sequencing, most often focusing on all protein-coding exons (e.g., whole exome sequencing), allowed the discovery of more than a 100 genes associated with various types of MCDs. Although this has considerably increased the diagnostic yield, most MCD cases remain unexplained. As Whole Exome Sequencing investigates only a minor part of the human genome (1-2%), it is likely that patients, in which no disease-causing mutation has been identified, could harbor mutations in genomic regions beyond the exome. Even though functional annotation of non-coding regions is still lagging behind that of protein-coding genes, tremendous progress has been made in the field of gene regulation. One group of non-coding regulatory regions are enhancers, which can be distantly located upstream or downstream of genes and which can mediate temporal and tissue-specific transcriptional control via long-distance interactions with promoter regions. Although some examples exist in literature that link alterations of enhancers to genetic disorders, a widespread appreciation of the putative roles of these sequences in MCDs is still lacking. Here, we summarize the current state of knowledge on cis-regulatory regions and discuss novel technologies such as massively-parallel reporter assay systems, CRISPR-Cas9-based screens and computational approaches that help to further elucidate the emerging role of the non-coding genome in disease. Moreover, we discuss existing literature on mutations or copy number alterations of regulatory regions involved in brain development. We foresee that the future implementation of the knowledge obtained through ongoing gene regulation studies will benefit patients and will provide an explanation to part of the missing heritability of MCDs and other genetic disorders.
Collapse
Affiliation(s)
| | | | | | - Tahsin Stefan Barakat
- Department of Clinical Genetics, Erasmus MC – University Medical Center, Rotterdam, Netherlands
| |
Collapse
|
50
|
Caporale AL, Gonda CM, Franchini LF. Transcriptional Enhancers in the FOXP2 Locus Underwent Accelerated Evolution in the Human Lineage. Mol Biol Evol 2019; 36:2432-2450. [PMID: 31359064 DOI: 10.1093/molbev/msz173] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 04/26/2019] [Accepted: 07/16/2019] [Indexed: 12/11/2022] Open
Abstract
Unique human features such as complex language are the result of molecular evolutionary changes that modified developmental programs of our brain. The human-specific evolution of the forkhead box P2 (FOXP2) gene coding region has been linked to the emergence of speech and language in the human kind. However, little is known about how the expression of FOXP2 is regulated and if its regulatory machinery evolved in a lineage-specific manner in humans. In order to identify FOXP2 regulatory regions containing human-specific changes we used databases of human accelerated non-coding sequences or HARs. We found that the topologically associating domain (TAD) determined using developing human cerebral cortex containing the FOXP2 locus includes two clusters of 12 HARs, placing the locus occupied by FOXP2 among the top regions showing fast acceleration rates in non-coding regions in the human genome. Using in vivo enhancer assays in zebrafish, we found that at least five FOXP2-HARs behave as transcriptional enhancers throughout different developmental stages. In addition, we found that at least two FOXP2-HARs direct the expression of the reporter gene EGFP to foxP2 expressing regions and cells. Moreover, we uncovered two FOXP2-HARs showing reporter expression gain of function in the nervous system when compared with the chimpanzee ortholog sequences. Our results indicate that regulatory sequences in the FOXP2 locus underwent a human-specific evolutionary process suggesting that the transcriptional machinery controlling this gene could have also evolved differentially in the human lineage.
Collapse
Affiliation(s)
- Alfredo Leandro Caporale
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Catalina M Gonda
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Lucía Florencia Franchini
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|