1
|
Kuldell JC, Kaplan CD. RNA Polymerase II Activity Control of Gene Expression and Involvement in Disease. J Mol Biol 2025; 437:168770. [PMID: 39214283 DOI: 10.1016/j.jmb.2024.168770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 08/26/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024]
Abstract
Gene expression is dependent on RNA Polymerase II (Pol II) activity in eukaryotes. In addition to determining the rate of RNA synthesis for all protein coding genes, Pol II serves as a platform for the recruitment of factors and regulation of co-transcriptional events, from RNA processing to chromatin modification and remodeling. The transcriptome can be shaped by changes in Pol II kinetics affecting RNA synthesis itself or because of alterations to co-transcriptional events that are responsive to or coupled with transcription. Genetic, biochemical, and structural approaches to Pol II in model organisms have revealed critical insights into how Pol II works and the types of factors that regulate it. The complexity of Pol II regulation generally increases with organismal complexity. In this review, we describe fundamental aspects of how Pol II activity can shape gene expression, discuss recent advances in how Pol II elongation is regulated on genes, and how altered Pol II function is linked to human disease and aging.
Collapse
Affiliation(s)
- James C Kuldell
- Department of Biological Sciences, 202A LSA, Fifth and Ruskin Avenues, University of Pittsburgh, Pittsburgh PA 15260, United States
| | - Craig D Kaplan
- Department of Biological Sciences, 202A LSA, Fifth and Ruskin Avenues, University of Pittsburgh, Pittsburgh PA 15260, United States.
| |
Collapse
|
2
|
Hosseini SH, Roussel MR. Analytic delay distributions for a family of gene transcription models. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:6225-6262. [PMID: 39176425 DOI: 10.3934/mbe.2024273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
Models intended to describe the time evolution of a gene network must somehow include transcription, the DNA-templated synthesis of RNA, and translation, the RNA-templated synthesis of proteins. In eukaryotes, the DNA template for transcription can be very long, often consisting of tens of thousands of nucleotides, and lengthy pauses may punctuate this process. Accordingly, transcription can last for many minutes, in some cases hours. There is a long history of introducing delays in gene expression models to take the transcription and translation times into account. Here we study a family of detailed transcription models that includes initiation, elongation, and termination reactions. We establish a framework for computing the distribution of transcription times, and work out these distributions for some typical cases. For elongation, a fixed delay is a good model provided elongation is fast compared to initiation and termination, and there are no sites where long pauses occur. The initiation and termination phases of the model then generate a nontrivial delay distribution, and elongation shifts this distribution by an amount corresponding to the elongation delay. When initiation and termination are relatively fast, the distribution of elongation times can be approximated by a Gaussian. A convolution of this Gaussian with the initiation and termination time distributions gives another analytic approximation to the transcription time distribution. If there are long pauses during elongation, because of the modularity of the family of models considered, the elongation phase can be partitioned into reactions generating a simple delay (elongation through regions where there are no long pauses), and reactions whose distribution of waiting times must be considered explicitly (initiation, termination, and motion through regions where long pauses are likely). In these cases, the distribution of transcription times again involves a nontrivial part and a shift due to fast elongation processes.
Collapse
Affiliation(s)
- S Hossein Hosseini
- Alberta RNA Research and Training Institute, Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada
| | - Marc R Roussel
- Alberta RNA Research and Training Institute, Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada
| |
Collapse
|
3
|
Ren L, Ma W, Wang Y. Predicting RNA polymerase II transcriptional elongation pausing and associated histone code. Brief Bioinform 2024; 25:bbae246. [PMID: 38783706 PMCID: PMC11116834 DOI: 10.1093/bib/bbae246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/12/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
RNA Polymerase II (Pol II) transcriptional elongation pausing is an integral part of the dynamic regulation of gene transcription in the genome of metazoans. It plays a pivotal role in many vital biological processes and disease progression. However, experimentally measuring genome-wide Pol II pausing is technically challenging and the precise governing mechanism underlying this process is not fully understood. Here, we develop RP3 (RNA Polymerase II Pausing Prediction), a network regularized logistic regression machine learning method, to predict Pol II pausing events by integrating genome sequence, histone modification, gene expression, chromatin accessibility, and protein-protein interaction data. RP3 can accurately predict Pol II pausing in diverse cellular contexts and unveil the transcription factors that are associated with the Pol II pausing machinery. Furthermore, we utilize a forward feature selection framework to systematically identify the combination of histone modification signals associated with Pol II pausing. RP3 is freely available at https://github.com/AMSSwanglab/RP3.
Collapse
Affiliation(s)
- Lixin Ren
- School of Mathematics and Physics, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, China
| | - Wanbiao Ma
- School of Mathematics and Physics, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, China
| | - Yong Wang
- CEMS, NCMIS, MDIS, Academy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, 55 Zhongguancun East Road, Haidian District, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, 19A Yuquan Road, Shijingshan District, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, 32 Jiaochang Donglu, Wuhua District, Kunming 650223, China
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 1 Xiangshan Zhi Nong, West Lake District, Hangzhou 330106, China
| |
Collapse
|
4
|
Liu L, Zhao Y, Siepel A. DNA-sequence and epigenomic determinants of local rates of transcription elongation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572932. [PMID: 38187771 PMCID: PMC10769381 DOI: 10.1101/2023.12.21.572932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Across all branches of life, transcription elongation is a crucial, regulated phase in gene expression. Many recent studies in eukaryotes have focused on the regulation of promoter-proximal pausing of RNA Polymerase II (Pol II), but rates of productive elongation also vary substantially throughout the gene body, both within and across genes. Here, we introduce a probabilistic model for systematically evaluating potential determinants of the local elongation rate based on nascent RNA sequencing (NRS) data. Our model is derived from a unified model for both the kinetics of Pol II movement along the DNA template and the generation of NRS read counts at steady state. It allows for a continuously variable elongation rate along the gene body, with the rate at each nucleotide defined by a generalized linear relationship with nearby genomic and epigenomic features. High-dimensional feature vectors are accommodated through a sparse-regression extension. We show with simulations that the model allows accurate detection of associated features and accurate prediction of local elongation rates. In an analysis of public PRO-seq and epigenomic data, we identify several features that are strongly associated with reductions in the local elongation rate, including DNA methylation, splice sites, RNA stem-loops, CTCF binding sites, and several histone marks, including H3K36me3 and H4K20me1. By contrast, low-complexity sequences and H3K79me2 marks are associated with increases in elongation rate. In an analysis of DNA k -mers, we find that cytosine nucleotides are strongly associated with reductions in local elongation rate, particularly when preceded by guanines and followed by adenines or thymines. Increases in elongation rate are associated with thymines and A+T-rich k -mers. These associations are generally shared across cell types, and by considering them our model is effective at predicting features of held-out PRO-seq data. Overall, our analysis is the first to permit genome-wide predictions of relative nucleotide-specific elongation rates based on complex sets of genomic and epigenomic covariates. We have made predictions available for the K562, CD14+, MCF-7, and HeLa-S3 cell types in a UCSC Genome Browser track.
Collapse
Affiliation(s)
- Lingjie Liu
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
- Graduate Program in Genetics, Stony Brook University, Stony Brook, NY
| | - Yixin Zhao
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
- Graduate Program in Genetics, Stony Brook University, Stony Brook, NY
| |
Collapse
|
5
|
Shi C, Yang X, Zhang J, Zhou T. Stochastic modeling of the mRNA life process: A generalized master equation. Biophys J 2023; 122:4023-4041. [PMID: 37653725 PMCID: PMC10598292 DOI: 10.1016/j.bpj.2023.08.024] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 06/29/2023] [Accepted: 08/29/2023] [Indexed: 09/02/2023] Open
Abstract
The mRNA life cycle is a complex biochemical process, involving transcription initiation, elongation, termination, splicing, and degradation. Each of these molecular events is multistep and can create a memory. The effect of this molecular memory on gene expression is not clear, although there are many related yet scattered experimental reports. To address this important issue, we develop a general theoretical framework formulated as a master equation in the sense of queue theory, which can reduce to multiple previously studied gene models in limiting cases. This framework allows us to interpret experimental observations, extract kinetic parameters from experimental data, and identify how the mRNA kinetics vary under regulatory influences. Notably, it allows us to evaluate the influences of elongation processes on mature RNA distribution; e.g., we find that the non-exponential elongation time can induce the bimodal mRNA expression and there is an optimal elongation noise intensity such that the mature RNA noise achieves the lowest level. In a word, our framework can not only provide insight into complex mRNA life processes but also bridge a dialogue between theoretical studies and experimental data.
Collapse
Affiliation(s)
- Changhong Shi
- State Key Laboratory of Respiratory Disease, School of Public Health, Guangzhou Medical University, Guangzhou, China
| | - Xiyan Yang
- School of Financial Mathematics and Statistics, Guangdong University of Finance, Guangzhou, China
| | - Jiajun Zhang
- School of Mathematics and Computational Science and Guangdong Province Key Laboratory of Computational Science, Sun Yat-Sen University, Guangzhou, China.
| | - Tianshou Zhou
- School of Mathematics and Computational Science and Guangdong Province Key Laboratory of Computational Science, Sun Yat-Sen University, Guangzhou, China.
| |
Collapse
|
6
|
Zhang T, Li L, Sun H, Xu D, Wang G. DeepICSH: a complex deep learning framework for identifying cell-specific silencers and their strength from the human genome. Brief Bioinform 2023; 24:bbad316. [PMID: 37643374 DOI: 10.1093/bib/bbad316] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 07/25/2023] [Accepted: 08/11/2023] [Indexed: 08/31/2023] Open
Abstract
Silencers are noncoding DNA sequence fragments located on the genome that suppress gene expression. The variation of silencers in specific cells is closely related to gene expression and cancer development. Computational approaches that exclusively rely on DNA sequence information for silencer identification fail to account for the cell specificity of silencers, resulting in diminished accuracy. Despite the discovery of several transcription factors and epigenetic modifications associated with silencers on the genome, there is still no definitive biological signal or combination thereof to fully characterize silencers, posing challenges in selecting suitable biological signals for their identification. Therefore, we propose a sophisticated deep learning framework called DeepICSH, which is based on multiple biological data sources. Specifically, DeepICSH leverages a deep convolutional neural network to automatically capture biologically relevant signal combinations strongly associated with silencers, originating from a diverse array of biological signals. Furthermore, the utilization of attention mechanisms facilitates the scoring and visualization of these signal combinations, whereas the employment of skip connections facilitates the fusion of multilevel sequence features and signal combinations, thereby empowering the accurate identification of silencers within specific cells. Extensive experiments on HepG2 and K562 cell line data sets demonstrate that DeepICSH outperforms state-of-the-art methods in silencer identification. Notably, we introduce for the first time a deep learning framework based on multi-omics data for classifying strong and weak silencers, achieving favorable performance. In conclusion, DeepICSH shows great promise for advancing the study and analysis of silencers in complex diseases. The source code is available at https://github.com/lyli1013/DeepICSH.
Collapse
Affiliation(s)
- Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Liangyu Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Hailong Sun
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Dali Xu
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| |
Collapse
|
7
|
Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review. BIOLOGY 2023; 12:1033. [PMID: 37508462 PMCID: PMC10376273 DOI: 10.3390/biology12071033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/18/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
Collapse
Affiliation(s)
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea;
| |
Collapse
|
8
|
Akcan TS, Vilov S, Heinig M. Predictive model of transcriptional elongation control identifies trans regulatory factors from chromatin signatures. Nucleic Acids Res 2023; 51:1608-1624. [PMID: 36727445 PMCID: PMC9976927 DOI: 10.1093/nar/gkac1272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 12/09/2022] [Accepted: 01/12/2023] [Indexed: 02/03/2023] Open
Abstract
Promoter-proximal Polymerase II (Pol II) pausing is a key rate-limiting step for gene expression. DNA and RNA-binding trans-acting factors regulating the extent of pausing have been identified. However, we lack a quantitative model of how interactions of these factors determine pausing, therefore the relative importance of implicated factors is unknown. Moreover, previously unknown regulators might exist. Here we address this gap with a machine learning model that accurately predicts the extent of promoter-proximal Pol II pausing from large-scale genome and transcriptome binding maps and gene annotation and sequence composition features. We demonstrate high accuracy and generalizability of the model by validation on an independent cell line which reveals the model's cell line agnostic character. Model interpretation in light of prior knowledge about molecular functions of regulatory factors confirms the interconnection of pausing with other RNA processing steps. Harnessing underlying feature contributions, we assess the relative importance of each factor, quantify their predictive effects and systematically identify previously unknown regulators of pausing. We additionally identify 16 previously unknown 7SK ncRNA interacting RNA-binding proteins predictive of pausing. Our work provides a framework to further our understanding of the regulation of the critical early steps in transcriptional elongation.
Collapse
Affiliation(s)
- Toray S Akcan
- Institute of Computational Biology, Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.,Department of Computer Science, TUM School of Computation, Information and Technology, Technical University Munich, Munich, Germany
| | - Sergey Vilov
- Institute of Computational Biology, Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Matthias Heinig
- Institute of Computational Biology, Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.,Department of Computer Science, TUM School of Computation, Information and Technology, Technical University Munich, Munich, Germany.,DZHK (German Centre for Cardiovascular Research), Munich Heart Association, Partner Site Munich, 10785 Berlin, Germany
| |
Collapse
|
9
|
Predicting CRISPR/Cas9 Repair Outcomes by Attention-Based Deep Learning Framework. Cells 2022; 11:cells11111847. [PMID: 35681543 PMCID: PMC9180579 DOI: 10.3390/cells11111847] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/24/2022] [Accepted: 06/02/2022] [Indexed: 02/01/2023] Open
Abstract
As a simple and programmable nuclease-based genome editing tool, the CRISPR/Cas9 system has been widely used in target-gene repair and gene-expression regulation. The DNA mutation generated by CRISPR/Cas9-mediated double-strand breaks determines its biological and phenotypic effects. Experiments have demonstrated that CRISPR/Cas9-generated cellular-repair outcomes depend on local sequence features. Therefore, the repair outcomes after DNA break can be predicted by sequences near the cleavage sites. However, existing prediction methods rely on manually constructed features or insufficiently detailed prediction labels. They cannot satisfy clinical-level-prediction accuracy, which limit the performance of these models to existing knowledge about CRISPR/Cas9 editing. We predict 557 repair labels of DNA, covering the vast majority of Cas9-generated mutational outcomes, and build a deep learning model called Apindel, to predict CRISPR/Cas9 editing outcomes. Apindel, automatically, trains the sequence features of DNA with the GloVe model, introduces location information through Positional Encoding (PE), and embeds the trained-word vector matrixes into a deep learning model, containing BiLSTM and the Attention mechanism. Apindel has better performance and more detailed prediction categories than the most advanced DNA-mutation-predicting models. It, also, reveals that nucleotides at different positions relative to the cleavage sites have different influences on CRISPR/Cas9 editing outcomes.
Collapse
|
10
|
Jeronimo C, Robert F. The histone chaperone FACT: a guardian of chromatin structure integrity. Transcription 2022; 13:16-38. [PMID: 35485711 PMCID: PMC9467567 DOI: 10.1080/21541264.2022.2069995] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The identification of FACT as a histone chaperone enabling transcription through chromatin in vitro has strongly shaped how its roles are envisioned. However, FACT has been implicated in essentially all aspects of chromatin biology, from transcription to DNA replication, DNA repair, and chromosome segregation. In this review, we focus on recent literature describing the role and mechanisms of FACT during transcription. We highlight the prime importance of FACT in preserving chromatin integrity during transcription and challenge its role as an elongation factor. We also review evidence for FACT's role as a cell-type/gene-specificregulator of gene expression and briefly summarize current efforts at using FACT inhibition as an anti-cancerstrategy.
Collapse
Affiliation(s)
- Célia Jeronimo
- Institut de recherches cliniques de Montréal, Montréal, Québec, Canada
| | - François Robert
- Institut de recherches cliniques de Montréal, Montréal, Québec, Canada.,Département de Médecine, Faculté de Médecine, Université de Montréal, Montréal, Québec, Canada.,Faculty of Medicine, Division of Experimental Medicine, McGill University, Montréal, Québec, Canada
| |
Collapse
|
11
|
Zhu L, Li W. Roles of Physicochemical and Structural Properties of RNA-Binding Proteins in Predicting the Activities of Trans-Acting Splicing Factors with Machine Learning. Int J Mol Sci 2022; 23:ijms23084426. [PMID: 35457243 PMCID: PMC9030803 DOI: 10.3390/ijms23084426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/13/2022] [Accepted: 04/14/2022] [Indexed: 02/06/2023] Open
Abstract
Trans-acting splicing factors play a pivotal role in modulating alternative splicing by specifically binding to cis-elements in pre-mRNAs. There are approximately 1500 RNA-binding proteins (RBPs) in the human genome, but the activities of these RBPs in alternative splicing are unknown. Since determining RBP activities through experimental methods is expensive and time consuming, the development of an efficient computational method for predicting the activities of RBPs in alternative splicing from their sequences is of great practical importance. Recently, a machine learning model for predicting the activities of splicing factors was built based on features of single and dual amino acid compositions. Here, we explored the role of physicochemical and structural properties in predicting their activities in alternative splicing using machine learning approaches and found that the prediction performance is significantly improved by including these properties. By combining the minimum redundancy–maximum relevance (mRMR) method and forward feature searching strategy, a promising feature subset with 24 features was obtained to predict the activities of RBPs. The feature subset consists of 16 dual amino acid compositions, 5 physicochemical features, and 3 structural features. The physicochemical and structural properties were as important as the sequence composition features for an accurate prediction of the activities of splicing factors. The hydrophobicity and distribution of coil are suggested to be the key physicochemical and structural features, respectively.
Collapse
Affiliation(s)
| | - Wenjin Li
- Correspondence: ; Tel.: +86-0755-26942336
| |
Collapse
|
12
|
Huminiecki Ł. Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science. ENTROPY (BASEL, SWITZERLAND) 2021; 24:17. [PMID: 35052043 PMCID: PMC8774939 DOI: 10.3390/e24010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel's concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program.
Collapse
Affiliation(s)
- Łukasz Huminiecki
- Evolutionary, Computational, and Statistical Genetics, Department of Molecula Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Postępu 36A, Jastrzębiec, 05-552 Warsaw, Poland
| |
Collapse
|