1
|
Hsiao YC, Dutta A. Network Modeling and Control of Dynamic Disease Pathways, Review and Perspectives. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1211-1230. [PMID: 38498762 DOI: 10.1109/tcbb.2024.3378155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Dynamic disease pathways are a combination of complex dynamical processes among bio-molecules in a cell that leads to diseases. Network modeling of disease pathways considers disease-related bio-molecules (e.g. DNA, RNA, transcription factors, enzymes, proteins, and metabolites) and their interaction (e.g. DNA methylation, histone modification, alternative splicing, and protein modification) to study disease progression and predict therapeutic responses. These bio-molecules and their interactions are the basic elements in the study of the misregulation in the disease-related gene expression that lead to abnormal cellular responses. Gene regulatory networks, cell signaling networks, and metabolic networks are the three major types of intracellular networks for the study of the cellular responses elicited from extracellular signals. The disease-related cellular responses can be prevented or regulated by designing control strategies to manipulate these extracellular or other intracellular signals. The paper reviews the regulatory mechanisms, the dynamic models, and the control strategies for each intracellular network. The applications, limitations and the prospective for modeling and control are also discussed.
Collapse
|
2
|
Feng H, Wu L, Zhao B, Huff C, Zhang J, Wu J, Lin L, Wei P, Wu C. Benchmarking DNA Foundation Models for Genomic Sequence Classification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.16.608288. [PMID: 39185205 PMCID: PMC11343214 DOI: 10.1101/2024.08.16.608288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
The rapid advancement of DNA foundation language models has revolutionized the field of genomics, enabling the decoding of complex patterns and regulatory mechanisms within DNA sequences. However, the current evaluation of these models often relies on fine-tuning and limited datasets, which introduces biases and limits the assessment of their true potential. Here, we present a benchmarking study of three recent DNA foundation language models, including DNABERT-2, Nucleotide Transformer version-2 (NT-v2), and HyenaDNA, focusing on the quality of their zero-shot embeddings across a diverse range of genomic tasks and species through analyses of 57 real datasets. We found that DNABERT-2 exhibits the most consistent performance across human genome-related tasks, while NT-v2 excels in epigenetic modification detection. HyenaDNA stands out for its exceptional runtime scalability and ability to handle long input sequences. Importantly, we demonstrate that using mean token embedding consistently improves the performance of all three models compared to the default setting of sentence-level summary token embedding, with average AUC improvements ranging from 4.3% to 9.7% for different DNA foundation models. Furthermore, the performance differences between these models are significantly reduced when using mean token embedding. Our findings provide a framework for selecting and optimizing DNA language models, guiding researchers in applying these tools effectively in genomic studies.
Collapse
Affiliation(s)
- Haonan Feng
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, 96813, USA
| | - Bingxin Zhao
- Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Chad Huff
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jianjun Zhang
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jia Wu
- Department of Imaging Physics, Division of Diagnostic Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Lifeng Lin
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ, 85724, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
- Institute for Data Science in Oncology, The UT MD Anderson Cancer Center, Houston, TX, 77030, USA
| |
Collapse
|
3
|
Liu L, Yahaya BS, Li J, Wu F. Enigmatic role of auxin response factors in plant growth and stress tolerance. FRONTIERS IN PLANT SCIENCE 2024; 15:1398818. [PMID: 38903418 PMCID: PMC11188990 DOI: 10.3389/fpls.2024.1398818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Accepted: 05/23/2024] [Indexed: 06/22/2024]
Abstract
Abiotic and biotic stresses globally constrain plant growth and impede the optimization of crop productivity. The phytohormone auxin is involved in nearly every aspect of plant development. Auxin acts as a chemical messenger that influences gene expression through a short nuclear pathway, mediated by a family of specific DNA-binding transcription factors known as Auxin Response Factors (ARFs). ARFs thus act as effectors of auxin response and translate chemical signals into the regulation of auxin responsive genes. Since the initial discovery of the first ARF in Arabidopsis, advancements in genetics, biochemistry, genomics, and structural biology have facilitated the development of models elucidating ARF action and their contributions to generating specific auxin responses. Yet, significant gaps persist in our understanding of ARF transcription factors despite these endeavors. Unraveling the functional roles of ARFs in regulating stress response, alongside elucidating their genetic and molecular mechanisms, is still in its nascent phase. Here, we review recent research outcomes on ARFs, detailing their involvement in regulating leaf, flower, and root organogenesis and development, as well as stress responses and their corresponding regulatory mechanisms: including gene expression patterns, functional characterization, transcriptional, post-transcriptional and post- translational regulation across diverse stress conditions. Furthermore, we delineate unresolved questions and forthcoming challenges in ARF research.
Collapse
Affiliation(s)
- Ling Liu
- Faculty of Agriculture, Forestry and Food Engineering, Yibin University, Yibin, Sichuan, China
| | - Baba Salifu Yahaya
- Maize Research Institute, Sichuan Agricultural University, Wenjiang, Sichuan, China
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Wenjiang, Sichuan, China
| | - Jing Li
- Maize Research Institute, Sichuan Agricultural University, Wenjiang, Sichuan, China
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Wenjiang, Sichuan, China
| | - Fengkai Wu
- Maize Research Institute, Sichuan Agricultural University, Wenjiang, Sichuan, China
- Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Wenjiang, Sichuan, China
| |
Collapse
|
4
|
Shao KM, Shao WH. Transcription Factors in the Pathogenesis of Lupus Nephritis and Their Targeted Therapy. Int J Mol Sci 2024; 25:1084. [PMID: 38256157 PMCID: PMC10816397 DOI: 10.3390/ijms25021084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 01/07/2024] [Accepted: 01/09/2024] [Indexed: 01/24/2024] Open
Abstract
Systemic lupus erythematosus (SLE) is a prototype inflammatory autoimmune disease, characterized by breakdown of immunotolerance to self-antigens. Renal involvement, known as lupus nephritis (LN), is one of the leading causes of morbidity and a significant contributor to mortality in SLE. Despite current pathophysiological advances, further studies are needed to fully understand complex mechanisms underlying the development and progression of LN. Transcription factors (TFs) are proteins that regulate the expression of genes and play a crucial role in the development and progression of LN. The mechanisms of TF promoting or inhibiting gene expression are complex, and studies have just begun to reveal the pathological roles of TFs in LN. Understanding TFs in the pathogenesis of LN can provide valuable insights into this disease's mechanisms and potentially lead to the development of targeted therapies for its management. This review will focus on recent findings on TFs in the pathogenesis of LN and newly developed TF-targeted therapy in renal inflammation.
Collapse
Affiliation(s)
- Kasey M. Shao
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Wen-Hai Shao
- Division of Rheumatology, Allergy and Immunology, Department of Internal Medicine, College of Medicine, University of Cincinnati, Cincinnati, OH 45267, USA
| |
Collapse
|
5
|
Yuan HY, Kagale S, Ferrie AMR. Multifaceted roles of transcription factors during plant embryogenesis. FRONTIERS IN PLANT SCIENCE 2024; 14:1322728. [PMID: 38235196 PMCID: PMC10791896 DOI: 10.3389/fpls.2023.1322728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/11/2023] [Indexed: 01/19/2024]
Abstract
Transcription factors (TFs) are diverse groups of regulatory proteins. Through their specific binding domains, TFs bind to their target genes and regulate their expression, therefore TFs play important roles in various growth and developmental processes. Plant embryogenesis is a highly regulated and intricate process during which embryos arise from various sources and undergo development; it can be further divided into zygotic embryogenesis (ZE) and somatic embryogenesis (SE). TFs play a crucial role in the process of plant embryogenesis with a number of them acting as master regulators in both ZE and SE. In this review, we focus on the master TFs involved in embryogenesis such as BABY BOOM (BBM) from the APETALA2/Ethylene-Responsive Factor (AP2/ERF) family, WUSCHEL and WUSCHEL-related homeobox (WOX) from the homeobox family, LEAFY COTYLEDON 2 (LEC2) from the B3 family, AGAMOUS-Like 15 (AGL15) from the MADS family and LEAFY COTYLEDON 1 (LEC1) from the Nuclear Factor Y (NF-Y) family. We aim to present the recent progress pertaining to the diverse roles these master TFs play in both ZE and SE in Arabidopsis, as well as other plant species including crops. We also discuss future perspectives in this context.
Collapse
Affiliation(s)
| | | | - Alison M. R. Ferrie
- Aquatic and Crop Resource Development Research Center, National Research Council Canada, Saskatoon, SK, Canada
| |
Collapse
|
6
|
Liu M, Mai JW, Luo DX, Liu GX, Xu T, Xin WJ, Lin SY, Li ZY. NFATc2-dependent epigenetic downregulation of the TSC2/Beclin-1 pathway is involved in neuropathic pain induced by oxaliplatin. Mol Pain 2023; 19:17448069231158289. [PMID: 36733258 PMCID: PMC9941598 DOI: 10.1177/17448069231158289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 01/16/2023] [Accepted: 01/29/2023] [Indexed: 02/04/2023] Open
Abstract
Neuropathic pain is a common dose-limiting side effect of oxaliplatin, which hampers the effective treatment of tumors. Here, we found that upregulation of transcription factor NFATc2 decreased the expression of Beclin-1, a critical molecule in autophagy, in the spinal dorsal horn, and contributed to neuropathic pain following oxaliplatin treatment. Meanwhile, manipulating autophagy levels by intrathecal injection of rapamycin (RAPA) or 3-methyladenine (3-MA) differentially altered mechanical allodynia in oxaliplatin-treated or naïve rats. Utilizing chromatin immunoprecipitation-sequencing (ChIP-seq) assay combined with bioinformatics analysis, we found that NFATc2 negatively regulated the transcription of tuberous sclerosis complex protein 2 (TSC2), which contributed to the oxaliplatin-induced Beclin-1 downregulation. Further assays revealed that NFATc2 regulated histone H4 acetylation and methylation in the TSC2 promoter site 1 in rats' dorsal horns with oxaliplatin treatment. These results suggested that NFATc2 mediated the epigenetic downregulation of the TSC2/Beclin-1 autophagy pathway and contributed to oxaliplatin-induced mechanical allodynia, which provided a new therapeutic insight for chemotherapy-induced neuropathic pain.
Collapse
Affiliation(s)
- Meng Liu
- Department of Anesthesia and Pain Medicine, Guangzhou First People’s Hospital, Guangzhou, China
| | - Jing-Wen Mai
- Department of Anesthesiology, Huizhou Central People’s Hospital, Huizhou, China
| | - De-Xing Luo
- Department of Anesthesiology, Huizhou Central People’s Hospital, Huizhou, China
| | - Guan-Xi Liu
- The First School of Clinical Medicine, Southern Medical University, Guangzhou, China
- The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou Huiai Hospital, Guangzhou, China
| | - Ting Xu
- Department of Emergency Medicine, The First Affiliated Hospital of Sun Yat-Sen University and Zhongshan Medical School, Sun Yat-Sen University, China
| | - Wen-Jun Xin
- Department of Emergency Medicine, The First Affiliated Hospital of Sun Yat-Sen University and Zhongshan Medical School, Sun Yat-Sen University, China
| | - Su-Yan Lin
- Department of Neurology, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Zhen-Yu Li
- Department of Emergency Medicine, The First Affiliated Hospital of Sun Yat-Sen University and Zhongshan Medical School, Sun Yat-Sen University, China
| |
Collapse
|
7
|
Nguyen Q, Tran HV, Nguyen BP, Do TTT. Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition. ACS OMEGA 2022; 7:32322-32330. [PMID: 36119976 PMCID: PMC9475634 DOI: 10.1021/acsomega.2c03696] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]
Abstract
Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation site and form complexes that allow polymerase enzymes to bind to initiate transcription. Previous studies showed that methylated DNAs had ability to inhibit and prevent TFs from binding to DNA fragments. However, recent studies have found that there were TFs that could bind to methylated DNA fragments. The identification of these TFs is an important steppingstone to a better understanding of cellular gene expression mechanisms. However, as experimental methods are often time-consuming and labor-intensive, developing computational methods is essential. In this study, we propose two machine learning methods for two problems: (1) identifying TFs and (2) identifying TFs that prefer binding to methylated DNA targets (TFPMs). For the TF identification problem, the proposed method uses the position-specific scoring matrix for data representation and a deep convolutional neural network for modeling. This method achieved 90.56% sensitivity, 83.96% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.9596 on an independent test set. For the TFPM identification problem, we propose to use the reduced g-gap dipeptide composition for data representation and the support vector machine algorithm for modeling. This method achieved 82.61% sensitivity, 64.86% specificity, and an AUC of 0.8486 on another independent test set. These results are higher than those of other studies on the same problems.
Collapse
Affiliation(s)
- Quang
H. Nguyen
- School
of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Hoang V. Tran
- School
of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Binh P. Nguyen
- School
of Mathematics and Statistics, Victoria
University of Wellington, Kelburn Parade, Wellington 6140, New Zealand
| | - Trang T. T. Do
- School
of Innovation, Design and Technology, Wellington
Institute of Technology, 21 Kensington Avenue, Lower Hutt 5012, New Zealand
| |
Collapse
|
8
|
New Genomic Signals Underlying the Emergence of Human Proto-Genes. Genes (Basel) 2022; 13:genes13020284. [PMID: 35205330 PMCID: PMC8871994 DOI: 10.3390/genes13020284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 12/04/2022] Open
Abstract
De novo genes are novel genes which emerge from non-coding DNA. Until now, little is known about de novo genes’ properties, correlated to their age and mechanisms of emergence. In this study, we investigate four related properties: introns, upstream regulatory motifs, 5′ Untranslated regions (UTRs) and protein domains, in 23,135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our results suggest that 41% of proto-genes might have captured existing introns, and 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic genes are more enriched in enhancers, even if the TATA motif is most commonly found upstream in these genes. Intergenic and intronic 5′ UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proteins expressed by proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our study demonstrates that introns, 5′ UTRs, and domains have specific properties in proto-genes. We also emphasize that the genomic positions of de novo genes strongly impacts these properties.
Collapse
|
9
|
Yan W, Deng XW, Yang C, Tang X. The Genome-Wide EMS Mutagenesis Bias Correlates With Sequence Context and Chromatin Structure in Rice. FRONTIERS IN PLANT SCIENCE 2021; 12:579675. [PMID: 33841451 PMCID: PMC8025102 DOI: 10.3389/fpls.2021.579675] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 02/17/2021] [Indexed: 06/12/2023]
Abstract
Ethyl methanesulfonate (EMS) is a chemical mutagen believed to mainly induce G/C to A/T transitions randomly in plant genomes. However, mutant screening for phenotypes often gets multiple alleles for one gene but no mutant for other genes. We investigated the potential EMS mutagenesis bias and the possible correlations with sequence context and chromatin structure using the whole genome resequencing data collected from 52 rice EMS mutants. We defined the EMS-induced single nucleotide polymorphic sites (SNPs) and explored the genomic factors associated with EMS mutagenesis bias. Compared with natural SNPs presented in the Rice3K project, EMS showed a preference on G/C sites with flanking sequences also higher in GC contents. The composition of local dinucleotides and trinucleotides was also associated with the efficiency of EMS mutagenesis. The biased distribution of EMS-induced SNPs was positively correlated with CpG numbers, transposable element contents, and repressive epigenetic markers but negatively with gene expression, the euchromatin marker DNase I hypersensitive sites, and active epigenetic markers, suggesting that sequence context and chromatin structure might correlate with the efficiency of EMS mutagenesis. Exploring the genome-wide features of EMS mutagenesis and correlations with epigenetic modifications will help in the understanding of DNA repair mechanism.
Collapse
Affiliation(s)
- Wei Yan
- Guangdong Provincial Key Laboratory of Biotechnology for Plant Development, School of Life Sciences, South China Normal University, Guangzhou, China
- Shenzhen Institute of Molecular Crop Design, Shenzhen, China
| | - Xing Wang Deng
- Shenzhen Institute of Molecular Crop Design, Shenzhen, China
- School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Chengwei Yang
- Guangdong Provincial Key Laboratory of Biotechnology for Plant Development, School of Life Sciences, South China Normal University, Guangzhou, China
| | - Xiaoyan Tang
- Guangdong Provincial Key Laboratory of Biotechnology for Plant Development, School of Life Sciences, South China Normal University, Guangzhou, China
- Shenzhen Institute of Molecular Crop Design, Shenzhen, China
| |
Collapse
|
10
|
Morenikeji OB, Strutton E, Wallace M, Bernard K, Yip E, Thomas BN. Dissecting Transcription Factor-Target Interaction in Bovine Coronavirus Infection. Microorganisms 2020; 8:E1323. [PMID: 32872640 PMCID: PMC7564962 DOI: 10.3390/microorganisms8091323] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 08/27/2020] [Accepted: 08/27/2020] [Indexed: 02/06/2023] Open
Abstract
Coronaviruses are RNA viruses that cause significant disease within many species, including cattle. Bovine coronavirus (BCoV) infects cattle and wild ruminants, both as a respiratory and enteric pathogen, and possesses a significant economic threat to the cattle industry. Transcription factors are proteins that activate or inhibit transcription through DNA binding and have become new targets for disease therapies. This study utilized in silico tools to identify potential transcription factors that can serve as biomarkers for regulation of BCoV pathogenesis in cattle, both for testing and treatment. A total of 11 genes were identified as significantly expressed during BCoV infection through literature searches and functional analyses. Eleven transcription factors were predicted to target those genes (AREB6, YY1, LMO2, C-Rel, NKX2-5, E47, RORAlpha1, HLF, E4BP4, ARNT, CREB). Function, network, and phylogenetic analyses established the significance of many transcription factors within the immune response. This study establishes new information on the transcription factors and genes related to host-pathogen interactome in BCoV infection, particularly transcription factors YY1, AREB6, LMO2, and NKX2, which appear to have strong potential as diagnostic markers, and YY1 as a potential target for drug therapies.
Collapse
Affiliation(s)
- Olanrewaju B. Morenikeji
- Department of Biology, Hamilton College, Clinton, NY 13323, USA; (O.B.M.); (E.S.); (M.W.); (K.B.); (E.Y.)
| | - Ellis Strutton
- Department of Biology, Hamilton College, Clinton, NY 13323, USA; (O.B.M.); (E.S.); (M.W.); (K.B.); (E.Y.)
| | - Madeleine Wallace
- Department of Biology, Hamilton College, Clinton, NY 13323, USA; (O.B.M.); (E.S.); (M.W.); (K.B.); (E.Y.)
| | - Kahleel Bernard
- Department of Biology, Hamilton College, Clinton, NY 13323, USA; (O.B.M.); (E.S.); (M.W.); (K.B.); (E.Y.)
| | - Elaine Yip
- Department of Biology, Hamilton College, Clinton, NY 13323, USA; (O.B.M.); (E.S.); (M.W.); (K.B.); (E.Y.)
| | - Bolaji N. Thomas
- Department of Biomedical Sciences, College of Health Sciences and Technology, Rochester Institute of Technology, Rochester, NY 14623, USA
| |
Collapse
|
11
|
Ru X, Cao P, Li L, Zou Q. Selecting Essential MicroRNAs Using a Novel Voting Method. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:16-23. [PMID: 31479921 PMCID: PMC6727015 DOI: 10.1016/j.omtn.2019.07.019] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 06/20/2019] [Accepted: 07/08/2019] [Indexed: 02/06/2023]
Abstract
Among the large number of known microRNAs (miRNAs), some miRNAs play negligible roles in cell regulation. Therefore, selecting essential miRNAs is an important initial step for a deeper understanding of miRNAs and their functions. In this study, we generated 60 classification models by combining 12 representative feature extraction methods and 5 commonly used classification algorithms. The optimal model for essential miRNA classification that we obtained is based on the Mismatch feature extraction method combined with the random forest algorithm. The F-Measure, area under the curve, and accuracy values of this model were 93.2%, 96.7%, and 93.0%, respectively. We also found that the distribution of the positive and negative examples of the first few features greatly influenced the classification results. The feature extraction methods performed best when the differences between the positive and negative examples were obvious, and this led to better classification of essential miRNAs. Because each classifier's predictions for the same sample may be different, we employed a novel voting method to improve the accuracy of the classification of essential miRNAs. The performance results showed that the best classification results were obtained when five classification models were used in the voting. The five classification models were constructed based on the Mismatch, pseudo-distance structure status pair composition, Subsequence, Kmer, and Triplet feature extraction methods. The voting result was 95.3%. Our results suggest that the voting method can be an important tool for selecting essential miRNAs.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Peigang Cao
- Department of Cardiology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
12
|
LncRNA REG1CP promotes tumorigenesis through an enhancer complex to recruit FANCJ helicase for REG3A transcription. Nat Commun 2019; 10:5334. [PMID: 31767869 PMCID: PMC6877513 DOI: 10.1038/s41467-019-13313-z] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 11/01/2019] [Indexed: 01/03/2023] Open
Abstract
Protein products of the regenerating islet-derived (REG) gene family are important regulators of many cellular processes. Here we functionally characterise a non-protein coding product of the family, the long noncoding RNA (lncRNA) REG1CP that is transcribed from a DNA fragment at the family locus previously thought to be a pseudogene. REG1CP forms an RNA–DNA triplex with a homopurine stretch at the distal promoter of the REG3A gene, through which the DNA helicase FANCJ is tethered to the core promoter of REG3A where it unwinds double stranded DNA and facilitates a permissive state for glucocorticoid receptor α (GRα)-mediated REG3A transcription. As such, REG1CP promotes cancer cell proliferation and tumorigenicity and its upregulation is associated with poor outcome of patients. REG1CP is also transcriptionally inducible by GRα, indicative of feedforward regulation. These results reveal the function and regulation of REG1CP and suggest that REG1CP may constitute a target for cancer treatment. The regenerating islet-derived (REG) protein family suppresses cell death and promotes cell proliferation. Here the authors report that the lncRNA REG1CP forms an RNA–DNA triplex at the promoter of REG3A gene to increase its expression.
Collapse
|
13
|
Budden DM, Hurley DG, Crampin EJ. Modelling the conditional regulatory activity of methylated and bivalent promoters. Epigenetics Chromatin 2015; 8:21. [PMID: 26097508 PMCID: PMC4474576 DOI: 10.1186/s13072-015-0013-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 06/10/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predictive modelling of gene expression is a powerful framework for the in silico exploration of transcriptional regulatory interactions through the integration of high-throughput -omics data. A major limitation of previous approaches is their inability to handle conditional interactions that emerge when genes are subject to different regulatory mechanisms. Although chromatin immunoprecipitation-based histone modification data are often used as proxies for chromatin accessibility, the association between these variables and expression often depends upon the presence of other epigenetic markers (e.g. DNA methylation or histone variants). These conditional interactions are poorly handled by previous predictive models and reduce the reliability of downstream biological inference. RESULTS We have previously demonstrated that integrating both transcription factor and histone modification data within a single predictive model is rendered ineffective by their statistical redundancy. In this study, we evaluate four proposed methods for quantifying gene-level DNA methylation levels and demonstrate that inclusion of these data in predictive modelling frameworks is also subject to this critical limitation in data integration. Based on the hypothesis that statistical redundancy in epigenetic data is caused by conditional regulatory interactions within a dynamic chromatin context, we construct a new gene expression model which is the first to improve prediction accuracy by unsupervised identification of latent regulatory classes. We show that DNA methylation and H2A.Z histone variant data can be interpreted in this way to identify and explore the signatures of silenced and bivalent promoters, substantially improving genome-wide predictions of mRNA transcript abundance and downstream biological inference across multiple cell lines. CONCLUSIONS Previous models of gene expression have been applied successfully to several important problems in molecular biology, including the discovery of transcription factor roles, identification of regulatory elements responsible for differential expression patterns and comparative analysis of the transcriptome across distant species. Our analysis supports our hypothesis that statistical redundancy in epigenetic data is partially due to conditional relationships between these regulators and gene expression levels. This analysis provides insight into the heterogeneous roles of H3K4me3 and H3K27me3 in the presence of the H2A.Z histone variant (implicated in cancer progression) and how these signatures change during lineage commitment and carcinogenesis.
Collapse
Affiliation(s)
- David M Budden
- Systems Biology Laboratory, Melbourne School of Engineering, The University of Melbourne, 3010 Parkville, Australia ; NICTA Victoria Research Laboratory, The University of Melbourne, 3010 Parkville, Australia
| | - Daniel G Hurley
- Systems Biology Laboratory, Melbourne School of Engineering, The University of Melbourne, 3010 Parkville, Australia
| | - Edmund J Crampin
- Systems Biology Laboratory, Melbourne School of Engineering, The University of Melbourne, 3010 Parkville, Australia ; NICTA Victoria Research Laboratory, The University of Melbourne, 3010 Parkville, Australia ; ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, 3010 Parkville, Australia ; Department of Mathematics and Statistics, The University of Melbourne, 3010 Parkville, Australia ; School of Medicine, The University of Melbourne, 3010 Parkville, Australia
| |
Collapse
|