1
|
Gonzalez-Avalos E, Onodera A, Samaniego-Castruita D, Rao A, Ay F. Predicting gene expression state and prioritizing putative enhancers using 5hmC signal. Genome Biol 2024; 25:142. [PMID: 38825692 PMCID: PMC11145787 DOI: 10.1186/s13059-024-03273-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 05/11/2024] [Indexed: 06/04/2024] Open
Abstract
BACKGROUND Like its parent base 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC) is a direct epigenetic modification of cytosines in the context of CpG dinucleotides. 5hmC is the most abundant oxidized form of 5mC, generated through the action of TET dioxygenases at gene bodies of actively-transcribed genes and at active or lineage-specific enhancers. Although such enrichments are reported for 5hmC, to date, predictive models of gene expression state or putative regulatory regions for genes using 5hmC have not been developed. RESULTS Here, by using only 5hmC enrichment in genic regions and their vicinity, we develop neural network models that predict gene expression state across 49 cell types. We show that our deep neural network models distinguish high vs low expression state utilizing only 5hmC levels and these predictive models generalize to unseen cell types. Further, in order to leverage 5hmC signal in distal enhancers for expression prediction, we employ an Activity-by-Contact model and also develop a graph convolutional neural network model with both utilizing Hi-C data and 5hmC enrichment to prioritize enhancer-promoter links. These approaches identify known and novel putative enhancers for key genes in multiple immune cell subsets. CONCLUSIONS Our work highlights the importance of 5hmC in gene regulation through proximal and distal mechanisms and provides a framework to link it to genome function. With the recent advances in 6-letter DNA sequencing by short and long-read techniques, profiling of 5mC and 5hmC may be done routinely in the near future, hence, providing a broad range of applications for the methods developed here.
Collapse
Affiliation(s)
- Edahi Gonzalez-Avalos
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Atsushi Onodera
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA
- Department of Immunology, Graduate School of Medicine, Chiba University, Chiba, 260-8670, Japan
| | - Daniela Samaniego-Castruita
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA
- Biological Sciences Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Anjana Rao
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA.
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Pharmacology, University of California San Diego, La Jolla, CA, 92093, USA.
- Sanford Consortium for Regenerative Medicine, La Jolla, CA, 92093, USA.
- Moores Cancer Center, University of California San Diego, La Jolla, CA, 92093, USA.
| | - Ferhat Ay
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA.
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA.
- Moores Cancer Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
2
|
Vishnevsky OV, Bocharnikov AV, Ignatieva EV. Peak Scores Significantly Depend on the Relationships between Contextual Signals in ChIP-Seq Peaks. Int J Mol Sci 2024; 25:1011. [PMID: 38256085 PMCID: PMC10816497 DOI: 10.3390/ijms25021011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/13/2023] [Accepted: 01/09/2024] [Indexed: 01/24/2024] Open
Abstract
Chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) is a central genome-wide method for in vivo analyses of DNA-protein interactions in various cellular conditions. Numerous studies have demonstrated the complex contextual organization of ChIP-seq peak sequences and the presence of binding sites for transcription factors in them. We assessed the dependence of the ChIP-seq peak score on the presence of different contextual signals in the peak sequences by analyzing these sequences from several ChIP-seq experiments using our fully enumerative GPU-based de novo motif discovery method, Argo_CUDA. Analysis revealed sets of significant IUPAC motifs corresponding to the binding sites of the target and partner transcription factors. For these ChIP-seq experiments, multiple regression models were constructed, demonstrating a significant dependence of the peak scores on the presence in the peak sequences of not only highly significant target motifs but also less significant motifs corresponding to the binding sites of the partner transcription factors. A significant correlation was shown between the presence of the target motifs FOXA2 and the partner motifs HNF4G, which found experimental confirmation in the scientific literature, demonstrating the important contribution of the partner transcription factors to the binding of the target transcription factor to DNA and, consequently, their important contribution to the peak score.
Collapse
Affiliation(s)
- Oleg V. Vishnevsky
- Institute of Cytology and Genetics, 630090 Novosibirsk, Russia;
- Department of Natural Science, Novosibirsk State University, 630090 Novosibirsk, Russia;
| | - Andrey V. Bocharnikov
- Department of Natural Science, Novosibirsk State University, 630090 Novosibirsk, Russia;
| | - Elena V. Ignatieva
- Institute of Cytology and Genetics, 630090 Novosibirsk, Russia;
- Department of Natural Science, Novosibirsk State University, 630090 Novosibirsk, Russia;
| |
Collapse
|
3
|
Neikes HK, Kliza KW, Gräwe C, Wester RA, Jansen PWTC, Lamers LA, Baltissen MP, van Heeringen SJ, Logie C, Teichmann SA, Lindeboom RGH, Vermeulen M. Quantification of absolute transcription factor binding affinities in the native chromatin context using BANC-seq. Nat Biotechnol 2023; 41:1801-1809. [PMID: 36973556 DOI: 10.1038/s41587-023-01715-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 02/16/2023] [Indexed: 03/29/2023]
Abstract
Transcription factor binding across the genome is regulated by DNA sequence and chromatin features. However, it is not yet possible to quantify the impact of chromatin context on transcription factor binding affinities. Here, we report a method called binding affinities to native chromatin by sequencing (BANC-seq) to determine absolute apparent binding affinities of transcription factors to native DNA across the genome. In BANC-seq, a concentration range of a tagged transcription factor is added to isolated nuclei. Concentration-dependent binding is then measured per sample to quantify apparent binding affinities across the genome. BANC-seq adds a quantitative dimension to transcription factor biology, which enables stratification of genomic targets based on transcription factor concentration and prediction of transcription factor binding sites under non-physiological conditions, such as disease-associated overexpression of (onco)genes. Notably, whereas consensus DNA binding motifs for transcription factors are important to establish high-affinity binding sites, these motifs are not always strictly required to generate nanomolar-affinity interactions in the genome.
Collapse
Affiliation(s)
- Hannah K Neikes
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Katarzyna W Kliza
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Cathrin Gräwe
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Roelof A Wester
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Pascal W T C Jansen
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Lieke A Lamers
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Marijke P Baltissen
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Simon J van Heeringen
- Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Colin Logie
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University Nijmegen, Nijmegen, the Netherlands
| | | | - Rik G H Lindeboom
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
- The Netherlands Cancer Institute, Amsterdam, the Netherlands.
| | - Michiel Vermeulen
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, the Netherlands.
- The Netherlands Cancer Institute, Amsterdam, the Netherlands.
| |
Collapse
|
4
|
Pianfetti E, Lovino M, Ficarra E, Martignetti L. MiREx: mRNA levels prediction from gene sequence and miRNA target knowledge. BMC Bioinformatics 2023; 24:443. [PMID: 37993778 PMCID: PMC10666312 DOI: 10.1186/s12859-023-05560-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 11/06/2023] [Indexed: 11/24/2023] Open
Abstract
Messenger RNA (mRNA) has an essential role in the protein production process. Predicting mRNA expression levels accurately is crucial for understanding gene regulation, and various models (statistical and neural network-based) have been developed for this purpose. A few models predict mRNA expression levels from the DNA sequence, exploiting the DNA sequence and gene features (e.g., number of exons/introns, gene length). Other models include information about long-range interaction molecules (i.e., enhancers/silencers) and transcriptional regulators as predictive features, such as transcription factors (TFs) and small RNAs (e.g., microRNAs - miRNAs). Recently, a convolutional neural network (CNN) model, called Xpresso, has been proposed for mRNA expression level prediction leveraging the promoter sequence and mRNAs' half-life features (gene features). To push forward the mRNA level prediction, we present miREx, a CNN-based tool that includes information about miRNA targets and expression levels in the model. Indeed, each miRNA can target specific genes, and the model exploits this information to guide the learning process. In detail, not all miRNAs are included, only a selected subset with the highest impact on the model. MiREx has been evaluated on four cancer primary sites from the genomics data commons (GDC) database: lung, kidney, breast, and corpus uteri. Results show that mRNA level prediction benefits from selected miRNA targets and expression information. Future model developments could include other transcriptional regulators or be trained with proteomics data to infer protein levels.
Collapse
Affiliation(s)
- Elena Pianfetti
- Department of Engineering, University of Modena and Reggio Emilia, Via Vivarelli 10/1, Modena, 41225, Italy
| | - Marta Lovino
- Department of Engineering, University of Modena and Reggio Emilia, Via Vivarelli 10/1, Modena, 41225, Italy.
| | - Elisa Ficarra
- Department of Engineering, University of Modena and Reggio Emilia, Via Vivarelli 10/1, Modena, 41225, Italy
| | - Loredana Martignetti
- Institut Curie, Rue d'Ulm 26, Paris, 75005, France.
- Inserm U900, Paris, France.
- CBIO-Centre for Computational Biology, Paris, France.
- PSL Research University, Paris, France.
| |
Collapse
|
5
|
Li L, Bao H, Xu Y, Yang W, Zhang Z, Ma K, Zhang K, Zhou J, Gong Y, Ci W, Gong K. Preliminary Study of Whole-Genome Bisulfite Sequencing and Transcriptome Sequencing in VHL Disease-Associated ccRCC. Mol Diagn Ther 2023; 27:741-752. [PMID: 37587253 DOI: 10.1007/s40291-023-00663-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/02/2023] [Indexed: 08/18/2023]
Abstract
BACKGROUND Von Hippel-Lindau (VHL) disease is an autosomal dominant hereditary tumor syndrome with an incidence of approximately 1/36,000. VHL disease-associated clear cell renal cell carcinoma (ccRCC) is the most common congenital RCC. Although recent advances in treating RCC have improved the long-term prognosis of patients with VHL disease, kidney cancer is still the leading cause of death in these patients. Therefore, finding new targets for diagnosing and treating VHL disease-associated ccRCC is still essential. METHODS In this study, we collected matched tumor tissues and normal samples from 25 patients with VHL disease-associated ccRCC, diagnosed and surgically treated in the Department of Urology, Peking University First Hospital. After screening, we performed whole genome bisulfite sequencing (WGBS) on 23 pairs of tissues and RNA-seq on 6 pairs of tissues. And we also compared the VHL disease-associated ccRCC transcriptome data with the sporadic ccRCC transcriptome data from the The Cancer Genome Atlas (TCGA) public database RESULTS: We found that the methylation level of VHL disease-associated ccRCC tumor tissues was significantly lower than that of normal tissues. The tumor tissues showed a difference in the copy number of 3p loss and 5q and 7q gain compared with normal tissues. We integrated RNA-seq and WGBS data to reveal methylation candidate genes associated with VHL disease-associated ccRCC; our results showed 124 hypermethylated and downregulated genes, and 245 hypomethylated and upregulated genes. By comparing the VHL disease-associated ccRCC transcriptome data with the sporadic ccRCC transcriptome data from the TCGA public database, we found that the major pathways of differential gene enrichment differed between them. CONCLUSIONS Our study mapped the multiomics of copy number variation, methylation and mRNA level changes in tumor and normal tissues of clear cell renal cell carcinoma with VHL syndrome, which provides a solid foundation for the mechanistic study, biomarker screening, and therapeutic target discovery of clear cell renal cell carcinoma.
Collapse
Affiliation(s)
- Lei Li
- Department of Urology, Peking University First Hospital, Beijing, 100034, China
- Institution of Urology, Peking University, Beijing, 100034, China
- Beijing Key Laboratory of Urogenital Diseases (Male) Molecular Diagnosis and Treatment Center, Beijing, 100034, China
- National Urological Cancer Center, Beijing, 100034, China
| | - Hainan Bao
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, and China National Center for Bioinformation, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yawei Xu
- Department of Urology, Peking University First Hospital, Beijing, 100034, China
- Institution of Urology, Peking University, Beijing, 100034, China
- Beijing Key Laboratory of Urogenital Diseases (Male) Molecular Diagnosis and Treatment Center, Beijing, 100034, China
- National Urological Cancer Center, Beijing, 100034, China
| | - Wuping Yang
- Department of Urology, Peking University First Hospital, Beijing, 100034, China
- Institution of Urology, Peking University, Beijing, 100034, China
- Beijing Key Laboratory of Urogenital Diseases (Male) Molecular Diagnosis and Treatment Center, Beijing, 100034, China
- National Urological Cancer Center, Beijing, 100034, China
| | - Zedan Zhang
- Department of Urology, Peking University First Hospital, Beijing, 100034, China
- Institution of Urology, Peking University, Beijing, 100034, China
- Beijing Key Laboratory of Urogenital Diseases (Male) Molecular Diagnosis and Treatment Center, Beijing, 100034, China
- National Urological Cancer Center, Beijing, 100034, China
| | - Kaifang Ma
- Department of Urology, Beijing Tongren Hospital, Capital Medical University, No. 1 Dongjiaomingxiang Street, Dongcheng District, Beijing, 100730, China
| | - Kenan Zhang
- Department of Urology, Peking University First Hospital, Beijing, 100034, China
- Institution of Urology, Peking University, Beijing, 100034, China
- Beijing Key Laboratory of Urogenital Diseases (Male) Molecular Diagnosis and Treatment Center, Beijing, 100034, China
- National Urological Cancer Center, Beijing, 100034, China
| | - Jingcheng Zhou
- Department of Urology, Peking University First Hospital, Beijing, 100034, China
- Institution of Urology, Peking University, Beijing, 100034, China
- Beijing Key Laboratory of Urogenital Diseases (Male) Molecular Diagnosis and Treatment Center, Beijing, 100034, China
- National Urological Cancer Center, Beijing, 100034, China
| | - Yanqing Gong
- Department of Urology, Peking University First Hospital, Beijing, 100034, China
- Institution of Urology, Peking University, Beijing, 100034, China
- Beijing Key Laboratory of Urogenital Diseases (Male) Molecular Diagnosis and Treatment Center, Beijing, 100034, China
- National Urological Cancer Center, Beijing, 100034, China
| | - Weimin Ci
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, and China National Center for Bioinformation, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing, China.
| | - Kan Gong
- Department of Urology, Peking University First Hospital, Beijing, 100034, China.
- Institution of Urology, Peking University, Beijing, 100034, China.
- Beijing Key Laboratory of Urogenital Diseases (Male) Molecular Diagnosis and Treatment Center, Beijing, 100034, China.
- National Urological Cancer Center, Beijing, 100034, China.
| |
Collapse
|
6
|
Hasib RA, Ali MC, Rahman MH, Ahmed S, Sultana S, Summa SZ, Shimu MSS, Afrin Z, Jamal MAHM. Integrated gene expression profiling and functional enrichment analyses to discover biomarkers and pathways associated with Guillain-Barré syndrome and autism spectrum disorder to identify new therapeutic targets. J Biomol Struct Dyn 2023:1-23. [PMID: 37776011 DOI: 10.1080/07391102.2023.2262586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 09/17/2023] [Indexed: 10/01/2023]
Abstract
Guillain-Barré syndrome (GBS) is one of the most prominent and acute immune-mediated peripheral neuropathy, while autism spectrum disorders (ASD) are a group of heterogeneous neurodevelopmental disorders. The complete mechanism regarding the neuropathophysiology of these disorders is still ambiguous. Even after recent breakthroughs in molecular biology, the link between GBS and ASD remains a mystery. Therefore, we have implemented well-established bioinformatic techniques to identify potential biomarkers and drug candidates for GBS and ASD. 17 common differentially expressed genes (DEGs) were identified for these two disorders, which later guided the rest of the research. Common genes identified the protein-protein interaction (PPI) network and pathways associated with both disorders. Based on the PPI network, the constructed hub gene and module analysis network determined two common DEGs, namely CXCL9 and CXCL10, which are vital in predicting the top drug candidates. Furthermore, coregulatory networks of TF-gene and TF-miRNA were built to detect the regulatory biomolecules. Among drug candidates, imatinib had the highest docking and MM-GBSA score with the well-known chemokine receptor CXCR3 and remained stable during the 100 ns molecular dynamics simulation validated by the principal component analysis and the dynamic cross-correlation map. This study predicted the gene-based disease network for GBS and ASD and suggested prospective drug candidates. However, more in-depth research is required for clinical validation.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Rizone Al Hasib
- Department of Biotechnology and Genetic Engineering, Islamic University, Kushtia, Bangladesh
- Laboratory of Medical and Environmental Biotechnology Islamic University, Kushtia, Bangladesh
| | - Md Chayan Ali
- Department of Biochemistry, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Md Habibur Rahman
- Department of Computer Science and Engineering, Islamic University, Kushtia, Bangladesh
- Center for Advanced Bioinformatics and Artificial Intelligent Research, Islamic University, Kushtia, Bangladesh
| | - Sabbir Ahmed
- Department of Biotechnology and Genetic Engineering, Islamic University, Kushtia, Bangladesh
| | - Shaharin Sultana
- Department of Biotechnology and Genetic Engineering, Islamic University, Kushtia, Bangladesh
- Laboratory of Medical and Environmental Biotechnology Islamic University, Kushtia, Bangladesh
| | - Sadia Zannat Summa
- Department of Biotechnology and Genetic Engineering, Islamic University, Kushtia, Bangladesh
- Laboratory of Medical and Environmental Biotechnology Islamic University, Kushtia, Bangladesh
| | | | - Zinia Afrin
- Department of Biotechnology and Genetic Engineering, Islamic University, Kushtia, Bangladesh
| | - Mohammad Abu Hena Mostofa Jamal
- Department of Biotechnology and Genetic Engineering, Islamic University, Kushtia, Bangladesh
- Laboratory of Medical and Environmental Biotechnology Islamic University, Kushtia, Bangladesh
| |
Collapse
|
7
|
Ochoa S, Hernández-Lemus E. Molecular mechanisms of multi-omic regulation in breast cancer. Front Oncol 2023; 13:1148861. [PMID: 37564937 PMCID: PMC10411627 DOI: 10.3389/fonc.2023.1148861] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 07/05/2023] [Indexed: 08/12/2023] Open
Abstract
Breast cancer is a complex disease that is influenced by the concurrent influence of multiple genetic and environmental factors. Recent advances in genomics and other high throughput biomolecular techniques (-omics) have provided numerous insights into the molecular mechanisms underlying breast cancer development and progression. A number of these mechanisms involve multiple layers of regulation. In this review, we summarize the current knowledge on the role of multiple omics in the regulation of breast cancer, including the effects of DNA methylation, non-coding RNA, and other epigenomic changes. We comment on how integrating such diverse mechanisms is envisioned as key to a more comprehensive understanding of breast carcinogenesis and cancer biology with relevance to prognostics, diagnostics and therapeutics. We also discuss the potential clinical implications of these findings and highlight areas for future research. Overall, our understanding of the molecular mechanisms of multi-omic regulation in breast cancer is rapidly increasing and has the potential to inform the development of novel therapeutic approaches for this disease.
Collapse
Affiliation(s)
- Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
8
|
Cai X, Shi W, Lian J, Zhang G, Cai Y, Zhu L. Characterization of immune landscape and development of a novel N7-methylguanine-related gene signature to aid therapy in recurrent aphthous stomatitis. Inflamm Res 2023; 72:133-148. [PMID: 36352034 DOI: 10.1007/s00011-022-01665-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 10/24/2022] [Accepted: 10/26/2022] [Indexed: 11/11/2022] Open
Abstract
OBJECTIVES Recurrent aphthous stomatitis (RAS) is the most common inflammatory disease of the oral mucosa resulting in an impaired life quality and even leading to tumors in susceptible populations. N7-Methylguanine (m7G) plays a vital role in various cellular activities but has not yet been investigated in RAS. We aimed at picturing the immune landscape and constructing an m7G-related gene signature, and investigating candidate drugs and gene-disease association to aid therapy for RAS. METHODS For our study, m7G-related differentially expressed genes (DEGs) were screened. We outlined the immune microenvironment and studied the correlations between the m7G-related DEGs and immune cells/pathways. We performed functional enrichment analyses and constructed the protein-protein interaction (PPI) and multifactor regulatory network in RAS. The m7G-related hub genes were extracted to formulate the corresponding m7G predictive signature. RESULTS We obtained 11 m7G-related DEGs and studied a comprehensive immune infiltration landscape, which indicated several immune markers as possible immunotherapeutic targets. The PPI and multifactor regulatory network was constructed and 4 hub genes (DDX58, IFI27, IFIT5, and PML) were identified, followed by validation of the corresponding m7G predictive signature for RAS. GO and KEGG analyses revealed the participation of JAK-STAT and several immune-related pathways. Finally, we suggested candidate drugs and gene-disease associations for potential RAS medical interventions. CONCLUSIONS The present study pictured a comprehensive immune infiltration landscape and suggested that m7G played a vital role in RAS through immune-related pathways. This study provided new insight for the future investigation of the mechanisms and therapeutic strategies for RAS.
Collapse
Affiliation(s)
- Xueyao Cai
- Department of Plastic and Reconstructive Surgery, Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, 639 Zhi-Zao-Ju Road, Huangpu District, Shanghai, 200011, China
| | - Wenjun Shi
- Department of Plastic and Reconstructive Surgery, Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, 639 Zhi-Zao-Ju Road, Huangpu District, Shanghai, 200011, China
| | - Jie Lian
- Department of Plastic and Reconstructive Surgery, Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, 639 Zhi-Zao-Ju Road, Huangpu District, Shanghai, 200011, China
| | - Guoyou Zhang
- Department of Plastic and Reconstructive Surgery, Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, 639 Zhi-Zao-Ju Road, Huangpu District, Shanghai, 200011, China
| | - Yuchen Cai
- Department of Plastic and Reconstructive Surgery, Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, 639 Zhi-Zao-Ju Road, Huangpu District, Shanghai, 200011, China.
| | - Lian Zhu
- Department of Plastic and Reconstructive Surgery, Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, 639 Zhi-Zao-Ju Road, Huangpu District, Shanghai, 200011, China.
| |
Collapse
|
9
|
Nikolenko JV, Fursova NA, Mazina MY, Vorobyeva NE, Krasnov AN. The Drosophila CG9890 Protein is Involved in the Regulation of Ecdysone-Dependent Transcription. Mol Biol 2022. [DOI: 10.1134/s0026893322040082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
10
|
Pan-cancer identification of the relationship of metabolism-related differentially expressed transcription regulation with non-differentially expressed target genes via a gated recurrent unit network. Comput Biol Med 2022; 148:105883. [PMID: 35878490 DOI: 10.1016/j.compbiomed.2022.105883] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 07/10/2022] [Accepted: 07/16/2022] [Indexed: 11/20/2022]
Abstract
The transcriptome describes the expression of all genes in a sample. Most studies have investigated the differential patterns or discrimination powers of transcript expression levels. In this study, we hypothesized that the quantitative correlations between the expression levels of transcription factors (TFs) and their regulated target genes (mRNAs) serve as a novel view of healthy status, and a disease sample exhibits a differential landscape (mqTrans) of transcription regulations compared with healthy status. We formulated quantitative transcription regulation relationships of metabolism-related genes as a multi-input multi-output regression model via a gated recurrent unit (GRU) network. The GRU model was trained using healthy blood transcriptomes and the expression levels of mRNAs were predicted by those of the TFs. The mqTrans feature of a gene was defined as the difference between its predicted and actual expression levels. A pan-cancer investigation of the differentially expressed mqTrans features was conducted between the early- and late-stage cancers in 26 cancer types of The Cancer Genome Atlas database. This study focused on the differentially expressed mqTrans features, that did not show differential expression in the actual expression levels. These genes could not be detected by conventional differential analysis. Such dark biomarkers are worthy of further wet-lab investigation. The experimental data also showed that the proposed mqTrans investigation improved the classification between early- and late-stage samples for some cancer types. Thus, the mqTrans features serve as a complementary view to transcriptomes, an OMIC type with mature high-throughput production technologies, and abundant public resources.
Collapse
|
11
|
Ripon Rouf ASM, Amin MA, Islam MK, Haque F, Ahmed KR, Rahman MA, Islam MZ, Kim B. Statistical Bioinformatics to Uncover the Underlying Biological Mechanisms That Linked Smoking with Type 2 Diabetes Patients Using Transcritpomic and GWAS Analysis. Molecules 2022; 27:molecules27144390. [PMID: 35889263 PMCID: PMC9323276 DOI: 10.3390/molecules27144390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/30/2022] [Accepted: 07/04/2022] [Indexed: 12/14/2022] Open
Abstract
Type 2 diabetes (T2D) is a chronic metabolic disease defined by insulin insensitivity corresponding to impaired insulin sensitivity, decreased insulin production, and eventually failure of beta cells in the pancreas. There is a 30–40 percent higher risk of developing T2D in active smokers. Moreover, T2D patients with active smoking may gradually develop many complications. However, there is still no significant research conducted to solve the issue. Hence, we have proposed a highthroughput network-based quantitative pipeline employing statistical methods. Transcriptomic and GWAS data were analysed and obtained from type 2 diabetes patients and active smokers. Differentially Expressed Genes (DEGs) resulted by comparing T2D patients’ and smokers’ tissue samples to those of healthy controls of gene expression transcriptomic datasets. We have found 55 dysregulated genes shared in people with type 2 diabetes and those who smoked, 27 of which were upregulated and 28 of which were downregulated. These identified DEGs were functionally annotated to reveal the involvement of cell-associated molecular pathways and GO terms. Moreover, protein–protein interaction analysis was conducted to discover hub proteins in the pathways. We have also identified transcriptional and post-transcriptional regulators associated with T2D and smoking. Moreover, we have analysed GWAS data and found 57 common biomarker genes between T2D and smokers. Then, Transcriptomic and GWAS analyses are compared for more robust outcomes and identified 1 significant common gene, 19 shared significant pathways and 12 shared significant GOs. Finally, we have discovered protein–drug interactions for our identified biomarkers.
Collapse
Affiliation(s)
| | - Md. Al Amin
- Department of Computer Science & Engineering, Prime University, Dhaka 1216, Bangladesh;
| | - Md. Khairul Islam
- Department of Information & Communication Technology, Islamic University, Kushtia 7003, Bangladesh;
| | - Farzana Haque
- Department of Biotechnology and Genetic Engineering, Faculty of Biological Sciences, Islamic University, Kushtia 7003, Bangladesh;
| | - Kazi Rejvee Ahmed
- Department of Pathology, College of Korean Medicine, Kyung Hee University, Hoegidong Dongdaemungu, Seoul 02447, Korea;
| | - Md. Ataur Rahman
- Department of Pathology, College of Korean Medicine, Kyung Hee University, Hoegidong Dongdaemungu, Seoul 02447, Korea;
- Korean Medicine-Based Drug Repositioning Cancer Research Center, College of Korean Medicine, Kyung Hee University, Seoul 02447, Korea
- Correspondence: (M.A.R.); (M.Z.I.); (B.K.)
| | - Md. Zahidul Islam
- Department of Information & Communication Technology, Islamic University, Kushtia 7003, Bangladesh;
- Correspondence: (M.A.R.); (M.Z.I.); (B.K.)
| | - Bonglee Kim
- Department of Pathology, College of Korean Medicine, Kyung Hee University, Hoegidong Dongdaemungu, Seoul 02447, Korea;
- Korean Medicine-Based Drug Repositioning Cancer Research Center, College of Korean Medicine, Kyung Hee University, Seoul 02447, Korea
- Correspondence: (M.A.R.); (M.Z.I.); (B.K.)
| |
Collapse
|
12
|
Tian H, He Y, Xue Y, Gao YQ. Expression regulation of genes is linked to their CpG density distributions around transcription start sites. Life Sci Alliance 2022; 5:5/9/e202101302. [PMID: 35580989 PMCID: PMC9113945 DOI: 10.26508/lsa.202101302] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 05/07/2022] [Accepted: 05/09/2022] [Indexed: 11/24/2022] Open
Abstract
The CpG dinucleotide and its methylation behaviors play vital roles in gene regulation. Previous studies have divided genes into several categories based on the CpG intensity around transcription starting sites and found that housekeeping genes tend to possess high CpG density, whereas tissue-specific genes are generally characterized by low CpG density. In this study, we investigated how the CpG density distribution of a gene affects its transcription and regulation pattern. Based on the CpG density distribution around transcription starting site, by means of a semi-supervised neural network we designed, which took data augmentation into account, we divided the human genes into three categories, and genes within each cluster shared similar CpG density distribution. Not only sequence properties, these different clusters exhibited distinctly different structural features, regulatory mechanisms, correlation patterns between the expression level and CpG/TpG density, and expression and epigenetic mark variations during tumorigenesis. For instance, the activation of cluster 3 genes relies more on 3D genome reorganization, compared with cluster 1 and 2 genes, whereas cluster 2 genes showed the strongest correlation between gene expression and H3K27me3. Genes exhibiting uncoupled correlation between gene regulation and histone modifications are mainly in cluster 3. These results emphasized that the usage of epigenetic marks in gene regulation is partially rooted in the sequence property of genes such as their CpG density distribution and explained to some extent why the relation between epigenetic marks and gene expression is controversial.
Collapse
Affiliation(s)
- Hao Tian
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Yueying He
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Yue Xue
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Yi Qin Gao
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing, China .,Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.,Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, China
| |
Collapse
|
13
|
Hanson HE, Wang C, Schrey AW, Liebl AL, Ravinet M, Jiang RH, Martin LB. Epigenetic Potential and DNA Methylation in an Ongoing House Sparrow (Passer domesticus) Range Expansion. Am Nat 2022; 200:662-674. [DOI: 10.1086/720950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
14
|
Hasan I, Hossain A, Bhuiyan P, Miah S, Rahman H. A system biology approach to determine therapeutic targets by identifying molecular mechanisms and key pathways for type 2 diabetes that are linked to the development of tuberculosis and rheumatoid arthritis. Life Sci 2022; 297:120483. [DOI: 10.1016/j.lfs.2022.120483] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 03/07/2022] [Accepted: 03/09/2022] [Indexed: 12/17/2022]
|
15
|
Girgis CM, Brennan-Speranza TC. Vitamin D and Skeletal Muscle: Current Concepts From Preclinical Studies. JBMR Plus 2021; 5:e10575. [PMID: 34950830 PMCID: PMC8674777 DOI: 10.1002/jbm4.10575] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 10/07/2021] [Accepted: 10/24/2021] [Indexed: 12/12/2022] Open
Abstract
Muscle weakness has been recognized as a hallmark feature of vitamin D deficiency for many years. Until recently, the direct biomolecular effects of vitamin D on skeletal muscle have been unclear. Although in the past, some reservations have been raised regarding the expression of the vitamin D receptor in muscle tissue, this special issue review article outlines the clear evidence from preclinical studies for not only the expression of the receptor in muscle but also the roles of vitamin D activity in muscle development, mass, and strength. Additionally, muscle may also serve as a dynamic storage site for vitamin D, and play a central role in the maintenance of circulating 25-hydroxy vitamin D levels during periods of low sun exposure. © 2021 The Authors. JBMR Plus published by Wiley Periodicals LLC on behalf of American Society for Bone and Mineral Research.
Collapse
Affiliation(s)
- Christian M Girgis
- Faculty of Medicine and Health University of Sydney Sydney NSW Australia.,Department of Diabetes and Endocrinology Westmead Hospital Sydney NSW Australia.,Department of Endocrinology Royal North Shore Hospital Sydney NSW Australia
| | - Tara C Brennan-Speranza
- Faculty of Medicine and Health University of Sydney Sydney NSW Australia.,School of Medical Sciences University of Sydney Sydney NSW Australia.,School of Public Health University of Sydney Sydney NSW Australia
| |
Collapse
|
16
|
Han Z, Yang T, Guo Y, Cui WH, Yao LJ, Li G, Wu AM, Li JH, Liu LJ. The transcription factor PagLBD3 contributes to the regulation of secondary growth in Populus. JOURNAL OF EXPERIMENTAL BOTANY 2021; 72:7092-7106. [PMID: 34313722 DOI: 10.1093/jxb/erab351] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 07/24/2021] [Indexed: 06/13/2023]
Abstract
LATERAL ORGAN BOUNDARIES DOMAIN (LBD) genes encode plant-specific transcription factors that participate in regulating various developmental processes. In this study, we genetically characterized PagLBD3 encoding an important regulator of secondary growth in poplar (Populus alba × Populus glandulosa). Overexpression of PagLBD3 increased stem secondary growth in Populus with a significantly higher rate of cambial cell differentiation into phloem, while dominant repression of PagLBD3 significantly decreased the rate of cambial cell differentiation into phloem. Furthermore, we identified 1756 PagLBD3 genome-wide putative direct target genes (DTGs) through RNA sequencing (RNA-seq)-coupled DNA affinity purification followed by sequencing (DAP-seq) assays. Gene Ontology analysis revealed that genes regulated by PagLBD3 were enriched in biological pathways regulating meristem development, xylem development, and auxin transport. Several central regulator genes for vascular development, including PHLOEM INTERCALATED WITH XYLEM (PXY), WUSCHEL RELATED HOMEOBOX4 (WOX4), Secondary Wall-Associated NAC Domain 1s (SND1-B2), and Vascular-Related NAC-Domain 6s (VND6-B1), were identified as PagLBD3 DTGs. Together, our results indicate that PagLBD3 and its DTGs form a complex transcriptional network to modulate cambium activity and phloem/xylem differentiation.
Collapse
Affiliation(s)
- Zhen Han
- College of Forestry, State Forestry and Grassland Administration Key Laboratory of Silviculture in downstream areas of the Yellow River, Shandong Agriculture University, Taian, Shandong 271018, China
| | - Tong Yang
- College of Forestry, State Forestry and Grassland Administration Key Laboratory of Silviculture in downstream areas of the Yellow River, Shandong Agriculture University, Taian, Shandong 271018, China
| | - Ying Guo
- College of Forestry, State Forestry and Grassland Administration Key Laboratory of Silviculture in downstream areas of the Yellow River, Shandong Agriculture University, Taian, Shandong 271018, China
| | - Wen-Hui Cui
- College of Forestry, State Forestry and Grassland Administration Key Laboratory of Silviculture in downstream areas of the Yellow River, Shandong Agriculture University, Taian, Shandong 271018, China
| | - Li-Juan Yao
- College of Forestry, State Forestry and Grassland Administration Key Laboratory of Silviculture in downstream areas of the Yellow River, Shandong Agriculture University, Taian, Shandong 271018, China
| | - Gang Li
- College of Life Science, State Key Laboratory of Crop Biology, Shandong Agriculture University, Taian, Shandong 271018, China
| | - Ai-Min Wu
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510642, China
| | - Ji-Hong Li
- College of Forestry, State Forestry and Grassland Administration Key Laboratory of Silviculture in downstream areas of the Yellow River, Shandong Agriculture University, Taian, Shandong 271018, China
| | - Li-Jun Liu
- College of Forestry, State Forestry and Grassland Administration Key Laboratory of Silviculture in downstream areas of the Yellow River, Shandong Agriculture University, Taian, Shandong 271018, China
| |
Collapse
|
17
|
Mutual dependency between lncRNA LETN and protein NPM1 in controlling the nucleolar structure and functions sustaining cell proliferation. Cell Res 2021; 31:664-683. [PMID: 33432115 PMCID: PMC8169757 DOI: 10.1038/s41422-020-00458-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 11/30/2020] [Indexed: 02/06/2023] Open
Abstract
Fundamental processes such as ribosomal RNA synthesis and chromatin remodeling take place in the nucleolus, which is hyperactive in fast-proliferating cells. The sophisticated regulatory mechanism underlying the dynamic nucleolar structure and functions is yet to be fully explored. The present study uncovers the mutual functional dependency between a previously uncharacterized human long non-coding RNA, which we renamed LETN, and a key nucleolar protein, NPM1. Specifically, being upregulated in multiple types of cancer, LETN resides in the nucleolus via direct binding with NPM1. LETN plays a critical role in facilitating the formation of NPM1 pentamers, which are essential building blocks of the nucleolar granular component and control the nucleolar functions. Repression of LETN or NPM1 led to similar and profound changes of the nucleolar morphology and arrest of the nucleolar functions, which led to proliferation inhibition of human cancer cells and neural progenitor cells. Interestingly, this inter-dependency between LETN and NPM1 is associated with the evolutionarily new variations of NPM1 and the coincidental emergence of LETN in higher primates. We propose that this human-specific protein-lncRNA axis renders an additional yet critical layer of regulation with high physiological relevance in both cancerous and normal developmental processes that require hyperactive nucleoli.
Collapse
|
18
|
Marand AP, Chen Z, Gallavotti A, Schmitz RJ. A cis-regulatory atlas in maize at single-cell resolution. Cell 2021; 184:3041-3055.e21. [PMID: 33964211 DOI: 10.1101/2020.09.27.315499] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 03/04/2021] [Accepted: 04/07/2021] [Indexed: 05/22/2023]
Abstract
cis-regulatory elements (CREs) encode the genomic blueprints of spatiotemporal gene expression programs enabling highly specialized cell functions. Using single-cell genomics in six maize organs, we determined the cis- and trans-regulatory factors defining diverse cell identities and coordinating chromatin organization by profiling transcription factor (TF) combinatorics, identifying TFs with non-cell-autonomous activity, and uncovering TFs underlying higher-order chromatin interactions. Cell-type-specific CREs were enriched for enhancer activity and within unmethylated long terminal repeat retrotransposons. Moreover, we found cell-type-specific CREs are hotspots for phenotype-associated genetic variants and were targeted by selection during modern maize breeding, highlighting the biological implications of this CRE atlas. Through comparison of maize and Arabidopsis thaliana developmental trajectories, we identified TFs and CREs with conserved and divergent chromatin dynamics, showcasing extensive evolution of gene regulatory networks. In addition to this rich dataset, we developed single-cell analysis software, Socrates, which can be used to understand cis-regulatory variation in any species.
Collapse
Affiliation(s)
| | - Zongliang Chen
- Waksman Institute, Rutgers University, Piscataway, NJ 08854, USA
| | - Andrea Gallavotti
- Waksman Institute, Rutgers University, Piscataway, NJ 08854, USA; Department of Plant Biology, Rutgers University, New Brunswick, NJ 08901, USA
| | - Robert J Schmitz
- Department of Genetics, University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
19
|
Agarwal V, Shendure J. Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks. Cell Rep 2021; 31:107663. [PMID: 32433972 DOI: 10.1016/j.celrep.2020.107663] [Citation(s) in RCA: 87] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 06/11/2019] [Accepted: 04/28/2020] [Indexed: 01/06/2023] Open
Abstract
Algorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here, we sought to apply deep convolutional neural networks toward that goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, termed Xpresso, more than doubles the accuracy of alternative sequence-based models and isolates rules as predictive as models relying on chromatic immunoprecipitation sequencing (ChIP-seq) data. Xpresso recapitulates genome-wide patterns of transcriptional activity, and its residuals can be used to quantify the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose cell-type-specific gene-expression predictions based solely on primary sequences as a grand challenge for the field.
Collapse
Affiliation(s)
- Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Calico Life Sciences LLC, South San Francisco, CA 94080, USA.
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
20
|
Muley VY. Mathematical Programming for Modeling Expression of a Gene Using Gurobi Optimizer to Identify Its Transcriptional Regulators. Methods Mol Biol 2021; 2328:99-113. [PMID: 34251621 DOI: 10.1007/978-1-0716-1534-8_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The cell expresses various genes in specific contexts with respect to internal and external perturbations to invoke appropriate responses. Transcription factors (TFs) orchestrate and define the expression level of genes by binding to their regulatory regions. Dysregulated expression of TFs often leads to aberrant expression changes of their target genes and is responsible for several diseases including cancers. In the last two decades, several studies experimentally identified target genes of several TFs. However, these studies are limited to a small fraction of the total TFs encoded by an organism, and only for those amenable to experimental settings. Experimental limitations lead to many computational techniques having been proposed to predict target genes of TFs. Linear modeling of gene expression is one of the most promising computational approaches, readily applicable to the thousands of expression datasets available in the public domain across diverse phenotypes. Linear models assume that the expression of a gene is the sum of expression of TFs regulating it. In this chapter, I introduce mathematical programming for the linear modeling of gene expression, which has certain advantages over the conventional statistical modeling approaches. It is fast, scalable to genome level and most importantly, allows mixed integer programming to tune the model outcome with prior knowledge on gene regulation.
Collapse
|
21
|
Sharipov RN, Kondrakhin YV, Ryabova AS, Yevshin IS, Kolpakov FA. Assessment of transcriptional importance of cell line-specific features based on GTRD and FANTOM5 data. PLoS One 2020; 15:e0243332. [PMID: 33347457 PMCID: PMC7751965 DOI: 10.1371/journal.pone.0243332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 11/19/2020] [Indexed: 11/18/2022] Open
Abstract
Creating a complete picture of the regulation of transcription seems to be an urgent task of modern biology. Regulation of transcription is a complex process carried out by transcription factors (TFs) and auxiliary proteins. Over the past decade, ChIP-Seq has become the most common experimental technology studying genome-wide interactions between TFs and DNA. We assessed the transcriptional significance of cell line-specific features using regression analysis of ChIP-Seq datasets from the GTRD database and transcriptional start site (TSS) activities from the FANTOM5 expression atlas. For this purpose, we initially generated a large number of features that were defined as the presence or absence of TFs in different promoter regions around TSSs. Using feature selection and regression analysis, we identified sets of the most important TFs that affect expression activity of TSSs in human cell lines such as HepG2, K562 and HEK293. We demonstrated that some TFs can be classified as repressors and activators depending on their location relative to TSS.
Collapse
Affiliation(s)
- Ruslan N. Sharipov
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- Specialized Educational Scientific Center, Novosibirsk State University, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| | - Yury V. Kondrakhin
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| | - Anna S. Ryabova
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| | - Ivan S. Yevshin
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| | - Fedor A. Kolpakov
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| |
Collapse
|
22
|
Wang H, Liu Y, Guan H, Fan GL. The Regulation of Target Genes by Co-occupancy of Transcription Factors, c-Myc and Mxi1 with Max in the Mouse Cell Line. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191106103633] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Background:
The regulatory function of transcription factors on genes is not only related
to the location of binding genes and its related functions, but is also related to the methods of
binding.
Objective:
It is necessary to study the regulation effects in different binding methods on target genes.
Methods:
In this study, we provided a reliable theoretical basis for studying gene expression
regulation of co-binding transcription factors and further revealed the specific regulation of
transcription factor co-binding in cancer cells.
Results:
Transcription factors tend to combine with other transcription factors in the regulatory
region to form a competitive or synergistic relationship to regulate target genes accurately.
Conclusion:
We found that up-regulated genes in cancer cells were involved in the regulation of
their own immune system related to the normal cells.
Collapse
Affiliation(s)
- Hui Wang
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Yuan Liu
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Hua Guan
- ENT Department, Huhhot First Hospital, Hohhot, China
| | - Guo-Liang Fan
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| |
Collapse
|
23
|
Singh R, Sophiarani Y. A report on DNA sequence determinants in gene expression. Bioinformation 2020; 16:422-431. [PMID: 32831525 PMCID: PMC7434957 DOI: 10.6026/97320630016422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 04/24/2020] [Indexed: 11/26/2022] Open
Abstract
The biased usage of nucleotides in coding sequence and its correlation with gene expression has been observed in several studies. A complex set of interactions between genes and
other components of the expression system determine the amount of proteins produced from coding sequences. It is known that the elongation rate of polypeptide chain is affected by
both codon usage bias and specific amino acid compositional constraints. Therefore, it is of interest to review local DNA-sequence elements and other positional as well as
combinatorial constraints that play significant role in gene expression.
Collapse
Affiliation(s)
- Ravail Singh
- Indian Institute of Integrative Medicine, CSIR, Canal Road, Jammu-180001
| | | |
Collapse
|
24
|
Höllbacher B, Balázs K, Heinig M, Uhlenhaut NH. Seq-ing answers: Current data integration approaches to uncover mechanisms of transcriptional regulation. Comput Struct Biotechnol J 2020; 18:1330-1341. [PMID: 32612756 PMCID: PMC7306512 DOI: 10.1016/j.csbj.2020.05.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/21/2020] [Accepted: 05/23/2020] [Indexed: 02/06/2023] Open
Abstract
Advancements in the field of next generation sequencing lead to the generation of ever-more data, with the challenge often being how to combine and reconcile results from different OMICs studies such as genome, epigenome and transcriptome. Here we provide an overview of the standard processing pipelines for ChIP-seq and RNA-seq as well as common downstream analyses. We describe popular multi-omics data integration approaches used to identify target genes and co-factors, and we discuss how machine learning techniques may predict transcriptional regulators and gene expression.
Collapse
Affiliation(s)
- Barbara Höllbacher
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Institute of Computational Biology ICB, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Department of Informatics, TUM, Munich 85748, Garching, Germany
| | - Kinga Balázs
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany
| | - Matthias Heinig
- Institute of Computational Biology ICB, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Department of Informatics, TUM, Munich 85748, Garching, Germany
| | - N Henriette Uhlenhaut
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Metabolic Programming, TUM School of Life Sciences Weihenstephan, Munich 85354, Freising, Germany
| |
Collapse
|
25
|
Ochoa S, de Anda-Jáuregui G, Hernández-Lemus E. Multi-Omic Regulation of the PAM50 Gene Signature in Breast Cancer Molecular Subtypes. Front Oncol 2020; 10:845. [PMID: 32528899 PMCID: PMC7259379 DOI: 10.3389/fonc.2020.00845] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 04/29/2020] [Indexed: 12/24/2022] Open
Abstract
Breast cancer is a disease that exhibits heterogeneity that goes from the genomic to the clinical levels. This heterogeneity is thought to be captured (at least partially) by the so-called breast cancer molecular subtypes. These molecular subtypes were initially defined based on the unsupervised clustering of gene expression and its correlate with histological, morphological, phenotypic and clinical features already known. Later, a 50-gene signature, PAM50, was defined in order to identify the biological subtype of a given sample within the clinical setting. The PAM50 signature was obtained by the use of unsupervised statistical methods, and therefore no limitation was set on the biological relevance (or lack of) of the selected genes beyond its predictive capacity. An open question that remains is what are the regulatory elements that drive the various expression behaviors of this set of genes in the different molecular subtypes. This question becomes more relevant as the measurement of more biological layers of regulation becomes accessible. In this work, we analyzed the gene expression regulation of the 50 genes in the PAM50 signature, in terms of (a) gene co-expression, (b) transcription factors, (c) micro-RNAs, and (d) methylation. Using data from the Cancer Genome Atlas (TCGA) for the Luminal A and B, Basal, and HER2-enriched molecular subtypes as well as normal tumor adjacent tissue, we identified predictors for gene expression through the use of an elastic net model. We compare and contrast the sets of identified regulators for the gene signature in each molecular subtype, and systematically compare them to current literature. We also identified a unique set of predictors for the expression of genes in the PAM50 signature associated with each of the molecular subtypes. Most selected predictors are exclusive for a PAM50 gene and predictors are not shared across subtypes. There are only 13 coding transcripts and 2 miRNAs selected for the four subtypes. MiR-21 and miR-10b connect almost all the PAM50 genes in all the subtypes and normal tissue, but do it in an exclusive manner, suggesting a cancer switch from miR-10b coordination in normal tissue to miR-21. The PAM50 gene sets of selected predictors that enrich for a function across subtypes, support that different regulatory molecular mechanisms are taking place. With this study we aim to a wider understanding of the regulatory mechanisms that differentiate the expression of the PAM50 signature, which in turn could perhaps help understand the molecular basis of the differences between the molecular subtypes.
Collapse
Affiliation(s)
- Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Graduate Program in Biomedical Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Guillermo de Anda-Jáuregui
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Cátedras Conacyt para Jóvenes Investigadores', National Council on Science and Technology, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
26
|
Klein HU, Schäfer M, Bennett DA, Schwender H, De Jager PL. Bayesian integrative analysis of epigenomic and transcriptomic data identifies Alzheimer's disease candidate genes and networks. PLoS Comput Biol 2020; 16:e1007771. [PMID: 32255787 PMCID: PMC7138305 DOI: 10.1371/journal.pcbi.1007771] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 03/03/2020] [Indexed: 12/28/2022] Open
Abstract
Biomedical research studies have generated large multi-omic datasets to study complex diseases like Alzheimer’s disease (AD). An important aim of these studies is the identification of candidate genes that demonstrate congruent disease-related alterations across the different data types measured by the study. We developed a new method to detect such candidate genes in large multi-omic case-control studies that measure multiple data types in the same set of samples. The method is based on a gene-centric integrative coefficient quantifying to what degree consistent differences are observed in the different data types. For statistical inference, a Bayesian hierarchical model is used to study the distribution of the integrative coefficient. The model employs a conditional autoregressive prior to integrate a functional gene network and to share information between genes known to be functionally related. We applied the method to an AD dataset consisting of histone acetylation, DNA methylation, and RNA transcription data from human cortical tissue samples of 233 subjects, and we detected 816 genes with consistent differences between persons with AD and controls. The findings were validated in protein data and in RNA transcription data from two independent AD studies. Finally, we found three subnetworks of jointly dysregulated genes within the functional gene network which capture three distinct biological processes: myeloid cell differentiation, protein phosphorylation and synaptic signaling. Further investigation of the myeloid network indicated an upregulation of this network in early stages of AD prior to accumulation of hyperphosphorylated tau and suggested that increased CSF1 transcription in astrocytes may contribute to microglial activation in AD. Thus, we developed a method that integrates multiple data types and external knowledge of gene function to detect candidate genes, applied the method to an AD dataset, and identified several disease-related genes and processes demonstrating the usefulness of the integrative approach. Recent technological advances have led to a new generation of studies that interrogate multiple molecular levels in the same target tissue of a set of subjects, generating complex multi-omic datasets with which to study disease mechanism. These datasets of genetic, epigenomic, transcriptomic, and other data have the potential to reveal novel biological insights; however, integrative analyses remain challenging and require new computational methods. We developed an integrative Bayesian approach to detect genes with consistent differences between case and control samples across multiple data types. The method further integrates prior knowledge about gene function in the form of a gene functional similarity network to improve statistical inference by sharing information between related genes. We applied our method to an Alzheimer’s disease dataset of epigenomic and transcriptomic data and detected and then validated several novel and known candidate genes as well as three major disease-related biological processes. One of these processes reflected microglial activation and included the cytokine CSF1. Single-nucleus data revealed that CSF1 was primarily upregulated in astrocytes, implicating the involvement of this cell type in microglial activation. Hence, we demonstrated that integrative analysis approaches to multi-omic datasets can improve candidate gene detection and thereby generate new insights into complex diseases.
Collapse
Affiliation(s)
- Hans-Ulrich Klein
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, New York, United States of America
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, New York, United States of America
- * E-mail:
| | - Martin Schäfer
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, United States of America
| | - Holger Schwender
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany
| | - Philip L. De Jager
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, New York, United States of America
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, New York, United States of America
| |
Collapse
|
27
|
do Amaral MCF, Frisbie J, Crum RJ, Goldstein DL, Krane CM. Hepatic transcriptome of the freeze-tolerant Cope's gray treefrog, Dryophytes chrysoscelis: responses to cold acclimation and freezing. BMC Genomics 2020; 21:226. [PMID: 32164545 PMCID: PMC7069055 DOI: 10.1186/s12864-020-6602-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 02/20/2020] [Indexed: 11/10/2022] Open
Abstract
Background Cope’s gray treefrog, Dryophytes chrysoscelis, withstands the physiological challenges of corporeal freezing, partly by accumulating cryoprotective compounds of hepatic origin, including glycerol, urea, and glucose. We hypothesized that expression of genes related to cryoprotectant mobilization and stress tolerance would be differentially regulated in response to cold. Using high-throughput RNA sequencing (RNA-Seq), a hepatic transcriptome was generated for D. chrysoscelis, and gene expression was compared among frogs that were warm-acclimated, cold-acclimated, and frozen. Results A total of 159,556 transcripts were generated; 39% showed homology with known transcripts, and 34% of all transcripts were annotated. Gene-level analyses identified 34,936 genes, 85% of which were annotated. Cold acclimation induced differential expression both of genes and non-coding transcripts; freezing induced few additional changes. Transcript-level analysis followed by gene-level aggregation revealed 3582 differentially expressed genes, whereas analysis at the gene level revealed 1324 differentially regulated genes. Approximately 3.6% of differentially expressed sequences were non-coding and of no identifiable homology. Expression of several genes associated with cryoprotectant accumulation was altered during cold acclimation. Of note, glycerol kinase expression decreased with cold exposure, possibly promoting accumulation of glycerol, whereas glucose export was transcriptionally promoted by upregulation of glucose-6-phosphatase and downregulation of genes of various glycolytic enzymes. Several genes related to heat shock protein response, DNA repair, and the ubiquitin proteasome pathway were upregulated in cold and frozen frogs, whereas genes involved in responses to oxidative stress and anoxia, both potential sources of cellular damage during freezing, were downregulated or unchanged. Conclusion Our study is the first to report transcriptomic responses to low temperature exposure in a freeze-tolerant vertebrate. The hepatic transcriptome of Dryophytes chrysoscelis is responsive to cold and freezing. Transcriptomic regulation of genes related to particular pathways, such as glycerol biosynthesis, were not all regulated in parallel. The physiological demands associated with cold and freezing, as well as the transcriptomic responses observed in this study, are shared with several organisms that face similar ecophysiological challenges, suggesting common regulatory mechanisms. The role of transcriptional regulation relative to other cellular processes, and of non-coding transcripts as elements of those responses, deserve further study.
Collapse
Affiliation(s)
- M Clara F do Amaral
- Department of Biology, Mount St. Joseph University, 5701 Delhi Ave, Cincinnati, OH, 45233, USA
| | - James Frisbie
- Department of Biological Sciences, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH, 45435, USA
| | - Raphael J Crum
- Department of Biology, University of Dayton, 300 College Park Ave, Dayton, OH, 45469, USA
| | - David L Goldstein
- Department of Biological Sciences, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH, 45435, USA
| | - Carissa M Krane
- Department of Biology, University of Dayton, 300 College Park Ave, Dayton, OH, 45469, USA.
| |
Collapse
|
28
|
Rahman MH, Peng S, Hu X, Chen C, Rahman MR, Uddin S, Quinn JM, Moni MA. A Network-Based Bioinformatics Approach to Identify Molecular Biomarkers for Type 2 Diabetes that Are Linked to the Progression of Neurological Diseases. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17031035. [PMID: 32041280 PMCID: PMC7037290 DOI: 10.3390/ijerph17031035] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 02/02/2020] [Accepted: 02/02/2020] [Indexed: 12/21/2022]
Abstract
Neurological diseases (NDs) are progressive disorders, the progression of which can be significantly affected by a range of common diseases that present as comorbidities. Clinical studies, including epidemiological and neuropathological analyses, indicate that patients with type 2 diabetes (T2D) have worse progression of NDs, suggesting pathogenic links between NDs and T2D. However, finding causal or predisposing factors that link T2D and NDs remains challenging. To address these problems, we developed a high-throughput network-based quantitative pipeline using agnostic approaches to identify genes expressed abnormally in both T2D and NDs, to identify some of the shared molecular pathways that may underpin T2D and ND interaction. We employed gene expression transcriptomic datasets from control and disease-affected individuals and identified differentially expressed genes (DEGs) in tissues of patients with T2D and ND when compared to unaffected control individuals. One hundred and ninety seven DEGs (99 up-regulated and 98 down-regulated in affected individuals) that were common to both the T2D and the ND datasets were identified. Functional annotation of these identified DEGs revealed the involvement of significant cell signaling associated molecular pathways. The overlapping DEGs (i.e., seen in both T2D and ND datasets) were then used to extract the most significant GO terms. We performed validation of these results with gold benchmark databases and literature searching, which identified which genes and pathways had been previously linked to NDs or T2D and which are novel. Hub proteins in the pathways were identified (including DNM2, DNM1, MYH14, PACSIN2, TFRC, PDE4D, ENTPD1, PLK4, CDC20B, and CDC14A) using protein-protein interaction analysis which have not previously been described as playing a role in these diseases. To reveal the transcriptional and post-transcriptional regulators of the DEGs we used transcription factor (TF) interactions analysis and DEG-microRNAs (miRNAs) interaction analysis, respectively. We thus identified the following TFs as important in driving expression of our T2D/ND common genes: FOXC1, GATA2, FOXL1, YY1, E2F1, NFIC, NFYA, USF2, HINFP, MEF2A, SRF, NFKB1, USF2, HINFP, MEF2A, SRF, NFKB1, PDE4D, CREB1, SP1, HOXA5, SREBF1, TFAP2A, STAT3, POU2F2, TP53, PPARG, and JUN. MicroRNAs that affect expression of these genes include mir-335-5p, mir-16-5p, mir-93-5p, mir-17-5p, mir-124-3p. Thus, our transcriptomic data analysis identifies novel potential links between NDs and T2D pathologies that may underlie comorbidity interactions, links that may include potential targets for therapeutic intervention. In sum, our neighborhood-based benchmarking and multilayer network topology methods identified novel putative biomarkers that indicate how type 2 diabetes (T2D) and these neurological diseases interact and pathways that, in the future, may be targeted for treatment.
Collapse
Affiliation(s)
- Md Habibur Rahman
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; (M.H.R.); (S.P.); (X.H.); (C.C.)
- University of Chinese Academy of Sciences, Beijing 100190, China
- Department of Computer Science and Engineering, Islamic University, Kushtia 7003, Bangladesh
| | - Silong Peng
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; (M.H.R.); (S.P.); (X.H.); (C.C.)
- University of Chinese Academy of Sciences, Beijing 100190, China
| | - Xiyuan Hu
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; (M.H.R.); (S.P.); (X.H.); (C.C.)
- University of Chinese Academy of Sciences, Beijing 100190, China
| | - Chen Chen
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; (M.H.R.); (S.P.); (X.H.); (C.C.)
- University of Chinese Academy of Sciences, Beijing 100190, China
| | - Md Rezanur Rahman
- Department of Biochemistry and Biotechnology, Khwaja Yunus Ali University, Enayetpur, Sirajgonj 6751, Bangladesh;
| | - Shahadat Uddin
- Complex Systems Research Group & Project Management Program, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia;
| | - Julian M.W. Quinn
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia;
| | - Mohammad Ali Moni
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia;
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
- Correspondence:
| |
Collapse
|
29
|
Xu T, Zheng X, Li B, Jin P, Qin Z, Wu H. A comprehensive review of computational prediction of genome-wide features. Brief Bioinform 2020; 21:120-134. [PMID: 30462144 PMCID: PMC10233247 DOI: 10.1093/bib/bby110] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 10/15/2018] [Accepted: 10/16/2018] [Indexed: 12/15/2022] Open
Abstract
There are significant correlations among different types of genetic, genomic and epigenomic features within the genome. These correlations make the in silico feature prediction possible through statistical or machine learning models. With the accumulation of a vast amount of high-throughput data, feature prediction has gained significant interest lately, and a plethora of papers have been published in the past few years. Here we provide a comprehensive review on these published works, categorized by the prediction targets, including protein binding site, enhancer, DNA methylation, chromatin structure and gene expression. We also provide discussions on some important points and possible future directions.
Collapse
Affiliation(s)
- Tianlei Xu
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| |
Collapse
|
30
|
Girgis CM. Vitamin D and Skeletal Muscle: Emerging Roles in Development, Anabolism and Repair. Calcif Tissue Int 2020; 106:47-57. [PMID: 31312865 DOI: 10.1007/s00223-019-00583-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Accepted: 04/29/2019] [Indexed: 12/17/2022]
Abstract
This special issue article will focus on morphologic and functional roles of vitamin D in muscle, from strength to contraction to development and ageing and will characterise the controversy of VDR's expression in skeletal muscle, central to our understanding of vitamin D's effects on this tissue.
Collapse
Affiliation(s)
- Christian M Girgis
- Department of Diabetes and Endocrinology, Westmead Hospital, Sydney, NSW, Australia.
- Department of Diabetes and Endocrinology, Royal North Shore Hospital, Sydney, NSW, Australia.
- University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|
31
|
Yu Q, He Z, Zubkov D, Huang S, Kurochkin I, Yang X, Halene T, Willmitzer L, Giavalisco P, Akbarian S, Khaitovich P. Lipidome alterations in human prefrontal cortex during development, aging, and cognitive disorders. Mol Psychiatry 2020; 25:2952-2969. [PMID: 30089790 PMCID: PMC7577858 DOI: 10.1038/s41380-018-0200-8] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 04/26/2018] [Accepted: 06/11/2018] [Indexed: 12/27/2022]
Abstract
Lipids are essential to brain functions, yet they remain largely unexplored. Here we investigated the lipidome composition of prefrontal cortex gray matter in 396 cognitively healthy individuals with ages spanning 100 years, as well as 67 adult individuals diagnosed with autism (ASD), schizophrenia (SZ), and Down syndrome (DS). Of the 5024 detected lipids, 95% showed significant age-dependent concentration differences clustering into four temporal stages, and resulting in a gradual increase in membrane fluidity in individuals ranging from newborn to nonagenarian. Aging affects 14% of the brain lipidome with late-life changes starting predominantly at 50-55 years of age-a period of general metabolic transition. All three diseases alter the brain lipidome composition, leading-among other things-to a concentration decrease in glycerophospholipid metabolism and endocannabinoid signaling pathways. Lipid concentration decreases in SZ were further linked to genetic variants associated with disease, indicating the relevance of the lipidome changes to disease progression.
Collapse
Affiliation(s)
- Qianhui Yu
- grid.9227.e0000000119573309Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China ,grid.419092.70000 0004 0467 2285CAS Key Laboratory of Compstudy has been deposited in the National Omics Datautational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, 200031 China
| | - Zhisong He
- grid.419092.70000 0004 0467 2285CAS Key Laboratory of Compstudy has been deposited in the National Omics Datautational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, 200031 China ,grid.454320.40000 0004 0555 3608Skolkovo Institute of Science and Technology, Moscow, 143028 Russia
| | - Dmitry Zubkov
- grid.454320.40000 0004 0555 3608Skolkovo Institute of Science and Technology, Moscow, 143028 Russia
| | - Shuyun Huang
- grid.419092.70000 0004 0467 2285CAS Key Laboratory of Compstudy has been deposited in the National Omics Datautational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, 200031 China ,grid.440637.20000 0004 4657 8879ShanghaiTech University, Shanghai, 200031 China
| | - Ilia Kurochkin
- grid.454320.40000 0004 0555 3608Skolkovo Institute of Science and Technology, Moscow, 143028 Russia
| | - Xiaode Yang
- grid.9227.e0000000119573309Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China ,grid.419092.70000 0004 0467 2285CAS Key Laboratory of Compstudy has been deposited in the National Omics Datautational Biology, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, 200031 China
| | - Tobias Halene
- grid.59734.3c0000 0001 0670 2351Department of Psychiatry and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Lothar Willmitzer
- grid.418390.70000 0004 0491 976XMax Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, Potsdam, 14476 Germany
| | - Patrick Giavalisco
- Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, Potsdam, 14476, Germany.
| | - Schahram Akbarian
- Department of Psychiatry and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| | - Philipp Khaitovich
- Skolkovo Institute of Science and Technology, Moscow, 143028, Russia. .,ShanghaiTech University, Shanghai, 200031, China. .,Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany. .,Comparative Biology Group, CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, 200031, China.
| |
Collapse
|
32
|
Schmidt F, Schulz MH. On the problem of confounders in modeling gene expression. Bioinformatics 2019; 35:711-719. [PMID: 30084962 PMCID: PMC6530814 DOI: 10.1093/bioinformatics/bty674] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 06/21/2018] [Accepted: 08/02/2018] [Indexed: 01/01/2023] Open
Abstract
Motivation Modeling of Transcription Factor (TF) binding from both ChIP-seq and chromatin accessibility data has become prevalent in computational biology. Several models have been proposed to generate new hypotheses on transcriptional regulation. However, there is no distinct approach to derive TF binding scores from ChIP-seq and open chromatin experiments. Here, we review biases of various scoring approaches and their effects on the interpretation and reliability of predictive gene expression models. Results We generated predictive models for gene expression using ChIP-seq and DNase1-seq data from DEEP and ENCODE. Via randomization experiments, we identified confounders in TF gene scores derived from both ChIP-seq and DNase1-seq data. We reviewed correction approaches for both data types, which reduced the influence of identified confounders without harm to model performance. Also, our analyses highlighted further quality control measures, in addition to model performance, that may help to assure model reliability and to avoid misinterpretation in future studies. Availability and implementation The software used in this study is available online at https://github.com/SchulzLab/TEPIC. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Florian Schmidt
- High-througput Genomics and Systems Biology, Cluster of Excellence on Multimodal Computing and Interaction, Saarland Informatics Campus, Saarbrücken, Germany.,Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.,Graduate School for Computer Science, Saarland Informatics Campus, Saarbrücken, Germany
| | - Marcel H Schulz
- High-througput Genomics and Systems Biology, Cluster of Excellence on Multimodal Computing and Interaction, Saarland Informatics Campus, Saarbrücken, Germany.,Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| |
Collapse
|
33
|
Yu R, Nielsen J. Big data in yeast systems biology. FEMS Yeast Res 2019; 19:5585886. [DOI: 10.1093/femsyr/foz070] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 10/09/2019] [Indexed: 12/16/2022] Open
Abstract
ABSTRACTSystems biology uses computational and mathematical modeling to study complex interactions in a biological system. The yeast Saccharomyces cerevisiae, which has served as both an important model organism and cell factory, has pioneered both the early development of such models and modeling concepts, and the more recent integration of multi-omics big data in these models to elucidate fundamental principles of biology. Here, we review the advancement of big data technologies to gain biological insight in three aspects of yeast systems biology: gene expression dynamics, cellular metabolism and the regulation network between gene expression and metabolism. The role of big data and complementary modeling approaches, including the expansion of genome-scale metabolic models and machine learning methodologies, are discussed as key drivers in the rapid advancement of yeast systems biology.
Collapse
Affiliation(s)
- Rosemary Yu
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
- BioInnovation Institute, Ole Maaløes Vej 3, DK-2200 Copenhagen N, Denmark
| |
Collapse
|
34
|
Thormann V, Rothkegel MC, Schöpflin R, Glaser LV, Djuric P, Li N, Chung HR, Schwahn K, Vingron M, Meijsing SH. Genomic dissection of enhancers uncovers principles of combinatorial regulation and cell type-specific wiring of enhancer-promoter contacts. Nucleic Acids Res 2019; 46:2868-2882. [PMID: 29385519 PMCID: PMC5888794 DOI: 10.1093/nar/gky051] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 01/19/2018] [Indexed: 12/19/2022] Open
Abstract
Genomic binding of transcription factors, like the glucocorticoid receptor (GR), is linked to the regulation of genes. However, as we show here, GR binding is a poor predictor of GR-dependent gene regulation even when taking the 3D organization of the genome into account. To connect GR binding sites to the regulation of genes in the endogenous genomic context, we turned to genome editing. By deleting GR binding sites, individually or in combination, we uncovered how cooperative interactions between binding sites contribute to the regulation of genes. Specifically, for the GR target gene GILZ, we show that the simultaneous presence of a cluster of GR binding sites is required for the activity of an individual enhancer and that the GR-dependent regulation of GILZ depends on multiple GR-bound enhancers. Further, by deleting GR binding sites that are shared between different cell types, we show how cell type-specific genome organization and enhancer-blocking can result in cell type-specific wiring of promoter–enhancer contacts. This rewiring allows an individual GR binding site shared between different cell types to direct the expression of distinct transcripts and thereby contributes to the cell type-specific consequences of glucocorticoid signaling.
Collapse
Affiliation(s)
- Verena Thormann
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Maika C Rothkegel
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Robert Schöpflin
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Laura V Glaser
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Petar Djuric
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Na Li
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Ho-Ryun Chung
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Kevin Schwahn
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Martin Vingron
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Sebastiaan H Meijsing
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| |
Collapse
|
35
|
Soleimani VD, Nguyen D, Ramachandran P, Palidwor GA, Porter CJ, Yin H, Perkins TJ, Rudnicki MA. Cis-regulatory determinants of MyoD function. Nucleic Acids Res 2019; 46:7221-7235. [PMID: 30016497 PMCID: PMC6101602 DOI: 10.1093/nar/gky388] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 04/30/2018] [Indexed: 01/06/2023] Open
Abstract
Muscle-specific transcription factor MyoD orchestrates the myogenic gene expression program by binding to short DNA motifs called E-boxes within myogenic cis-regulatory elements (CREs). Genome-wide analyses of MyoD cistrome by chromatin immnunoprecipitation sequencing shows that MyoD-bound CREs contain multiple E-boxes of various sequences. However, how E-box numbers, sequences and their spatial arrangement within CREs collectively regulate the binding affinity and transcriptional activity of MyoD remain largely unknown. Here, by an integrative analysis of MyoD cistrome combined with genome-wide analysis of key regulatory histones and gene expression data we show that the affinity landscape of MyoD is driven by multiple E-boxes, and that the overall binding affinity—and associated nucleosome positioning and epigenetic features of the CREs—crucially depend on the variant sequences and positioning of the E-boxes within the CREs. By comparative genomic analysis of single nucleotide polymorphism (SNPs) across publicly available data from 17 strains of laboratory mice, we show that variant sequences within the MyoD-bound motifs, but not their genome-wide counterparts, are under selection. At last, we show that the quantitative regulatory effect of MyoD binding on the nearby genes can, in part, be predicted by the motif composition of the CREs to which it binds. Taken together, our data suggest that motif numbers, sequences and their spatial arrangement within the myogenic CREs are important determinants of the cis-regulatory code of myogenic CREs.
Collapse
Affiliation(s)
- Vahab D Soleimani
- Department of Human Genetics, McGill University, Montréal, QC H3A 1B1, Canada.,Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, QC H3T 1E2, Canada
| | - Duy Nguyen
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, QC H3T 1E2, Canada
| | - Parameswaran Ramachandran
- Sprott Centre for Stem Cell Research, Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, ON K1H 8L6, Canada
| | - Gareth A Palidwor
- Sprott Centre for Stem Cell Research, Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, ON K1H 8L6, Canada
| | - Christopher J Porter
- Sprott Centre for Stem Cell Research, Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, ON K1H 8L6, Canada
| | - Hang Yin
- Center for Molecular Medicine, Department of Biochemistry and Molecular Biology, University of Georgia, GA 30602, USA
| | - Theodore J Perkins
- Sprott Centre for Stem Cell Research, Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, ON K1H 8L6, Canada
| | - Michael A Rudnicki
- Sprott Centre for Stem Cell Research, Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, ON K1H 8L6, Canada.,Department of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
36
|
Holland P, Bergenholm D, Börlin CS, Liu G, Nielsen J. Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter usage between metabolic conditions. Nucleic Acids Res 2019; 47:4986-5000. [PMID: 30976803 PMCID: PMC6547448 DOI: 10.1093/nar/gkz253] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 03/26/2019] [Accepted: 04/04/2019] [Indexed: 01/08/2023] Open
Abstract
Transcription factors (TF) are central to transcriptional regulation, but they are often studied in relative isolation and without close control of the metabolic state of the cell. Here, we describe genome-wide binding (by ChIP-exo) of 15 yeast TFs in four chemostat conditions that cover a range of metabolic states. We integrate this data with transcriptomics and six additional recently mapped TFs to identify predictive models describing how TFs control gene expression in different metabolic conditions. Contributions by TFs to gene regulation are predicted to be mostly activating, additive and well approximated by assuming linear effects from TF binding signal. Notably, using TF binding peaks from peak finding algorithms gave distinctly worse predictions than simply summing the low-noise and high-resolution TF ChIP-exo reads on promoters. Finally, we discover indications of a novel functional role for three TFs; Gcn4, Ert1 and Sut1 during nitrogen limited aerobic fermentation. In only this condition, the three TFs have correlated binding to a large number of genes (enriched for glycolytic and translation processes) and a negative correlation to target gene transcript levels.
Collapse
Affiliation(s)
- Petter Holland
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
| | - David Bergenholm
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
| | - Christoph S Börlin
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
| | - Guodong Liu
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Gothenburg SE-41296, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby DK-2800, Denmark
| |
Collapse
|
37
|
Zhao Y, Schaafsma E, Cheng C. Applications of ENCODE data to Systematic Analyses via Data Integration. ACTA ACUST UNITED AC 2019; 11:57-64. [PMID: 31011690 DOI: 10.1016/j.coisb.2018.08.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Large-scale genomic data have been utilized to generate unprecedented biological findings and new hypotheses. To delineate functional elements in the human genome, the Encyclopedia of DNA Elements (ENCODE) project has generated an enormous amount of genomic data, yielding around 7,000 data profiles in different cell and tissue types. In this article, we reviewed the systematic analyses that have integrated ENCODE data with other data sources to reveal new biological insights, ranging from human genome annotation to the identification of new candidate drugs. These analyses demonstrate the critical impact of ENCODE data on basic biology and translational research.
Collapse
Affiliation(s)
- Yanding Zhao
- Department of Biomedical Data Science, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756.,Department of Molecular and Systems Biology, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756
| | - Evelien Schaafsma
- Department of Biomedical Data Science, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756.,Department of Molecular and Systems Biology, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756
| | - Chao Cheng
- Department of Biomedical Data Science, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756.,Department of Molecular and Systems Biology, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756.,Norris Cotton Cancer Center, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756
| |
Collapse
|
38
|
Liu W, Rajapakse JC. Fusing gene expressions and transitive protein-protein interactions for inference of gene regulatory networks. BMC SYSTEMS BIOLOGY 2019; 13:37. [PMID: 30953534 PMCID: PMC6449891 DOI: 10.1186/s12918-019-0695-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Background Systematic fusion of multiple data sources for Gene Regulatory Networks (GRN) inference remains a key challenge in systems biology. We incorporate information from protein-protein interaction networks (PPIN) into the process of GRN inference from gene expression (GE) data. However, existing PPIN remain sparse and transitive protein interactions can help predict missing protein interactions. We therefore propose a systematic probabilistic framework on fusing GE data and transitive protein interaction data to coherently build GRN. Results We use a Gaussian Mixture Model (GMM) to soft-cluster GE data, allowing overlapping cluster memberships. Next, a heuristic method is proposed to extend sparse PPIN by incorporating transitive linkages. We then propose a novel way to score extended protein interactions by combining topological properties of PPIN and correlations of GE. Following this, GE data and extended PPIN are fused using a Gaussian Hidden Markov Model (GHMM) in order to identify gene regulatory pathways and refine interaction scores that are then used to constrain the GRN structure. We employ a Bayesian Gaussian Mixture (BGM) model to refine the GRN derived from GE data by using the structural priors derived from GHMM. Experiments on real yeast regulatory networks demonstrate both the feasibility of the extended PPIN in predicting transitive protein interactions and its effectiveness on improving the coverage and accuracy the proposed method of fusing PPIN and GE to build GRN. Conclusion The GE and PPIN fusion model outperforms both the state-of-the-art single data source models (CLR, GENIE3, TIGRESS) as well as existing fusion models under various constraints.
Collapse
Affiliation(s)
- Wenting Liu
- School of Public Health and Management, Hubei University of Medicine, Shiyan, Hubei, China.,Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jagath C Rajapakse
- School of Computer Engineering, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
39
|
Pantera H, Moran JJ, Hung HA, Pak E, Dutra A, Svaren J. Regulation of the neuropathy-associated Pmp22 gene by a distal super-enhancer. Hum Mol Genet 2019; 27:2830-2839. [PMID: 29771329 DOI: 10.1093/hmg/ddy191] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 05/09/2018] [Indexed: 12/27/2022] Open
Abstract
Peripheral nerve myelination is adversely affected in the most common form of the hereditary peripheral neuropathy called Charcot-Marie-Tooth Disease. This form, classified as CMT1A, is caused by a 1.4 Mb duplication on chromosome 17, which includes the abundantly expressed Schwann cell myelin gene, Peripheral Myelin Protein 22 (PMP22). This is one of the most common copy number variants causing neurological disease. Overexpression of Pmp22 in rodent models recapitulates several aspects of neuropathy, and reduction of Pmp22 in such models results in amelioration of the neuropathy phenotype. Recently we identified a potential super-enhancer approximately 90-130 kb upstream of the Pmp22 transcription start sites. This super-enhancer encompasses a cluster of individual enhancers that have the acetylated histone H3K27 active enhancer mark, and coincides with smaller duplications identified in patients with milder CMT1A-like symptoms, where the PMP22 coding region itself was not part of the duplication. In this study, we have utilized genome editing to create a deletion of this super-enhancer to determine its role in Pmp22 regulation. Our data show a significant decrease in Pmp22 transcript expression using allele-specific internal controls. Moreover, the P2 promoter of the Pmp22 gene, which is used in other cell types, is affected, but we find that the Schwann cell-specific P1 promoter is disproportionately more sensitive to loss of the super-enhancer. These data show for the first time the requirement of these upstream enhancers for full Pmp22 expression.
Collapse
Affiliation(s)
- Harrison Pantera
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA.,Molecular and Cellular Pharmacology Graduate Program, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - John J Moran
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Holly A Hung
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Evgenia Pak
- Cytogenetics and Microscopy Core, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Amalia Dutra
- Cytogenetics and Microscopy Core, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - John Svaren
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA.,Department of Comparative Biosciences, University of Wisconsin-Madison, Madison, WI 53705, USA
| |
Collapse
|
40
|
Ma S, Jiang T, Jiang R. Constructing tissue-specific transcriptional regulatory networks via a Markov random field. BMC Genomics 2018; 19:884. [PMID: 30598101 PMCID: PMC6311931 DOI: 10.1186/s12864-018-5277-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Recent advances in sequencing technologies have enabled parallel assays of chromatin accessibility and gene expression for major human cell lines. Such innovation provides a great opportunity to decode phenotypic consequences of genetic variation via the construction of predictive gene regulatory network models. However, there still lacks a computational method to systematically integrate chromatin accessibility information with gene expression data to recover complicated regulatory relationships between genes in a tissue-specific manner. RESULTS We propose a Markov random field (MRF) model for constructing tissue-specific transcriptional regulatory networks via integrative analysis of DNase-seq and RNA-seq data. Our method, named CSNets (cell-line specific regulatory networks), first infers regulatory networks for individual cell lines using chromatin accessibility information, and then fine-tunes these networks using the MRF based on pairwise similarity between cell lines derived from gene expression data. Using this method, we constructed regulatory networks specific to 110 human cell lines and 13 major tissues with the use of ENCODE data. We demonstrated the high quality of these networks via comprehensive statistical analysis based on ChIP-seq profiles, functional annotations, taxonomic analysis, and literature surveys. We further applied these networks to analyze GWAS data of Crohn's disease and prostate cancer. Results were either consistent with the literature or provided biological insights into regulatory mechanisms of these two complex diseases. The website of CSNets is freely available at http://bioinfo.au.tsinghua.edu.cn/jianglab/CSNETS/ . CONCLUSIONS CSNets demonstrated the power of joint analysis on epigenomic and transcriptomic data towards the accurate construction of gene regulatory network. Our work provides not only a useful resource of regulatory networks to the community, but also valuable experiences in methodology development for multi-omics data integration.
Collapse
Affiliation(s)
- Shining Ma
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program Stanford University, Stanford, CA 94305 USA
| | - Tao Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084 China
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084 China
| |
Collapse
|
41
|
Lu R, Rogan PK. Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations. F1000Res 2018; 7:1933. [PMID: 31001412 PMCID: PMC6464064 DOI: 10.12688/f1000research.17363.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/05/2018] [Indexed: 10/12/2023] Open
Abstract
Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets. Methods: Genes with correlated expression patterns across 53 tissues and TF targets were respectively identified from Bray-Curtis Similarity and TF knockdown experiments. Corresponding promoter sequences were reduced to DNase I-accessible intervals; TFBSs were then identified within these intervals using information theory-based position weight matrices for each TF (iPWMs) and clustered. Features from information-dense TFBS clusters predicted these genes with machine learning classifiers, which were evaluated for accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed to in silico examine their impact on cluster densities and the regulatory states of target genes. Results: We initially chose the glucocorticoid receptor gene ( NR3C1), whose regulation has been extensively studied, to test this approach. SLC25A32 and TANK were found to exhibit the most similar expression patterns to NR3C1. A Decision Tree classifier exhibited the largest area under the Receiver Operating Characteristic (ROC) curve in detecting such genes. Target gene prediction was confirmed using siRNA knockdown of TFs, which was found to be more accurate than those predicted after CRISPR/CAS9 inactivation. In-silico mutation analyses of TFBSs also revealed that one or more information-dense TFBS clusters in promoters are required for accurate target gene prediction. Conclusions: Machine learning based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.
Collapse
Affiliation(s)
- Ruipeng Lu
- Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada
| | - Peter K. Rogan
- Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada
- Biochemistry, University of Western Ontario, London, Ontario, N6A 5C1, Canada
- Cytognomix, London, Ontario, N5X 3X5, Canada
| |
Collapse
|
42
|
Lu R, Rogan PK. Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations. F1000Res 2018; 7:1933. [PMID: 31001412 PMCID: PMC6464064 DOI: 10.12688/f1000research.17363.2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/28/2019] [Indexed: 12/20/2022] Open
Abstract
Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets using Machine Learning (ML). Methods: Bray-Curtis Similarity was used to identify genes with correlated expression patterns across 53 tissues. TF targets from knockdown experiments were also analyzed by this approach to set up the ML framework. TFBSs were selected within DNase I-accessible intervals of corresponding promoter sequences using information theory-based position weight matrices (iPWMs) for each TF. Features from information-dense clusters of TFBSs were input to ML classifiers which predict these gene targets along with their accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed in silico to examine their impact on TFBS clustering and predict changes in gene regulation. Results: The glucocorticoid receptor gene ( NR3C1), whose regulation has been extensively studied, was selected to test this approach. SLC25A32 and TANK exhibited the most similar expression patterns to NR3C1. A Decision Tree classifier exhibited the best performance in detecting such genes, based on Area Under the Receiver Operating Characteristic curve (ROC). TF target gene prediction was confirmed using siRNA knockdown, which was more accurate than CRISPR/CAS9 inactivation. TFBS mutation analyses revealed that accurate target gene prediction required at least 1 information-dense TFBS cluster. Conclusions: ML based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.
Collapse
Affiliation(s)
- Ruipeng Lu
- Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada
| | - Peter K. Rogan
- Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada
- Biochemistry, University of Western Ontario, London, Ontario, N6A 5C1, Canada
- Cytognomix, London, Ontario, N5X 3X5, Canada
| |
Collapse
|
43
|
Ng FSL, Ruau D, Wernisch L, Göttgens B. A graphical model approach visualizes regulatory relationships between genome-wide transcription factor binding profiles. Brief Bioinform 2018; 19:162-173. [PMID: 27780826 PMCID: PMC5496675 DOI: 10.1093/bib/bbw102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Indexed: 11/16/2022] Open
Abstract
Integrated analysis of multiple genome-wide transcription factor (TF)-binding profiles will be vital to advance our understanding of the global impact of TF binding. However, existing methods for measuring similarity in large numbers of chromatin immunoprecipitation assays with sequencing (ChIP-seq), such as correlation, mutual information or enrichment analysis, are limited in their ability to display functionally relevant TF relationships. In this study, we propose the use of graphical models to determine conditional independence between TFs and showed that network visualization provides a promising alternative to distinguish ‘direct’ versus ‘indirect’ TF interactions. We applied four algorithms to measure ‘direct’ dependence to a compendium of 367 mouse haematopoietic TF ChIP-seq samples and obtained a consensus network known as a ‘TF association network’ where edges in the network corresponded to likely causal pairwise relationships between TFs. The ‘TF association network’ illustrates the role of TFs in developmental pathways, is reminiscent of combinatorial TF regulation, corresponds to known protein–protein interactions and indicates substantial TF-binding reorganization in leukemic cell types. With the rapid increase in TF ChIP-Seq data sets, the approach presented here will be a powerful tool to study transcriptional programmes across a wide range of biological systems.
Collapse
Affiliation(s)
- Felicia S L Ng
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - David Ruau
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - Lorenz Wernisch
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - Berthold Göttgens
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
- Corresponding author: Berthold Gottgens, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge CB2 0XY, UK. Tel: 01223-336829; Fax: 01223-762670; E-mail:
| |
Collapse
|
44
|
Niu B, Coslo DM, Bataille AR, Albert I, Pugh BF, Omiecinski CJ. In vivo genome-wide binding interactions of mouse and human constitutive androstane receptors reveal novel gene targets. Nucleic Acids Res 2018; 46:8385-8403. [PMID: 30102401 PMCID: PMC6144799 DOI: 10.1093/nar/gky692] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 07/17/2018] [Accepted: 07/20/2018] [Indexed: 12/13/2022] Open
Abstract
The constitutive androstane receptor (CAR; NR1I3) is a nuclear receptor orchestrating complex roles in cell and systems biology. Species differences in CAR's effector pathways remain poorly understood, including its role in regulating liver tumor promotion. We developed transgenic mouse models to assess genome-wide binding of mouse and human CAR, following receptor activation in liver with direct ligands and with phenobarbital, an indirect CAR activator. Genomic interaction profiles were integrated with transcriptional and biological pathway analyses. Newly identified CAR target genes included Gdf15 and Foxo3, important regulators of the carcinogenic process. Approximately 1000 genes exhibited differential binding interactions between mouse and human CAR, including the proto-oncogenes, Myc and Ikbke, which demonstrated preferential binding by mouse CAR as well as mouse CAR-selective transcriptional enhancement. The ChIP-exo analyses also identified distinct binding motifs for the respective mouse and human receptors. Together, the results provide new insights into the important roles that CAR contributes as a key modulator of numerous signaling pathways in mammalian organisms, presenting a genomic context that specifies species variation in biological processes under CAR's control, including liver cell proliferation and tumor promotion.
Collapse
Affiliation(s)
- Ben Niu
- Center for Molecular Toxicology and Carcinogenesis, Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Denise M Coslo
- Center for Molecular Toxicology and Carcinogenesis, Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Alain R Bataille
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Istvan Albert
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - B Franklin Pugh
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Curtis J Omiecinski
- Center for Molecular Toxicology and Carcinogenesis, Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
45
|
Luo X, Wei Y. Nonparametric Bayesian learning of heterogeneous dynamic transcription factor networks. Ann Appl Stat 2018. [DOI: 10.1214/17-aoas1129] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
46
|
Chen D, Fu LY, Hu D, Klukas C, Chen M, Kaufmann K. The HTPmod Shiny application enables modeling and visualization of large-scale biological data. Commun Biol 2018; 1:89. [PMID: 30271970 PMCID: PMC6123733 DOI: 10.1038/s42003-018-0091-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 06/03/2018] [Indexed: 01/20/2023] Open
Abstract
The wave of high-throughput technologies in genomics and phenomics are enabling data to be generated on an unprecedented scale and at a reasonable cost. Exploring the large-scale data sets generated by these technologies to derive biological insights requires efficient bioinformatic tools. Here we introduce an interactive, open-source web application (HTPmod) for high-throughput biological data modeling and visualization. HTPmod is implemented with the Shiny framework by integrating the computational power and professional visualization of R and including various machine-learning approaches. We demonstrate that HTPmod can be used for modeling and visualizing large-scale, high-dimensional data sets (such as multiple omics data) under a broad context. By reinvestigating example data sets from recent studies, we find not only that HTPmod can reproduce results from the original studies in a straightforward fashion and within a reasonable time, but also that novel insights may be gained from fast reinvestigation of existing data by HTPmod. Dijun Chen et al. present HTPmod, a Shiny web application for modeling and visualization of large-scale genomic and phenomic datasets. The authors show that HTPmod can quickly reproduce analyses of high-throughput biological datasets and produce publication-quality figures.
Collapse
Affiliation(s)
- Dijun Chen
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, 10115, Germany. .,Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstrasse 3, Gatersleben, 06466, Germany.
| | - Liang-Yu Fu
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, 10115, Germany
| | - Dahui Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Christian Klukas
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstrasse 3, Gatersleben, 06466, Germany.,Digitalization in Research & Development (ROM), BASF SE, Ludwigshafen am Rhein, 67056, Germany
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.
| | - Kerstin Kaufmann
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, 10115, Germany.
| |
Collapse
|
47
|
|
48
|
Zhang LQ, Li QZ. Estimating the effects of transcription factors binding and histone modifications on gene expression levels in human cells. Oncotarget 2018; 8:40090-40103. [PMID: 28454114 PMCID: PMC5522221 DOI: 10.18632/oncotarget.16988] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 03/11/2017] [Indexed: 12/22/2022] Open
Abstract
Transcription factors and histone modifications are vital for the regulation of gene expression. Hence, to estimate the effects of transcription factors binding and histone modifications on gene expression, we construct a statistical model for the genome-wide 15 transcription factors binding data, 10 histone modifications profiles and DNase-I hypersensitivity data in three mammalian. Remarkably, our results show POLR2A and H3K36me3 can highly and consistently predict gene expression in three cell lines. And H3K4me3, H3K27me3 and H3K9ac are more reliable predictors than other histone modifications in human embryonic stem cells. Moreover, genome-wide statistical redundancies exist within and between transcription factors and histone modifications, and these phenomena may be caused by the regulation mechanism. In further study, we find that even though transcription factors and histone modifications offer similar effects on expression levels of genome-wide genes, the effects of transcription factors and histone modifications on predictive abilities are different for genes in independent biological processes.
Collapse
Affiliation(s)
- Lu-Qiang Zhang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Qian-Zhong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| |
Collapse
|
49
|
Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 2018; 28:739-750. [PMID: 29588361 PMCID: PMC5932613 DOI: 10.1101/gr.227819.117] [Citation(s) in RCA: 216] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 03/23/2018] [Indexed: 01/10/2023]
Abstract
Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.
Collapse
Affiliation(s)
| | - Yakir A Reshef
- Department of Computer Science, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | | - Jasper Snoek
- Google Brain, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
50
|
Guo WL, Huang DS. An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. MOLECULAR BIOSYSTEMS 2018; 13:1827-1837. [PMID: 28718849 DOI: 10.1039/c7mb00155j] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Transcription factors (TFs) are DNA-binding proteins that have a central role in regulating gene expression. Identification of DNA-binding sites of TFs is a key task in understanding transcriptional regulation, cellular processes and disease. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) enables genome-wide identification of in vivo TF binding sites. However, it is still difficult to map every TF in every cell line owing to cost and biological material availability, which poses an enormous obstacle for integrated analysis of gene regulation. To address this problem, we propose a novel computational approach, TFBSImpute, for predicting additional TF binding profiles by leveraging information from available ChIP-seq TF binding data. TFBSImpute fuses the dataset to a 3-mode tensor and imputes missing TF binding signals via simultaneous completion of multiple TF binding matrices with positional consistency. We show that signals predicted by our method achieve overall similarity with experimental data and that TFBSImpute significantly outperforms baseline approaches, by assessing the performance of imputation methods against observed ChIP-seq TF binding profiles. Besides, motif analysis shows that TFBSImpute preforms better in capturing binding motifs enriched in observed data compared with baselines, indicating that the higher performance of TFBSImpute is not simply due to averaging related samples. We anticipate that our approach will constitute a useful complement to experimental mapping of TF binding, which is beneficial for further study of regulation mechanisms and disease.
Collapse
Affiliation(s)
- Wei-Li Guo
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, 201804, China.
| | | |
Collapse
|