1
|
Muley VY. Prediction and Analysis of Transcription Factor Binding Sites: Practical Examples and Case Studies Using R Programming. Methods Mol Biol 2024; 2719:199-225. [PMID: 37803120 DOI: 10.1007/978-1-0716-3461-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Transcription factors (TFs) bind to specific regions of DNA known as transcription factor binding sites (TFBSs) and modulate gene expression by interacting with the transcriptional machinery. TFBSs are typically located upstream of target genes, within a few thousand base pairs of the transcription start site. The binding of TFs to TFBSs influences the recruitment of the transcriptional machinery, thereby regulating gene transcription in a precise and specific manner. This chapter provides practical examples and case studies demonstrating the extraction of upstream gene regions from the genome, identification of TFBSs using PWMEnrich R/Bioconductor package, interpretation of results, and preparation of publication-ready figures and tables. The EOMES promoter is used as a case study for single DNA sequence analysis, revealing potential regulation by the LHX9-FOXP1 complex during embryonic development. Additionally, an example is presented on how to investigate TFBSs in the upstream regions of a group of genes, using a case study of differentially expressed genes in response to human parainfluenza virus type 1 (HPIV1) infection and interferon-beta. Key regulators identified in this context include the STAT1:STAT2 heterodimer and interferon regulatory factor family proteins. The presented protocol is designed to be accessible to individuals with basic computer literacy. Understanding the interactions between TFs and TFBSs provides insights into the complex transcriptional regulatory networks that govern gene expression, with broad implications for several fields such as developmental biology, immunology, and disease research.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Independent Researcher, Hingoli, India
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, Mexico
| |
Collapse
|
2
|
Muley VY. Deep Learning for Predicting Gene Regulatory Networks: A Step-by-Step Protocol in R. Methods Mol Biol 2024; 2719:265-294. [PMID: 37803123 DOI: 10.1007/978-1-0716-3461-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Deep learning has emerged as a powerful tool for solving complex problems, including reconstruction of gene regulatory networks within the realm of biology. These networks consist of transcription factors and their associations with genes they regulate. Despite the utility of deep learning methods in studying gene expression and regulation, their accessibility remains limited for biologists, mainly due to the prerequisites of programming skills and a nuanced grasp of the underlying algorithms. This chapter presents a deep learning protocol that utilize TensorFlow and the Keras API in R/RStudio, with the aim of making deep learning accessible for individuals without specialized expertise. The protocol focuses on the genome-wide prediction of regulatory interactions between transcription factors and genes, leveraging publicly available gene expression data in conjunction with well-established benchmarks. The protocol encompasses pivotal phases including data preprocessing, conceptualization of neural network architectures, iterative processes of model training and validation, as well as forecasting of novel regulatory associations. Furthermore, it provides insights into parameter tuning for deep learning models. By adhering to this protocol, researchers are expected to gain a comprehensive understanding of applying deep learning techniques to predict regulatory interactions. This protocol can be readily modifiable to serve diverse research problems, thereby empowering scientists to effectively harness the capabilities of deep learning in their investigations.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Independent Researcher, Hingoli, India.
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, México.
| |
Collapse
|
3
|
Muley VY. Search, Retrieve, Visualize, and Analyze Protein-Protein Interactions from Multiple Databases: A Guide for Experimental Biologists. Methods Mol Biol 2023; 2690:429-443. [PMID: 37450164 DOI: 10.1007/978-1-0716-3327-4_33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
Functional annotation is lacking for over half of the proteins encoded in genomes and model or representative organisms are not an exception to this trend. One of the popular ways of assigning putative functions to uncharacterized proteins is based on the functions of well-characterized proteins that physically interact with them, i.e., guilt-by-association or functional context approach. In the last two decades, several powerful experimental and computational techniques have been used to determine protein-protein interactions (PPIs) at genome level and are made available through many public databases. The PPI data are often complex and heterogeneously represented across databases posing unique challenges in retrieving, integrating, and analyzing the data even for trained computational biologists, the end users-experimental biologists often struggle to work around the data for the protein of their interests. This chapter provides stepwise protocols to import interaction network of the protein of interest in Cytoscape using PSICQUIC, stringApp, and IntAct App. These are next-generation applications that import PPI from multiple databases/resources and provide seamless functions to study the protein of interest and its functional context directly in Cytoscape.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Independent Researcher, Jijamata Nagar, Hingoli, India.
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, México.
| |
Collapse
|
4
|
Shovlin S, Delepine C, Swanson L, Bach S, Sahin M, Sur M, Kaufmann WE, Tropea D. Molecular Signatures of Response to Mecasermin in Children With Rett Syndrome. Front Neurosci 2022; 16:868008. [PMID: 35712450 PMCID: PMC9197456 DOI: 10.3389/fnins.2022.868008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/26/2022] [Indexed: 11/21/2022] Open
Abstract
Rett syndrome (RTT) is a devastating neurodevelopmental disorder without effective treatments. Attempts at developing targetted therapies have been relatively unsuccessful, at least in part, because the genotypical and phenotypical variability of the disorder. Therefore, identification of biomarkers of response and patients' stratification are high priorities. Administration of Insulin-like Growth Factor 1 (IGF-1) and related compounds leads to significant reversal of RTT-like symptoms in preclinical mouse models. However, improvements in corresponding clinical trials have not been consistent. A 20-weeks phase I open label trial of mecasermin (recombinant human IGF-1) in children with RTT demonstrated significant improvements in breathing phenotypes. However, a subsequent randomised controlled phase II trial did not show significant improvements in primary outcomes although two secondary clinical endpoints showed positive changes. To identify molecular biomarkers of response and surrogate endpoints, we used RNA sequencing to measure differential gene expression in whole blood samples of participants in the abovementioned phase I mecasermin trial. When all participants (n = 9) were analysed, gene expression was unchanged during the study (baseline vs. end of treatment, T0-T3). However, when participants were subclassified in terms of breathing phenotype improvement, specifically by their plethysmography-based apnoea index, individuals with moderate-severe apnoea and breathing improvement (Responder group) displayed significantly different transcript profiles compared to the other participants in the study (Mecasermin Study Reference group, MSR). Many of the differentially expressed genes are involved in the regulation of cell cycle processes and immune responses, as well as in IGF-1 signalling and breathing regulation. While the Responder group showed limited gene expression changes in response to mecasermin, the MSR group displayed marked differences in the expression of genes associated with inflammatory processes (e.g., neutrophil activation, complement activation) throughout the trial. Our analyses revealed gene expression profiles associated with severe breathing phenotype and its improvement after mecasermin administration in RTT, and suggest that inflammatory/immune pathways and IGF-1 signalling contribute to treatment response. Overall, these data support the notion that transcript profiles have potential as biomarkers of response to IGF-1 and related compounds.
Collapse
Affiliation(s)
- Stephen Shovlin
- Neuropsychiatric Genetics, Trinity Center for Health Sciences, Trinity Translational Medicine Institute, St James Hospital, Dublin, Ireland
| | - Chloe Delepine
- Department of Brain and Cognitive Sciences, Simons Center for the Social Brain, Picower Institute for Learning and Memory, MIT, Cambridge, MA, United States
| | - Lindsay Swanson
- Department of Neurology, Rosamund Stone Zander Translational Neuroscience Center, Boston Children's Hospital and Harvard Medical School, Boston, MA, United States
| | - Snow Bach
- Neuropsychiatric Genetics, Trinity Center for Health Sciences, Trinity Translational Medicine Institute, St James Hospital, Dublin, Ireland
| | - Mustafa Sahin
- Department of Neurology, Rosamund Stone Zander Translational Neuroscience Center, Boston Children's Hospital and Harvard Medical School, Boston, MA, United States
| | - Mriganka Sur
- Department of Brain and Cognitive Sciences, Simons Center for the Social Brain, Picower Institute for Learning and Memory, MIT, Cambridge, MA, United States
| | - Walter E Kaufmann
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, United States.,Department of Neurology, Boston Children's Hospital, Boston, MA, United States
| | - Daniela Tropea
- Neuropsychiatric Genetics, Trinity Center for Health Sciences, Trinity Translational Medicine Institute, St James Hospital, Dublin, Ireland.,Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland.,FutureNeuro, The SFI Research Centre for Chronic and Rare Neurological Diseases, Dublin, Ireland
| |
Collapse
|
5
|
Muley VY, König R. Human transcriptional gene regulatory network compiled from 14 data resources. Biochimie 2021; 193:115-125. [PMID: 34740743 DOI: 10.1016/j.biochi.2021.10.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 10/28/2021] [Accepted: 10/29/2021] [Indexed: 11/02/2022]
Abstract
The transcriptional regulatory network (TRN) in a cell orchestrates spatio-temporal expression of genes to generate cellular responses for maintenance, reproduction, development and survival of the cell and its hosting organism. Transcription factors (TF) regulate the expression of their target genes (TG) and are the fundamental units of TRN. Several databases have been developed to catalogue human TRN based on low- and high-throughput experimental and computational studies considering their importance in understanding cellular physiology. However, literature lacks their comparative assessment on the strengths and weaknesses. We compared over 2.2 million regulatory pairs between 1379 TF and 22,518 TG from 14 resources. Our study reveals that the TF and TG were common across data resources but not their regulatory pairs. TF and TG of the regulatory pairs showed weak expression correlation, significant gene ontology overlap, co-citations in PubMed and low numbers of TF-TG pairs representing transcriptional repression relationships. We assigned each TF-TG regulatory pair a combined confidence score reflecting its reliability based on its presence in multiple databases. The assembled TRN contains 2,246,598 TF-TG pairs, of which, 44,284 with information on TF's activating or repressing effects on their TG and is available upon request. This study brings the information about transcriptional regulation scattered over the literature and databases at one place in the form of one of the most comprehensive and complete human TRN assembled to date. It will be a valuable resource for benchmarking TRN prediction tools, and to the scientific community working in functional genomics, gene expression and regulation analysis.
Collapse
Affiliation(s)
| | - Rainer König
- Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany; Integrated Research and Treatment Center, Center for Sepsis Control and Care, Jena University Hospital, Jena, Germany.
| |
Collapse
|
6
|
Ovens K, Eames BF, McQuillan I. Comparative Analyses of Gene Co-expression Networks: Implementations and Applications in the Study of Evolution. Front Genet 2021; 12:695399. [PMID: 34484293 PMCID: PMC8414652 DOI: 10.3389/fgene.2021.695399] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
Similarities and differences in the associations of biological entities among species can provide us with a better understanding of evolutionary relationships. Often the evolution of new phenotypes results from changes to interactions in pre-existing biological networks and comparing networks across species can identify evidence of conservation or adaptation. Gene co-expression networks (GCNs), constructed from high-throughput gene expression data, can be used to understand evolution and the rise of new phenotypes. The increasing abundance of gene expression data makes GCNs a valuable tool for the study of evolution in non-model organisms. In this paper, we cover motivations for why comparing these networks across species can be valuable for the study of evolution. We also review techniques for comparing GCNs in the context of evolution, including local and global methods of graph alignment. While some protein-protein interaction (PPI) bioinformatic methods can be used to compare co-expression networks, they often disregard highly relevant properties, including the existence of continuous and negative values for edge weights. Also, the lack of comparative datasets in non-model organisms has hindered the study of evolution using PPI networks. We also discuss limitations and challenges associated with cross-species comparison using GCNs, and provide suggestions for utilizing co-expression network alignments as an indispensable tool for evolutionary studies going forward.
Collapse
Affiliation(s)
- Katie Ovens
- Augmented Intelligence & Precision Health Laboratory (AIPHL), Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - B. Frank Eames
- Department of Anatomy, Physiology, & Pharmacology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
7
|
Aviña-Padilla K, Ramírez-Rafael JA, Herrera-Oropeza GE, Muley VY, Valdivia DI, Díaz-Valenzuela E, García-García A, Varela-Echavarría A, Hernández-Rosales M. Evolutionary Perspective and Expression Analysis of Intronless Genes Highlight the Conservation of Their Regulatory Role. Front Genet 2021; 12:654256. [PMID: 34306008 PMCID: PMC8302217 DOI: 10.3389/fgene.2021.654256] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 06/01/2021] [Indexed: 11/13/2022] Open
Abstract
The structure of eukaryotic genes is generally a combination of exons interrupted by intragenic non-coding DNA regions (introns) removed by RNA splicing to generate the mature mRNA. A fraction of genes, however, comprise a single coding exon with introns in their untranslated regions or are intronless genes (IGs), lacking introns entirely. The latter code for essential proteins involved in development, growth, and cell proliferation and their expression has been proposed to be highly specialized for neuro-specific functions and linked to cancer, neuropathies, and developmental disorders. The abundant presence of introns in eukaryotic genomes is pivotal for the precise control of gene expression. Notwithstanding, IGs exempting splicing events entail a higher transcriptional fidelity, making them even more valuable for regulatory roles. This work aimed to infer the functional role and evolutionary history of IGs centered on the mouse genome. IGs consist of a subgroup of genes with one exon including coding genes, non-coding genes, and pseudogenes, which conform approximately 6% of a total of 21,527 genes. To understand their prevalence, biological relevance, and evolution, we identified and studied 1,116 IG functional proteins validating their differential expression in transcriptomic data of embryonic mouse telencephalon. Our results showed that overall expression levels of IGs are lower than those of MEGs. However, strongly up-regulated IGs include transcription factors (TFs) such as the class 3 of POU (HMG Box), Neurog1, Olig1, and BHLHe22, BHLHe23, among other essential genes including the β-cluster of protocadherins. Most striking was the finding that IG-encoded BHLH TFs fit the criteria to be classified as microproteins. Finally, predicted protein orthologs in other six genomes confirmed high conservation of IGs associated with regulating neural processes and with chromatin organization and epigenetic regulation in Vertebrata. Moreover, this study highlights that IGs are essential modulators of regulatory processes, such as the Wnt signaling pathway and biological processes as pivotal as sensory organ developing at a transcriptional and post-translational level. Overall, our results suggest that IG proteins have specialized, prevalent, and unique biological roles and that functional divergence between IGs and MEGs is likely to be the result of specific evolutionary constraints.
Collapse
Affiliation(s)
- Katia Aviña-Padilla
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, Mexico
- Centro de Investigacioìn y de Estudios Avanzados del IPN, Unidad Irapuato, Guanajuato, Mexico
| | | | - Gabriel Emilio Herrera-Oropeza
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, Mexico
- Centre for Developmental Neurobiology, Institute of Psychiatry, Psychology, and Neuroscience, King’s College London, London, United Kingdom
| | | | - Dulce I. Valdivia
- Centro de Investigacioìn y de Estudios Avanzados del IPN, Unidad Irapuato, Guanajuato, Mexico
| | - Erik Díaz-Valenzuela
- Centro de Investigacioìn y de Estudios Avanzados del IPN, Unidad Irapuato, Guanajuato, Mexico
| | - Andrés García-García
- Centro de Física Aplicada y Tecnología Avanzada, Universidad Nacional Autónoma de México, Querétaro, Mexico
| | | | | |
Collapse
|
8
|
Muley VY. Mathematical Programming for Modeling Expression of a Gene Using Gurobi Optimizer to Identify Its Transcriptional Regulators. Methods Mol Biol 2021; 2328:99-113. [PMID: 34251621 DOI: 10.1007/978-1-0716-1534-8_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The cell expresses various genes in specific contexts with respect to internal and external perturbations to invoke appropriate responses. Transcription factors (TFs) orchestrate and define the expression level of genes by binding to their regulatory regions. Dysregulated expression of TFs often leads to aberrant expression changes of their target genes and is responsible for several diseases including cancers. In the last two decades, several studies experimentally identified target genes of several TFs. However, these studies are limited to a small fraction of the total TFs encoded by an organism, and only for those amenable to experimental settings. Experimental limitations lead to many computational techniques having been proposed to predict target genes of TFs. Linear modeling of gene expression is one of the most promising computational approaches, readily applicable to the thousands of expression datasets available in the public domain across diverse phenotypes. Linear models assume that the expression of a gene is the sum of expression of TFs regulating it. In this chapter, I introduce mathematical programming for the linear modeling of gene expression, which has certain advantages over the conventional statistical modeling approaches. It is fast, scalable to genome level and most importantly, allows mixed integer programming to tune the model outcome with prior knowledge on gene regulation.
Collapse
|