1
|
Chaturvedi A, Som A. Inference of Dynamic Growth Regulatory Network in Cancer Using High-Throughput Transcriptomic Data. Methods Mol Biol 2024; 2719:51-77. [PMID: 37803112 DOI: 10.1007/978-1-0716-3461-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Growth is regulated by gene expression variation at different developmental stages of biological processes such as cell differentiation, disease progression, or drug response. In cancer, a stage-specific regulatory model constructed to infer the dynamic expression changes in genes contributing to tissue growth or proliferation is referred as a dynamic growth regulatory network (dGRN). Over the past decade, gene expression data has been widely used for reconstructing dGRN by computing correlations between the differentially expressed genes (DEGs). A wide variety of pipelines are available to construct the GRNs using DEGs and the choice of a particular method or tool depends on the nature of the study. In this protocol, we have outlined a step-by-step guide for the analysis of DEGs using RNA-Seq data, beginning from data acquisition, pre-processing, mapping to reference genome, and construction of a correlation-based co-expression network to further downstream analysis. We have also outlined the steps for the inclusion of publicly available interaction/regulation information into the dGRN followed by relevant topological inferences. This tutorial has been designed in a way that early researchers can refer to for an easy and comprehensive glimpse of methodologies used in the inference of dGRN using transcriptomics data.
Collapse
Affiliation(s)
- Aparna Chaturvedi
- Centre of Bioinformatics, Institute of Interdisciplinary Studies, University of Allahabad, Prayagraj, India
| | - Anup Som
- Centre of Bioinformatics, Institute of Interdisciplinary Studies, University of Allahabad, Prayagraj, India
| |
Collapse
|
2
|
Asghari A, Wall K, Gill M, Vecchio ND, Allahbakhsh F, Wu J, Deng N, Zheng WJ, Wu H, Umetani M, Maroufy V. A novel group of genes that cause endocrine resistance in breast cancer identified by dynamic gene expression analysis. Oncotarget 2022; 13:600-613. [PMID: 35401937 PMCID: PMC8986262 DOI: 10.18632/oncotarget.28225] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/25/2022] [Indexed: 11/25/2022] Open
Abstract
Breast cancer (BC) is the most common type of cancer diagnosed in women. Among female cancer deaths, BC is the second leading cause of death worldwide. For estrogen receptor-positive (ER-positive) breast cancers, endocrine therapy is an effective therapeutic approach. However, in many cases, an ER-positive tumor becomes unresponsive to endocrine therapy, and tumor regrowth occurs after treatment. While some genetic mutations contribute to resistance in some patients, the underlying causes of resistance to endocrine therapy are mostly undetermined. In this study, we utilized a recently developed statistical approach to investigate the dynamic behavior of gene expression during the development of endocrine resistance and identified a novel group of genes whose time course expression significantly change during cell modelling of endocrine resistant BC development. Expression of a subset of these genes was also differentially expressed in microarray analysis of endocrine-resistant and endocrine-sensitive tumor samples. Surprisingly, a subset of those genes was also differentially genes expressed in triple-negative breast cancer (TNBC) as compared with ER-positive BC. The findings suggest shared genetic mechanisms may underlie the development of endocrine resistant BC and TNBC. Our findings identify 34 novel genes for further study as potential therapeutic targets for treatment of endocrine-resistant BC and TNBC.
Collapse
Affiliation(s)
- Arvand Asghari
- Center for Nuclear Receptors and Cell Signaling, Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA.,These authors contributed equally to this work
| | - Katherine Wall
- Department of Biostatistics and Data Science, School of Public Health, UTHealth, Houston, TX 77030, USA.,These authors contributed equally to this work
| | - Michael Gill
- Department of Biostatistics and Data Science, School of Public Health, UTHealth, Houston, TX 77030, USA
| | - Natascha Del Vecchio
- Chicago Center for HIV Elimination, University of Chicago, Chicago, IL 60637, USA
| | - Farnaz Allahbakhsh
- Center for Nuclear Receptors and Cell Signaling, Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA
| | - Jacky Wu
- Center for Nuclear Receptors and Cell Signaling, Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA
| | - Nan Deng
- Clinical Cancer Prevention Department, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - W Jim Zheng
- School of Biomedical Informatics, UTHealth, Houston, TX 77030, USA
| | - Hulin Wu
- Department of Biostatistics and Data Science, School of Public Health, UTHealth, Houston, TX 77030, USA
| | - Michihisa Umetani
- Center for Nuclear Receptors and Cell Signaling, Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA.,Health Research Institute, University of Houston, Houston, TX 77204, USA
| | - Vahed Maroufy
- Department of Biostatistics and Data Science, School of Public Health, UTHealth, Houston, TX 77030, USA
| |
Collapse
|
3
|
Patra BG, Soltanalizadeh B, Deng N, Wu L, Maroufy V, Wu C, Zheng WJ, Roberts K, Wu H, Yaseen A. An informatics research platform to make public gene expression time-course datasets reusable for more scientific discoveries. Database (Oxford) 2020; 2020:baaa074. [PMID: 33247935 PMCID: PMC7698665 DOI: 10.1093/database/baaa074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 07/17/2020] [Accepted: 08/10/2020] [Indexed: 11/13/2022]
Abstract
The exponential growth of genomic/genetic data in the era of Big Data demands new solutions for making these data findable, accessible, interoperable and reusable. In this article, we present a web-based platform named Gene Expression Time-Course Research (GETc) Platform that enables the discovery and visualization of time-course gene expression data and analytical results from the NIH/NCBI-sponsored Gene Expression Omnibus (GEO). The analytical results are produced from an analytic pipeline based on the ordinary differential equation model. Furthermore, in order to extract scientific insights from these results and disseminate the scientific findings, close and efficient collaborations between domain-specific experts from biomedical and scientific fields and data scientists is required. Therefore, GETc provides several recommendation functions and tools to facilitate effective collaborations. GETc platform is a very useful tool for researchers from the biomedical genomics community to present and communicate large numbers of analysis results from GEO. It is generalizable and broadly applicable across different biomedical research areas. GETc is a user-friendly and efficient web-based platform freely accessible at http://genestudy.org/.
Collapse
Affiliation(s)
- Braja Gopal Patra
- Department of Biostatistics and Data Science, School of Public Health,The University of Texas Health
Science Center at Houston, 1200 Pressler Street, Houston, TX 77030, USA
| | - Babak Soltanalizadeh
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health
Science Center at Houston, 1200 Pressler Street, Houston, TX 77030, USA
| | - Nan Deng
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health
Science Center at Houston, 1200 Pressler Street, Houston, TX 77030, USA
| | - Leqing Wu
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health
Science Center at Houston, 1200 Pressler Street, Houston, TX 77030, USA
| | - Vahed Maroufy
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health
Science Center at Houston, 1200 Pressler Street, Houston, TX 77030, USA
| | - Canglin Wu
- TechWave International. Inc., Houston, TX, USA and
| | - W Jim Zheng
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 600, Houston, TX 77030, USA
| | - Kirk Roberts
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 600, Houston, TX 77030, USA
| | - Hulin Wu
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health
Science Center at Houston, 1200 Pressler Street, Houston, TX 77030, USA
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 600, Houston, TX 77030, USA
| | - Ashraf Yaseen
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health
Science Center at Houston, 1200 Pressler Street, Houston, TX 77030, USA
| |
Collapse
|
4
|
Identification of Monotonically Differentially Expressed Genes across Pathologic Stages for Cancers. JOURNAL OF ONCOLOGY 2020; 2020:8458190. [PMID: 33273919 PMCID: PMC7676961 DOI: 10.1155/2020/8458190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 10/17/2020] [Accepted: 10/28/2020] [Indexed: 12/09/2022]
Abstract
Given the fact that cancer is a multistage progression process resulting from genetic sequence mutations, the genes whose expression values increase or decrease monotonically across pathologic stages are potentially involved in tumor progression. This may provide insightful clues about how human cancers advance, thereby facilitating more personalized treatments. By replacing the expression values of genes with their GeneRanks, we propose a procedure capable of identifying monotonically differentially expressed genes (MEGs) as the disease advances. Using three real-world gene expression data that cover three distinct cancer types-colon, esophageal, and lung cancers-the proposed procedure has demonstrated excellent performance in detecting the potential MEGs. To conclude, the proposed procedure can detect MEGs across pathologic stages of cancers very efficiently and is thus highly recommended.
Collapse
|
5
|
Maroufy V, Shah P, Asghari A, Deng N, Le RNU, Ramirez JC, Yaseen A, Zheng WJ, Umetani M, Wu H. Gene expression dynamic analysis reveals co-activation of Sonic Hedgehog and epidermal growth factor followed by dynamic silencing. Oncotarget 2020; 11:1358-1372. [PMID: 32341755 PMCID: PMC7170495 DOI: 10.18632/oncotarget.27547] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 03/14/2020] [Indexed: 12/02/2022] Open
Abstract
Aberrant activation of the Sonic Hedgehog (SHH) gene is observed in various cancers. Previous studies have shown a “cross-talk” effect between the canonical Hedgehog signaling pathway and the Epidermal Growth Factor (EGF) pathway when SHH is active in the presence of EGF. However, the precise mechanism of the cross-talk effect on the entire gene population has not been investigated. Here, we re-analyzed publicly available data to study how SHH and EGF cooperate to affect the dynamic activity of the gene population. We used genome dynamic analysis to explore the expression profiles under different conditions in a human medulloblastoma cell line. Ordinary differential equations, equipped with solid statistical and computational tools, were exploited to extract the information hidden in the dynamic behavior of the gene population. Our results revealed that EGF stimulation plays a dominant role, overshadowing most of the SHH effects. We also identified cross-talk genes that exhibited expression profiles dissimilar to that seen under SHH or EGF stimulation alone. These unique cross-talk patterns were validated in a cell culture model. These cross-talk genes identified here may serve as valuable markers to study or test for EGF co-stimulatory effects in an SHH+ environment. Furthermore, these cross-talk genes may play roles in cancer progression, thus they may be further explored as cancer treatment targets.
Collapse
Affiliation(s)
- Vahed Maroufy
- Department of Biostatistics and Data Science, School of Public Heath, University of Texas Health Science Center at Houston, Houston, TX, USA.,These authors contributed equally to this work
| | - Pankil Shah
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA.,These authors contributed equally to this work
| | - Arvand Asghari
- Center for Nuclear Receptors and Cell Signaling, Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Nan Deng
- Department of Biostatistics and Data Science, School of Public Heath, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Rosemarie N U Le
- Center for Nuclear Receptors and Cell Signaling, Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Juan C Ramirez
- Facultad de Ingeniería de Sistemas, Universidad Antonio Nariño, Bogota, Colombia
| | - Ashraf Yaseen
- Department of Biostatistics and Data Science, School of Public Heath, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - W Jim Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Michihisa Umetani
- Center for Nuclear Receptors and Cell Signaling, Department of Biology and Biochemistry, University of Houston, Houston, TX, USA.,HEALTH Research Institute, University of Houston, Houston, TX, USA
| | - Hulin Wu
- Department of Biostatistics and Data Science, School of Public Heath, University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
6
|
Soltanalizadeh B, Gonzalez Rodriguez E, Maroufy V, Zheng WJ, Wu H. Modelling of hypoxia gene expression for three different cancer cell lines. ACTA ACUST UNITED AC 2020; 13:124-143. [PMID: 32153660 DOI: 10.1504/ijcbdd.2020.10026794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Gene dynamic analysis is essential in identifying target genes involved pathogenesis of various diseases, including cancer. Cancer prognosis is often influenced by hypoxia. We apply a multi-step pipeline to study dynamic gene expressions in response to hypoxia in three cancer cell lines: prostate (DU145), colon (HT29), and breast (MCF7) cancers. We identified 26 distinct temporal expression patterns for prostate cell line, and 29 patterns for colon and breast cell lines. The module-based dynamic networks have been developed for all three cell lines. Our analyses improve the existing results in multiple ways. It exploits the time-dependence nature of gene expression values in identifying the dynamically significant genes; hence, more key significant genes and transcription factors have been identified. Our gene network returns significant information regarding biologically important modules of genes. Furthermore, the network has potential in learning the regulatory path between transcription factors and the downstream genes. In addition, our findings suggest that changes in genes BMP6 and ARSJ expression might have a key role in the time-dependent response to hypoxia in breast cancer.
Collapse
Affiliation(s)
- Babak Soltanalizadeh
- Department of Biostatistics & Data Science, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Erika Gonzalez Rodriguez
- Center for translational Injury Research, Department of Surgery, McGovern Medical School, UT Houston, Houston, TX, USA
| | - Vahed Maroufy
- Department of Biostatistics & Data Science, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - W Jim Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hulin Wu
- Department of Biostatistics & Data Science, University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
7
|
Patra BG, Roberts K, Wu H. A content-based dataset recommendation system for researchers-a case study on Gene Expression Omnibus (GEO) repository. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:1. [PMID: 33002137 PMCID: PMC7659921 DOI: 10.1093/database/baaa064] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 07/19/2020] [Accepted: 07/27/2020] [Indexed: 11/13/2022]
Abstract
It is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. Sharing data with fellow researchers helps in increasing the visibility of the work. On the other hand, there are researchers who are inhibited by the lack of data resources. To overcome this challenge, many repositories and knowledge bases have been established to date to ease data sharing. Further, in the past two decades, there has been an exponential increase in the number of datasets added to these dataset repositories. However, most of these repositories are domain-specific, and none of them can recommend datasets to researchers/users. Naturally, it is challenging for a researcher to keep track of all the relevant repositories for potential use. Thus, a dataset recommender system that recommends datasets to a researcher based on previous publications can enhance their productivity and expedite further research. This work adopts an information retrieval (IR) paradigm for dataset recommendation. We hypothesize that two fundamental differences exist between dataset recommendation and PubMed-style biomedical IR beyond the corpus. First, instead of keywords, the query is the researcher, embodied by his or her publications. Second, to filter the relevant datasets from non-relevant ones, researchers are better represented by a set of interests, as opposed to the entire body of their research. This second approach is implemented using a non-parametric clustering technique. These clusters are used to recommend datasets for each researcher using the cosine similarity between the vector representations of publication clusters and datasets. The maximum normalized discounted cumulative gain at 10 (NDCG@10), precision at 10 (p@10) partial and p@10 strict of 0.89, 0.78 and 0.61, respectively, were obtained using the proposed method after manual evaluation by five researchers. As per the best of our knowledge, this is the first study of its kind on content-based dataset recommendation. We hope that this system will further promote data sharing, offset the researchers' workload in identifying the right dataset and increase the reusability of biomedical datasets. Database URL: http://genestudy.org/recommends/#/.
Collapse
Affiliation(s)
- Braja Gopal Patra
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston/1200 Pressler Street, Suite E-833, Houston, TX, 77030, USA and
| | - Kirk Roberts
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston/7000 Fannin st. Suite 600, Houston, TX, 77030, USA
| | - Hulin Wu
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston/1200 Pressler Street, Suite E-833, Houston, TX, 77030, USA.,School of Biomedical Informatics, The University of Texas Health Science Center at Houston/7000 Fannin st. Suite 600, Houston, TX, 77030, USA
| |
Collapse
|
8
|
Deng N, Ramirez JC, Carey M, Miao H, Arias CA, Rice AP, Wu H. Investigation of temporal and spatial heterogeneities of the immune responses to Bordetella pertussis infection in the lung and spleen of mice via analysis and modeling of dynamic microarray gene expression data. Infect Dis Model 2019; 4:215-226. [PMID: 31236525 PMCID: PMC6579965 DOI: 10.1016/j.idm.2019.06.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 06/06/2019] [Accepted: 06/06/2019] [Indexed: 12/24/2022] Open
Abstract
Bordetella pertussis (B. pertussis) is the causative agent of pertussis, also referenced as whooping cough. Although pertussis has been appropriately controlled by routine immunization of infants, it has experienced a resurgence since the beginning of the 21st century. Given that elucidating the immune response to pertussis is a crucial factor to improve therapeutic and preventive treatments, we re-analyzed a time course microarray dataset of B. pertussis infection by applying a newly developed dynamic data analysis pipeline. Our results indicate that the immune response to B. pertussis is highly dynamic and heterologous across different organs during infection. Th1 and Th17 cells, which are two critical types of T helper cell populations in the immune response to B. pertussis, and follicular T helper cells (TFHs), which are also essential for generating antibodies, might be generated at different time points and distinct locations after infection. This phenomenon may indicate that different lymphoid organs may have their unique functions during infection. These findings provide a better understanding of the basic immunology of bacterial infection, which may provide valuable insights for the improvement of pertussis vaccine design in the future.
Collapse
Affiliation(s)
- Nan Deng
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Juan C Ramirez
- Facultad de Ingeniería de Sistemas, Universidad Antonio Nariño, Bogotá, Colombia
| | - Michelle Carey
- School of Mathematics and Statistics, University College Dublin, Dublin, Ireland
| | - Hongyu Miao
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Cesar A Arias
- Center for Antimicrobial Resistance and Microbial Genomics (CARMiG), UTHealth McGovern Medical School, USA.,Divicon of Infectious Diseases and Department of Microbiology and Molecular Genetics, UTHealth McGovern Medical School, USA.,Center for Infectious Diseases, UTHealth School of Public Health, USA.,Molecular Genetics and Antimicrobial Resistance Unit and International Center for Microbial Genomics, Universidad El Bosque, Bogota, Colombia
| | - Andrew P Rice
- Department of Molecular Virology & Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Hulin Wu
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|