1
|
Zhang Y, Wu L, Wen X, Lv X. Identification and validation of risk score model based on gene set activity as a diagnostic biomarker for endometriosis. Heliyon 2023; 9:e18277. [PMID: 37539146 PMCID: PMC10395533 DOI: 10.1016/j.heliyon.2023.e18277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 06/28/2023] [Accepted: 07/13/2023] [Indexed: 08/05/2023] Open
Abstract
Objective The enigmatic nature of Endometriosis (EMS) pathogenesis necessitates investigating alterations in signaling pathway activity to enhance our comprehension of the disease's characteristics. Methods Three published gene expression profiles (GSE11691, GSE25628, and GSE7305 datasets) were downloaded, and the "combat" algorithm was employed for batch correction, gene expression difference analysis, and pathway enrichment difference analysis. The protein-protein interaction (PPI) network was constructed to identify core genes, and the relative enrichment degree of gene sets was evaluated. The Lasso regression model identified candidate gene sets with diagnostic value, and a risk scoring diagnostic model was constructed for further validation on the GSE86534 and GSE5108 datasets. CIBERSORT was used to assess the composition of immune cells in EMS, and the correlation between EMS diagnostic value gene sets and immune cells was evaluated. Results A total of 568 differentially expressed genes were identified between eutopic and ectopic endometrium, with 10 core genes in the PPI network associated with cell cycle regulation. Inflammation-related pathways, including cytokine-receptor signaling and chemokine signaling pathways, were significantly more active in ectopic endometrium compared to eutopic endometrium. Diagnostic gene sets for EMS, such as homologous recombination, base excision repair, DNA replication, P53 signaling pathway, adherens junction, and SNARE interactions in vesicular transport, were identified. The risk score's area under the curve (AUC) was 0.854, as indicated by the receiver operating characteristic (ROC) curve, and the risk score's diagnostic value was validated by the validation cohort. Immune cell infiltration analysis revealed correlations between the risk score and Macrophages M2, Plasma cells, resting NK cells, activated NK cells, and regulatory T cells. Conclusion The risk scoring diagnostic model, based on pathway activity, demonstrates high diagnostic value and offers novel insights and strategies for the clinical diagnosis and treatment of Endometriosis.
Collapse
Affiliation(s)
- Yi Zhang
- Department of Gynecology, Second Affiliated Hospital of Hunan University of Traditional Chinese Medicine, Changsha 410005, China
| | - Lulu Wu
- Department of Integrated Traditional Chinese and Western Medicine, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430014, China
| | - Xiang Wen
- Department of Pathology, The First People's Hospital of Huizhou City, Huizhou 516000, China
| | - Xiuwei Lv
- Department of Traditional Chinese Medicine, Rocket Force Medical Center of PLA, Beijing 100088, China
| |
Collapse
|
2
|
Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel) 2023; 10:bioengineering10020173. [PMID: 36829667 PMCID: PMC9952758 DOI: 10.3390/bioengineering10020173] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
Collapse
|
3
|
Castro-Mondragon JA, Aure M, Lingjærde O, Langerød A, Martens JWM, Børresen-Dale AL, Kristensen V, Mathelier A. Cis-regulatory mutations associate with transcriptional and post-transcriptional deregulation of gene regulatory programs in cancers. Nucleic Acids Res 2022; 50:12131-12148. [PMID: 36477895 PMCID: PMC9757053 DOI: 10.1093/nar/gkac1143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 11/03/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022] Open
Abstract
Most cancer alterations occur in the noncoding portion of the human genome, where regulatory regions control gene expression. The discovery of noncoding mutations altering the cells' regulatory programs has been limited to few examples with high recurrence or high functional impact. Here, we show that transcription factor binding sites (TFBSs) have similar mutation loads to those in protein-coding exons. By combining cancer somatic mutations in TFBSs and expression data for protein-coding and miRNA genes, we evaluate the combined effects of transcriptional and post-transcriptional alterations on the regulatory programs in cancers. The analysis of seven TCGA cohorts culminates with the identification of protein-coding and miRNA genes linked to mutations at TFBSs that are associated with a cascading trans-effect deregulation on the cells' regulatory programs. Our analyses of cis-regulatory mutations associated with miRNAs recurrently predict 12 mature miRNAs (derived from 7 precursors) associated with the deregulation of their target gene networks. The predictions are enriched for cancer-associated protein-coding and miRNA genes and highlight cis-regulatory mutations associated with the dysregulation of key pathways associated with carcinogenesis. By combining transcriptional and post-transcriptional regulation of gene expression, our method predicts cis-regulatory mutations related to the dysregulation of key gene regulatory networks in cancer patients.
Collapse
Affiliation(s)
- Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Miriam Ragle Aure
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway,Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Ole Christian Lingjærde
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway,Centre for Bioinformatics, Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway,KG Jebsen Centre for B-cell malignancies, Institute for Clinical Medicine, University of Oslo, Ullernchausseen 70, N-0372 Oslo, Norway
| | - Anita Langerød
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
| | - John W M Martens
- Erasmus MC Cancer Institute and Cancer Genomics Netherlands, University Medical Center Rotterdam, Department of Medical Oncology, 3015GD Rotterdam, The Netherlands
| | - Anne-Lise Børresen-Dale
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
| | - Vessela N Kristensen
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway,Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | | |
Collapse
|
4
|
Yang Q, Liu T, Wu T, Lei T, Li Y, Wang X. GGDB: A Grameneae genome alignment database of homologous genes hierarchically related to evolutionary events. PLANT PHYSIOLOGY 2022; 190:340-351. [PMID: 35789395 PMCID: PMC9434254 DOI: 10.1093/plphys/kiac297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
The genomes of Gramineae plants have been preferentially sequenced owing to their economic value. These genomes are often quite complex, for example harboring many duplicated genes, and are the main source of genetic innovation and often the result of recurrent polyploidization. Deciphering these complex genome structures and linking duplicated genes to specific polyploidization events are important for understanding the biology and evolution of plants. However, efforts have been hampered by the complexity of analyzing these genomes. Here, we analyzed 29 well-assembled and up-to-date Gramineae genome sequences by hierarchically relating duplicated genes in collinear regions to specific polyploidization or speciation events. We separated duplicated genes produced by each event, established lists of paralogous and orthologous genes, and ultimately constructed an online database, GGDB (http://www.grassgenome.com/). Homologous gene lists from each plant and between plants can be displayed, searched, and downloaded from the database. Interactive comparison tools are deployed to demonstrate homology among user-selected plants and to draw genome-scale or local alignment figures and gene-based phylogenetic trees corrected by exploiting gene collinearity. Using these tools and figures, users can easily detect structural changes in genomes and explore the effects of paleo-polyploidy on crop genome structure and function. The GGDB will provide a useful platform for improving our understanding of genome changes and functional innovation in Gramineae plants.
Collapse
Affiliation(s)
- Qihang Yang
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Center for Genomics and Bio-computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Tao Liu
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- College of Sciences, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Tong Wu
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Center for Genomics and Bio-computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Tianyu Lei
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Center for Genomics and Bio-computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Yuxian Li
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Center for Genomics and Bio-computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | | |
Collapse
|
5
|
Tan Y, Neto FBL, Neto UB. PALLAS: Penalized mAximum LikeLihood and pArticle Swarms for Inference of Gene Regulatory Networks From Time Series Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1807-1816. [PMID: 33170782 DOI: 10.1109/tcbb.2020.3037090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We present PALLAS, a practical method for gene regulatory network (GRN) inference from time series data, which employs penalized maximum likelihood and particle swarms for optimization. PALLAS is based on the Partially-Observable Boolean Dynamical System (POBDS) model and thus does not require ad-hoc binarization of the data. The penalty in the likelihood is a LASSO regularization term, which encourages the resulting network to be sparse. PALLAS is able to scale to networks of realistic size under no prior knowledge, by virtue of a novel continuous-discrete Fish School Search particle swarm algorithm for efficient simultaneous maximization of the penalized likelihood over the discrete space of networks and the continuous space of observational parameters. The performance of PALLAS is demonstrated by a comprehensive set of experiments using synthetic data generated from real and artificial networks, as well as real time series microarray and RNA-seq data, where it is compared to several other well-known methods for gene regulatory network inference. The results show that PALLAS can infer GRNs more accurately than other methods, while being capable of working directly on gene expression data, without need of ad-hoc binarization. PALLAS is a fully-fledged program, written in python, and available on GitHub (https://github.com/yukuntan92/PALLAS).
Collapse
|
6
|
Analyzing RNA-Seq Gene Expression Data Using Deep Learning Approaches for Cancer Classification. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12041850] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Ribonucleic acid Sequencing (RNA-Seq) analysis is particularly useful for obtaining insights into differentially expressed genes. However, it is challenging because of its high-dimensional data. Such analysis is a tool with which to find underlying patterns in data, e.g., for cancer specific biomarkers. In the past, analyses were performed on RNA-Seq data pertaining to the same cancer class as positive and negative samples, i.e., without samples of other cancer types. To perform multiple cancer type classification and to find differentially expressed genes, data for multiple cancer types need to be analyzed. Several repositories offer RNA-Seq data for various cancer types. In this paper, data from the Mendeley data repository for five cancer types are analyzed. As a first step, RNA-Seq values are converted to 2D images using normalization and zero padding. In the next step, relevant features are extracted and selected using Deep Learning (DL). In the last phase, classification is performed, and eight DL algorithms are used. Results and discussion are based on four different splitting strategies and k-fold cross validation for each DL classifier. Furthermore, a comparative analysis is performed with state of the art techniques discussed in literature. The results demonstrated that classifiers performed best at 70–30 split, and that Convolutional Neural Network (CNN) achieved the best overall results. Hence, CNN is the best DL model for classification among the eight studied DL models, and is easy to implement and simple to understand.
Collapse
|
7
|
Li H, Xiao X, Wu X, Ye L, Ji G. scLINE: A multi-network integration framework based on network embedding for representation of single-cell RNA-seq data. J Biomed Inform 2021; 122:103899. [PMID: 34481921 DOI: 10.1016/j.jbi.2021.103899] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 08/22/2021] [Accepted: 08/24/2021] [Indexed: 01/18/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is fast becoming a powerful technology that revolutionizes biomedical studies related to development, immunology and cancer by providing genome-scale transcriptional profiles at unprecedented throughput and resolution. However, due to the low capture rate and frequent drop-out events in the sequencing process, scRNA-seq data suffer from extremely high sparsity and variability, challenging the data analysis. Here we proposed a novel method called scLINE for learning low dimensional representations of scRNA-seq data. scLINE is based on the network embedding model that jointly considers multiple gene-gene interaction networks, facilitating the incorporation of prior biological knowledge for signal extraction. We comprehensively evaluated scLINE on eight single-cell datasets. Results show that scLINE achieved comparable or higher performance than competing methods, including PCA, t-SNE and Isomap, in terms of internal validation metrics and clustering accuracy. The low dimensional representations learned by scLINE are effective for downstream single-cell analysis, such as visualization, clustering and cell typing. We have implemented scLINE as an easy-to-use R package, which can be incorporated in other existing scRNA-seq analysis pipelines or tools for data preprocessing.
Collapse
Affiliation(s)
- Huoyou Li
- School of Mathematics and Information Engineering, Longyan University, China
| | - Xuesong Xiao
- Department of Automation, Xiamen University, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, China.
| | - Lishan Ye
- Xiamen Health and Medical Big Data Center, XiaMen, Fujian, China.
| | - Guoli Ji
- Department of Automation, Xiamen University, China.
| |
Collapse
|
8
|
Huckstep H, Fearnley LG, Davis MJ. Measuring pathway database coverage of the phosphoproteome. PeerJ 2021; 9:e11298. [PMID: 34113485 PMCID: PMC8162239 DOI: 10.7717/peerj.11298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 03/29/2021] [Indexed: 12/02/2022] Open
Abstract
Protein phosphorylation is one of the best known post-translational mechanisms playing a key role in the regulation of cellular processes. Over 100,000 distinct phosphorylation sites have been discovered through constant improvement of mass spectrometry based phosphoproteomics in the last decade. However, data saturation is occurring and the bottleneck of assigning biologically relevant functionality to phosphosites needs to be addressed. There has been finite success in using data-driven approaches to reveal phosphosite functionality due to a range of limitations. The alternate, more suitable approach is making use of prior knowledge from literature-derived databases. Here, we analysed seven widely used databases to shed light on their suitability to provide functional insights into phosphoproteomics data. We first determined the global coverage of each database at both the protein and phosphosite level. We also determined how consistent each database was in its phosphorylation annotations compared to a global standard. Finally, we looked in detail at the coverage of each database over six experimental datasets. Our analysis highlights the relative strengths and weaknesses of each database, providing a guide in how each can be best used to identify biological mechanisms in phosphoproteomic data.
Collapse
Affiliation(s)
- Hannah Huckstep
- Division of Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, Victoria, Australia
| | - Liam G. Fearnley
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, Victoria, Australia
- Division of Population Health, Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Melissa J. Davis
- Division of Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, Victoria, Australia
- Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
9
|
Gautam US, Mehra S, Kumari P, Alvarez X, Niu T, Tyagi JS, Kaushal D. Mycobacterium tuberculosis sensor kinase DosS modulates the autophagosome in a DosR-independent manner. Commun Biol 2019; 2:349. [PMID: 31552302 PMCID: PMC6754383 DOI: 10.1038/s42003-019-0594-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Accepted: 09/03/2019] [Indexed: 01/03/2023] Open
Abstract
Dormancy is a key characteristic of the intracellular life-cycle of Mtb. The importance of sensor kinase DosS in mycobacteria are attributed in part to our current findings that DosS is required for both persistence and full virulence of Mtb. Here we show that DosS is also required for optimal replication in macrophages and involved in the suppression of TNF-α and autophagy pathways. Silencing of these pathways during the infection process restored full virulence in MtbΔdosS mutant. Notably, a mutant of the response regulator DosR did not exhibit the attenuation in macrophages, suggesting that DosS can function independently of DosR. We identified four DosS targets in Mtb genome; Rv0440, Rv2859c, Rv0994, and Rv0260c. These genes encode functions related to hypoxia adaptation, which are not directly controlled by DosR, e.g., protein recycling and chaperoning, biosynthesis of molybdenum cofactor and nitrogen metabolism. Our results strongly suggest a DosR-independent role for DosS in Mtb.
Collapse
Affiliation(s)
- Uma S. Gautam
- Tulane National Primate Research Center, Covington, LA 70433 USA
- Present Address: Duke Human Vaccine Institute, Duke University School of Medicine, 909 S. LaSalle St., Durham, NC 27710 USA
| | - Smriti Mehra
- Tulane National Primate Research Center, Covington, LA 70433 USA
- Department of Pathobiological Sciences, Louisiana State University School of Veterinary Medicine, Baton Rouge, LA 70803 USA
- Center for Experimental Infectious Diseases Research, Louisiana State University School of Veterinary Medicine, Baton Rouge, LA 70803 USA
| | - Priyanka Kumari
- All India Institute of Medical Sciences, New Delhi, 110029 India
| | - Xavier Alvarez
- Tulane National Primate Research Center, Covington, LA 70433 USA
| | - Tianhua Niu
- Department of Biochemistry, Tulane University School of Medicine, New Orleans, 70112 LA USA
| | - Jaya S. Tyagi
- All India Institute of Medical Sciences, New Delhi, 110029 India
- Centre for Bio-design and Diagnostics, Translational Health Science and Technology Institute Faridabad, Haryana, 121001 India
| | - Deepak Kaushal
- Tulane National Primate Research Center, Covington, LA 70433 USA
- Department of Microbiology and Immunology, Tulane University School of Medicine, New Orleans, 70112 LA USA
| |
Collapse
|
10
|
Pasala C, Chilamakuri CSR, Katari SK, Nalamolu RM, Bitla AR, Umamaheswari A. An in silico study: Novel targets for potential drug and vaccine design against drug resistant H. pylori. Microb Pathog 2018; 122:156-161. [DOI: 10.1016/j.micpath.2018.05.037] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 05/19/2018] [Accepted: 05/22/2018] [Indexed: 02/08/2023]
|
11
|
Jiang S, Zhou H, Liang J, Gerdt C, Wang C, Ke L, Schmidt SCS, Narita Y, Ma Y, Wang S, Colson T, Gewurz B, Li G, Kieff E, Zhao B. The Epstein-Barr Virus Regulome in Lymphoblastoid Cells. Cell Host Microbe 2018; 22:561-573.e4. [PMID: 29024646 DOI: 10.1016/j.chom.2017.09.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 06/21/2017] [Accepted: 08/30/2017] [Indexed: 01/01/2023]
Abstract
Epstein-Barr virus (EBV) transforms B cells to continuously proliferating lymphoblastoid cell lines (LCLs), which represent an experimental model for EBV-associated cancers. EBV nuclear antigens (EBNAs) and LMP1 are EBV transcriptional regulators that are essential for LCL establishment, proliferation, and survival. Starting with the 3D genome organization map of LCL, we constructed a comprehensive EBV regulome encompassing 1,992 viral/cellular genes and enhancers. Approximately 30% of genes essential for LCL growth were linked to EBV enhancers. Deleting EBNA2 sites significantly reduced their target gene expression. Additional EBV super-enhancer (ESE) targets included MCL1, IRF4, and EBF. MYC ESE looping to the transcriptional stat site of MYC was dependent on EBNAs. Deleting MYC ESEs greatly reduced MYC expression and LCL growth. EBNA3A/3C altered CDKN2A/B spatial organization to suppress senescence. EZH2 inhibition decreased the looping at the CDKN2A/B loci and reduced LCL growth. This study provides a comprehensive view of the spatial organization of chromatin during EBV-driven cellular transformation.
Collapse
Affiliation(s)
- Sizun Jiang
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Hufeng Zhou
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Jun Liang
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Catherine Gerdt
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Chong Wang
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Liangru Ke
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Nasopharyngeal Carcinoma, Sun Yat-Sen Cancer Center, Sun Yat-Sen University, Guangzhou 510060, China
| | - Stefanie C S Schmidt
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Yohei Narita
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Yijie Ma
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Shuangqi Wang
- National Key Laboratory of Crop Genetic Improvement, College of Life Sciences and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Tyler Colson
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Benjamin Gewurz
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Guoliang Li
- National Key Laboratory of Crop Genetic Improvement, College of Life Sciences and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Elliott Kieff
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.
| | - Bo Zhao
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
12
|
Li X, Chen W, Chen Y, Zhang X, Gu J, Zhang MQ. Network embedding-based representation learning for single cell RNA-seq data. Nucleic Acids Res 2017; 45:e166. [PMID: 28977434 PMCID: PMC5737094 DOI: 10.1093/nar/gkx750] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/17/2017] [Indexed: 11/13/2022] Open
Abstract
Single cell RNA-seq (scRNA-seq) techniques can reveal valuable insights of cell-to-cell heterogeneities. Projection of high-dimensional data into a low-dimensional subspace is a powerful strategy in general for mining such big data. However, scRNA-seq suffers from higher noise and lower coverage than traditional bulk RNA-seq, hence bringing in new computational difficulties. One major challenge is how to deal with the frequent drop-out events. The events, usually caused by the stochastic burst effect in gene transcription and the technical failure of RNA transcript capture, often render traditional dimension reduction methods work inefficiently. To overcome this problem, we have developed a novel Single Cell Representation Learning (SCRL) method based on network embedding. This method can efficiently implement data-driven non-linear projection and incorporate prior biological knowledge (such as pathway information) to learn more meaningful low-dimensional representations for both cells and genes. Benchmark results show that SCRL outperforms other dimensional reduction methods on several recent scRNA-seq datasets.
Collapse
Affiliation(s)
- Xiangyu Li
- MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division/Center for Synthetic & System Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Weizheng Chen
- Institute of Network Computing and Information System, Department of Computer Science, Peking University, Beijing 100871, China
| | - Yang Chen
- MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division/Center for Synthetic & System Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division/Center for Synthetic & System Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Jin Gu
- MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division/Center for Synthetic & System Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Michael Q Zhang
- MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division/Center for Synthetic & System Biology, Department of Automation, Tsinghua University, Beijing 100084, China.,Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, 800 West Campbell Road, RL11 Richardson, TX 75080-3021, USA
| |
Collapse
|
13
|
Hüls A, Ickstadt K, Schikowski T, Krämer U. Detection of gene-environment interactions in the presence of linkage disequilibrium and noise by using genetic risk scores with internal weights from elastic net regression. BMC Genet 2017; 18:55. [PMID: 28606108 PMCID: PMC5469185 DOI: 10.1186/s12863-017-0519-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 05/23/2017] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND For the analysis of gene-environment (GxE) interactions commonly single nucleotide polymorphisms (SNPs) are used to characterize genetic susceptibility, an approach that mostly lacks power and has poor reproducibility. One promising approach to overcome this problem might be the use of weighted genetic risk scores (GRS), which are defined as weighted sums of risk alleles of gene variants. The gold-standard is to use external weights from published meta-analyses. METHODS In this study, we used internal weights from the marginal genetic effects of the SNPs estimated by a multivariate elastic net regression and thereby provided a method that can be used if there are no external weights available. We conducted a simulation study for the detection of GxE interactions and compared power and type I error of single SNPs analyses with Bonferroni correction and corresponding analysis with unweighted and our weighted GRS approach in scenarios with six risk SNPs and an increasing number of highly correlated (up to 210) and noise SNPs (up to 840). RESULTS Applying weighted GRS increased the power enormously in comparison to the common single SNPs approach (e.g. 94.2% vs. 35.4%, respectively, to detect a weak interaction with an OR ≈ 1.04 for six uncorrelated risk SNPs and n = 700 with a well-controlled type I error). Furthermore, weighted GRS outperformed the unweighted GRS, in particular in the presence of SNPs without any effect on the phenotype (e.g. 90.1% vs. 43.9%, respectively, when 20 noise SNPs were added to the six risk SNPs). This outperforming of the weighted GRS was confirmed in a real data application on lung inflammation in the SALIA cohort (n = 402). However, in scenarios with a high number of noise SNPs (>200 vs. 6 risk SNPs), larger sample sizes are needed to avoid an increased type I error, whereas a high number of correlated SNPs can be handled even in small samples (e.g. n = 400). CONCLUSION In conclusion, weighted GRS with weights from the marginal genetic effects of the SNPs estimated by a multivariate elastic net regression were shown to be a powerful tool to detect gene-environment interactions in scenarios of high Linkage disequilibrium and noise.
Collapse
Affiliation(s)
- Anke Hüls
- IUF-Leibniz Research Institute for Environmental Medicine, Auf'm Hennekamp 50, 40225, Düsseldorf, Germany.
- Faculty of Statistics, TU Dortmund University, Dortmund, Germany.
| | - Katja Ickstadt
- Faculty of Statistics, TU Dortmund University, Dortmund, Germany
| | - Tamara Schikowski
- IUF-Leibniz Research Institute for Environmental Medicine, Auf'm Hennekamp 50, 40225, Düsseldorf, Germany
| | - Ursula Krämer
- IUF-Leibniz Research Institute for Environmental Medicine, Auf'm Hennekamp 50, 40225, Düsseldorf, Germany
| |
Collapse
|
14
|
Rezabakhsh A, Cheraghi O, Nourazarian A, Hassanpour M, Kazemi M, Ghaderi S, Faraji E, Rahbarghazi R, Avci ÇB, Bagca BG, Garjani A. Type 2 Diabetes Inhibited Human Mesenchymal Stem Cells Angiogenic Response by Over-Activity of the Autophagic Pathway. J Cell Biochem 2017; 118:1518-1530. [DOI: 10.1002/jcb.25814] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2016] [Accepted: 11/28/2016] [Indexed: 12/18/2022]
Affiliation(s)
- Aysa Rezabakhsh
- Faculty of Pharmacy; Department of Pharmacology and Toxicology; Tabriz University of Medical Sciences; Tabriz Iran
- Stem Cell Research Center; Tabriz University of Medical Sciences; Tabriz Iran
- Student Research Committee of Tabriz University of Medical Sciences; Tabriz; Iran
| | - Omid Cheraghi
- Faculty of Natural Sciences; Department of Biology; University of Tabriz; Tabriz Iran
| | - Alireza Nourazarian
- Faculty of Medicine; Department of Biochemistry and Clinical Laboratories; Tabriz University of Medical Sciences; Tabriz Iran
| | - Mehdi Hassanpour
- Faculty of Medicine; Department of Biochemistry and Clinical Laboratories; Tabriz University of Medical Sciences; Tabriz Iran
| | - Masoumeh Kazemi
- Stem Cell Research Center; Tabriz University of Medical Sciences; Tabriz Iran
| | - Shahrooz Ghaderi
- Faculty of Advanced Medical Sciences; Department of Molecular Medicine; Tabriz University of Medical Sciences; Tabriz Iran
| | - Esmaeil Faraji
- Faculty of Medicine; Department of Internal Medicine; Tabriz University of Medical Sciences; Tabriz Iran
| | - Reza Rahbarghazi
- Stem Cell Research Center; Tabriz University of Medical Sciences; Tabriz Iran
- Faculty of Advanced Medical Sciences; Department of Applied Cell Sciences; Tabriz University of Medical Sciences; Tabriz Iran
| | - Çığır Biray Avci
- Faculty of Medicine; Department of Medical Biology; Ege University; Izmir Turkey
| | - Bakiye Goker Bagca
- Faculty of Medicine; Department of Medical Biology; Ege University; Izmir Turkey
| | - Alireza Garjani
- Faculty of Pharmacy; Department of Pharmacology and Toxicology; Tabriz University of Medical Sciences; Tabriz Iran
- Stem Cell Research Center; Tabriz University of Medical Sciences; Tabriz Iran
| |
Collapse
|
15
|
Goh WWB, Wong L. Integrating Networks and Proteomics: Moving Forward. Trends Biotechnol 2016; 34:951-959. [DOI: 10.1016/j.tibtech.2016.05.015] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Revised: 05/23/2016] [Accepted: 05/24/2016] [Indexed: 11/28/2022]
|
16
|
Kruppa J, Kramer F, Beißbarth T, Jung K. A simulation framework for correlated count data of features subsets in high-throughput sequencing or proteomics experiments. Stat Appl Genet Mol Biol 2016; 15:401-414. [PMID: 27655448 DOI: 10.1515/sagmb-2015-0082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
As part of the data processing of high-throughput-sequencing experiments count data are produced representing the amount of reads that map to specific genomic regions. Count data also arise in mass spectrometric experiments for the detection of protein-protein interactions. For evaluating new computational methods for the analysis of sequencing count data or spectral count data from proteomics experiments artificial count data is thus required. Although, some methods for the generation of artificial sequencing count data have been proposed, all of them simulate single sequencing runs, omitting thus the correlation structure between the individual genomic features, or they are limited to specific structures. We propose to draw correlated data from the multivariate normal distribution and round these continuous data in order to obtain discrete counts. In our approach, the required distribution parameters can either be constructed in different ways or estimated from real count data. Because rounding affects the correlation structure we evaluate the use of shrinkage estimators that have already been used in the context of artificial expression data from DNA microarrays. Our approach turned out to be useful for the simulation of counts for defined subsets of features such as individual pathways or GO categories.
Collapse
|
17
|
Goh WWB, Wong L. Advancing Clinical Proteomics via Analysis Based on Biological Complexes: A Tale of Five Paradigms. J Proteome Res 2016; 15:3167-79. [DOI: 10.1021/acs.jproteome.6b00402] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Wilson Wen Bin Goh
- School
of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
| | - Limsoon Wong
- Department
of Computer Science, National University of Singapore, 13 Computing
Drive, Singapore 117417
- Department
of Pathology, National University of Singapore, 5 Lower Kent Ridge Road, Singapore 117417
| |
Collapse
|
18
|
Stable Gene Regulatory Network Modeling From Steady-State Data. Bioengineering (Basel) 2016; 3:bioengineering3020012. [PMID: 28952574 PMCID: PMC5597136 DOI: 10.3390/bioengineering3020012] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Revised: 03/09/2016] [Accepted: 04/06/2016] [Indexed: 12/19/2022] Open
Abstract
Gene regulatory networks represent an abstract mapping of gene regulations in living cells. They aim to capture dependencies among molecular entities such as transcription factors, proteins and metabolites. In most applications, the regulatory network structure is unknown, and has to be reverse engineered from experimental data consisting of expression levels of the genes usually measured as messenger RNA concentrations in microarray experiments. Steady-state gene expression data are obtained from measurements of the variations in expression activity following the application of small perturbations to equilibrium states in genetic perturbation experiments. In this paper, the least absolute shrinkage and selection operator-vector autoregressive (LASSO-VAR) originally proposed for the analysis of economic time series data is adapted to include a stability constraint for the recovery of a sparse and stable regulatory network that describes data obtained from noisy perturbation experiments. The approach is applied to real experimental data obtained for the SOS pathway in Escherichia coli and the cell cycle pathway for yeast Saccharomyces cerevisiae. Significant features of this method are the ability to recover networks without inputting prior knowledge of the network topology, and the ability to be efficiently applied to large scale networks due to the convex nature of the method.
Collapse
|
19
|
Wu X, Wu G, Yao X, Hou G, Jiang F. The clinicopathological significance and ethnic difference of FHIT hypermethylation in non-small-cell lung carcinoma: a meta-analysis and literature review. DRUG DESIGN DEVELOPMENT AND THERAPY 2016; 10:699-709. [PMID: 26929601 PMCID: PMC4760666 DOI: 10.2147/dddt.s85253] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Emerging evidence indicates that FHIT is a candidate tumor suppressor in many types of tumors including non-small-cell lung carcinoma (NSCLC). However, the prognostic value and correlation between FHIT hypermethylation and clinicopathological characteristics of NSCLC remains unclear. In this report, we performed a meta-analysis to evaluate the effects of FHIT hypermethylation on the incidence of NSCLC and clinicopathological characteristics of human NSCLC patients. Final analysis of 1,801 NSCLC patients from 18 eligible studies was performed. FHIT hypermethylation was found to be significantly higher in NSCLC than in normal lung tissue. The pooled odds ratio (OR) from ten studies included 819 NSCLC and 792 normal lung tissues (OR =7.51, 95% confidence interval [CI] =2.98-18.91, P<0.0001). Subgroup analysis based on ethnicity implied that FHIT hypermethylation level was higher in NSCLC tissues than in normal tissues in both Caucasians (P=0.02) and Asians (P<0.0001), indicating that the difference in Asians was much more significant. FHIT hypermethylation was also correlated with sex status, smoking status, as well as pathological types. In addition, patients with FHIT hypermethylation had a lower survival rate than those without (hazard ratio =1.73, 95% CI =1.10-2.71, P=0.02). The results of this meta-analysis suggest that FHIT hypermethylation is associated with an increased risk and poor survival in NSCLC patients. FHIT hypermethylation, which induces the inactivation of FHIT gene, plays an important role in the carcinogenesis and clinical outcome and may serve as a potential diagnostic marker and drug target of NSCLC.
Collapse
Affiliation(s)
- Xiaoyu Wu
- Department of Surgical Oncology, Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Nanjing, People's Republic of China
| | - Guannan Wu
- Department of Surgical Oncology, Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Nanjing, People's Republic of China
| | - Xuequan Yao
- Department of Surgical Oncology, Affiliated Hospital of Nanjing University of Traditional Chinese Medicine, Nanjing, People's Republic of China
| | - Gang Hou
- Department of Respiratory Medicine, The First Hospital of China Medical University, Shenyang, People's Republic of China
| | - Feng Jiang
- Department of Thoracic Surgery, Jiangsu Cancer Hospital, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, People's Republic of China
| |
Collapse
|
20
|
Liu Q, Song R, Li J. Inference of gene interaction networks using conserved subsequential patterns from multiple time course gene expression datasets. BMC Genomics 2015; 16 Suppl 12:S4. [PMID: 26681650 PMCID: PMC4682423 DOI: 10.1186/1471-2164-16-s12-s4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Motivation Deciphering gene interaction networks (GINs) from time-course gene expression (TCGx) data is highly valuable to understand gene behaviors (e.g., activation, inhibition, time-lagged causality) at the system level. Existing methods usually use a global or local proximity measure to infer GINs from a single dataset. As the noise contained in a single data set is hardly self-resolved, the results are sometimes not reliable. Also, these proximity measurements cannot handle the co-existence of the various in vivo positive, negative and time-lagged gene interactions. Methods and results We propose to infer reliable GINs from multiple TCGx datasets using a novel conserved subsequential pattern of gene expression. A subsequential pattern is a maximal subset of genes sharing positive, negative or time-lagged correlations of one expression template on their own subsets of time points. Based on these patterns, a GIN can be built from each of the datasets. It is assumed that reliable gene interactions would be detected repeatedly. We thus use conserved gene pairs from the individual GINs of the multiple TCGx datasets to construct a reliable GIN for a species. We apply our method on six TCGx datasets related to yeast cell cycle, and validate the reliable GINs using protein interaction networks, biopathways and transcription factor-gene regulations. We also compare the reliable GINs with those GINs reconstructed by a global proximity measure Pearson correlation coefficient method from single datasets. It has been demonstrated that our reliable GINs achieve much better prediction performance especially with much higher precision. The functional enrichment analysis also suggests that gene sets in a reliable GIN are more functionally significant. Our method is especially useful to decipher GINs from multiple TCGx datasets related to less studied organisms where little knowledge is available except gene expression data.
Collapse
|
21
|
Takahashi H, Kaniwa N, Saito Y, Sai K, Hamaguchi T, Shirao K, Shimada Y, Matsumura Y, Ohtsu A, Yoshino T, Doi T, Takahashi A, Odaka Y, Okuyama M, Sawada JI, Sakamoto H, Yoshida T. Construction of possible integrated predictive index based on EGFR and ANXA3 polymorphisms for chemotherapy response in fluoropyrimidine-treated Japanese gastric cancer patients using a bioinformatic method. BMC Cancer 2015; 15:718. [PMID: 26475168 PMCID: PMC4609065 DOI: 10.1186/s12885-015-1721-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 10/08/2015] [Indexed: 12/23/2022] Open
Abstract
Background Variability in drug response between individual patients is a serious concern in medicine. To identify single-nucleotide polymorphisms (SNPs) related to drug response variability, many genome-wide association studies have been conducted. Methods We previously applied a knowledge-based bioinformatic approach to a pharmacogenomics study in which 119 fluoropyrimidine-treated gastric cancer patients were genotyped at 109,365 SNPs using the Illumina Human-1 BeadChip. We identified the SNP rs2293347 in the human epidermal growth factor receptor (EGFR) gene as a novel genetic factor related to chemotherapeutic response. In the present study, we reanalyzed these hypothesis-free genomic data using extended knowledge. Results We identified rs2867461 in annexin A3 (ANXA3) gene as another candidate. Using logistic regression, we confirmed that the performance of the rs2867461 + rs2293347 model was superior to those of the single factor models. Furthermore, we propose a novel integrated predictive index (iEA) based on these two polymorphisms in EGFR and ANXA3. The p value for iEA was 1.47 × 10−8 by Fisher’s exact test. Recent studies showed that the mutations in EGFR is associated with high expression of dihydropyrimidine dehydrogenase, which is an inactivating and rate-limiting enzyme for fluoropyrimidine, and suggested that the combination of chemotherapy with fluoropyrimidine and EGFR-targeting agents is effective against EGFR-overexpressing gastric tumors, while ANXA3 overexpression confers resistance to tyrosine kinase inhibitors targeting the EGFR pathway. Conclusions These results suggest that the iEA index or a combination of polymorphisms in EGFR and ANXA3 may serve as predictive factors of drug response, and therefore could be useful for optimal selection of chemotherapy regimens. Electronic supplementary material The online version of this article (doi:10.1186/s12885-015-1721-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hiro Takahashi
- Graduate School of Horticulture, Chiba University, 648 Matsudo, Matsudo, Chiba, 271-8510, Japan. .,Plant Biology Research Center, Chubu University, Matsumoto-cho 1200, Kasugai, Aichi, 487-8501, Japan. .,Division of Genetics, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| | - Nahoko Kaniwa
- Division of Medicinal Safety Science, National Institute of Health Sciences, 1-18-1 Kamiyoga, Setagaya-ku, Tokyo, 158-8501, Japan.
| | - Yoshiro Saito
- Division of Medicinal Safety Science, National Institute of Health Sciences, 1-18-1 Kamiyoga, Setagaya-ku, Tokyo, 158-8501, Japan.
| | - Kimie Sai
- Division of Medicinal Safety Science, National Institute of Health Sciences, 1-18-1 Kamiyoga, Setagaya-ku, Tokyo, 158-8501, Japan.
| | - Tetsuya Hamaguchi
- Gastrointestinal Medical Oncology Division, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| | - Kuniaki Shirao
- Gastrointestinal Medical Oncology Division, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| | - Yasuhiro Shimada
- Gastrointestinal Medical Oncology Division, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| | - Yasuhiro Matsumura
- Division of Developmental Therapeutics, Research Center for Innovative Oncology, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan.
| | - Atsushi Ohtsu
- Department of Gastrointestinal Oncology, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan.
| | - Takayuki Yoshino
- Department of Gastrointestinal Oncology, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan.
| | - Toshihiko Doi
- Department of Gastrointestinal Oncology, National Cancer Center Hospital East, 6-5-1, Kashiwanoha, Kashiwa, Chiba, 277-8577, Japan.
| | - Anna Takahashi
- Plant Biology Research Center, Chubu University, Matsumoto-cho 1200, Kasugai, Aichi, 487-8501, Japan.
| | - Yoko Odaka
- Division of Genetics, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| | - Misuzu Okuyama
- Division of Genetics, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| | - Jun-Ichi Sawada
- Division of Functional Biochemistry and Genomics, National Institute of Health Sciences, 1-18-1 Kamiyoga, Setagaya-ku, Tokyo, 158-8501, Japan. .,Present address: Pharmaceutical and Medical Devices Agency, Shinkasumigaseki-building, 3-3-2 Kasumigaseki, Chiyoda-ku, Tokyo, 100-0013, Japan.
| | - Hiromi Sakamoto
- Division of Genetics, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| | - Teruhiko Yoshida
- Division of Genetics, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| |
Collapse
|
22
|
Systematic analysis of somatic mutations impacting gene expression in 12 tumour types. Nat Commun 2015; 6:8554. [PMID: 26436532 PMCID: PMC4600750 DOI: 10.1038/ncomms9554] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 09/04/2015] [Indexed: 12/27/2022] Open
Abstract
We present a novel hierarchical Bayes statistical model, xseq, to systematically quantify the impact of somatic mutations on expression profiles. We establish the theoretical framework and robust inference characteristics of the method using computational benchmarking. We then use xseq to analyse thousands of tumour data sets available through The Cancer Genome Atlas, to systematically quantify somatic mutations impacting expression profiles. We identify 30 novel cis-effect tumour suppressor gene candidates, enriched in loss-of-function mutations and biallelic inactivation. Analysis of trans-effects of mutations and copy number alterations with xseq identifies mutations in 150 genes impacting expression networks, with 89 novel predictions. We reveal two important novel characteristics of mutation impact on expression: (1) patients harbouring known driver mutations exhibit different downstream gene expression consequences; (2) expression patterns for some mutations are stable across tumour types. These results have critical implications for identification and interpretation of mutations with consequent impact on transcription in cancer. Assessing functional impact of mutations in cancer on gene expression can improve our understanding of cancer biology and may identify potential therapeutic targets. Here, Ding et al. describe a novel statistical model named xseq for a systematic survey of how mutations impact transcriptome landscapes across 12 different tumour types.
Collapse
|
23
|
Evasion of affinity-based selection in germinal centers by Epstein-Barr virus LMP2A. Proc Natl Acad Sci U S A 2015; 112:11612-7. [PMID: 26305967 DOI: 10.1073/pnas.1514484112] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Epstein-Barr virus (EBV) infects germinal center (GC) B cells and establishes persistent infection in memory B cells. EBV-infected B cells can cause B-cell malignancies in humans with T- or natural killer-cell deficiency. We now find that EBV-encoded latent membrane protein 2A (LMP2A) mimics B-cell antigen receptor (BCR) signaling in murine GC B cells, causing altered humoral immune responses and autoimmune diseases. Investigation of the impact of LMP2A on B-cell differentiation in mice that conditionally express LMP2A in GC B cells or all B-lineage cells found LMP2A expression enhanced not only BCR signals but also plasma cell differentiation in vitro and in vivo. Conditional LMP2A expression in GC B cells resulted in preferential selection of low-affinity antibody-producing B cells despite apparently normal GC formation. GC B-cell-specific LMP2A expression led to systemic lupus erythematosus-like autoimmune phenotypes in an age-dependent manner. Epigenetic profiling of LMP2A B cells found increased H3K27ac and H3K4me1 signals at the zinc finger and bric-a-brac, tramtrack domain-containing protein 20 locus. We conclude that LMP2A reduces the stringency of GC B-cell selection and may contribute to persistent EBV infection and pathogenesis by providing GC B cells with excessive prosurvival effects.
Collapse
|
24
|
Gautam US, Mehra S, Kaushal D. In-Vivo Gene Signatures of Mycobacterium tuberculosis in C3HeB/FeJ Mice. PLoS One 2015; 10:e0135208. [PMID: 26270051 PMCID: PMC4535907 DOI: 10.1371/journal.pone.0135208] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 07/19/2015] [Indexed: 11/28/2022] Open
Abstract
Despite considerable progress in understanding the pathogenesis of Mycobacterium tuberculosis (Mtb), development of new therapeutics and vaccines against it has proven difficult. This is at least in part due to the use of less than optimal models of in-vivo Mtb infection, which has precluded a study of the physiology of the pathogen in niches where it actually persists. C3HeB/FeJ (Kramnik) mice develop human-like lesions when experimentally infected with Mtb and thus make available, a faithful and highly tractable system to study the physiology of the pathogen in-vivo. We compared the transcriptomics of Mtb and various mutants in the DosR (DevR) regulon derived from Kramnik mouse granulomas to those cultured in-vitro. We recently showed that mutant ΔdosS is attenuated in C3HeB/FeJ mice. Aerosol exposure of mice with the mutant mycobacteria resulted in a substantially different and a relatively weaker transcriptional response (< = 20 genes were induced) for the functional category ‘Information Pathways’ in Mtb:ΔdosR; ‘Lipid Metabolism’ in Mtb:ΔdosT; ‘Virulence, Detoxification, Adaptation’ in both Mtb:ΔdosR and Mtb:ΔdosT; and ‘PE/PPE’ family in all mutant strains compare to wild-type Mtb H37Rv, suggesting that the inability to induce DosR functions to different levels can modulate the interaction of the pathogen with the host. The Mtb genes expressed during growth in C3HeB/FeJ mice appear to reflect adaptation to differential nutrient utilization for survival in mouse lungs. The genes such as glnB, Rv0744c, Rv3281, sdhD/B, mce4A, dctA etc. downregulated in mutant ΔdosS indicate their requirement for bacterial growth and flow of carbon/energy source from host cells. We conclude that genes expressed in Mtb during in-vivo chronic phase of infection in Kramnik mice mainly contribute to growth, cell wall processes, lipid metabolism, and virulence.
Collapse
Affiliation(s)
- Uma Shankar Gautam
- Tulane National Primate Research Center, Covington, Louisiana, United States of America
- * E-mail: (DK); (USG)
| | - Smriti Mehra
- Tulane National Primate Research Center, Covington, Louisiana, United States of America
- Louisiana State University School of Veterinary Medicine Department of Pathobiological Sciences, Baton Rouge, Louisiana, United States of America
| | - Deepak Kaushal
- Tulane National Primate Research Center, Covington, Louisiana, United States of America
- Microbiology and Immunology, Tulane University School of Medicine, New Orleans, Louisiana, United States of America
- * E-mail: (DK); (USG)
| |
Collapse
|
25
|
Dai H, Charnigo R. Compound hierarchical correlated beta mixture with an application to cluster mouse transcription factor DNA binding data. Biostatistics 2015; 16:641-54. [PMID: 25964663 DOI: 10.1093/biostatistics/kxv016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 04/10/2015] [Indexed: 11/12/2022] Open
Abstract
Modeling correlation structures is a challenge in bioinformatics, especially when dealing with high throughput genomic data. A compound hierarchical correlated beta mixture (CBM) with an exchangeable correlation structure is proposed to cluster genetic vectors into mixture components. The correlation coefficient, [Formula: see text], is homogenous within a mixture component and heterogeneous between mixture components. A random CBM with [Formula: see text] brings more flexibility in explaining correlation variations among genetic variables. Expectation-Maximization (EM) algorithm and Stochastic Expectation-Maximization (SEM) algorithm are used to estimate parameters of CBM. The number of mixture components can be determined using model selection criteria such as AIC, BIC and ICL-BIC. Extensive simulation studies were conducted to compare EM, SEM and model selection criteria. Simulation results suggest that CBM outperforms the traditional beta mixture model with lower estimation bias and higher classification accuracy. The proposed method is applied to cluster transcription factor-DNA binding probability in mouse genome data generated by Lahdesmaki and others (2008, Probabilistic inference of transcription factor binding from multiple data sources. PLoS One, 3: , e1820). The results reveal distinct clusters of transcription factors when binding to promoter regions of genes in JAK-STAT, MAPK and other two pathways.
Collapse
Affiliation(s)
- Hongying Dai
- Research Development and Clinical Investigation, Children's Mercy Hospital, Kansas City, MO 64108, USA and Department of Biomedical & Health Informatics, University of Missouri-Kansas City, Kansas City, MO 64110, USA
| | - Richard Charnigo
- Department of Statistics, University of Kentucky, Lexington, KY 40506, USA
| |
Collapse
|
26
|
Zhou H, Schmidt SCS, Jiang S, Willox B, Bernhardt K, Liang J, Johannsen EC, Kharchenko P, Gewurz BE, Kieff E, Zhao B. Epstein-Barr virus oncoprotein super-enhancers control B cell growth. Cell Host Microbe 2015; 17:205-16. [PMID: 25639793 DOI: 10.1016/j.chom.2014.12.013] [Citation(s) in RCA: 123] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Revised: 10/16/2014] [Accepted: 11/15/2014] [Indexed: 01/11/2023]
Abstract
Super-enhancers are clusters of gene-regulatory sites bound by multiple transcription factors that govern cell transcription, development, phenotype, and oncogenesis. By examining Epstein-Barr virus (EBV)-transformed lymphoblastoid cell lines (LCLs), we identified four EBV oncoproteins and five EBV-activated NF-κB subunits co-occupying ∼1,800 enhancer sites. Of these, 187 had markedly higher and broader histone H3K27ac signals, characteristic of super-enhancers, and were designated "EBV super-enhancers." EBV super-enhancer-associated genes included the MYC and BCL2 oncogenes, which enable LCL proliferation and survival. EBV super-enhancers were enriched for B cell transcription factor motifs and had high co-occupancy of STAT5 and NFAT transcription factors (TFs). EBV super-enhancer-associated genes were more highly expressed than other LCL genes. Disrupting EBV super-enhancers by the bromodomain inhibitor JQ1 or conditionally inactivating an EBV oncoprotein or NF-κB decreased MYC or BCL2 expression and arrested LCL growth. These findings provide insight into mechanisms of EBV-induced lymphoproliferation and identify potential therapeutic interventions.
Collapse
Affiliation(s)
- Hufeng Zhou
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Stefanie C S Schmidt
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Sizun Jiang
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Bradford Willox
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Katharina Bernhardt
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Jun Liang
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Eric C Johannsen
- Department of Medicine and McArdle Laboratory for Cancer Research, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Peter Kharchenko
- Center for Biomedical Informatics, Harvard Medical School and Division of Hematology, Children's Hospital, Boston, MA 02115, USA
| | - Benjamin E Gewurz
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Elliott Kieff
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.
| | - Bo Zhao
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
27
|
Yu G, Zhu H, Domeniconi C. Predicting protein functions using incomplete hierarchical labels. BMC Bioinformatics 2015; 16:1. [PMID: 25591917 PMCID: PMC4384381 DOI: 10.1186/s12859-014-0430-y] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 12/11/2014] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Protein function prediction is to assign biological or biochemical functions to proteins, and it is a challenging computational problem characterized by several factors: (1) the number of function labels (annotations) is large; (2) a protein may be associated with multiple labels; (3) the function labels are structured in a hierarchy; and (4) the labels are incomplete. Current predictive models often assume that the labels of the labeled proteins are complete, i.e. no label is missing. But in real scenarios, we may be aware of only some hierarchical labels of a protein, and we may not know whether additional ones are actually present. The scenario of incomplete hierarchical labels, a challenging and practical problem, is seldom studied in protein function prediction. RESULTS In this paper, we propose an algorithm to Predict protein functions using Incomplete hierarchical LabeLs (PILL in short). PILL takes into account the hierarchical and the flat taxonomy similarity between function labels, and defines a Combined Similarity (ComSim) to measure the correlation between labels. PILL estimates the missing labels for a protein based on ComSim and the known labels of the protein, and uses a regularization to exploit the interactions between proteins for function prediction. PILL is shown to outperform other related techniques in replenishing the missing labels and in predicting the functions of completely unlabeled proteins on publicly available PPI datasets annotated with MIPS Functional Catalogue and Gene Ontology labels. CONCLUSION The empirical study shows that it is important to consider the incomplete annotation for protein function prediction. The proposed method (PILL) can serve as a valuable tool for protein function prediction using incomplete labels. The Matlab code of PILL is available upon request.
Collapse
Affiliation(s)
- Guoxian Yu
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China.
- College of Computer and Information Sciences, Southwest University, Chongqing, China.
| | - Hailong Zhu
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China.
| | | |
Collapse
|
28
|
Abelin ACT, Marinov GK, Williams BA, McCue K, Wold BJ. A ratiometric-based measure of gene co-expression. BMC Bioinformatics 2014; 15:331. [PMID: 25411051 PMCID: PMC4289233 DOI: 10.1186/1471-2105-15-331] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Accepted: 07/18/2014] [Indexed: 12/02/2022] Open
Abstract
Background Gene co-expression analysis has previously been based on measures that include correlation coefficients and mutual information, as well as newcomers such as MIC. These measures depend primarily on the degree of association between the RNA levels of two genes and to a lesser extent on their variability. They focus on the similarity of expression value trajectories that change in like manner across samples. However there are relationships of biological interest for which these classical measures are expected to be insensitive. These include genes whose expression levels are ratiometrically stable and genes whose variance is tightly constrained. Large-scale studies of relatively homogeneous samples, including single cell RNA-seq, are experimental settings in which such relationships might be especially pertinent. Results We develop and implement a ratiometric approach for detecting gene associations (abbreviated RA). It is based on the coefficient of variation of the measured expression ratio of each pair of genes. We apply it to a collection of lymphoblastoid RNA-seq data from the 1000 Genomes Project Consortium, a typical sample set with high overall homogeneity. RA is a selective method, reporting in this case ~1/4 of all possible gene pairs, yet these relationships include a distilled picture of biological relationships previously found by other methods. In addition, RA reveals expression relationships that are not detected by traditional correlation and mutual information methods. We also analyze data from individual lymphoblastoid cells and show that desirable properties of the RA method extend to single-cell RNA-seq. Conclusion We show that our ratiometric method identifies biologically significant relationships that are often missed or low-ranked by conventional association-based methods when applied to a relatively homogenous dataset. The results open new questions about the regulatory mechanisms that produce strong RA relationships. RA is scalable and potentially well suited for the analysis of thousands of bulk-RNA or single-cell transcriptomes. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-331) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | - Barbara J Wold
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Blvd, Pasadena, CA 91125, USA.
| |
Collapse
|
29
|
Design pattern mining using distributed learning automata and DNA sequence alignment. PLoS One 2014; 9:e106313. [PMID: 25243670 PMCID: PMC4171372 DOI: 10.1371/journal.pone.0106313] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Accepted: 07/30/2014] [Indexed: 11/19/2022] Open
Abstract
Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.
Collapse
|
30
|
Analysis of gene expression profiles of soft tissue sarcoma using a combination of knowledge-based filtering with integration of multiple statistics. PLoS One 2014; 9:e106801. [PMID: 25188299 PMCID: PMC4154757 DOI: 10.1371/journal.pone.0106801] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2014] [Accepted: 08/01/2014] [Indexed: 12/21/2022] Open
Abstract
The diagnosis and treatment of soft tissue sarcomas (STS) have been difficult. Of the diverse histological subtypes, undifferentiated pleomorphic sarcoma (UPS) is particularly difficult to diagnose accurately, and its classification per se is still controversial. Recent advances in genomic technologies provide an excellent way to address such problems. However, it is often difficult, if not impossible, to identify definitive disease-associated genes using genome-wide analysis alone, primarily because of multiple testing problems. In the present study, we analyzed microarray data from 88 STS patients using a combination method that used knowledge-based filtering and a simulation based on the integration of multiple statistics to reduce multiple testing problems. We identified 25 genes, including hypoxia-related genes (e.g., MIF, SCD1, P4HA1, ENO1, and STAT1) and cell cycle- and DNA repair-related genes (e.g., TACC3, PRDX1, PRKDC, and H2AFY). These genes showed significant differential expression among histological subtypes, including UPS, and showed associations with overall survival. STAT1 showed a strong association with overall survival in UPS patients (logrank p = 1.84 × 10(-6) and adjusted p value 2.99 × 10(-3) after the permutation test). According to the literature, the 25 genes selected are useful not only as markers of differential diagnosis but also as prognostic/predictive markers and/or therapeutic targets for STS. Our combination method can identify genes that are potential prognostic/predictive factors and/or therapeutic targets in STS and possibly in other cancers. These disease-associated genes deserve further preclinical and clinical validation.
Collapse
|
31
|
Jadhav A, Shanmugham B, Rajendiran A, Pan A. Unraveling novel broad-spectrum antibacterial targets in food and waterborne pathogens using comparative genomics and protein interaction network analysis. INFECTION GENETICS AND EVOLUTION 2014; 27:300-8. [PMID: 25128740 DOI: 10.1016/j.meegid.2014.08.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 07/31/2014] [Accepted: 08/07/2014] [Indexed: 02/04/2023]
Abstract
Food and waterborne diseases are a growing concern in terms of human morbidity and mortality worldwide, even in the 21st century, emphasizing the need for new therapeutic interventions for these diseases. The current study aims at prioritizing broad-spectrum antibacterial targets, present in multiple food and waterborne bacterial pathogens, through a comparative genomics strategy coupled with a protein interaction network analysis. The pathways unique and common to all the pathogens under study (viz., methane metabolism, d-alanine metabolism, peptidoglycan biosynthesis, bacterial secretion system, two-component system, C5-branched dibasic acid metabolism), identified by comparative metabolic pathway analysis, were considered for the analysis. The proteins/enzymes involved in these pathways were prioritized following host non-homology analysis, essentiality analysis, gut flora non-homology analysis and protein interaction network analysis. The analyses revealed a set of promising broad-spectrum antibacterial targets, present in multiple food and waterborne pathogens, which are essential for bacterial survival, non-homologous to host and gut flora, and functionally important in the metabolic network. The identified broad-spectrum candidates, namely, integral membrane protein/virulence factor (MviN), preprotein translocase subunits SecB and SecG, carbon storage regulator (CsrA), and nitrogen regulatory protein P-II 1 (GlnB), contributed by the peptidoglycan pathway, bacterial secretion systems and two-component systems, were also found to be present in a wide range of other disease-causing bacteria. Cytoplasmic proteins SecG, CsrA and GlnB were considered as drug targets, while membrane proteins MviN and SecB were classified as vaccine targets. The identified broad-spectrum targets can aid in the design and development of antibacterial agents not only against food and waterborne pathogens but also against other pathogens.
Collapse
Affiliation(s)
- Ankush Jadhav
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry 605014, India
| | - Buvaneswari Shanmugham
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry 605014, India
| | - Anjana Rajendiran
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry 605014, India
| | - Archana Pan
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry 605014, India.
| |
Collapse
|
32
|
Koo I, Yao S, Zhang X, Kim S. Comparative analysis of false discovery rate methods in constructing metabolic association networks. J Bioinform Comput Biol 2014; 12:1450018. [PMID: 25152043 DOI: 10.1142/s0219720014500188] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Gaussian graphical model (GGM)-based method, a key approach to reverse engineering biological networks, uses partial correlation to measure conditional dependence between two variables by controlling the contribution from other variables. After estimating partial correlation coefficients, one of the most critical processes in network construction is to control the false discovery rate (FDR) to assess the significant associations among variables. Various FDR methods have been proposed mainly for biomarker discovery, but it still remains unclear which FDR method performs better for network construction. Furthermore, there is no study to see the effect of the network structure on network construction. We selected the six FDR methods, the linear step-up procedure (BH95), the adaptive linear step-up procedure (BH00), Efron's local FDR (LFDR), Benjamini-Yekutieli's step-up procedure (BY01), Storey's q-value procedure (Storey01), and Storey-Taylor-Siegmund's adaptive step-up procedure (STS04), to evaluate their performances on network construction. We further considered two network structures, random and scale-free networks, to investigate their influence on network construction. Both simulated data and real experimental data suggest that STS04 provides the highest true positive rate (TPR) or F1 score, while BY01 has the highest positive predictive value (PPV) in network construction. In addition, no significant effect of the network structure is found on FDR methods.
Collapse
Affiliation(s)
- Imhoi Koo
- Department of Chemistry, University of Louisville, Louisville, Kentucky 40292, USA
| | | | | | | |
Collapse
|
33
|
Lee S, Kim JY, Hwang J, Kim S, Lee JH, Han DH. Investigation of pathogenic genes in peri-implantitis from implant clustering failure patients: a whole-exome sequencing pilot study. PLoS One 2014; 9:e99360. [PMID: 24921256 PMCID: PMC4055653 DOI: 10.1371/journal.pone.0099360] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Accepted: 05/13/2014] [Indexed: 01/21/2023] Open
Abstract
Peri-implantitis is a frequently occurring gum disease linked to multi-factorial traits with various environmental and genetic causalities and no known concrete pathogenesis. The varying severity of peri-implantitis among patients with relatively similar environments suggests a genetic aspect which needs to be investigated to understand and regulate the pathogenesis of the disease. Six unrelated individuals with multiple clusterization implant failure due to severe peri-implantitis were chosen for this study. These six individuals had relatively healthy lifestyles, with minimal environmental causalities affecting peri-implantitis. Research was undertaken to investigate pathogenic genes in peri-implantitis albeit with a small number of subjects and incomplete elimination of environmental causalities. Whole-exome sequencing was performed on collected saliva samples via self DNA collection kit. Common variants with minor allele frequencies (MAF) > = 0.05 from all control datasets were eliminated and variants having high and moderate impact and loss of function were used for comparison. Gene set enrichment analysis was performed to reveal functional groups associated with the genetic variants. 2,022 genes were left after filtering against dbSNP, the 1000 Genomes East Asian population, and healthy Korean randomized subsample data (GSK project). 175 (p-value <0.05) out of 927 gene sets were obtained via GSEA (DAVID). The top 10 was chosen (p-value <0.05) from cluster enrichment showing significance of cytoskeleton, cell adhesion, and metal ion binding. Network analysis was applied to find relationships between functional clusters. Among the functional groups, ion metal binding was located in the center of all clusters, indicating dysfunction of regulation in metal ion concentration might affect cell morphology or cell adhesion, resulting in implant failure. This result may demonstrate the feasibility of and provide pilot data for a larger research project aimed at discovering biomarkers for early diagnosis of peri-implantitis.
Collapse
Affiliation(s)
- Soohyung Lee
- Department of Prosthodontics, Oral Science Research Center, College of Dentistry, Yonsei University, Seoul, Korea
| | - Ji-Young Kim
- Department of Prosthodontics, Oral Science Research Center, College of Dentistry, Yonsei University, Seoul, Korea
| | - Jihye Hwang
- Department of IT Convergence and Engineering, Pohang University of Science and Technology, Pohang, Korea
| | - Sanguk Kim
- Department of IT Convergence and Engineering, Pohang University of Science and Technology, Pohang, Korea
| | - Jae-Hoon Lee
- Department of Prosthodontics, Oral Science Research Center, College of Dentistry, Yonsei University, Seoul, Korea
- * E-mail: (JHL); (DHH)
| | - Dong-Hoo Han
- Department of Prosthodontics, Oral Science Research Center, College of Dentistry, Yonsei University, Seoul, Korea
- * E-mail: (JHL); (DHH)
| |
Collapse
|
34
|
Chen YA, Tripathi LP, Dessailly BH, Nyström-Persson J, Ahmad S, Mizuguchi K. Integrated pathway clusters with coherent biological themes for target prioritisation. PLoS One 2014; 9:e99030. [PMID: 24918583 PMCID: PMC4053319 DOI: 10.1371/journal.pone.0099030] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2014] [Accepted: 05/07/2014] [Indexed: 12/15/2022] Open
Abstract
Prioritising candidate genes for further experimental characterisation is an essential, yet challenging task in biomedical research. One way of achieving this goal is to identify specific biological themes that are enriched within the gene set of interest to obtain insights into the biological phenomena under study. Biological pathway data have been particularly useful in identifying functional associations of genes and/or gene sets. However, biological pathway information as compiled in varied repositories often differs in scope and content, preventing a more effective and comprehensive characterisation of gene sets. Here we describe a new approach to constructing biologically coherent gene sets from pathway data in major public repositories and employing them for functional analysis of large gene sets. We first revealed significant overlaps in gene content between different pathways and then defined a clustering method based on the shared gene content and the similarity of gene overlap patterns. We established the biological relevance of the constructed pathway clusters using independent quantitative measures and we finally demonstrated the effectiveness of the constructed pathway clusters in comparative functional enrichment analysis of gene sets associated with diverse human diseases gathered from the literature. The pathway clusters and gene mappings have been integrated into the TargetMine data warehouse and are likely to provide a concise, manageable and biologically relevant means of functional analysis of gene sets and to facilitate candidate gene prioritisation.
Collapse
Affiliation(s)
- Yi-An Chen
- National Institute of Biomedical Innovation, Ibaraki, Osaka, Japan
| | | | | | | | - Shandar Ahmad
- National Institute of Biomedical Innovation, Ibaraki, Osaka, Japan
| | - Kenji Mizuguchi
- National Institute of Biomedical Innovation, Ibaraki, Osaka, Japan
| |
Collapse
|
35
|
Zhou H, Gao S, Nguyen NN, Fan M, Jin J, Liu B, Zhao L, Xiong G, Tan M, Li S, Wong L. Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions. Biol Direct 2014; 9:5. [PMID: 24708540 PMCID: PMC4022245 DOI: 10.1186/1745-6150-9-5] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2013] [Accepted: 03/26/2014] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homology-based prediction is frequently used in predicting both intra-species and inter-species PPIs. However, some limitations are not properly resolved in several published works that predict eukaryote-prokaryote inter-species PPIs using intra-species template PPIs. RESULTS We develop a stringent homology-based prediction approach by taking into account (i) differences between eukaryotic and prokaryotic proteins and (ii) differences between inter-species and intra-species PPI interfaces. We compare our stringent homology-based approach to a conventional homology-based approach for predicting host-pathogen PPIs, based on cellular compartment distribution analysis, disease gene list enrichment analysis, pathway enrichment analysis and functional category enrichment analysis. These analyses support the validity of our prediction result, and clearly show that our approach has better performance in predicting H. sapiens-M. tuberculosis H37Rv PPIs. Using our stringent homology-based approach, we have predicted a set of highly plausible H. sapiens-M. tuberculosis H37Rv PPIs which might be useful for many of related studies. Based on our analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent homology-based approach, we have discovered several interesting properties which are reported here for the first time. We find that both host proteins and pathogen proteins involved in the host-pathogen PPIs tend to be hubs in their own intra-species PPI network. Also, both host and pathogen proteins involved in host-pathogen PPIs tend to have longer primary sequence, tend to have more domains, tend to be more hydrophilic, etc. And the protein domains from both host and pathogen proteins involved in host-pathogen PPIs tend to have lower charge, and tend to be more hydrophilic. CONCLUSIONS Our stringent homology-based prediction approach provides a better strategy in predicting PPIs between eukaryotic hosts and prokaryotic pathogens than a conventional homology-based approach. The properties we have observed from the predicted H. sapiens-M. tuberculosis H37Rv PPI network are useful for understanding inter-species host-pathogen PPI networks and provide novel insights for host-pathogen interaction studies.
Collapse
Affiliation(s)
- Hufeng Zhou
- NUS Graduate School for Integrative Sciences & Engineering, National University of Singapore, Singapore, Singapore
- School of Computing, National University of Singapore, Singapore, Singapore
- Department of Medicine, Brigham and Women’s Hospital, Boston, USA
- Department of Microbiology and Immunobiology, Harvard University, Cambridge, USA
| | - Shangzhi Gao
- Department of Environmental Health, Harvard School of Public Health, Harvard University, Cambridge, USA
| | - Nam Ninh Nguyen
- School of Computing, National University of Singapore, Singapore, Singapore
| | - Mengyuan Fan
- NUS Graduate School for Integrative Sciences & Engineering, National University of Singapore, Singapore, Singapore
- School of Computing, National University of Singapore, Singapore, Singapore
| | - Jingjing Jin
- School of Computing, National University of Singapore, Singapore, Singapore
| | - Bing Liu
- Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Liang Zhao
- Bioinformatics Research Center, & School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
| | - Geng Xiong
- Department of Medicine, Brigham and Women’s Hospital, Boston, USA
| | - Min Tan
- Department of Medicine, Brigham and Women’s Hospital, Boston, USA
- Department of Microbiology and Immunobiology, Harvard University, Cambridge, USA
| | - Shijun Li
- Department of Medicine, Brigham and Women’s Hospital, Boston, USA
- Department of Microbiology and Immunobiology, Harvard University, Cambridge, USA
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore, Singapore
| |
Collapse
|
36
|
Zhou H, Rezaei J, Hugo W, Gao S, Jin J, Fan M, Yong CH, Wozniak M, Wong L. Stringent DDI-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S6. [PMID: 24564941 PMCID: PMC4029759 DOI: 10.1186/1752-0509-7-s6-s6] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are very important information to illuminate the infection mechanism of M. tuberculosis H37Rv. But current H. sapiens-M. tuberculosis H37Rv PPI data are very scarce. This seriously limits the study of the interaction between this important pathogen and its host H. sapiens. Computational prediction of H. sapiens-M. tuberculosis H37Rv PPIs is an important strategy to fill in the gap. Domain-domain interaction (DDI) based prediction is one of the frequently used computational approaches in predicting both intra-species and inter-species PPIs. However, the performance of DDI-based host-pathogen PPI prediction has been rather limited. RESULTS We develop a stringent DDI-based prediction approach with emphasis on (i) differences between the specific domain sequences on annotated regions of proteins under the same domain ID and (ii) calculation of the interaction strength of predicted PPIs based on the interacting residues in their interaction interfaces. We compare our stringent DDI-based approach to a conventional DDI-based approach for predicting PPIs based on gold standard intra-species PPIs and coherent informative Gene Ontology terms assessment. The assessment results show that our stringent DDI-based approach achieves much better performance in predicting PPIs than the conventional approach. Using our stringent DDI-based approach, we have predicted a small set of reliable H. sapiens-M. tuberculosis H37Rv PPIs which could be very useful for a variety of related studies. We also analyze the H. sapiens-M. tuberculosis H37Rv PPIs predicted by our stringent DDI-based approach using cellular compartment distribution analysis, functional category enrichment analysis and pathway enrichment analysis. The analyses support the validity of our prediction result. Also, based on an analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent DDI-based approach, we have discovered some important properties of domains involved in host-pathogen PPIs. We find that both host and pathogen proteins involved in host-pathogen PPIs tend to have more domains than proteins involved in intra-species PPIs, and these domains have more interaction partners than domains on proteins involved in intra-species PPI. CONCLUSIONS The stringent DDI-based prediction approach reported in this work provides a stringent strategy for predicting host-pathogen PPIs. It also performs better than a conventional DDI-based approach in predicting PPIs. We have predicted a small set of accurate H. sapiens-M. tuberculosis H37Rv PPIs which could be very useful for a variety of related studies.
Collapse
|