1
|
Najafzadeh L, Mahmoudi M, Ebadi M, Dehghan Shasaltaneh M. Co-expression Network Analysis Reveals Key Genes Related to Ankylosing spondylitis Arthritis Disease: Computational and Experimental Validation. IRANIAN JOURNAL OF BIOTECHNOLOGY 2021; 19:e2630. [PMID: 34179194 PMCID: PMC8217537 DOI: 10.30498/ijb.2021.2630] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
BACKGROUND Ankylosing spondylitis (AS) is a type of arthritis which can cause inflammation in the vertebrae and joints between the spine and pelvis. However, our understanding of the exact genetic mechanisms of AS is still far from being clear. OBJECTIVE To study and find the mechanisms and possible biomarkers related to AS by surveying inter-gene correlations of networks. MATERIALS AND METHODS A weighted gene co-expression network was constructed among genes identified by microarray analysis, gene co-expression network analysis, and network clustering. Then receiver operating characteristic (ROC) curves were conducted to identify a significant module with the genes implicated in the AS pathogenesis. Real-time PCR was performed to validate the results of microarray analysis. RESULTS In the significant module obtained from the network analysis there were eight AS related genes (LSM3, MRPS11, NSMCE2, PSMA4, UBL5, RPL17, MRPL22 and RPS17) which have been reported in previous studies as hub genes. Further, in this module, eight significant enriched pathways were found with adjusted p-values < 0.001 consisting of oxidative phosphorylation, ribosome, nonalcoholic fatty liver disease, Alzheimer's, Huntington's, and Parkinson's diseases, spliceosome, and cardiac muscle contraction pathways which have been linked to AS. Furthermore, we identified nine AS related genes (UQCRB, UQCRH, UQCRHL, UQCRQ, COX7B, COX5B, COX6C, COX6A1 and COX7C) in these pathways which can play essential roles in controlling mitochondrial activity and pathogenesis of autoimmune diseases. Real-time PCR results showed that three genes including UQCRH, MRPS11, and NSMCE2 in AS patients were significantly differentially expressed compared with normal controls. CONCLUSIONS The results of the present study may contribute to understanding of AS molecular pathogenesis, thereby aiding the early prognosis, diagnosis, and effective therapies of the disease.
Collapse
Affiliation(s)
- Leila Najafzadeh
- Department of Biology, College of Science, Damghan Branch, Islamic Azad University, Damghan, Iran
| | - Mahdi Mahmoudi
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Mostafa Ebadi
- Department of Biology, College of Science, Damghan Branch, Islamic Azad University, Damghan, Iran
| | | |
Collapse
|
2
|
DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier. PLoS Comput Biol 2020; 16:e1008453. [PMID: 33206638 PMCID: PMC7710064 DOI: 10.1371/journal.pcbi.1008453] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 12/02/2020] [Accepted: 10/20/2020] [Indexed: 12/21/2022] Open
Abstract
Predicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype–phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations. We developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over established methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene–disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno have recently been added as phenotype databases. Gene–phenotype associations can help to understand the underlying mechanisms of many genetic diseases. However, experimental identification, often involving animal models, is time consuming and expensive. Computational methods that predict gene–phenotype associations can be used instead. We developed DeepPheno, a novel approach for predicting the phenotypes resulting from a loss of function of a single gene. We use gene functions and gene expression as information to prediction phenotypes. Our method uses a neural network classifier that is able to account for hierarchical dependencies between phenotypes. We extensively evaluate our method and compare it with related approaches, and we show that DeepPheno results in better performance in several evaluations. Furthermore, we found that many of the new predictions made by our method have been added to phenotype association databases released one year later. Overall, DeepPheno simulates some aspects of human physiology and how molecular and physiological alterations lead to abnormal phenotypes.
Collapse
|
3
|
Hu B, Ruan Y, Wei F, Qin G, Mo X, Wang X, Zou D. Identification of three glioblastoma subtypes and a six-gene prognostic risk index based on the expression of growth factors and cytokines. Am J Transl Res 2020; 12:4669-4682. [PMID: 32913540 PMCID: PMC7476164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 07/22/2020] [Indexed: 06/11/2023]
Abstract
Glioblastoma multiforme (GBM) is the most common and invasive tumor of the central nervous system. Growth factors and cytokines (GFCKs) play a crucial role in tumor invasion. In the present study, GFCK expression profiles from GBM patients in the Chinese Glioma Genome Atlas were used to perform sample clustering with nonnegative matrix factorization. Three GBM subtypes were identified based on differences in GFCK expression, and the subtypes differed in characteristics and prognosis. A prognostic risk index (RI) comprising six GFCKs (BMP2, CCN3, GKN1, LIF, MDK, and SEMA3G) was defined using univariate Cox hazard analysis and multivariate stepwise Cox regression. The RI was validated in two independent data sets and may be independent of some known prognostic factors. Our results suggest that GBM occurs as different subtypes expressing different patterns of GFCKs and that these expression patterns can be captured in an RI that can predict prognosis.
Collapse
Affiliation(s)
- Beiquan Hu
- Department of Neurosurgery, The First Affiliated Hospital, Jinan UniversityGuangzhou 510630, Guangdong, People’s Republic of China
- Department of Neurosurgery, The Fifth Affiliated Hospital of Guangxi Medical UniversityNanning 530022, Guangxi, People’s Republic of China
| | - Yushan Ruan
- Department of Neurosurgery, The Second Affiliated Hospital of Guangxi Medical UniversityNanning 530000, Guangxi, People’s Republic of China
| | - Feng Wei
- Department of Neurosurgery, The Fifth Affiliated Hospital of Guangxi Medical UniversityNanning 530022, Guangxi, People’s Republic of China
| | - Gang Qin
- Department of Neurosurgery, The Fifth Affiliated Hospital of Guangxi Medical UniversityNanning 530022, Guangxi, People’s Republic of China
| | - Xianlun Mo
- Department of Neurosurgery, The Fifth Affiliated Hospital of Guangxi Medical UniversityNanning 530022, Guangxi, People’s Republic of China
| | - Xiangyu Wang
- Department of Neurosurgery, The First Affiliated Hospital, Jinan UniversityGuangzhou 510630, Guangdong, People’s Republic of China
| | - Donghua Zou
- Department of Neurology, The Fifth Affiliated Hospital of Guangxi Medical UniversityNanning 530022, Guangxi, People’s Republic of China
| |
Collapse
|
4
|
Baez-Ortega A, Gori K. Computational approaches for discovery of mutational signatures in cancer. Brief Bioinform 2019; 20:77-88. [PMID: 28968631 DOI: 10.1093/bib/bbx082] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Indexed: 01/07/2023] Open
Abstract
The accumulation of somatic mutations in a genome is the result of the activity of one or more mutagenic processes, each of which leaves its own imprint. The study of these DNA fingerprints, termed mutational signatures, holds important potential for furthering our understanding of the causes and evolution of cancer, and can provide insights of relevance for cancer prevention and treatment. In this review, we focus our attention on the mathematical models and computational techniques that have driven recent advances in the field.
Collapse
Affiliation(s)
| | - Kevin Gori
- Transmissible Cancer Group, University of Cambridge
| |
Collapse
|
5
|
Che C, Lin R, Zeng X, Elmaaroufi K, Galeotti J, Xu M. Improved deep learning-based macromolecules structure classification from electron cryo-tomograms. MACHINE VISION AND APPLICATIONS 2018; 29:1227-1236. [PMID: 31511756 PMCID: PMC6738941 DOI: 10.1007/s00138-018-0949-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 01/16/2018] [Accepted: 05/18/2018] [Indexed: 05/30/2023]
Abstract
Cellular processes are governed by macromolecular complexes inside the cell. Study of the native structures of macromolecular complexes has been extremely difficult due to lack of data. With recent breakthroughs in Cellular Electron Cryo-Tomography (CECT) 3D imaging technology, it is now possible for researchers to gain accesses to fully study and understand the macro-molecular structures single cells. However, systematic recovery of macromolecular structures from CECT is very difficult due to high degree of structural complexity and practical imaging limitations. Specifically, we proposed a deep learning-based image classification approach for large-scale systematic macromolecular structure separation from CECT data. However, our previous work was only a very initial step toward exploration of the full potential of deep learning-based macromolecule separation. In this paper, we focus on improving classification performance by proposing three newly designed individual CNN models: an extended version of (Deep Small Receptive Field) DSRF3D, donated as DSRF3D-v2, a 3D residual block-based neural network, named as RB3D, and a convolutional 3D (C3D)-based model, CB3D. We compare them with our previously developed model (DSRF3D) on 12 datasets with different SNRs and tilt angle ranges. The experiments show that our new models achieved significantly higher classification accuracies. The accuracies are not only higher than 0.9 on normal datasets, but also demonstrate potentials to operate on datasets with high levels of noises and missing wedge effects presented.
Collapse
Affiliation(s)
- Chengqian Che
- The Robotics Institute, Carnegie Mellon University,Pittsburgh, USA
| | - Ruogu Lin
- Department of Automation, Tsinghua University, Beijing, China
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, USA
| | - Karim Elmaaroufi
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA
| | - John Galeotti
- The Robotics Institute, Carnegie Mellon University,Pittsburgh, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, USA
| |
Collapse
|
6
|
Kim M, Tagkopoulos I. Data integration and predictive modeling methods for multi-omics datasets. Mol Omics 2018; 14:8-25. [DOI: 10.1039/c7mo00051k] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
We provide an overview of opportunities and challenges in multi-omics predictive analytics with particular emphasis on data integration and machine learning methods.
Collapse
Affiliation(s)
- Minseung Kim
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| | - Ilias Tagkopoulos
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| |
Collapse
|
7
|
Li YE, Xiao M, Shi B, Yang YCT, Wang D, Wang F, Marcia M, Lu ZJ. Identification of high-confidence RNA regulatory elements by combinatorial classification of RNA-protein binding sites. Genome Biol 2017; 18:169. [PMID: 28886744 PMCID: PMC5591525 DOI: 10.1186/s13059-017-1298-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 08/14/2017] [Indexed: 12/20/2022] Open
Abstract
Crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled researchers to characterize transcriptome-wide binding sites of RNA-binding protein (RBP) with high resolution. We apply a soft-clustering method, RBPgroup, to various CLIP-seq datasets to group together RBPs that specifically bind the same RNA sites. Such combinatorial clustering of RBPs helps interpret CLIP-seq data and suggests functional RNA regulatory elements. Furthermore, we validate two RBP–RBP interactions in cell lines. Our approach links proteins and RNA motifs known to possess similar biochemical and cellular properties and can, when used in conjunction with additional experimental data, identify high-confidence RBP groups and their associated RNA regulatory elements.
Collapse
Affiliation(s)
- Yang Eric Li
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Mu Xiao
- Life Sciences Institute, Innovation Center for Cell Signaling Network, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Binbin Shi
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yu-Cheng T Yang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Dong Wang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Fei Wang
- Life Sciences Institute, Innovation Center for Cell Signaling Network, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Marco Marcia
- European Molecular Biology Laboratory, Grenoble Outstation, 71 Avenue des Martyrs, Grenoble, 38042, France
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
8
|
Gandy LM, Gumm J, Fertig B, Thessen A, Kennish MJ, Chavan S, Marchionni L, Xia X, Shankrit S, Fertig EJ. Synthesizer: Expediting synthesis studies from context-free data with information retrieval techniques. PLoS One 2017; 12:e0175860. [PMID: 28437440 PMCID: PMC5402950 DOI: 10.1371/journal.pone.0175860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 03/31/2017] [Indexed: 11/18/2022] Open
Abstract
Scientists have unprecedented access to a wide variety of high-quality datasets. These datasets, which are often independently curated, commonly use unstructured spreadsheets to store their data. Standardized annotations are essential to perform synthesis studies across investigators, but are often not used in practice. Therefore, accurately combining records in spreadsheets from differing studies requires tedious and error-prone human curation. These efforts result in a significant time and cost barrier to synthesis research. We propose an information retrieval inspired algorithm, Synthesize, that merges unstructured data automatically based on both column labels and values. Application of the Synthesize algorithm to cancer and ecological datasets had high accuracy (on the order of 85-100%). We further implement Synthesize in an open source web application, Synthesizer (https://github.com/lisagandy/synthesizer). The software accepts input as spreadsheets in comma separated value (CSV) format, visualizes the merged data, and outputs the results as a new spreadsheet. Synthesizer includes an easy to use graphical user interface, which enables the user to finish combining data and obtain perfect accuracy. Future work will allow detection of units to automatically merge continuous data and application of the algorithm to other data formats, including databases.
Collapse
Affiliation(s)
- Lisa M. Gandy
- Department of Computer Science, Central Michigan University, Mt Pleasant, MI, United States of America
- * E-mail: (LMG); (EJF)
| | - Jordan Gumm
- Department of Computer Science, Central Michigan University, Mt Pleasant, MI, United States of America
| | - Benjamin Fertig
- Ronin Institute for Independent Scholarship, Montclair, NJ, United States of America
| | - Anne Thessen
- Ronin Institute for Independent Scholarship, Montclair, NJ, United States of America
| | - Michael J. Kennish
- Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, United States of America
| | - Sameer Chavan
- Colorado Center for Personalized Medicine, University of Colorado Denver, Denver, CO, United States of America
| | - Luigi Marchionni
- Department of Oncology, Johns Hopkins University, Baltimore, MD, United States of America
| | - Xiaoxin Xia
- Department of Oncology, Johns Hopkins University, Baltimore, MD, United States of America
| | - Shambhavi Shankrit
- Department of Oncology, Johns Hopkins University, Baltimore, MD, United States of America
| | - Elana J. Fertig
- Department of Oncology, Johns Hopkins University, Baltimore, MD, United States of America
- * E-mail: (LMG); (EJF)
| |
Collapse
|
9
|
Ji Z, Vokes SA, Dang CV, Ji H. Turning publicly available gene expression data into discoveries using gene set context analysis. Nucleic Acids Res 2015; 44:e8. [PMID: 26350211 PMCID: PMC4705686 DOI: 10.1093/nar/gkv873] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 08/20/2015] [Indexed: 12/17/2022] Open
Abstract
Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data.
Collapse
Affiliation(s)
- Zhicheng Ji
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
| | - Steven A Vokes
- Department of Molecular Biosciences, The University of Texas at Austin, 2500 Speedway Stop A4800, Austin, TX 78712, USA Institute for Cellular and Molecular Biology, The University of Texas at Austin, 2500 Speedway Stop A4800, Austin, TX 78712, USA
| | - Chi V Dang
- Abramson Cancer Center, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA 19104, USA
| | - Hongkai Ji
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
| |
Collapse
|
10
|
Kim M, Zorraquino V, Tagkopoulos I. Microbial forensics: predicting phenotypic characteristics and environmental conditions from large-scale gene expression profiles. PLoS Comput Biol 2015; 11:e1004127. [PMID: 25774498 PMCID: PMC4361189 DOI: 10.1371/journal.pcbi.1004127] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2014] [Accepted: 01/14/2015] [Indexed: 01/13/2023] Open
Abstract
A tantalizing question in cellular physiology is whether the cellular state and environmental conditions can be inferred by the expression signature of an organism. To investigate this relationship, we created an extensive normalized gene expression compendium for the bacterium Escherichia coli that was further enriched with meta-information through an iterative learning procedure. We then constructed an ensemble method to predict environmental and cellular state, including strain, growth phase, medium, oxygen level, antibiotic and carbon source presence. Results show that gene expression is an excellent predictor of environmental structure, with multi-class ensemble models achieving balanced accuracy between 70.0% (±3.5%) to 98.3% (±2.3%) for the various characteristics. Interestingly, this performance can be significantly boosted when environmental and strain characteristics are simultaneously considered, as a composite classifier that captures the inter-dependencies of three characteristics (medium, phase and strain) achieved 10.6% (±1.0%) higher performance than any individual models. Contrary to expectations, only 59% of the top informative genes were also identified as differentially expressed under the respective conditions. Functional analysis of the respective genetic signatures implicates a wide spectrum of Gene Ontology terms and KEGG pathways with condition-specific information content, including iron transport, transferases, and enterobactin synthesis. Further experimental phenotypic-to-genotypic mapping that we conducted for knock-out mutants argues for the information content of top-ranked genes. This work demonstrates the degree at which genome-scale transcriptional information can be predictive of latent, heterogeneous and seemingly disparate phenotypic and environmental characteristics, with far-reaching applications. The transcriptional profile of an organism contains clues about the environmental context in which it has evolved and currently lives, its behavior and cellular state. It is yet unclear, however, how much information can be efficiently extracted and how it can be used to classify new samples with respect to their environmental and genetic characteristics. Here, we have constructed an extensive transcriptome compendium of Escherichia coli that we have further enriched via an iterative learning approach. We then apply an ensemble of various machine learning algorithms to infer environmental and cellular information such as strain, growth phase, medium, oxygen level, antibiotic and carbon source. Functional analysis of the most informative genes provides mechanistic insights and palpable hypotheses regarding their role in each environmental or genetic context. Our work argues that genome-scale gene expression can be a multi-purpose marker for identifying latent, heterogeneous cellular and environmental states and that optimal classification can be achieved with a feature set of a couple hundred genes that might not necessarily have the most pronounced differential expression in the respective conditions.
Collapse
Affiliation(s)
- Minseung Kim
- Department of Computer Science, University of California, Davis, Davis, California, United States of America
- UC Davis Genome Center, University of California, Davis, Davis, California, United States of America
| | - Violeta Zorraquino
- UC Davis Genome Center, University of California, Davis, Davis, California, United States of America
| | - Ilias Tagkopoulos
- Department of Computer Science, University of California, Davis, Davis, California, United States of America
- UC Davis Genome Center, University of California, Davis, Davis, California, United States of America
- * E-mail:
| |
Collapse
|
11
|
Johnson MD, Bell J, Clarke K, Chandler R, Pathak P, Xia Y, Marshall RL, Weinstock GM, Loman NJ, Winn PJ, Lund PA. Characterization of mutations in the PAS domain of the EvgS sensor kinase selected by laboratory evolution for acid resistance in Escherichia coli. Mol Microbiol 2014; 93:911-27. [PMID: 24995530 PMCID: PMC4283999 DOI: 10.1111/mmi.12704] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/02/2014] [Indexed: 01/25/2023]
Abstract
Laboratory-based evolution and whole-genome sequencing can link genotype and phenotype. We used evolution of acid resistance in exponential phase Escherichia coli to study resistance to a lethal stress. Iterative selection at pH 2.5 generated five populations that were resistant to low pH in early exponential phase. Genome sequencing revealed multiple mutations, but the only gene mutated in all strains was evgS, part of a two-component system that has already been implicated in acid resistance. All these mutations were in the cytoplasmic PAS domain of EvgS, and were shown to be solely responsible for the resistant phenotype, causing strong upregulation at neutral pH of genes normally induced by low pH. Resistance to pH 2.5 in these strains did not require the transporter GadC, or the sigma factor RpoS. We found that EvgS-dependent constitutive acid resistance to pH 2.5 was retained in the absence of the regulators GadE or YdeO, but was lost if the oxidoreductase YdeP was also absent. A deletion in the periplasmic domain of EvgS abolished the response to low pH, but not the activity of the constitutive mutants. On the basis of these results we propose a model for how EvgS may become activated by low pH.
Collapse
Affiliation(s)
- Matthew D Johnson
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, UK; Drug Delivery, Disposition & Dynamics, Monash Institute of Pharmaceutical Sciences, 381 Royal Parade, Parkville, 3062, Vic., Australia
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Giannopoulou EG, Elemento O. Inferring chromatin-bound protein complexes from genome-wide binding assays. Genome Res 2013; 23:1295-306. [PMID: 23554462 PMCID: PMC3730103 DOI: 10.1101/gr.149419.112] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Genome-wide binding assays can determine where individual transcription factors bind in the genome. However, these factors rarely bind chromatin alone, but instead frequently bind to cis-regulatory elements (CREs) together with other factors thus forming protein complexes. Currently there are no integrative analytical approaches that can predict which complexes are formed on chromatin. Here, we describe a computational methodology to systematically capture protein complexes and infer their impact on gene expression. We applied our method to three human cell types, identified thousands of CREs, inferred known and undescribed complexes recruited to these CREs, and determined the role of the complexes as activators or repressors. Importantly, we found that the predicted complexes have a higher number of physical interactions between their members than expected by chance. Our work provides a mechanism for developing hypotheses about gene regulation via binding partners, and deciphering the interplay between combinatorial binding and gene expression.
Collapse
Affiliation(s)
- Eugenia G Giannopoulou
- HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, Cornell University, New York, New York 10021, USA
| | | |
Collapse
|
13
|
Oldham MC, Langfelder P, Horvath S. Network methods for describing sample relationships in genomic datasets: application to Huntington's disease. BMC SYSTEMS BIOLOGY 2012; 6:63. [PMID: 22691535 PMCID: PMC3441531 DOI: 10.1186/1752-0509-6-63] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Accepted: 05/03/2012] [Indexed: 01/08/2023]
Abstract
BACKGROUND Genomic datasets generated by new technologies are increasingly prevalent in disparate areas of biological research. While many studies have sought to characterize relationships among genomic features, commensurate efforts to characterize relationships among biological samples have been less common. Consequently, the full extent of sample variation in genomic studies is often under-appreciated, complicating downstream analytical tasks such as gene co-expression network analysis. RESULTS Here we demonstrate the use of network methods for characterizing sample relationships in microarray data generated from human brain tissue. We describe an approach for identifying outlying samples that does not depend on the choice or use of clustering algorithms. We introduce a battery of measures for quantifying the consistency and integrity of sample relationships, which can be compared across disparate studies, technology platforms, and biological systems. Among these measures, we provide evidence that the correlation between the connectivity and the clustering coefficient (two important network concepts) is a sensitive indicator of homogeneity among biological samples. We also show that this measure, which we refer to as cor(K,C), can distinguish biologically meaningful relationships among subgroups of samples. Specifically, we find that cor(K,C) reveals the profound effect of Huntington's disease on samples from the caudate nucleus relative to other brain regions. Furthermore, we find that this effect is concentrated in specific modules of genes that are naturally co-expressed in human caudate nucleus, highlighting a new strategy for exploring the effects of disease on sets of genes. CONCLUSIONS These results underscore the importance of systematically exploring sample relationships in large genomic datasets before seeking to analyze genomic feature activity. We introduce a standardized platform for this purpose using freely available R software that has been designed to enable iterative and interactive exploration of sample networks.
Collapse
Affiliation(s)
- Michael C Oldham
- Department of Neurology, The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, USA.
| | | | | |
Collapse
|
14
|
Drummond RSM, Sheehan H, Simons JL, Martínez-Sánchez NM, Turner RM, Putterill J, Snowden KC. The Expression of Petunia Strigolactone Pathway Genes is Altered as Part of the Endogenous Developmental Program. FRONTIERS IN PLANT SCIENCE 2012; 2:115. [PMID: 22645562 PMCID: PMC3355783 DOI: 10.3389/fpls.2011.00115] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2011] [Accepted: 12/26/2011] [Indexed: 05/18/2023]
Abstract
Analysis of mutants with increased branching has revealed the strigolactone synthesis/perception pathway which regulates branching in plants. However, whether variation in this well conserved developmental signaling system contributes to the unique plant architectures of different species is yet to be determined. We examined petunia orthologs of the ArabidopsisMAX1 and MAX2 genes to characterize their role in petunia architecture. A single ortholog of MAX1, PhMAX1 which encodes a cytochrome P450, was identified and was able to complement the max1 mutant of Arabidopsis. Petunia has two copies of the MAX2 gene, PhMAX2A and PhMAX2B which encode F-Box proteins. Differences in the transcript levels of these two MAX2-like genes suggest diverging functions. Unlike PhMAX2B, PhMAX2A mRNA levels change in leaves of differing age/position on the plant. Nonetheless, this gene functionally complements the Arabidopsismax2 mutant indicating that the biochemical activity of the PhMAX2A protein is not significantly different from MAX2. The expression of the petunia strigolactone pathway genes (PhCCD7, PhCCD8, PhMAX1, PhMAX2A, and PhMAX2B) was then further investigated throughout the development of wild-type petunia plants. Three of these genes showed changes in mRNA levels over a development series. Alterations to the expression patterns of these genes may influence the branching growth habit of plants by changing strigolactone production and/or sensitivity. These changes could allow both subtle and dramatic changes to branching within and between species.
Collapse
Affiliation(s)
| | - Hester Sheehan
- The New Zealand Institute for Plant and Food Research LtdAuckland, New Zealand
- Plant Molecular Sciences, School of Biological Sciences, University of AucklandAuckland, New Zealand
| | - Joanne L. Simons
- The New Zealand Institute for Plant and Food Research LtdAuckland, New Zealand
| | | | - Rebecca M. Turner
- The New Zealand Institute for Plant and Food Research LtdAuckland, New Zealand
| | - Joanna Putterill
- Plant Molecular Sciences, School of Biological Sciences, University of AucklandAuckland, New Zealand
| | | |
Collapse
|
15
|
Jorgensen RA, Dorantes-Acosta AE. Conserved Peptide Upstream Open Reading Frames are Associated with Regulatory Genes in Angiosperms. FRONTIERS IN PLANT SCIENCE 2012; 3:191. [PMID: 22936940 PMCID: PMC3426882 DOI: 10.3389/fpls.2012.00191] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2012] [Accepted: 08/04/2012] [Indexed: 05/20/2023]
Abstract
Upstream open reading frames (uORFs) are common in eukaryotic transcripts, but those that encode conserved peptides occur in less than 1% of transcripts. The peptides encoded by three plant conserved peptide uORF (CPuORF) families are known to control translation of the downstream ORF in response to a small signal molecule (sucrose, polyamines, and phosphocholine). In flowering plants, transcription factors are statistically over-represented among genes that possess CPuORFs, and in general it appeared that many CPuORF genes also had other regulatory functions, though the significance of this suggestion was uncertain (Hayden and Jorgensen, 2007). Five years later the literature provides much more information on the functions of many CPuORF genes. Here we reassess the functions of 27 known CPuORF gene families and find that 22 of these families play a variety of different regulatory roles, from transcriptional control to protein turnover, and from small signal molecules to signal transduction kinases. Clearly then, there is indeed a strong association of CPuORFs with regulatory genes. In addition, 16 of these families play key roles in a variety of different biological processes. Most strikingly, the core sucrose response network includes three different CPuORFs, creating the potential for sophisticated balancing of the network in response to three different molecular inputs. We propose that the function of most CPuORFs is to modulate translation of a downstream major ORF (mORF) in response to a signal molecule recognized by the conserved peptide and that because the mORFs of CPuORF genes generally encode regulatory proteins, many of them centrally important in the biology of plants, CPuORFs play key roles in balancing such regulatory networks.
Collapse
Affiliation(s)
- Richard A. Jorgensen
- Laboratorio Nacional de Genómica para la Biodiversidad, Centro de Investigación y Estudios Avanzados del Instituto Politécnico NacionalIrapuato, Guanajuato, México
- *Correspondence: Richard A. Jorgensen, Laboratorio Nacional de Genómica para la Biodiversidad, Centro de Investigación y Estudios Avanzados del Instituto Politécnico Nacional, Km 9.6 Libramiento Norte Carretera León, 36821 Irapuato, Guanajuato, México. e-mail:
| | - Ana E. Dorantes-Acosta
- Instituto de Biotecnología y Ecología Aplicada, Universidad VeracruzanaXalapa, Veracruz, México
| |
Collapse
|
16
|
Li W, Liu CC, Zhang T, Li H, Waterman MS, Zhou XJ. Integrative analysis of many weighted co-expression networks using tensor computation. PLoS Comput Biol 2011; 7:e1001106. [PMID: 21698123 PMCID: PMC3116899 DOI: 10.1371/journal.pcbi.1001106] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Accepted: 02/08/2011] [Indexed: 11/18/2022] Open
Abstract
The rapid accumulation of biological networks poses new challenges and calls for powerful integrative analysis tools. Most existing methods capable of simultaneously analyzing a large number of networks were primarily designed for unweighted networks, and cannot easily be extended to weighted networks. However, it is known that transforming weighted into unweighted networks by dichotomizing the edges of weighted networks with a threshold generally leads to information loss. We have developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks. Specifically, we formulate the recurrent heavy subgraph identification problem as a heavy 3D subtensor discovery problem with sparse constraints. We describe an effective approach to solving this problem by designing a multi-stage, convex relaxation protocol, and a non-uniform edge sampling technique. We applied our method to 130 co-expression networks, and identified 11,394 recurrent heavy subgraphs, grouped into 2,810 families. We demonstrated that the identified subgraphs represent meaningful biological modules by validating against a large set of compiled biological knowledge bases. We also showed that the likelihood for a heavy subgraph to be meaningful increases significantly with its recurrence in multiple networks, highlighting the importance of the integrative approach to biological network analysis. Moreover, our approach based on weighted graphs detects many patterns that would be overlooked using unweighted graphs. In addition, we identified a large number of modules that occur predominately under specific phenotypes. This analysis resulted in a genome-wide mapping of gene network modules onto the phenome. Finally, by comparing module activities across many datasets, we discovered high-order dynamic cooperativeness in protein complex networks and transcriptional regulatory networks.
Collapse
Affiliation(s)
- Wenyuan Li
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Chun-Chi Liu
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Tong Zhang
- Department of Statistics, Rutgers University, New Brunswick, New Jersey, United States of America
| | - Haifeng Li
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Michael S. Waterman
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Xianghong Jasmine Zhou
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
17
|
Congdon E, Poldrack RA, Freimer NB. Neurocognitive phenotypes and genetic dissection of disorders of brain and behavior. Neuron 2010; 68:218-30. [PMID: 20955930 DOI: 10.1016/j.neuron.2010.10.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2010] [Indexed: 01/10/2023]
Abstract
Elucidating the molecular mechanisms underlying quantitative neurocognitive phenotypes will further our understanding of the brain's structural and functional architecture and advance the diagnosis and treatment of the psychiatric disorders that these traits underlie. Although many neurocognitive traits are highly heritable, little progress has been made in identifying genetic variants unequivocally associated with these phenotypes. A major obstacle to such progress is the difficulty in identifying heritable neurocognitive measures that are precisely defined and systematically assessed and represent unambiguous mental constructs, yet are also amenable to the high-throughput phenotyping necessary to obtain adequate power for genetic association studies. In this perspective we compare the current status of genetic investigations of neurocognitive phenotypes to that of other categories of biomedically relevant traits and suggest strategies for genetically dissecting traits that may underlie disorders of brain and behavior.
Collapse
Affiliation(s)
- Eliza Congdon
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | | |
Collapse
|
18
|
Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 2010; 11:367. [PMID: 20598126 PMCID: PMC2912887 DOI: 10.1186/1471-2105-11-367] [Citation(s) in RCA: 824] [Impact Index Per Article: 58.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2009] [Accepted: 07/02/2010] [Indexed: 11/23/2022] Open
Abstract
Background Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, including signal processing, face recognition and text mining. Recent applications of NMF in bioinformatics have demonstrated its ability to extract meaningful information from high-dimensional data such as gene expression microarrays. Developments in NMF theory and applications have resulted in a variety of algorithms and methods. However, most NMF implementations have been on commercial platforms, while those that are freely available typically require programming skills. This limits their use by the wider research community. Results Our objective is to provide the bioinformatics community with an open-source, easy-to-use and unified interface to standard NMF algorithms, as well as with a simple framework to help implement and test new NMF methods. For that purpose, we have developed a package for the R/BioConductor platform. The package ports public code to R, and is structured to enable users to easily modify and/or add algorithms. It includes a number of published NMF algorithms and initialization methods and facilitates the combination of these to produce new NMF strategies. Commonly used benchmark data and visualization methods are provided to help in the comparison and interpretation of the results. Conclusions The NMF package helps realize the potential of Nonnegative Matrix Factorization, especially in bioinformatics, providing easy access to methods that have already yielded new insights in many applications. Documentation, source code and sample data are available from CRAN.
Collapse
Affiliation(s)
- Renaud Gaujoux
- Computational Biology Group, Department of Clinical Laboratory Sciences, Faculty of Health Sciences, University of Cape Town, South Africa
| | | |
Collapse
|
19
|
Eisenstein M. Reading between the lines. Nat Methods 2009. [DOI: 10.1038/nmeth0909-632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|