1
|
Hsieh AR, Tsai CY. Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction. Eur J Med Res 2024; 29:404. [PMID: 39095899 PMCID: PMC11297645 DOI: 10.1186/s40001-024-01983-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 07/17/2024] [Indexed: 08/04/2024] Open
Abstract
The supervised machine learning method is often used for biomedical relationship extraction. The disadvantage is that it requires much time and money to manually establish an annotated dataset. Based on distant supervision, the knowledge base is combined with the corpus, thus, the training corpus can be automatically annotated. As many biomedical databases provide knowledge bases for study with a limited number of annotated corpora, this method is practical in biomedicine. The clinical significance of each patient's genetic makeup can be understood based on the healthcare provider's genetic database. Unfortunately, the lack of previous biomedical relationship extraction studies focuses on gene-gene interaction. The main purpose of this study is to develop extraction methods for gene-gene interactions that can help explain the heritability of human complex diseases. This study referred to the information on gene-gene interactions in the KEGG PATHWAY database, the abstracts in PubMed were adopted to generate the training sample set, and the graph kernel method was adopted to extract gene-gene interactions. The best assessment result was an F1-score of 0.79. Our developed distant supervision method automatically finds sentences through the corpus without manual labeling for extracting gene-gene interactions, which can effectively reduce the time cost for manual annotation data; moreover, the relationship extraction method based on a graph kernel can be successfully applied to extract gene-gene interactions. In this way, the results of this study are expected to help achieve precision medicine.
Collapse
Affiliation(s)
- Ai-Ru Hsieh
- Department of Statistics, Tamkang University, Tamsui District, New Taipei City, 251301, Taiwan.
| | - Chen-Yu Tsai
- Department of Statistics, Tamkang University, Tamsui District, New Taipei City, 251301, Taiwan
| |
Collapse
|
2
|
Hinostroza F, Araya-Duran I, Piñeiro A, Lobos I, Pastenes L. Transcription factor roles in the local adaptation to temperature in the Andean Spiny Toad Rhinella spinulosa. Sci Rep 2024; 14:15158. [PMID: 38956427 PMCID: PMC11220030 DOI: 10.1038/s41598-024-66127-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 06/27/2024] [Indexed: 07/04/2024] Open
Abstract
Environmental temperature strongly influences the adaptation dynamics of amphibians, whose limited regulation capabilities render them susceptible to thermal oscillations. A central element of the adaptive strategies is the transcription factors (TFs), which act as master regulators that orchestrate stress responses, enabling species to navigate the fluctuations of their environment skillfully. Our study delves into the intricate relationship between TF expression and thermal adaptation mechanisms in the Rhinella spinulosa populations. We sought to elucidate the dynamic modulations of TF expression in prometamorphic and metamorphic tadpoles that inhabit two thermally contrasting environments (Catarpe and El Tatio Geyser, Chile) and which were exposed to two thermal treatments (25 °C vs. 20 °C). Our findings unravel an intriguing dichotomy in response strategies between these populations. First, results evidence the expression of 1374 transcription factors. Regarding the temperature shift, the Catarpe tadpoles show a multifaceted approach by up-regulating crucial TFs, including fosB, atf7, and the androgen receptor. These dynamic regulatory responses likely underpin the population's ability to navigate thermal fluctuations effectively. In stark contrast, the El Tatio tadpoles exhibit a more targeted response, primarily up-regulating foxc1. This differential expression suggests a distinct focus on specific TFs to mitigate the effects of temperature variations. Our study contributes to understanding the molecular mechanisms governing thermal adaptation responses and highlights the resilience and adaptability of amphibians in the face of ever-changing environmental conditions.
Collapse
Affiliation(s)
- Fernando Hinostroza
- Centro de Investigación de Estudios Avanzados del Maule (CIEAM), Vicerrectoría de Investigación y Postgrado, Universidad Católica del Maule, Talca, Chile
- Centro de Investigación en Neuropsicología y Neurociencias Cognitivas, Facultad de Ciencias de la Salud, Universidad Católica del Maule, Talca, Chile
- Escuela de Química y Farmacia, Departamento de Medicina Traslacional, Facultad de Medicina, Universidad Católica del Maule, Talca, Chile
- Centro Para la Investigación Traslacional en Neurofarmacología, Universidad de Valparaíso, Valparaíso, Chile
| | - Ingrid Araya-Duran
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Universidad Andrés Bello, Santiago, Chile
| | - Alejandro Piñeiro
- Laboratorio de Genética y Microevolución, Departamento de Biología y Química, Facultad de Ciencias Básicas, Universidad Católica del Maule, Talca, Chile
| | - Isabel Lobos
- Laboratorio de Genética y Microevolución, Departamento de Biología y Química, Facultad de Ciencias Básicas, Universidad Católica del Maule, Talca, Chile
| | - Luis Pastenes
- Laboratorio de Genética y Microevolución, Departamento de Biología y Química, Facultad de Ciencias Básicas, Universidad Católica del Maule, Talca, Chile.
| |
Collapse
|
3
|
Mantica F, Iñiguez LP, Marquez Y, Permanyer J, Torres-Mendez A, Cruz J, Franch-Marro X, Tulenko F, Burguera D, Bertrand S, Doyle T, Nouzova M, Currie PD, Noriega FG, Escriva H, Arnone MI, Albertin CB, Wotton KR, Almudi I, Martin D, Irimia M. Evolution of tissue-specific expression of ancestral genes across vertebrates and insects. Nat Ecol Evol 2024; 8:1140-1153. [PMID: 38622362 DOI: 10.1038/s41559-024-02398-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 03/08/2024] [Indexed: 04/17/2024]
Abstract
Regulation of gene expression is arguably the main mechanism underlying the phenotypic diversity of tissues within and between species. Here we assembled an extensive transcriptomic dataset covering 8 tissues across 20 bilaterian species and performed analyses using a symmetric phylogeny that allowed the combined and parallel investigation of gene expression evolution between vertebrates and insects. We specifically focused on widely conserved ancestral genes, identifying strong cores of pan-bilaterian tissue-specific genes and even larger groups that diverged to define vertebrate and insect tissues. Systematic inferences of tissue-specificity gains and losses show that nearly half of all ancestral genes have been recruited into tissue-specific transcriptomes. This occurred during both ancient and, especially, recent bilaterian evolution, with several gains being associated with the emergence of unique phenotypes (for example, novel cell types). Such pervasive evolution of tissue specificity was linked to gene duplication coupled with expression specialization of one of the copies, revealing an unappreciated prolonged effect of whole-genome duplications on recent vertebrate evolution.
Collapse
Affiliation(s)
- Federica Mantica
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Luis P Iñiguez
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Yamile Marquez
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jon Permanyer
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Antonio Torres-Mendez
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Josefa Cruz
- Institute of Evolutionary Biology (IBE, CSIC-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain
| | - Xavier Franch-Marro
- Institute of Evolutionary Biology (IBE, CSIC-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain
| | - Frank Tulenko
- Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia
| | - Demian Burguera
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Stephanie Bertrand
- Sorbonne Université, CNRS, Biologie Intégrative des Organismes Marins; BIOM, Banyuls-sur-Mer, France
| | - Toby Doyle
- Centre for Ecology and Conservation, University of Exeter, Penryn, UK
| | - Marcela Nouzova
- Institute of Parasitology, CAS, České Budějovice, Czech Republic
| | - Peter D Currie
- Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia
- EMBL Australia; Victorian Node, Monash University, Clayton, Victoria, Australia
| | - Fernando G Noriega
- Biology and BSI, Florida International University, Miami, FL, USA
- Department of Parasitology, University of South Bohemia, České Budějovice, Czech Republic
| | - Hector Escriva
- Sorbonne Université, CNRS, Biologie Intégrative des Organismes Marins; BIOM, Banyuls-sur-Mer, France
| | | | - Caroline B Albertin
- Eugene Bell Center for Regenerative Biology and Tissue Engineering, Marine Biological Laboratory, Woods Hole, MA, USA
| | - Karl R Wotton
- Centre for Ecology and Conservation, University of Exeter, Penryn, UK
| | - Isabel Almudi
- Department of Genetics, Microbiology and Statistics and IRBio, Universitat de Barcelona, Barcelona, Spain
| | - David Martin
- Institute of Evolutionary Biology (IBE, CSIC-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain
| | - Manuel Irimia
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain.
- Universitat Pompeu Fabra, Barcelona, Spain.
- ICREA, Barcelona, Spain.
| |
Collapse
|
4
|
Mohapatra S, Banerjee A, Rausseo P, Dragomir MP, Manyam GC, Broom BM, Calin GA. FuncPEP v2.0: An Updated Database of Functional Short Peptides Translated from Non-Coding RNAs. Noncoding RNA 2024; 10:20. [PMID: 38668378 PMCID: PMC11054400 DOI: 10.3390/ncrna10020020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 03/27/2024] [Accepted: 03/28/2024] [Indexed: 04/29/2024] Open
Abstract
Over the past decade, there have been reports of short novel functional peptides (less than 100 aa in length) translated from so-called non-coding RNAs (ncRNAs) that have been characterized using mass spectrometry (MS) and large-scale proteomics studies. Therefore, understanding the bivalent functions of some ncRNAs as transcripts that encode both functional RNAs and short peptides, which we named ncPEPs, will deepen our understanding of biology and disease. In 2020, we published the first database of functional peptides translated from non-coding RNAs-FuncPEP. Herein, we have performed an update including the newly published ncPEPs from the last 3 years along with the categorization of host ncRNAs. FuncPEP v2.0 contains 152 functional ncPEPs, out of which 40 are novel entries. A PubMed search from August 2020 to July 2023 incorporating specific keywords was performed and screened for publications reporting validated functional peptides derived from ncRNAs. We did not observe a significant increase in newly discovered functional ncPEPs, but a steady increase. The novel identified ncPEPs included in the database were characterized by a wide array of molecular and physiological parameters (i.e., types of host ncRNA, species distribution, chromosomal density, distribution of ncRNA length, identification methods, molecular weight, and functional distribution across humans and other species). We consider that, despite the fact that MS can now easily identify ncPEPs, there still are important limitations in proving their functionality.
Collapse
Affiliation(s)
- Swati Mohapatra
- Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (S.M.); (P.R.)
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA;
| | - Anik Banerjee
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA;
- Department of Neurology, University of Texas McGovern Medical School, Houston, TX 77030, USA
| | - Paola Rausseo
- Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (S.M.); (P.R.)
- Scripps College, Claremont, CA 91711, USA
| | - Mihnea P. Dragomir
- Institute of Pathology, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 10117 Berlin, Germany;
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Berlin Institute of Health at Charité, 10117 Berlin, Germany
| | - Ganiraju C. Manyam
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (G.C.M.)
| | - Bradley M. Broom
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (G.C.M.)
| | - George A. Calin
- Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (S.M.); (P.R.)
- Center for RNA Interference and Non-Coding RNAs, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
5
|
Jain A, Begum T, Ahmad S. Analysis and Prediction of Pathogen Nucleic Acid Specificity for Toll-like Receptors in Vertebrates. J Mol Biol 2023; 435:168208. [PMID: 37479078 DOI: 10.1016/j.jmb.2023.168208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/20/2023] [Accepted: 07/13/2023] [Indexed: 07/23/2023]
Abstract
Identification of key sequence, expression and function related features of nucleic acid-sensing host proteins is of fundamental importance to understand the dynamics of pathogen-specific host responses. To meet this objective, we considered toll-like receptors (TLRs), a representative class of membrane-bound sensor proteins, from 17 vertebrate species covering mammals, birds, reptiles, amphibians, and fishes in this comparative study. We identified the molecular signatures of host TLRs that are responsible for sensing pathogen nucleic acids or other pathogen-associated molecular patterns (PAMPs), and potentially play important roles in host defence mechanism. Interestingly, our findings reveal that such host-specific features are directly related to the strand (single or double) specificity of nucleic acid from pathogens. However, during host-pathogen interactions, such features were unable to explain the pathogenic PAMP (i.e., DNA, RNA or other) selectivity, suggesting a more complex mechanism. Using these features, we developed a number of machine learning models, of which Random Forest achieved a high performance (94.57% accuracy) to predict strand specificity of TLRs from protein-derived features. We applied the trained model to propose strand specificity of some previously uncharacterized distinct fish-specific novel TLRs (TLR18, TLR23, TLR24, TLR25, TLR27).
Collapse
Affiliation(s)
- Anuja Jain
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India. https://twitter.com/@Anuja334
| | - Tina Begum
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
6
|
Piya AA, DeGiorgio M, Assis R. Predicting gene expression divergence between single-copy orthologs in two species. Genome Biol Evol 2023; 15:evad078. [PMID: 37170892 PMCID: PMC10220509 DOI: 10.1093/gbe/evad078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 04/21/2023] [Accepted: 05/02/2023] [Indexed: 05/13/2023] Open
Abstract
Predicting gene expression divergence is integral to understanding the emergence of new biological functions and associated traits. Whereas several sophisticated methods have been developed for this task, their applications are either limited to duplicate genes or require expression data from more than two species. Thus, here we present PiXi, the first machine learning framework for predicting gene expression divergence between single-copy orthologs in two species. PiXi models gene expression evolution as an Ornstein-Uhlenbeck process, and overlays this model with multi-layer neural network, random forest, and support vector machine architectures for making predictions. It outputs the predicted class "conserved" or "diverged" for each pair of orthologs, as well as their predicted expression optima in the two species. We show that PiXi has high power and accuracy in predicting gene expression divergence between single-copy orthologs, as well as high accuracy and precision in estimating their expression optima in the two species, across a wide range of evolutionary scenarios, with the globally best performance achieved by a multi-layer neural network. Moreover, application of our best performing PiXi predictor to empirical gene expression data from single-copy orthologs residing at different loci in two species of Drosophila reveals that approximately 23% underwent expression divergence after positional relocation. Further analysis shows that several of these "diverged" genes are involved in the electron transport chain of the mitochondrial membrane, suggesting that new chromatin environments may impact energy production in Drosophila. Thus, by providing a toolkit for predicting gene expression divergence between single-copy orthologs in two species, PiXi can shed light on the origins of novel phenotypes across diverse biological processes and study systems.
Collapse
Affiliation(s)
- Antara Anika Piya
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FloridaUSA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FloridaUSA
| | - Raquel Assis
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FloridaUSA
- Institute for Human Health and Disease Intervention, Florida Atlantic University, Boca Raton, FloridaUSA
| |
Collapse
|
7
|
Palmer RD. Three Tiers to biological escape velocity: The quest to outwit aging. Aging Med (Milton) 2022; 5:281-286. [PMID: 36606268 PMCID: PMC9805293 DOI: 10.1002/agm2.12231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/14/2022] [Accepted: 11/20/2022] [Indexed: 12/14/2022] Open
Abstract
As longevity companies emerge with new products and the fields of anti-aging research develop new cutting-edge therapies, three distinct classes of longevity methodologies emerge. This discussion finds that there are three clear classes (Tiers) of longevity systems that are currently under development, and all three will be paramount to achieve biological escape velocity (where tissues can be repaired faster than aging can damage them). These classes are referred to as Tier 1, Tier 2, and Tier 3 treatments and are described in detail below. These three Tiers are required for easy identification for pharmaceutical companies and research companies to determine the type of therapy they may choose to deliver being noninvasive, invasive, time consuming, or simple end user products. Specific targets and goals need to be defined clearly from an early perspective in the development of these technologies for future precision medicines. This allows consumers of future anti-aging technologies to consider which Tier a particular therapy may be, delivering a more informed choice.
Collapse
Affiliation(s)
- Raymond D. Palmer
- Full Spectrum BiologicsSouth PerthWestern AustraliaAustralia
- School of Aging, Science of AgingSouth PerthWestern AustraliaAustralia
| |
Collapse
|
8
|
Dubreuil B, Levy ED. Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins. Front Mol Biosci 2021; 8:626729. [PMID: 33996892 PMCID: PMC8119896 DOI: 10.3389/fmolb.2021.626729] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/29/2021] [Indexed: 12/02/2022] Open
Abstract
An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein’s abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
9
|
Hu H, Zhang Q, Hu FF, Liu CJ, Guo AY. A comprehensive survey for human transcription factors on expression, regulation, interaction, phenotype and cancer survival. Brief Bioinform 2021; 22:6124917. [PMID: 33517372 DOI: 10.1093/bib/bbab002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Revised: 12/30/2020] [Accepted: 01/02/2021] [Indexed: 11/13/2022] Open
Abstract
Transcription factors (TFs) act as key regulators in biological processes through controlling gene expression. Here, we conducted a systematic study for all human TFs on the expression, regulation, interaction, mutation, phenotype and cancer survival. We revealed that the average expression levels of TFs in normal tissues were lower than 50% expression of non-TFs, whereas TF expression was increased in cancers. TFs that are specifically expressed in an individual tissue or cancer may be potential marker genes. For instance, TGIF2LX/Y were preferentially expressed in testis and NEUROG1, PRDM14, SRY, ZNF705A and ZNF716 were specifically highly expressed in germ cell tumors. We found different distributions of target genes and TF co-regulations in different TF families. Some small TF families have huge protein interaction pairs, suggesting their central roles in transcriptional regulation. The bZIP family is a small family involving many signaling pathways. Survival analysis indicated that most TFs significantly affect survival of one or more cancers. Some survival-related TFs were also specifically highly expressed in the corresponding cancer types, which may be potential targets for cancer therapy. Finally, we identified 43 TFs whose mutations were closely correlated to survival, suggesting their cancer-driven roles. The systematic analysis of TFs provides useful clues for further investigation of TF regulatory mechanisms and the role of TFs in diseases.
Collapse
Affiliation(s)
- Hui Hu
- Center for Artificial Intelligence Biology, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Qiong Zhang
- Center for Artificial Intelligence Biology, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Fei-Fei Hu
- Center for Artificial Intelligence Biology, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Chun-Jie Liu
- Center for Artificial Intelligence Biology, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - An-Yuan Guo
- Center for Artificial Intelligence Biology, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
10
|
Dragomir MP, Manyam GC, Ott LF, Berland L, Knutsen E, Ivan C, Lipovich L, Broom BM, Calin GA. FuncPEP: A Database of Functional Peptides Encoded by Non-Coding RNAs. Noncoding RNA 2020; 6:E41. [PMID: 32977531 PMCID: PMC7712257 DOI: 10.3390/ncrna6040041] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 09/15/2020] [Accepted: 09/18/2020] [Indexed: 02/06/2023] Open
Abstract
Non-coding RNAs (ncRNAs) are essential players in many cellular processes, from normal development to oncogenic transformation. Initially, ncRNAs were defined as transcripts that lacked an open reading frame (ORF). However, multiple lines of evidence suggest that certain ncRNAs encode small peptides of less than 100 amino acids. The sequences encoding these peptides are known as small open reading frames (smORFs), many initiating with the traditional AUG start codon but terminating with atypical stop codons, suggesting a different biogenesis. The ncRNA-encoded peptides (ncPEPs) are gradually becoming appreciated as a new class of functional molecules that contribute to diverse cellular processes, and are deregulated in different diseases contributing to pathogenesis. As multiple publications have identified unique ncPEPs, we appreciated the need for assembling a new web resource that could gather information about these functional ncPEPs. We developed FuncPEP, a new database of functional ncRNA encoded peptides, containing all experimentally validated and functionally characterized ncPEPs. Currently, FuncPEP includes a comprehensive annotation of 112 functional ncPEPs and specific details regarding the ncRNA transcripts that encode these peptides. We believe that FuncPEP will serve as a platform for further deciphering the biologic significance and medical use of ncPEPs.
Collapse
Affiliation(s)
- Mihnea P. Dragomir
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Department of Surgery, Fundeni Clinical Hospital, Carol Davila University of Medicine and Pharmacy, 022328 Bucharest, Romania
| | - Ganiraju C. Manyam
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (G.C.M.); (B.M.B.)
| | - Leonie Florence Ott
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Institute of Tumor Biology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Léa Berland
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
| | - Erik Knutsen
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Department of Medical Biology, Faculty of Health Sciences, UiT—The Arctic University of Norway, N-9037 Tromsø, Norway
| | - Cristina Ivan
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Center for RNA Interference and Non-Coding RNAs, The University of Texas MD Anderson Cancer Centre, Houston, TX 77054, USA
| | - Leonard Lipovich
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48201, USA;
| | - Bradley M. Broom
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (G.C.M.); (B.M.B.)
| | - George A. Calin
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Center for RNA Interference and Non-Coding RNAs, The University of Texas MD Anderson Cancer Centre, Houston, TX 77054, USA
| |
Collapse
|
11
|
Ali F, Seshasayee ASN. Dynamics of genetic variation in transcription factors and its implications for the evolution of regulatory networks in Bacteria. Nucleic Acids Res 2020; 48:4100-4114. [PMID: 32182360 PMCID: PMC7192604 DOI: 10.1093/nar/gkaa162] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 02/05/2020] [Accepted: 03/03/2020] [Indexed: 11/25/2022] Open
Abstract
The evolution of regulatory networks in Bacteria has largely been explained at macroevolutionary scales through lateral gene transfer and gene duplication. Transcription factors (TF) have been found to be less conserved across species than their target genes (TG). This would be expected if TFs accumulate mutations faster than TGs. This hypothesis is supported by several lab evolution studies which found TFs, especially global regulators, to be frequently mutated. Despite these studies, the contribution of point mutations in TFs to the evolution of regulatory network is poorly understood. We tested if TFs show greater genetic variation than their TGs using whole-genome sequencing data from a large collection of Escherichia coli isolates. TFs were less diverse than their TGs across natural isolates, with TFs of large regulons being more conserved. In contrast, TFs showed higher mutation frequency in adaptive laboratory evolution experiments. However, over long-term laboratory evolution spanning 60 000 generations, mutation frequency in TFs gradually declined after a rapid initial burst. Extrapolating the dynamics of genetic variation from long-term laboratory evolution to natural populations, we propose that point mutations, conferring large-scale gene expression changes, may drive the early stages of adaptation but gene regulation is subjected to stronger purifying selection post adaptation.
Collapse
Affiliation(s)
- Farhan Ali
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bengaluru, Karnataka 560065, India.,Manipal Academy of Higher Education, Manipal, Karnataka 576104, India
| | - Aswin Sai Narain Seshasayee
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bengaluru, Karnataka 560065, India
| |
Collapse
|
12
|
Cambridge SB. Hypothesis: protein and RNA attributes are continuously optimized over time. BMC Genomics 2019; 20:1012. [PMID: 31870287 PMCID: PMC6929361 DOI: 10.1186/s12864-019-6371-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 12/05/2019] [Indexed: 02/01/2023] Open
Abstract
Background Little is known why proteins and RNAs exhibit half-lives varying over several magnitudes. Despite many efforts, a conclusive link between half-lives and gene function could not be established suggesting that other determinants may influence these molecular attributes. Results Here, I find that with increasing gene age there is a gradual and significant increase of protein and RNA half-lives, protein structure, and other molecular attributes that tend to affect protein abundance. These observations are accommodated in a hypothesis which posits that new genes at ‘birth’ are not optimized and thus their products exhibit low half-lives and less structure but continuous mutagenesis eventually improves these attributes. Thus, the protein and RNA products of the oldest genes obtained their high degrees of stability and structure only after billions of years while the products of younger genes had less time to be optimized and are therefore less stable and structured. Because more stable proteins with lower turnover require less transcription to maintain the same level of abundance, reduced transcription-associated mutagenesis (TAM) would fixate the changes by increasing gene conservation. Conclusions Consequently, the currently observed diversity of molecular attributes is a snapshot of gene products being at different stages along their temporal path of optimization.
Collapse
Affiliation(s)
- Sidney B Cambridge
- Department of Functional Neuroanatomy, Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
13
|
Fourati S, Talla A, Mahmoudian M, Burkhart JG, Klén R, Henao R, Yu T, Aydın Z, Yeung KY, Ahsen ME, Almugbel R, Jahandideh S, Liang X, Nordling TEM, Shiga M, Stanescu A, Vogel R, Pandey G, Chiu C, McClain MT, Woods CW, Ginsburg GS, Elo LL, Tsalik EL, Mangravite LM, Sieberts SK. A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection. Nat Commun 2018; 9:4418. [PMID: 30356117 PMCID: PMC6200745 DOI: 10.1038/s41467-018-06735-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 09/12/2018] [Indexed: 01/17/2023] Open
Abstract
The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.
Collapse
Affiliation(s)
- Slim Fourati
- Department of Pathology, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Aarthi Talla
- Department of Pathology, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Mehrad Mahmoudian
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
- Department of Future Technologies, University of Turku, FI-20014 Turku, Finland
| | - Joshua G Burkhart
- Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, OR, 97239, USA
- Laboratory of Evolutionary Genetics, Institute of Ecology and Evolution, University of Oregon, Eugene, OR, 97403, USA
| | - Riku Klén
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Ricardo Henao
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA
| | - Thomas Yu
- Sage Bionetworks, Seattle, WA, 98121, USA
| | - Zafer Aydın
- Department of Computer Engineering, Abdullah Gul University, Kayseri, 38080, Turkey
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, WA, 98402, USA
| | - Mehmet Eren Ahsen
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Reem Almugbel
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, WA, 98402, USA
| | | | - Xiao Liang
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, WA, 98402, USA
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, Tainan, 70101, Taiwan
| | - Motoki Shiga
- Department of Electrical, Electronic and Computer Engineering, Faculty of Engineering, Gifu University, Gifu, 501-1193, Japan
| | - Ana Stanescu
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Computer Science, University of West Georgia, Carrolton, GA, 30116, USA
| | - Robert Vogel
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598, USA
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Christopher Chiu
- Section of Infectious Diseases and Immunity, Imperial College London, London, W12 0NN, UK
| | - Micah T McClain
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Medical Service, Durham VA Health Care System, Durham, NC, 27705, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Christopher W Woods
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Medical Service, Durham VA Health Care System, Durham, NC, 27705, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Geoffrey S Ginsburg
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Ephraim L Tsalik
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Emergency Medicine Service, Durham VA Health Care System, Durham, NC, 27705, USA
| | | | | |
Collapse
|
14
|
Biswas K, Acharya D, Podder S, Ghosh TC. Evolutionary rate heterogeneity between multi- and single-interface hubs across human housekeeping and tissue-specific protein interaction network: Insights from proteins' and its partners' properties. Genomics 2017; 110:283-290. [PMID: 29198610 DOI: 10.1016/j.ygeno.2017.11.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 11/10/2017] [Accepted: 11/29/2017] [Indexed: 12/12/2022]
Abstract
Integrating gene expression into protein-protein interaction network (PPIN) leads to the construction of tissue-specific (TS) and housekeeping (HK) sub-networks, with distinctive TS- and HK-hubs. All such hub proteins are divided into multi-interface (MI) hubs and single-interface (SI) hubs, where MI hubs evolve slower than SI hubs. Here we explored the evolutionary rate difference between MI and SI proteins within TS- and HK-PPIN and observed that this difference is present only in TS, but not in HK-class. Next, we explored whether proteins' own properties or its partners' properties are more influential in such evolutionary discrepancy. Statistical analyses revealed that this evolutionary rate correlates negatively with protein's own properties like expression level, miRNA count, conformational diversity and functional properties and with its partners' properties like protein disorder and tissue expression similarity. Moreover, partial correlation and regression analysis revealed that both proteins' and its partners' properties have independent effects on protein evolutionary rate.
Collapse
Affiliation(s)
- Kakali Biswas
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Debarun Acharya
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | - Soumita Podder
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India; Department of Microbiology, Raiganj University, Raiganj, Uttar Dinajpur 733134, India
| | - Tapash Chandra Ghosh
- Bioinformatics Centre, Bose Institute, P-1/12, C.I.T. Scheme VII M, Kolkata 700 054, India.
| |
Collapse
|
15
|
Modelling the evolution of transcription factor binding preferences in complex eukaryotes. Sci Rep 2017; 7:7596. [PMID: 28790414 PMCID: PMC5548724 DOI: 10.1038/s41598-017-07761-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Accepted: 06/30/2017] [Indexed: 12/27/2022] Open
Abstract
Transcription factors (TFs) exert their regulatory action by binding to DNA with specific sequence preferences. However, different TFs can partially share their binding sequences due to their common evolutionary origin. This "redundancy" of binding defines a way of organizing TFs in "motif families" by grouping TFs with similar binding preferences. Since these ultimately define the TF target genes, the motif family organization entails information about the structure of transcriptional regulation as it has been shaped by evolution. Focusing on the human TF repertoire, we show that a one-parameter evolutionary model of the Birth-Death-Innovation type can explain the TF empirical repartition in motif families, and allows to highlight the relevant evolutionary forces at the origin of this organization. Moreover, the model allows to pinpoint few deviations from the neutral scenario it assumes: three over-expanded families (including HOX and FOX genes), a set of "singleton" TFs for which duplication seems to be selected against, and a higher-than-average rate of diversification of the binding preferences of TFs with a Zinc Finger DNA binding domain. Finally, a comparison of the TF motif family organization in different eukaryotic species suggests an increase of redundancy of binding with organism complexity.
Collapse
|
16
|
Ehsani R, Bahrami S, Drabløs F. Feature-based classification of human transcription factors into hypothetical sub-classes related to regulatory function. BMC Bioinformatics 2016; 17:459. [PMID: 27842491 PMCID: PMC5109715 DOI: 10.1186/s12859-016-1349-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Accepted: 11/10/2016] [Indexed: 12/15/2022] Open
Abstract
Background Transcription factors are key proteins in the regulation of gene transcription. An important step in this process is the opening of chromatin in order to make genomic regions available for transcription. Data on DNase I hypersensitivity has previously been used to label a subset of transcription factors as Pioneers, Settlers and Migrants to describe their potential role in this process. These labels represent an interesting hypothesis on gene regulation and possibly a useful approach for data analysis, and therefore we wanted to expand the set of labeled transcription factors to include as many known factors as possible. We have used a well-annotated dataset of 1175 transcription factors as input to supervised machine learning methods, using the subset with previously assigned labels as training set. We then used the final classifier to label the additional transcription factors according to their potential role as Pioneers, Settlers and Migrants. The full set of labeled transcription factors was used to investigate associated properties and functions of each class, including an analysis of interaction data for transcription factors based on DNA co-binding and protein-protein interactions. We also used the assigned labels to analyze a previously published set of gene lists associated with a time course experiment on cell differentiation. Results The analysis showed that the classification of transcription factors with respect to their potential role in chromatin opening largely was determined by how they bind to DNA. Each subclass of transcription factors was enriched for properties that seemed to characterize the subclass relative to its role in gene regulation, with very general functions for Pioneers, whereas Migrants to a larger extent were associated with specific processes. Further analysis showed that the expanded classification is a useful resource for analyzing other datasets on transcription factors with respect to their potential role in gene regulation. The analysis of transcription factor interaction data showed complementary differences between the subclasses, where transcription factors labeled as Pioneers often interact with other transcription factors through DNA co-binding, whereas Migrants to a larger extent use protein-protein interactions. The analysis of time course data on cell differentiation indicated a shift in the regulatory program associated with Pioneer-like transcription factors during differentiation. Conclusions The expanded classification is an interesting resource for analyzing data on gene regulation, as illustrated here on transcription factor interaction data and data from a time course experiment. The potential regulatory function of transcription factors seems largely to be determined by how they bind DNA, but is also influenced by how they interact with each other through cooperativity and protein-protein interactions. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1349-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rezvan Ehsani
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, PO Box 8905, NO-7491, Trondheim, Norway.,Department of Mathematics, University of Zabol, Zabol, Iran
| | - Shahram Bahrami
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, PO Box 8905, NO-7491, Trondheim, Norway.,St. Olavs Hospital, Trondheim University Hospital, NO-7006, Trondheim, Norway
| | - Finn Drabløs
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, PO Box 8905, NO-7491, Trondheim, Norway.
| |
Collapse
|
17
|
Wanggou S, Feng C, Xie Y, Ye L, Wang F, Li X. Sample Level Enrichment Analysis of KEGG Pathways Identifies Clinically Relevant Subtypes of Glioblastoma. J Cancer 2016; 7:1701-1710. [PMID: 27698907 PMCID: PMC5039391 DOI: 10.7150/jca.15486] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Accepted: 06/02/2016] [Indexed: 12/25/2022] Open
Abstract
Background: Glioblastoma is the most lethal primary brain tumor in adults. Aberrant signal transduction pathways, associated with the progression of glioblastoma, have been identified recently and may offer a potential gene therapy strategy. Methods and Findings: We first used the sample level enrichment analysis to transfer gene expression profile of TCGA dataset into pathway enrichment z-score matrix. Then, we classified glioblastoma into five subtypes (Cluster A to Cluster E) by the consensus clustering and silhouette analysis. Principle component analysis showed the five subtype could be separated by first three principle components. Integrative omics data showed that mesenchymal subtype was rich in Cluster A, neural subtype was centered in Cluster D and proneural subtype was gathered in Cluster E, while Cluster E showed a high percentage of G-CIMP subtype. Additionally, according to analyze the overall survival and progression free survival of each subtype by Kaplan-Merie analysis and Cox hazard proportion model, we identified Cluster D and Cluster E received a better prognosis. Conclusions: We report a clinically relevant classification of glioblastoma based on sample level KEGG pathway enrichment profile and this novel classification system provided new insights into the heterogeneity of glioblastoma, and may be used as an important clinical tool to predict the prognosis.
Collapse
Affiliation(s)
- Siyi Wanggou
- Department of Neurosurgery, Xiangya Hospital, Central South University
| | - Chengyuan Feng
- Department of Neurosurgery, Xiangya Hospital, Central South University
| | - Yuanyang Xie
- Department of Neurosurgery, Xiangya Hospital, Central South University
| | - Linrong Ye
- Department of Neurosurgery, Xiangya Hospital, Central South University
| | - Feiyifan Wang
- Department of Neurosurgery, Xiangya Hospital, Central South University
| | - Xuejun Li
- Department of Neurosurgery, Xiangya Hospital, Central South University
| |
Collapse
|
18
|
Biswas K, Chakraborty S, Podder S, Ghosh TC. Insights into the dN/dS ratio heterogeneity between brain specific genes and widely expressed genes in species of different complexity. Genomics 2016; 108:11-7. [DOI: 10.1016/j.ygeno.2016.04.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 04/22/2016] [Accepted: 04/23/2016] [Indexed: 01/07/2023]
|
19
|
Levati E, Sartini S, Ottonello S, Montanini B. Dry and wet approaches for genome-wide functional annotation of conventional and unconventional transcriptional activators. Comput Struct Biotechnol J 2016; 14:262-70. [PMID: 27453771 PMCID: PMC4941109 DOI: 10.1016/j.csbj.2016.06.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 06/21/2016] [Accepted: 06/23/2016] [Indexed: 02/06/2023] Open
Abstract
Transcription factors (TFs) are master gene products that regulate gene expression in response to a variety of stimuli. They interact with DNA in a sequence-specific manner using a variety of DNA-binding domain (DBD) modules. This allows to properly position their second domain, called "effector domain", to directly or indirectly recruit positively or negatively acting co-regulators including chromatin modifiers, thus modulating preinitiation complex formation as well as transcription elongation. At variance with the DBDs, which are comprised of well-defined and easily recognizable DNA binding motifs, effector domains are usually much less conserved and thus considerably more difficult to predict. Also not so easy to identify are the DNA-binding sites of TFs, especially on a genome-wide basis and in the case of overlapping binding regions. Another emerging issue, with many potential regulatory implications, is that of so-called "moonlighting" transcription factors, i.e., proteins with an annotated function unrelated to transcription and lacking any recognizable DBD or effector domain, that play a role in gene regulation as their second job. Starting from bioinformatic and experimental high-throughput tools for an unbiased, genome-wide identification and functional characterization of TFs (especially transcriptional activators), we describe both established (and usually well affordable) as well as newly developed platforms for DNA-binding site identification. Selected combinations of these search tools, some of which rely on next-generation sequencing approaches, allow delineating the entire repertoire of TFs and unconventional regulators encoded by the any sequenced genome.
Collapse
Affiliation(s)
| | | | - Simone Ottonello
- Corresponding author at: Department of Life Sciences, University of Parma, Parco Area delle Scienze 23/A, 43124 Parma, Italy.Department of Life SciencesUniversity of ParmaParco Area delle Scienze 23/AParma43124Italy
| | | |
Collapse
|
20
|
Jorquera R, Ortiz R, Ossandon F, Cárdenas JP, Sepúlveda R, González C, Holmes DS. SinEx DB: a database for single exon coding sequences in mammalian genomes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw095. [PMID: 27278816 PMCID: PMC4897596 DOI: 10.1093/database/baw095] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Accepted: 05/11/2016] [Indexed: 12/27/2022]
Abstract
Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl.
Collapse
Affiliation(s)
- Roddy Jorquera
- Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida and Facultad de Ciencias Biologicas, Universidad Andres Bello, Avda Zañartu 1482, Santiago, Chile
| | - Rodrigo Ortiz
- Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida and Facultad de Ciencias Biologicas, Universidad Andres Bello, Avda Zañartu 1482, Santiago, Chile
| | - F Ossandon
- Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida and Facultad de Ciencias Biologicas, Universidad Andres Bello, Avda Zañartu 1482, Santiago, Chile
| | - Juan Pablo Cárdenas
- Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida and Facultad de Ciencias Biologicas, Universidad Andres Bello, Avda Zañartu 1482, Santiago, Chile
| | - Rene Sepúlveda
- Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida and Facultad de Ciencias Biologicas, Universidad Andres Bello, Avda Zañartu 1482, Santiago, Chile
| | - Carolina González
- Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida and Facultad de Ciencias Biologicas, Universidad Andres Bello, Avda Zañartu 1482, Santiago, Chile
| | - David S Holmes
- Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida and Facultad de Ciencias Biologicas, Universidad Andres Bello, Avda Zañartu 1482, Santiago, Chile
| |
Collapse
|
21
|
Begum T, Ghosh TC. Elucidating the genotype-phenotype relationships and network perturbations of human shared and specific disease genes from an evolutionary perspective. Genome Biol Evol 2014; 6:2741-53. [PMID: 25287147 PMCID: PMC4224346 DOI: 10.1093/gbe/evu220] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To date, numerous studies have been attempted to determine the extent of variation in evolutionary rates between human disease and nondisease (ND) genes. In our present study, we have considered human autosomal monogenic (Mendelian) disease genes, which were classified into two groups according to the number of phenotypic defects, that is, specific disease (SPD) gene (one gene: one defect) and shared disease (SHD) gene (one gene: multiple defects). Here, we have compared the evolutionary rates of these two groups of genes, that is, SPD genes and SHD genes with respect to ND genes. We observed that the average evolutionary rates are slow in SHD group, intermediate in SPD group, and fast in ND group. Group-to-group evolutionary rate differences remain statistically significant regardless of their gene expression levels and number of defects. We demonstrated that disease genes are under strong selective constraint if they emerge through edgetic perturbation or drug-induced perturbation of the interactome network, show tissue-restricted expression, and are involved in transmembrane transport. Among all the factors, our regression analyses interestingly suggest the independent effects of 1) drug-induced perturbation and 2) the interaction term of expression breadth and transmembrane transport on protein evolutionary rates. We reasoned that the drug-induced network disruption is a combination of several edgetic perturbations and, thus, has more severe effect on gene phenotypes.
Collapse
Affiliation(s)
- Tina Begum
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
| | | |
Collapse
|
22
|
Effect of duplicate genes on mouse genetic robustness: an update. BIOMED RESEARCH INTERNATIONAL 2014; 2014:758672. [PMID: 25110693 PMCID: PMC4119742 DOI: 10.1155/2014/758672] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 06/15/2014] [Accepted: 06/16/2014] [Indexed: 12/28/2022]
Abstract
In contrast to S. cerevisiae and C. elegans, analyses based on the current knockout (KO) mouse phenotypes led to the conclusion that duplicate genes had almost no role in mouse genetic robustness. It has been suggested that the bias of mouse KO database toward ancient duplicates may possibly cause this knockout duplicate puzzle, that is, a very similar proportion of essential genes (PE) between duplicate genes and singletons. In this paper, we conducted an extensive and careful analysis for the mouse KO phenotype data and corroborated a strong effect of duplicate genes on mouse genetics robustness. Moreover, the effect of duplicate genes on mouse genetic robustness is duplication-age dependent, which holds after ruling out the potential confounding effect from coding-sequence conservation, protein-protein connectivity, functional bias, or the bias of duplicates generated by whole genome duplication (WGD). Our findings suggest that two factors, the sampling bias toward ancient duplicates and very ancient duplicates with a proportion of essential genes higher than that of singletons, have caused the mouse knockout duplicate puzzle; meanwhile, the effect of genetic buffering may be correlated with sequence conservation as well as protein-protein interactivity.
Collapse
|
23
|
Witztum J, Persi E, Horn D, Pasmanik-Chor M, Chor B. Hierarchical partitioning of metazoan protein conservation profiles provides new functional insights. PLoS One 2014; 9:e90282. [PMID: 24594619 PMCID: PMC3942430 DOI: 10.1371/journal.pone.0090282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Accepted: 01/29/2014] [Indexed: 12/23/2022] Open
Abstract
The availability of many complete, annotated proteomes enables the systematic study of the relationships between protein conservation and functionality. We explore this question based solely on the presence or absence of protein homologues (a.k.a. conservation profiles). We study 18 metazoans, from two distinct points of view: the human's and the fly's. Using the GOrilla gene ontology (GO) analysis tool, we explore functional enrichment of the “universal proteins”, those with homologues in all 17 other species, and of the “non-universal proteins”. A large number of GO terms are strongly enriched in both human and fly universal proteins. Most of these functions are known to be essential. A smaller number of GO terms, exhibiting markedly different properties, are enriched in both human and fly non-universal proteins. We further explore the non-universal proteins, whose conservation profiles are consistent with the “tree of life” (TOL consistent), as well as the TOL inconsistent proteins. Finally, we applied Quantum Clustering to the conservation profiles of the TOL consistent proteins. Each cluster is strongly associated with one or a small number of specific monophyletic clades in the tree of life. The proteins in many of these clusters exhibit strong functional enrichment associated with the “life style” of the related clades. Most previous approaches for studying function and conservation are “bottom up”, studying protein families one by one, and separately assessing the conservation of each. By way of contrast, our approach is “top down”. We globally partition the set of all proteins hierarchically, as described above, and then identify protein families enriched within different subdivisions. While supporting previous findings, our approach also provides a tool for discovering novel relations between protein conservation profiles, functionality, and evolutionary history as represented by the tree of life.
Collapse
Affiliation(s)
| | - Erez Persi
- School of Physics and Astronomy, Tel-Aviv University, Tel-Aviv, Israel
| | - David Horn
- School of Physics and Astronomy, Tel-Aviv University, Tel-Aviv, Israel
| | | | - Benny Chor
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
- * E-mail:
| |
Collapse
|
24
|
Jene-Sanz A, Váraljai R, Vilkova AV, Khramtsova GF, Khramtsov AI, Olopade OI, Lopez-Bigas N, Benevolenskaya EV. Expression of polycomb targets predicts breast cancer prognosis. Mol Cell Biol 2013; 33:3951-61. [PMID: 23918806 PMCID: PMC3811872 DOI: 10.1128/mcb.00426-13] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Accepted: 07/28/2013] [Indexed: 11/20/2022] Open
Abstract
Global changes in the epigenome are increasingly being appreciated as key events in cancer progression. The pathogenic role of enhancer of zeste homolog 2 (EZH2) has been connected to its histone 3 lysine 27 (H3K27) methyltransferase activity and gene repression; however, little is known about relationship of changes in expression of EZH2 target genes to cancer characteristics and patient prognosis. Here we show that through expression analysis of genomic regions with H3K27 trimethylation (H3K27me3) and EZH2 binding, breast cancer patients can be stratified into good and poor prognostic groups independent of known cancer gene signatures. The EZH2-bound regions were downregulated in tumors characterized by aggressive behavior, high expression of cell cycle genes, and low expression of developmental and cell adhesion genes. Depletion of EZH2 in breast cancer cells significantly increased expression of the top altered genes, decreased proliferation, and improved cell adhesion, indicating a critical role played by EZH2 in determining the cancer phenotype.
Collapse
Affiliation(s)
- Alba Jene-Sanz
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Renáta Váraljai
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Alexandra V. Vilkova
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, Illinois, USA
| | | | | | | | - Nuria Lopez-Bigas
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Elizaveta V. Benevolenskaya
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, Illinois, USA
| |
Collapse
|
25
|
Gonzalez-Perez A, Jene-Sanz A, Lopez-Bigas N. The mutational landscape of chromatin regulatory factors across 4,623 tumor samples. Genome Biol 2013; 14:r106. [PMID: 24063517 PMCID: PMC4054018 DOI: 10.1186/gb-2013-14-9-r106] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2013] [Accepted: 09/24/2013] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Chromatin regulatory factors are emerging as important genes in cancer development and are regarded as interesting candidates for novel targets for cancer treatment. However, we lack a comprehensive understanding of the role of this group of genes in different cancer types. RESULTS We have analyzed 4,623 tumor samples from thirteen anatomical sites to determine which chromatin regulatory factors are candidate drivers in these different sites. We identify 34 chromatin regulatory factors that are likely drivers in tumors from at least one site, all with relatively low mutational frequency. We also analyze the relative importance of mutations in this group of genes for the development of tumorigenesis in each site, and indifferent tumor types from the same site. CONCLUSIONS We find that, although tumors from all thirteen sites show mutations in likely driver chromatin regulatory factors, these are more prevalent in tumors arising from certain tissues. With the exception of hematopoietic, liver and kidney tumors, as a median, the mutated factors are less than one fifth of all mutated drivers across all sites analyzed. We also show that mutations in two of these genes, MLL and EP300, correlate with broad expression changes across cancer cell lines, thus presenting at least one mechanism through which these mutations could contribute to tumorigenesis in cells of the corresponding tissues.
Collapse
Affiliation(s)
- Abel Gonzalez-Perez
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona, Spain
| | - Alba Jene-Sanz
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona, Spain
| | - Nuria Lopez-Bigas
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
26
|
Xin YN, Zhao Y, Lin ZH, Jiang X, Xuan SY, Huang J. Molecular dynamics simulation of PNPLA3 I148M polymorphism reveals reduced substrate access to the catalytic cavity. Proteins 2012; 81:406-14. [PMID: 23042597 DOI: 10.1002/prot.24199] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Revised: 09/20/2012] [Accepted: 09/20/2012] [Indexed: 12/13/2022]
Abstract
A missense mutation I148M in PNPLA3 (patatin-like phospholipase domain-containing 3 protein) is significantly correlated with nonalcoholic fatty liver disease (NAFLD). To glean insights into mutation's effect on enzymatic activity, we performed molecular dynamics simulation and flexible docking studies. Our data show that the size of the substrate-access entry site is significantly reduced in mutants, which limits the access of palmitic acid to the catalytic dyad. Besides, the binding free energy calculations suggest low affinity for substrate to mutant enzyme. The substrate-bound system simulations reveal that the spatial arrangement of palmitic acid is distinct in wild-type from that in mutant. The substrate recognition specificity is lost due to the loop where the I148M mutation was located. Our results provide strong evidence for the mechanism by which I148M affects the enzyme activity and suggest that mediating the dynamics may offer a potential avenue for NAFLD.
Collapse
Affiliation(s)
- Yong-Ning Xin
- College of Medicine and Pharmaceutics, Ocean University of China, Qingdao 266003, Shandong Province, China
| | | | | | | | | | | |
Collapse
|
27
|
Gundem G, Lopez-Bigas N. Sample-level enrichment analysis unravels shared stress phenotypes among multiple cancer types. Genome Med 2012; 4:28. [PMID: 22458606 PMCID: PMC3446278 DOI: 10.1186/gm327] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Revised: 02/24/2012] [Accepted: 03/29/2012] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Adaptation to stress signals in the tumor microenvironment is a crucial step towards carcinogenic phenotype. The adaptive alterations attained by cells to withstand different types of insults are collectively referred to as the stress phenotypes of cancers. In this manuscript we explore the interrelation of different stress phenotypes in multiple cancer types and ask if these phenotypes could be used to explain prognostic differences among tumor samples. METHODS We propose a new approach based on enrichment analysis at the level of samples (sample-level enrichment analysis - SLEA) in expression profiling datasets. Without using a priori phenotypic information about samples, SLEA calculates an enrichment score per sample per gene set using z-test. This score is used to determine the relative importance of the corresponding pathway or module in different patient groups. RESULTS Our analysis shows that tumors significantly upregulating genes related to chromosome instability strongly correlate with worse prognosis in breast cancer. Moreover, in multiple tumor types, these tumors upregulate a senescence-bypass transcriptional program and exhibit similar stress phenotypes. CONCLUSIONS Using SLEA we are able to find relationships between stress phenotype pathways across multiple cancer types. Moreover we show that SLEA enables the identification of gene sets in correlation with clinical characteristics such as survival, as well as the identification of biological pathways/processes that underlie the pathology of different cancer subgroups.
Collapse
Affiliation(s)
- Gunes Gundem
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Dr, Aiguader 88, Barcelona, Spain.
| | | |
Collapse
|
28
|
Identifying highly conserved and highly differentiated gene ontology categories in human populations. PLoS One 2011; 6:e27871. [PMID: 22140477 PMCID: PMC3227580 DOI: 10.1371/journal.pone.0027871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2010] [Accepted: 10/27/2011] [Indexed: 11/19/2022] Open
Abstract
Detecting and interpreting certain system-level characteristics associated with human population genetic differences is a challenge for human geneticists. In this study, we conducted a population genetic study using the HapMap genotype data to identify certain special Gene Ontology (GO) categories associated with high/low genetic difference among 11 Hapmap populations. Initially, the genetic differences in each gene region among these populations were measured using allele frequency, linkage disequilibrium (LD) pattern, and transferability of tagSNPs. The associations between each GO term and these genetic differences were then identified. The results showed that cellular process, catalytic activity, binding, and some of their sub-terms were associated with high levels of genetic difference, and genes involved in these functional categories displayed, on average, high genetic diversity among different populations. By contrast, multicellular organismal processes, molecular transducer activity, and some of their sub-terms were associated with low levels of genetic difference. In particular, the neurological system process under the multicellular organismal process category had low levels of genetic difference; the neurological function also showed high evolutionary conservation between species in some previous studies. These results may provide a new insight into the understanding of human evolutionary history at the system-level.
Collapse
|
29
|
Perez-Llamas C, Lopez-Bigas N. Gitools: analysis and visualisation of genomic data using interactive heat-maps. PLoS One 2011; 6:e19541. [PMID: 21602921 PMCID: PMC3094337 DOI: 10.1371/journal.pone.0019541] [Citation(s) in RCA: 217] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2010] [Accepted: 03/31/2011] [Indexed: 12/30/2022] Open
Abstract
Intuitive visualization of data and results is very important in genomics, especially when many conditions are to be analyzed and compared. Heat-maps have proven very useful for the representation of biological data. Here we present Gitools (http://www.gitools.org), an open-source tool to perform analyses and visualize data and results as interactive heat-maps. Gitools contains data import systems from several sources (i.e. IntOGen, Biomart, KEGG, Gene Ontology), which facilitate the integration of novel data with previous knowledge.
Collapse
Affiliation(s)
- Christian Perez-Llamas
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, University Pompeu Fabra, Barcelona, Spain
| | - Nuria Lopez-Bigas
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, University Pompeu Fabra, Barcelona, Spain
- * E-mail:
| |
Collapse
|
30
|
Hudson CM, Conant GC. Expression level, cellular compartment and metabolic network position all influence the average selective constraint on mammalian enzymes. BMC Evol Biol 2011; 11:89. [PMID: 21470417 PMCID: PMC3082228 DOI: 10.1186/1471-2148-11-89] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Accepted: 04/06/2011] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND A gene's position in regulatory, protein interaction or metabolic networks can be predictive of the strength of purifying selection acting on it, but these relationships are neither universal nor invariably strong. Following work in bacteria, fungi and invertebrate animals, we explore the relationship between selective constraint and metabolic function in mammals. RESULTS We measure the association between selective constraint, estimated by the ratio of nonsynonymous (Ka) to synonymous (Ks) substitutions, and several, primarily metabolic, measures of gene function. We find significant differences between the selective constraints acting on enzyme-coding genes from different cellular compartments, with the nucleus showing higher constraint than genes from either the cytoplasm or the mitochondria. Among metabolic genes, the centrality of an enzyme in the metabolic network is significantly correlated with Ka/Ks. In contrast to yeasts, gene expression magnitude does not appear to be the primary predictor of selective constraint in these organisms. CONCLUSIONS Our results imply that the relationship between selective constraint and enzyme centrality is complex: the strength of selective constraint acting on mammalian genes is quite variable and does not appear to exclusively follow patterns seen in other organisms.
Collapse
Affiliation(s)
- Corey M Hudson
- Informatics Institute, University of Missouri, Columbia, MO, USA.
| | | |
Collapse
|
31
|
Wang D, Liu F, Wang L, Huang S, Yu J. Nonsynonymous substitution rate (Ka) is a relatively consistent parameter for defining fast-evolving and slow-evolving protein-coding genes. Biol Direct 2011; 6:13. [PMID: 21342519 PMCID: PMC3055854 DOI: 10.1186/1745-6150-6-13] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2010] [Accepted: 02/22/2011] [Indexed: 12/30/2022] Open
Abstract
Background Mammalian genome sequence data are being acquired in large quantities and at enormous speeds. We now have a tremendous opportunity to better understand which genes are the most variable or conserved, and what their particular functions and evolutionary dynamics are, through comparative genomics. Results We chose human and eleven other high-coverage mammalian genome data–as well as an avian genome as an outgroup–to analyze orthologous protein-coding genes using nonsynonymous (Ka) and synonymous (Ks) substitution rates. After evaluating eight commonly-used methods of Ka and Ks calculation, we observed that these methods yielded a nearly uniform result when estimating Ka, but not Ks (or Ka/Ks). When sorting genes based on Ka, we noticed that fast-evolving and slow-evolving genes often belonged to different functional classes, with respect to species-specificity and lineage-specificity. In particular, we identified two functional classes of genes in the acquired immune system. Fast-evolving genes coded for signal-transducing proteins, such as receptors, ligands, cytokines, and CDs (cluster of differentiation, mostly surface proteins), whereas the slow-evolving genes were for function-modulating proteins, such as kinases and adaptor proteins. In addition, among slow-evolving genes that had functions related to the central nervous system, neurodegenerative disease-related pathways were enriched significantly in most mammalian species. We also confirmed that gene expression was negatively correlated with evolution rate, i.e. slow-evolving genes were expressed at higher levels than fast-evolving genes. Our results indicated that the functional specializations of the three major mammalian clades were: sensory perception and oncogenesis in primates, reproduction and hormone regulation in large mammals, and immunity and angiotensin in rodents. Conclusion Our study suggests that Ka calculation, which is less biased compared to Ks and Ka/Ks, can be used as a parameter to sort genes by evolution rate and can also provide a way to categorize common protein functions and define their interaction networks, either pair-wise or in defined lineages or subgroups. Evaluating gene evolution based on Ka and Ks calculations can be done with large datasets, such as mammalian genomes. Reviewers This article has been reviewed by Drs. Anamaria Necsulea (nominated by Nicolas Galtier), Subhajyoti De (nominated by Sarah Teichmann) and Claus O. Wilke.
Collapse
Affiliation(s)
- Dapeng Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, PR China
| | | | | | | | | |
Collapse
|
32
|
Waterhouse RM, Zdobnov EM, Kriventseva EV. Correlating traits of gene retention, sequence divergence, duplicability and essentiality in vertebrates, arthropods, and fungi. Genome Biol Evol 2010; 3:75-86. [PMID: 21148284 PMCID: PMC3030422 DOI: 10.1093/gbe/evq083] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Delineating ancestral gene relations among a large set of sequenced eukaryotic genomes allowed us to rigorously examine links between evolutionary and functional traits. We classified 86% of over 1.36 million protein-coding genes from 40 vertebrates, 23 arthropods, and 32 fungi into orthologous groups and linked over 90% of them to Gene Ontology or InterPro annotations. Quantifying properties of ortholog phyletic retention, copy-number variation, and sequence conservation, we examined correlations with gene essentiality and functional traits. More than half of vertebrate, arthropod, and fungal orthologs are universally present across each lineage. These universal orthologs are preferentially distributed in groups with almost all single-copy or all multicopy genes, and sequence evolution of the predominantly single-copy orthologous groups is markedly more constrained. Essential genes from representative model organisms, Mus musculus, Drosophila melanogaster, and Saccharomyces cerevisiae, are significantly enriched in universal orthologs within each lineage, and essential-gene-containing groups consistently exhibit greater sequence conservation than those without. This study of eukaryotic gene repertoire evolution identifies shared fundamental principles and highlights lineage-specific features, it also confirms that essential genes are highly retained and conclusively supports the "knockout-rate prediction" of stronger constraints on essential gene sequence evolution. However, the distinction between sequence conservation of single- versus multicopy orthologs is quantitatively more prominent than between orthologous groups with and without essential genes. The previously underappreciated difference in the tolerance of gene duplications and contrasting evolutionary modes of "single-copy control" versus "multicopy license" may reflect a major evolutionary mechanism that allows extended exploration of gene sequence space.
Collapse
Affiliation(s)
- Robert M Waterhouse
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.
| | | | | |
Collapse
|
33
|
Prifti E, Zucker JD, Clément K, Henegar C. Interactional and functional centrality in transcriptional co-expression networks. ACTA ACUST UNITED AC 2010; 26:3083-9. [PMID: 20959383 DOI: 10.1093/bioinformatics/btq591] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
MOTIVATION The noisy nature of transcriptomic data hinders the biological relevance of conventional network centrality measures, often used to select gene candidates in co-expression networks. Therefore, new tools and methods are required to improve the prediction of mechanistically important transcriptional targets. RESULTS We propose an original network centrality measure, called annotation transcriptional centrality (ATC) computed by integrating gene expression profiles from microarray experiments with biological knowledge extracted from public genomic databases. ATC computation algorithm delimits representative functional domains in the co-expression network and then relies on this information to find key nodes that modulate propagation of functional influences within the network. We demonstrate ATC ability to predict important genes in several experimental models and provide improved biological relevance over conventional topological network centrality measures. AVAILABILITY ATC computational routine is implemented in a publicly available tool named FunNet (www.funnet.info).
Collapse
Affiliation(s)
- Edi Prifti
- INSERM, UMR-S 872, Les Cordeliers, Eq. 7 Nutriomique, Paris, France.
| | | | | | | |
Collapse
|
34
|
Carvalho-Santos Z, Machado P, Branco P, Tavares-Cadete F, Rodrigues-Martins A, Pereira-Leal JB, Bettencourt-Dias M. Stepwise evolution of the centriole-assembly pathway. J Cell Sci 2010; 123:1414-26. [PMID: 20392737 DOI: 10.1242/jcs.064931] [Citation(s) in RCA: 173] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The centriole and basal body (CBB) structure nucleates cilia and flagella, and is an essential component of the centrosome, underlying eukaryotic microtubule-based motility, cell division and polarity. In recent years, components of the CBB-assembly machinery have been identified, but little is known about their regulation and evolution. Given the diversity of cellular contexts encountered in eukaryotes, but the remarkable conservation of CBB morphology, we asked whether general mechanistic principles could explain CBB assembly. We analysed the distribution of each component of the human CBB-assembly machinery across eukaryotes as a strategy to generate testable hypotheses. We found an evolutionarily cohesive and ancestral module, which we term UNIMOD and is defined by three components (SAS6, SAS4/CPAP and BLD10/CEP135), that correlates with the occurrence of CBBs. Unexpectedly, other players (SAK/PLK4, SPD2/CEP192 and CP110) emerged in a taxon-specific manner. We report that gene duplication plays an important role in the evolution of CBB components and show that, in the case of BLD10/CEP135, this is a source of tissue specificity in CBB and flagella biogenesis. Moreover, we observe extreme protein divergence amongst CBB components and show experimentally that there is loss of cross-species complementation among SAK/PLK4 family members, suggesting species-specific adaptations in CBB assembly. We propose that the UNIMOD theory explains the conservation of CBB architecture and that taxon- and tissue-specific molecular innovations, gained through emergence, duplication and divergence, play important roles in coordinating CBB biogenesis and function in different cellular contexts.
Collapse
Affiliation(s)
- Zita Carvalho-Santos
- Instituto Gulbenkian de Ciência, Rua da Quinta Grande 6, P-2780-156 Oeiras, Portugal
| | | | | | | | | | | | | |
Collapse
|
35
|
Essien K, Stoeckert CJ. Conservation and divergence of known apicomplexan transcriptional regulons. BMC Genomics 2010; 11:147. [PMID: 20199665 PMCID: PMC2841118 DOI: 10.1186/1471-2164-11-147] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2009] [Accepted: 03/03/2010] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The apicomplexans are a diverse phylum of parasites causing an assortment of diseases including malaria in a wide variety of animals and lymphoproliferation in cattle. Little is known about how these varied parasites regulate their transcriptional regulons. Even less is known about how regulon systems, consisting of transcription factors and target genes together with their associated biological process, evolve in these diverse parasites. RESULTS In order to obtain insights into the differences in transcriptional regulation between these parasites we compared the orthology profiles of putative malaria transcription factors across species and examined the enrichment patterns of four binding sites across eleven apicomplexans. About three-fifths of the factors are broadly conserved in several phylogenetic orders of sequenced apicomplexans. This observation suggests the existence of regulons whose regulation is conserved across this ancient phylum. Transcription factors not broadly conserved across the phylum are possibly involved in regulon systems that have diverged between species. Examining binding site enrichment patterns in light of transcription factor conservation patterns suggests a second mode via which regulon systems may diverge - rewiring of existing transcription factors and their associated binding sites in specific ways. Integrating binding sites with transcription factor conservation patterns also facilitated prediction of putative regulators for one of the binding sites. CONCLUSIONS Even though transcription factors are underrepresented in apicomplexans, the distribution of these factors and their associated regulons reflect common and family-specific transcriptional regulatory processes.
Collapse
Affiliation(s)
- Kobby Essien
- Department of Bioengineering, University of Pennsylvania, 240 SkirkanichHall, Philadelphia, Pennsylvania 19104, USA
| | | |
Collapse
|
36
|
Weirauch MT, Hughes TR. Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet 2010; 26:66-74. [PMID: 20083321 DOI: 10.1016/j.tig.2009.12.002] [Citation(s) in RCA: 119] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Revised: 12/09/2009] [Accepted: 12/09/2009] [Indexed: 12/28/2022]
Abstract
Regulatory regions with similar transcriptional output often have little overt sequence similarity, both within and between genomes. Although cis- and trans-regulatory changes can contribute to sequence divergence without dramatically altering gene expression outputs, heterologous DNA often functions similarly in organisms that share little regulatory sequence similarities (e.g. human DNA in fish), indicating that trans-regulatory mechanisms tend to diverge more slowly and can accommodate a variety of cis-regulatory configurations. This capacity to 'tinker' with regulatory DNA probably relates to the complexity, robustness and evolvability of regulatory systems, but cause-and-effect relationships among evolutionary processes and properties of regulatory systems remain a topic of debate. The challenge of understanding the concrete mechanisms underlying cis-regulatory evolution - including the conservation of function without the conservation of sequence - relates to the challenge of understanding the function of regulatory systems in general. Currently, we are largely unable to recognize functionally similar regulatory DNA.
Collapse
Affiliation(s)
- Matthew T Weirauch
- Banting and Best Department of Medical Research and Donnelly Centre for Cellular and Biomolecular Research, Ontario, Canada
| | | |
Collapse
|
37
|
Exploring the Differences in Evolutionary Rates between Monogenic and Polygenic Disease Genes in Human. Mol Biol Evol 2009; 27:934-41. [DOI: 10.1093/molbev/msp297] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
38
|
Pan D, Zhang L. An atlas of the speed of copy number changes in animal gene families and its implications. PLoS One 2009; 4:e7342. [PMID: 19851465 PMCID: PMC2761543 DOI: 10.1371/journal.pone.0007342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 08/28/2009] [Indexed: 01/23/2023] Open
Abstract
The notion that gene duplications generating new genes and functions is commonly accepted in evolutionary biology. However, this assumption is more speculative from theory rather than well proven in genome-wide studies. Here, we generated an atlas of the rate of copy number changes (CNCs) in all the gene families of ten animal genomes. We grouped the gene families with similar CNC dynamics into rate pattern groups (RPGs) and annotated their function using a novel bottom-up approach. By comparing CNC rate patterns, we showed that most of the species-specific CNC rates groups are formed by gene duplication rather than gene loss, and most of the changes in rates of CNCs may be the result of adaptive evolution. We also found that the functions of many RPGs match their biological significance well. Our work confirmed the role of gene duplication in generating novel phenotypes, and the results can serve as a guide for researchers to connect the phenotypic features to certain gene duplications.
Collapse
Affiliation(s)
- Deng Pan
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
- Program in Genetics, Bioinformatics, and Computational Biology, Blacksburg, Virginia, United States of America
- * E-mail:
| |
Collapse
|
39
|
Peregrín-Alvarez JM, Sanford C, Parkinson J. The conservation and evolutionary modularity of metabolism. Genome Biol 2009; 10:R63. [PMID: 19523219 PMCID: PMC2718497 DOI: 10.1186/gb-2009-10-6-r63] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Revised: 05/27/2009] [Accepted: 06/12/2009] [Indexed: 01/09/2023] Open
Abstract
A novel evolutionary analysis of metabolic networks across 26 taxa reveals a highly-conserved but flexible core of metabolic enzymes. Background Cellular metabolism is a fundamental biological system consisting of myriads of enzymatic reactions that together fulfill the basic requirements of life. The recent availability of vast amounts of sequence data from diverse sets of organisms provides an opportunity to systematically examine metabolism from a comparative perspective. Here we supplement existing genome and protein resources with partial genome datasets derived from 193 eukaryotes to present a comprehensive survey of the conservation of metabolism across 26 taxa representing the three domains of life. Results In general, metabolic enzymes are highly conserved. However, organizing these enzymes within the context of functional pathways revealed a spectrum of conservation from those that are highly conserved (for example, carbohydrate, energy, amino acid and nucleotide metabolism enzymes) to those specific to individual taxa (for example, those involved in glycan metabolism and secondary metabolite pathways). Applying a novel co-conservation analysis, KEGG defined pathways did not generally display evolutionary coherence. Instead, such modularity appears restricted to smaller subsets of enzymes. Expanding analyses to a global metabolic network revealed a highly conserved, but nonetheless flexible, 'core' of enzymes largely involved in multiple reactions across different pathways. Enzymes and pathways associated with the periphery of this network were less well conserved and associated with taxon-specific innovations. Conclusions These findings point to an emerging picture in which a core of enzyme activities involving amino acid, energy, carbohydrate and lipid metabolism have evolved to provide the basic functions required for life. However, the precise complement of enzymes associated within this core for each species is flexible.
Collapse
Affiliation(s)
- José M Peregrín-Alvarez
- Program in Molecular Structure and Function, Hospital for Sick Children, College Street, Toronto, ON M5G1L7, Canada.
| | | | | |
Collapse
|
40
|
Tuller T, Kupiec M, Ruppin E. Co-evolutionary networks of genes and cellular processes across fungal species. Genome Biol 2009; 10:R48. [PMID: 19416514 PMCID: PMC2718514 DOI: 10.1186/gb-2009-10-5-r48] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2009] [Revised: 02/24/2009] [Accepted: 05/05/2009] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The introduction of measures such as evolutionary rate and propensity for gene loss have significantly advanced our knowledge of the evolutionary history and selection forces acting upon individual genes and cellular processes. RESULTS We present two new measures, the 'relative evolutionary rate pattern' (rERP), which records the relative evolutionary rates of conserved genes across the different branches of a species' phylogenetic tree, and the 'copy number pattern' (CNP), which quantifies the rate of gene loss of less conserved genes. Together, these measures yield a high-resolution study of the co-evolution of genes in 9 fungal species, spanning 3,540 sets of orthologs. We find that the evolutionary tempo of conserved genes varies in different evolutionary periods. The co-evolution of genes' Gene Ontology categories exhibits a significant correlation with their functional distance in the Gene Ontology hierarchy, but not with their location on chromosomes, showing that cellular functions are a more important driving force in gene co-evolution than their chromosomal proximity. Two fundamental patterns of co-evolution of conserved genes, cooperative and reciprocal, are identified; only genes co-evolving cooperatively functionally back each other up. The co-evolution of conserved and less conserved genes exhibits both commonalities and differences; DNA metabolism is positively correlated with nuclear traffic, transcription processes and vacuolar biology in both analyses. CONCLUSIONS Overall, this study charts the first global network view of gene co-evolution in fungi. The future application of the approach presented here to other phylogenetic trees holds much promise in characterizing the forces that shape cellular co-evolution.
Collapse
Affiliation(s)
- Tamir Tuller
- School of Computer Sciences, Tel Aviv University, Ramat Aviv 69978, Israel
- Department of Molecular Microbiology and Biotechnology, Tel Aviv University, Ramat Aviv 69978, Israel
- School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
| | - Martin Kupiec
- Department of Molecular Microbiology and Biotechnology, Tel Aviv University, Ramat Aviv 69978, Israel
| | - Eytan Ruppin
- School of Computer Sciences, Tel Aviv University, Ramat Aviv 69978, Israel
- School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
| |
Collapse
|
41
|
Di Pietro C, Ragusa M, Barbagallo D, Duro LR, Guglielmino MR, Majorana A, Angelica R, Scalia M, Statello L, Salito L, Tomasello L, Pernagallo S, Valenti S, D'Agostino V, Triberio P, Tandurella I, Palumbo GA, La Cava P, Cafiso V, Bertuccio T, Santagati M, Li Destri G, Lanzafame S, Di Raimondo F, Stefani S, Mishra B, Purrello M. The apoptotic machinery as a biological complex system: analysis of its omics and evolution, identification of candidate genes for fourteen major types of cancer, and experimental validation in CML and neuroblastoma. BMC Med Genomics 2009; 2:20. [PMID: 19402918 PMCID: PMC2683874 DOI: 10.1186/1755-8794-2-20] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2008] [Accepted: 04/30/2009] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Apoptosis is a critical biological phenomenon, executed under the guidance of the Apoptotic Machinery (AM), which allows the physiologic elimination of terminally differentiated, senescent or diseased cells. Because of its relevance to BioMedicine, we have sought to obtain a detailed characterization of AM Omics in Homo sapiens, namely its Genomics and Evolution, Transcriptomics, Proteomics, Interactomics, Oncogenomics, and Pharmacogenomics. METHODS This project exploited the methodology commonly used in Computational Biology (i.e., mining of many omics databases of the web) as well as the High Throughput biomolecular analytical techniques. RESULTS In Homo sapiens AM is comprised of 342 protein-encoding genes (possessing either anti- or pro-apoptotic activity, or a regulatory function) and 110 MIR-encoding genes targeting them: some have a critical role within the system (core AM nodes), others perform tissue-, pathway-, or disease-specific functions (peripheral AM nodes). By overlapping the cancer type-specific AM mutation map in the fourteen most frequent cancers in western societies (breast, colon, kidney, leukaemia, liver, lung, neuroblastoma, ovary, pancreas, prostate, skin, stomach, thyroid, and uterus) to their transcriptome, proteome and interactome in the same tumour type, we have identified the most prominent AM molecular alterations within each class. The comparison of the fourteen mutated AM networks (both protein- as MIR-based) has allowed us to pinpoint the hubs with a general and critical role in tumour development and, conversely, in cell physiology: in particular, we found that some of these had already been used as targets for pharmacological anticancer therapy. For a better understanding of the relationship between AM molecular alterations and pharmacological induction of apoptosis in cancer, we examined the expression of AM genes in K562 and SH-SY5Y after anticancer treatment. CONCLUSION We believe that our data on the Apoptotic Machinery will lead to the identification of new cancer genes and to the discovery of new biomarkers, which could then be used to profile cancers for diagnostic purposes and to pinpoint new targets for pharmacological therapy. This approach could pave the way for future studies and applications in molecular and clinical Medicine with important perspectives both for Oncology as for Regenerative Medicine.
Collapse
Affiliation(s)
- Cinzia Di Pietro
- Dipartimento di Scienze BioMediche, Sezione di Biologia Generale, Biologia Cellulare, Genetica Molecolare G Sichel, Unità di Biologia Genomica e dei Sistemi Complessi, Genetica, Bioinformatica, Università di Catania, 95123 Catania, Italy.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Chan ET, Quon GT, Chua G, Babak T, Trochesset M, Zirngibl RA, Aubin J, Ratcliffe MJH, Wilde A, Brudno M, Morris QD, Hughes TR. Conservation of core gene expression in vertebrate tissues. J Biol 2009; 8:33. [PMID: 19371447 PMCID: PMC2689434 DOI: 10.1186/jbiol130] [Citation(s) in RCA: 138] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Revised: 03/12/2009] [Accepted: 03/18/2009] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Vertebrates share the same general body plan and organs, possess related sets of genes, and rely on similar physiological mechanisms, yet show great diversity in morphology, habitat and behavior. Alteration of gene regulation is thought to be a major mechanism in phenotypic variation and evolution, but relatively little is known about the broad patterns of conservation in gene expression in non-mammalian vertebrates. RESULTS We measured expression of all known and predicted genes across twenty tissues in chicken, frog and pufferfish. By combining the results with human and mouse data and considering only ten common tissues, we have found evidence of conserved expression for more than a third of unique orthologous genes. We find that, on average, transcription factor gene expression is neither more nor less conserved than that of other genes. Strikingly, conservation of expression correlates poorly with the amount of conserved nonexonic sequence, even using a sequence alignment technique that accounts for non-collinearity in conserved elements. Many genes show conserved human/fish expression despite having almost no nonexonic conserved primary sequence. CONCLUSIONS There are clearly strong evolutionary constraints on tissue-specific gene expression. A major challenge will be to understand the precise mechanisms by which many gene expression patterns remain similar despite extensive cis-regulatory restructuring.
Collapse
Affiliation(s)
- Esther T Chan
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Gerald T Quon
- Department of Computer Science, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Gordon Chua
- Banting and Best Department of Medical Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Current address: Department of Biological Sciences, University of Calgary, 2500 University Drive NW, Calgary, Alberta, T2N 1N4 Canada
| | - Tomas Babak
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Current address: Rosetta Inpharmatics, 401 Terry Avenue North, Seattle, WA 98109, USA
| | - Miles Trochesset
- Department of Computer Science, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Banting and Best Department of Medical Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Ralph A Zirngibl
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Jane Aubin
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Michael JH Ratcliffe
- Department of Immunology and Sunnybrook Research Institute, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Andrew Wilde
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Michael Brudno
- Department of Computer Science, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Banting and Best Department of Medical Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Quaid D Morris
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Department of Computer Science, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Banting and Best Department of Medical Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Banting and Best Department of Medical Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| |
Collapse
|
43
|
A census of human transcription factors: function, expression and evolution. Nat Rev Genet 2009; 10:252-63. [PMID: 19274049 DOI: 10.1038/nrg2538] [Citation(s) in RCA: 1095] [Impact Index Per Article: 73.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Transcription factors are key cellular components that control gene expression: their activities determine how cells function and respond to the environment. Currently, there is great interest in research into human transcriptional regulation. However, surprisingly little is known about these regulators themselves. For example, how many transcription factors does the human genome contain? How are they expressed in different tissues? Are they evolutionarily conserved? Here, we present an analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation. Much remains to be explored, but this study provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.
Collapse
|
44
|
Perez JC, Shin D, Zwir I, Latifi T, Hadley TJ, Groisman EA. Evolution of a bacterial regulon controlling virulence and Mg(2+) homeostasis. PLoS Genet 2009; 5:e1000428. [PMID: 19300486 PMCID: PMC2650801 DOI: 10.1371/journal.pgen.1000428] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2008] [Accepted: 02/17/2009] [Indexed: 12/25/2022] Open
Abstract
Related organisms typically rely on orthologous regulatory proteins to respond to a given signal. However, the extent to which (or even if) the targets of shared regulatory proteins are maintained across species has remained largely unknown. This question is of particular significance in bacteria due to the widespread effects of horizontal gene transfer. Here, we address this question by investigating the regulons controlled by the DNA-binding PhoP protein, which governs virulence and Mg(2+) homeostasis in several bacterial species. We establish that the ancestral PhoP protein directs largely different gene sets in ten analyzed species of the family Enterobacteriaceae, reflecting both regulation of species-specific targets and transcriptional rewiring of shared genes. The two targets directly activated by PhoP in all ten species (the most distant of which diverged >200 million years ago), and coding for the most conserved proteins are the phoPQ operon itself and the lipoprotein-encoding slyB gene, which decreases PhoP protein activity. The Mg(2+)-responsive PhoP protein dictates expression of Mg(2+) transporters and of enzymes that modify Mg(2+)-binding sites in the cell envelope in most analyzed species. In contrast to the core PhoP regulon, which determines the amount of active PhoP and copes with the low Mg(2+) stress, the variable members of the regulon contribute species-specific traits, a property shared with regulons controlled by dissimilar regulatory proteins and responding to different signals.
Collapse
Affiliation(s)
- J. Christian Perez
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Howard Hughes Medical Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Dongwoo Shin
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Howard Hughes Medical Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Igor Zwir
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Howard Hughes Medical Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Tammy Latifi
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Howard Hughes Medical Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Tricia J. Hadley
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Howard Hughes Medical Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Eduardo A. Groisman
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Howard Hughes Medical Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
45
|
De S, Lopez-Bigas N, Teichmann SA. Patterns of evolutionary constraints on genes in humans. BMC Evol Biol 2008; 8:275. [PMID: 18840274 PMCID: PMC2587479 DOI: 10.1186/1471-2148-8-275] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2008] [Accepted: 10/07/2008] [Indexed: 12/27/2022] Open
Abstract
Background Different regions in a genome evolve at different rates depending on structural and functional constraints. Some genomic regions are highly conserved during metazoan evolution, while other regions may evolve rapidly, either in all species or in a lineage-specific manner. A strong or even moderate change in constraints in functional regions, for example in coding regions, can have significant evolutionary consequences. Results Here we discuss a novel framework, 'BaseDiver', to classify groups of genes in humans based on the patterns of evolutionary constraints on polymorphic positions in their coding regions. Comparing the nucleotide-level divergence among mammals with the extent of deviation from the ancestral base in the human lineage, we identify patterns of evolutionary pressure on nonsynonymous base-positions in groups of genes belonging to the same functional category. Focussing on groups of genes in functional categories, we find that transcription factors contain a significant excess of nonsynonymous base-positions that are conserved in other mammals but changed in human, while immunity related genes harbour mutations at base-positions that evolve rapidly in all mammals including humans due to strong preference for advantageous alleles. Genes involved in olfaction also evolve rapidly in all mammals, and in humans this appears to be due to weak negative selection. Conclusion While recent studies have identified genes under positive selection in humans, our approach identifies evolutionary constraints on Gene Ontology groups identifying changes in humans relative to some of the other mammals.
Collapse
Affiliation(s)
- Subhajyoti De
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, UK.
| | | | | |
Collapse
|
46
|
Seshasayee ASN, Fraser GM, Babu MM, Luscombe NM. Principles of transcriptional regulation and evolution of the metabolic system in E. coli. Genome Res 2008; 19:79-91. [PMID: 18836036 DOI: 10.1101/gr.079715.108] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Organisms must adapt to make optimal use of the metabolic system in response to environmental changes. In the long-term, this involves evolution of the genomic repertoire of enzymes; in the short-term, transcriptional control ensures that appropriate enzymes are expressed in response to transitory extracellular conditions. Unicellular organisms are particularly susceptible to environmental changes; however, genome-scale impact of these modulatory effects has not been explored so far in bacteria. Here, we integrate genome-scale data to investigate the evolutionary trends and transcriptional control of metabolism in Escherichia coli K12. Globally, the regulatory system is organized in a clear hierarchy of general and specific transcription factors (TFs) that control differing ranges of metabolic functions. Further, catabolic, anabolic, and central metabolic pathways are targeted by distinct combinations of these TFs. Locally, enzymes catalyzing sequential reactions in a metabolic pathway are co-regulated by the same TFs. Regulation is more complex at junctions: General TFs control the overall activity of all connecting reactions, whereas specific TFs control individual enzymes. Divergent junctions play a special role in delineating metabolic pathways and decouple the regulation of incoming and outgoing reactions. We find little evidence for differential usage of isozymes, which are generally co-expressed in similar conditions, and thus are likely to reinforce the metabolic system through redundancy. Finally, we show that enzymes controlled by the same TFs have a strong tendency to co-evolve, suggesting a significant constraint to maintain similar regulatory regimes during evolution. Catabolic, anabolic, and central energy pathways evolve differently, emphasizing the role of the environment in shaping the metabolic system. Many of the observations also occur in yeast, and our findings may apply across large evolutionary distances.
Collapse
Affiliation(s)
- Aswin S N Seshasayee
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom.
| | | | | | | |
Collapse
|
47
|
Computational analysis of constraints on noncoding regions, coding regions and gene expression in relation to Plasmodium phenotypic diversity. PLoS One 2008; 3:e3122. [PMID: 18769675 PMCID: PMC2518851 DOI: 10.1371/journal.pone.0003122] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2008] [Accepted: 08/02/2008] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Malaria-causing Plasmodium species exhibit marked differences including host choice and preference for invading particular cell types. The genetic bases of phenotypic differences between parasites can be understood, in part, by investigating constraints on gene expression and genic sequences, both coding and regulatory. METHODOLOGY/PRINCIPAL FINDINGS We investigated the evolutionary constraints on sequence and expression of parasitic genes by applying comparative genomics approaches to 6 Plasmodium genomes and 2 genome-wide expression studies. We found that the coding regions of Plasmodium transcription factor and sexual development genes are relatively less constrained, as are those of genes encoding CCCH zinc fingers and invasion proteins, which all play important roles in these parasites. Transcription factors and genes with stage-restricted expression have conserved upstream regions and so do several gene classes critical to the parasite's lifestyle, namely, ion transport, invasion, chromatin assembly and CCCH zinc fingers. Additionally, a cross-species comparison of expression patterns revealed that Plasmodium-specific genes exhibit significant expression divergence. CONCLUSIONS/SIGNIFICANCE Overall, constraints on Plasmodium's protein coding regions confirm observations from other eukaryotes in that transcription factors are under relatively lower constraint. Proteins relevant to the parasite's unique lifestyle also have lower constraint on their coding regions. Greater conservation between Plasmodium species in terms of promoter motifs suggests tight regulatory control of lifestyle genes. However, an interspecies divergence in expression patterns of these genes suggests that either expression is controlled via genomic or epigenomic features not encoded in the proximal promoter sequence, or alternatively, the combinatorial interactions between motifs confer species-specific expression patterns.
Collapse
|