1
|
Méndez-Cruz CF, Blanchet A, Godínez A, Arroyo-Fernández I, Gama-Castro S, Martínez-Luna SB, González-Colín C, Collado-Vides J. Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties. Database (Oxford) 2020; 2020:6029376. [PMID: 33306798 PMCID: PMC7731926 DOI: 10.1093/database/baaa109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 11/18/2020] [Accepted: 11/26/2020] [Indexed: 11/21/2022]
Abstract
Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist. Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https://github.com/laigen-unam/tf-properties-summarizer.git).
Collapse
Affiliation(s)
- Carlos-Francisco Méndez-Cruz
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Antonio Blanchet
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Alan Godínez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Ignacio Arroyo-Fernández
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico.,División de Posgrado, Universidad Tecnológica de la Mixteca, Carretera a Acatlima Km. 2.5, Huajuapan de León, 69000, Oaxaca, Mexico
| | - Socorro Gama-Castro
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Sara Berenice Martínez-Luna
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Cristian González-Colín
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico
| | - Julio Collado-Vides
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico.,Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Room 403, Boston, 02215 MA, USA
| |
Collapse
|
2
|
Martínez-Pacheco M, Tenorio M, Almonte L, Fajardo V, Godínez A, Fernández D, Cornejo-Páramo P, Díaz-Barba K, Halbert J, Liechti A, Székely T, Urrutia AO, Cortez D. Expression Evolution of Ancestral XY Gametologs across All Major Groups of Placental Mammals. Genome Biol Evol 2020; 12:2015-2028. [PMID: 32790864 PMCID: PMC7674692 DOI: 10.1093/gbe/evaa173] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/10/2020] [Indexed: 12/14/2022] Open
Abstract
Placental mammals present 180 million-year-old Y chromosomes that have retained a handful of dosage-sensitive genes. However, the expression evolution of Y-linked genes across placental groups has remained largely unexplored. Here, we expanded the number of Y gametolog sequences by analyzing ten additional species from previously unexplored groups. We detected seven remarkably conserved genes across 25 placental species with known Y repertoires. We then used RNA-seq data from 17 placental mammals to unveil the expression evolution of XY gametologs. We found that Y gametologs followed, on average, a 3-fold expression loss and that X gametologs also experienced some expression reduction, particularly in primates. Y gametologs gained testis specificity through an accelerated expression decay in somatic tissues. Moreover, despite the substantial expression decay of Y genes, the combined expression of XY gametologs in males is higher than that of both X gametologs in females. Finally, our work describes several features of the Y chromosome in the last common mammalian ancestor.
Collapse
Affiliation(s)
| | | | - Laura Almonte
- Center for Genome Sciences, UNAM, Cuernavaca, Mexico
| | | | - Alan Godínez
- Center for Genome Sciences, UNAM, Cuernavaca, Mexico
| | | | | | | | - Jean Halbert
- Center for Integrative Genomics, University of Lausanne, Switzerland
| | - Angelica Liechti
- Center for Integrative Genomics, University of Lausanne, Switzerland
| | - Tamas Székely
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - Araxi O Urrutia
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom.,Ecology Institute, UNAM, Mexico
| | - Diego Cortez
- Center for Genome Sciences, UNAM, Cuernavaca, Mexico
| |
Collapse
|