1
|
Minaeva M, Domingo J, Rentzsch P, Lappalainen T. Specifying cellular context of transcription factor regulons for exploring context-specific gene regulation programs. NAR Genom Bioinform 2025; 7:lqae178. [PMID: 39781510 PMCID: PMC11704787 DOI: 10.1093/nargab/lqae178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 11/19/2024] [Accepted: 12/20/2024] [Indexed: 01/12/2025] Open
Abstract
Understanding the role of transcription and transcription factors (TFs) in cellular identity and disease, such as cancer, is essential. However, comprehensive data resources for cell line-specific TF-to-target gene annotations are currently limited. To address this, we employed a straightforward method to define regulons that capture the cell-specific aspects of TF binding and transcript expression levels. By integrating cellular transcriptome and TF binding data, we generated regulons for 40 common cell lines comprising both proximal and distal cell line-specific regulatory events. Through systematic benchmarking involving TF knockout experiments, we demonstrated performance on par with state-of-the-art methods, with our method being easily applicable to other cell types of interest. We present case studies using three cancer single-cell datasets to showcase the utility of these cell-type-specific regulons in exploring transcriptional dysregulation. In summary, this study provides a valuable pipeline and a resource for systematically exploring cell line-specific transcriptional regulations, emphasizing the utility of network analysis in deciphering disease mechanisms.
Collapse
Affiliation(s)
- Mariia Minaeva
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Tomtebodavägen 23A, 17165 Solna, Sweden
| | - Júlia Domingo
- New York Genome Center, 101 Avenue of the Americas, New York, NY 10013, USA
| | - Philipp Rentzsch
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Tomtebodavägen 23A, 17165 Solna, Sweden
| | - Tuuli Lappalainen
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Tomtebodavägen 23A, 17165 Solna, Sweden
- New York Genome Center, 101 Avenue of the Americas, New York, NY 10013, USA
| |
Collapse
|
2
|
Minaeva M, Domingo J, Rentzsch P, Lappalainen T. Specifying cellular context of transcription factor regulons for exploring context-specific gene regulation programs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.31.573765. [PMID: 38260658 PMCID: PMC10802353 DOI: 10.1101/2023.12.31.573765] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Understanding the role of transcription and transcription factors in cellular identity and disease, such as cancer and autoimmunity, is essential. However, comprehensive data resources for cell line-specific transcription factor-to-target gene annotations are currently limited. To address this, we developed a straightforward method to define regulons that capture the cell-specific aspects of TF binding and transcript expression levels. By integrating cellular transcriptome and transcription factor binding data, we generated regulons for four common cell lines comprising both proximal and distal cell line-specific regulatory events. Through systematic benchmarking involving transcription factor knockout experiments, we demonstrated performance on par with state-of-the-art methods, with our method being easily applicable to other cell types of interest. We present case studies using three cancer single-cell datasets to showcase the utility of these cell-type-specific regulons in exploring transcriptional dysregulation. In summary, this study provides a valuable tool and a resource for systematically exploring cell line-specific transcriptional regulations, emphasizing the utility of network analysis in deciphering disease mechanisms.
Collapse
Affiliation(s)
- Mariia Minaeva
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Solna, 17165, Sweden
| | | | - Philipp Rentzsch
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Solna, 17165, Sweden
| | - Tuuli Lappalainen
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Solna, 17165, Sweden
- New York Genome Center, New York, NY 10013, USA
| |
Collapse
|
3
|
Miranda-Escalada A, Mehryary F, Luoma J, Estrada-Zavala D, Gasco L, Pyysalo S, Valencia A, Krallinger M. Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical-protein relations. Database (Oxford) 2023; 2023:baad080. [PMID: 38015956 PMCID: PMC10683943 DOI: 10.1093/database/baad080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 09/22/2023] [Accepted: 10/30/2023] [Indexed: 11/30/2023]
Abstract
It is getting increasingly challenging to efficiently exploit drug-related information described in the growing amount of scientific literature. Indeed, for drug-gene/protein interactions, the challenge is even bigger, considering the scattered information sources and types of interactions. However, their systematic, large-scale exploitation is key for developing tools, impacting knowledge fields as diverse as drug design or metabolic pathway research. Previous efforts in the extraction of drug-gene/protein interactions from the literature did not address these scalability and granularity issues. To tackle them, we have organized the DrugProt track at BioCreative VII. In the context of the track, we have released the DrugProt Gold Standard corpus, a collection of 5000 PubMed abstracts, manually annotated with granular drug-gene/protein interactions. We have proposed a novel large-scale track to evaluate the capacity of natural language processing systems to scale to the range of millions of documents, and generate with their predictions a silver standard knowledge graph of 53 993 602 nodes and 19 367 406 edges. Its use exceeds the shared task and points toward pharmacological and biological applications such as drug discovery or continuous database curation. Finally, we have created a persistent evaluation scenario on CodaLab to continuously evaluate new relation extraction systems that may arise. Thirty teams from four continents, which involved 110 people, sent 107 submission runs for the Main DrugProt track, and nine teams submitted 21 runs for the Large Scale DrugProt track. Most participants implemented deep learning approaches based on pretrained transformer-like language models (LMs) such as BERT or BioBERT, reaching precision and recall values as high as 0.9167 and 0.9542 for some relation types. Finally, some initial explorations of the applicability of the knowledge graph have shown its potential to explore the chemical-protein relations described in the literature, or chemical compound-enzyme interactions. Database URL: https://doi.org/10.5281/zenodo.4955410.
Collapse
Affiliation(s)
| | - Farrokh Mehryary
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| | - Jouni Luoma
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| | | | - Luis Gasco
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain
| | - Sampo Pyysalo
- TurkuNLP Group, Department of Computing, University of Turku, Turku 20014, Finland
| | - Alfonso Valencia
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain
| | - Martin Krallinger
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain
| |
Collapse
|
4
|
Müller-Dott S, Tsirvouli E, Vazquez M, Ramirez Flores R, Badia-i-Mompel P, Fallegger R, Türei D, Lægreid A, Saez-Rodriguez J. Expanding the coverage of regulons from high-confidence prior knowledge for accurate estimation of transcription factor activities. Nucleic Acids Res 2023; 51:10934-10949. [PMID: 37843125 PMCID: PMC10639077 DOI: 10.1093/nar/gkad841] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 08/08/2023] [Accepted: 09/22/2023] [Indexed: 10/17/2023] Open
Abstract
Gene regulation plays a critical role in the cellular processes that underlie human health and disease. The regulatory relationship between transcription factors (TFs), key regulators of gene expression, and their target genes, the so called TF regulons, can be coupled with computational algorithms to estimate the activity of TFs. However, to interpret these findings accurately, regulons of high reliability and coverage are needed. In this study, we present and evaluate a collection of regulons created using the CollecTRI meta-resource containing signed TF-gene interactions for 1186 TFs. In this context, we introduce a workflow to integrate information from multiple resources and assign the sign of regulation to TF-gene interactions that could be applied to other comprehensive knowledge bases. We find that the signed CollecTRI-derived regulons outperform other public collections of regulatory interactions in accurately inferring changes in TF activities in perturbation experiments. Furthermore, we showcase the value of the regulons by examining TF activity profiles in three different cancer types and exploring TF activities at the level of single-cells. Overall, the CollecTRI-derived TF regulons enable the accurate and comprehensive estimation of TF activities and thereby help to interpret transcriptomics data.
Collapse
Affiliation(s)
- Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Eirini Tsirvouli
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | | | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Pau Badia-i-Mompel
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Robin Fallegger
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Dénes Türei
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| |
Collapse
|
5
|
Lo Surdo P, Iannuccelli M, Contino S, Castagnoli L, Licata L, Cesareni G, Perfetto L. SIGNOR 3.0, the SIGnaling network open resource 3.0: 2022 update. Nucleic Acids Res 2022; 51:D631-D637. [PMID: 36243968 PMCID: PMC9825604 DOI: 10.1093/nar/gkac883] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/21/2022] [Accepted: 09/30/2022] [Indexed: 01/29/2023] Open
Abstract
The SIGnaling Network Open Resource (SIGNOR 3.0, https://signor.uniroma2.it) is a public repository that captures causal information and represents it according to an 'activity-flow' model. SIGNOR provides freely-accessible static maps of causal interactions that can be tailored, pruned and refined to build dynamic and predictive models. Each signaling relationship is annotated with an effect (up/down-regulation) and with the mechanism (e.g. binding, phosphorylation, transcriptional activation, etc.) causing the regulation of the target entity. Since its latest release, SIGNOR has undergone a significant upgrade including: (i) a new website that offers an improved user experience and novel advanced search and graph tools; (ii) a significant content growth adding up to a total of approx. 33,000 manually-annotated causal relationships between more than 8900 biological entities; (iii) an increase in the number of manually annotated pathways, currently including pathways deregulated by SARS-CoV-2 infection or involved in neurodevelopment synaptic transmission and metabolism, among others; (iv) additional features such as new model to represent metabolic reactions and a new confidence score assigned to each interaction.
Collapse
Affiliation(s)
| | - Marta Iannuccelli
- Department of Biology, University of Rome ‘Tor Vergata’, Rome 00133, Italy
| | - Silvia Contino
- Department of Biology, University of Rome ‘Tor Vergata’, Rome 00133, Italy
| | - Luisa Castagnoli
- Department of Biology, University of Rome ‘Tor Vergata’, Rome 00133, Italy
| | | | | | - Livia Perfetto
- To whom correspondence should be addressed. Tel: +39 0672594315;
| |
Collapse
|
6
|
Liska O, Bohár B, Hidas A, Korcsmáros T, Papp B, Fazekas D, Ari E. TFLink: an integrated gateway to access transcription factor-target gene interactions for multiple species. Database (Oxford) 2022; 2022:baac083. [PMID: 36124642 PMCID: PMC9480832 DOI: 10.1093/database/baac083] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 08/06/2022] [Accepted: 09/06/2022] [Indexed: 12/01/2022]
Abstract
Analysis of transcriptional regulatory interactions and their comparisons across multiple species are crucial for progress in various fields in biology, from functional genomics to the evolution of signal transduction pathways. However, despite the rapidly growing body of data on regulatory interactions in several eukaryotes, no databases exist to provide curated high-quality information on transcription factor-target gene interactions for multiple species. Here, we address this gap by introducing the TFLink gateway, which uniquely provides experimentally explored and highly accurate information on transcription factor-target gene interactions (∼12 million), nucleotide sequences and genomic locations of transcription factor binding sites (∼9 million) for human and six model organisms: mouse, rat, zebrafish, fruit fly, worm and yeast by integrating 10 resources. TFLink provides user-friendly access to data on transcription factor-target gene interactions, interactive network visualizations and transcription factor binding sites, with cross-links to several other databases. Besides containing accurate information on transcription factors, with a clear labelling of the type/volume of the experiments (small-scale or high-throughput), the source database and the original publications, TFLink also provides a wealth of standardized regulatory data available for download in multiple formats. The database offers easy access to high-quality data for wet-lab researchers, supplies data for gene set enrichment analyses and facilitates systems biology and comparative gene regulation studies. Database URL https://tflink.net/.
Collapse
Affiliation(s)
- Orsolya Liska
- HCEMM-BRC Metabolic Systems Biology Research Group, Temesvári krt. 62, Szeged 6726, Hungary
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Eötvös Loránd Research Network (ELKH), Temesvári krt. 62, Szeged 6726, Hungary
- Department of Genetics, ELTE Eötvös Loránd University, Pázmány P. stny. 1/C, Budapest 1117, Hungary
- Doctoral School of Biology, University of Szeged, Közép fasor 52, Szeged 6726, Hungary
| | - Balázs Bohár
- Department of Genetics, ELTE Eötvös Loránd University, Pázmány P. stny. 1/C, Budapest 1117, Hungary
- Earlham Institute, Colney Ln, Norwich NR4 7UZ, UK
| | - András Hidas
- Department of Genetics, ELTE Eötvös Loránd University, Pázmány P. stny. 1/C, Budapest 1117, Hungary
- Institute of Aquatic Ecology, Centre for Ecological Research, Eötvös Loránd Research Network (ELKH), Karolina út 29, Budapest 1113, Hungary
| | - Tamás Korcsmáros
- Earlham Institute, Colney Ln, Norwich NR4 7UZ, UK
- Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK
- Faculty of Medicine, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Balázs Papp
- HCEMM-BRC Metabolic Systems Biology Research Group, Temesvári krt. 62, Szeged 6726, Hungary
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Eötvös Loránd Research Network (ELKH), Temesvári krt. 62, Szeged 6726, Hungary
| | - Dávid Fazekas
- Department of Genetics, ELTE Eötvös Loránd University, Pázmány P. stny. 1/C, Budapest 1117, Hungary
- Earlham Institute, Colney Ln, Norwich NR4 7UZ, UK
| | - Eszter Ari
- HCEMM-BRC Metabolic Systems Biology Research Group, Temesvári krt. 62, Szeged 6726, Hungary
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Eötvös Loránd Research Network (ELKH), Temesvári krt. 62, Szeged 6726, Hungary
- Department of Genetics, ELTE Eötvös Loránd University, Pázmány P. stny. 1/C, Budapest 1117, Hungary
| |
Collapse
|