1
|
Müller-Dott S, Tsirvouli E, Vazquez M, Ramirez Flores R, Badia-i-Mompel P, Fallegger R, Türei D, Lægreid A, Saez-Rodriguez J. Expanding the coverage of regulons from high-confidence prior knowledge for accurate estimation of transcription factor activities. Nucleic Acids Res 2023; 51:10934-10949. [PMID: 37843125 PMCID: PMC10639077 DOI: 10.1093/nar/gkad841] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 08/08/2023] [Accepted: 09/22/2023] [Indexed: 10/17/2023] Open
Abstract
Gene regulation plays a critical role in the cellular processes that underlie human health and disease. The regulatory relationship between transcription factors (TFs), key regulators of gene expression, and their target genes, the so called TF regulons, can be coupled with computational algorithms to estimate the activity of TFs. However, to interpret these findings accurately, regulons of high reliability and coverage are needed. In this study, we present and evaluate a collection of regulons created using the CollecTRI meta-resource containing signed TF-gene interactions for 1186 TFs. In this context, we introduce a workflow to integrate information from multiple resources and assign the sign of regulation to TF-gene interactions that could be applied to other comprehensive knowledge bases. We find that the signed CollecTRI-derived regulons outperform other public collections of regulatory interactions in accurately inferring changes in TF activities in perturbation experiments. Furthermore, we showcase the value of the regulons by examining TF activity profiles in three different cancer types and exploring TF activities at the level of single-cells. Overall, the CollecTRI-derived TF regulons enable the accurate and comprehensive estimation of TF activities and thereby help to interpret transcriptomics data.
Collapse
Affiliation(s)
- Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Eirini Tsirvouli
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | | | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Pau Badia-i-Mompel
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Robin Fallegger
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Dénes Türei
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| |
Collapse
|
2
|
Roy S, Kumar R, Mittal V, Gupta D. Classification models for Invasive Ductal Carcinoma Progression, based on gene expression data-trained supervised machine learning. Sci Rep 2020; 10:4113. [PMID: 32139710 PMCID: PMC7057992 DOI: 10.1038/s41598-020-60740-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 02/12/2020] [Indexed: 12/20/2022] Open
Abstract
Early detection of breast cancer and its correct stage determination are important for prognosis and rendering appropriate personalized clinical treatment to breast cancer patients. However, despite considerable efforts and progress, there is a need to identify the specific genomic factors responsible for, or accompanying Invasive Ductal Carcinoma (IDC) progression stages, which can aid the determination of the correct cancer stages. We have developed two-class machine-learning classification models to differentiate the early and late stages of IDC. The prediction models are trained with RNA-seq gene expression profiles representing different IDC stages of 610 patients, obtained from The Cancer Genome Atlas (TCGA). Different supervised learning algorithms were trained and evaluated with an enriched model learning, facilitated by different feature selection methods. We also developed a machine-learning classifier trained on the same datasets with training sets reduced data corresponding to IDC driver genes. Based on these two classifiers, we have developed a web-server Duct-BRCA-CSP to predict early stage from late stages of IDC based on input RNA-seq gene expression profiles. The analysis conducted by us also enables deeper insights into the stage-dependent molecular events accompanying IDC progression. The server is publicly available at http://bioinfo.icgeb.res.in/duct-BRCA-CSP.
Collapse
Affiliation(s)
- Shikha Roy
- International Centre for Genetic Engineering and Biotechnology, New Delhi, India
| | - Rakesh Kumar
- International Centre for Genetic Engineering and Biotechnology, New Delhi, India
| | - Vaibhav Mittal
- International Centre for Genetic Engineering and Biotechnology, New Delhi, India
| | - Dinesh Gupta
- International Centre for Genetic Engineering and Biotechnology, New Delhi, India.
| |
Collapse
|
4
|
Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res 2019; 29:1363-1375. [PMID: 31340985 PMCID: PMC6673718 DOI: 10.1101/gr.240663.118] [Citation(s) in RCA: 411] [Impact Index Per Article: 82.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 05/28/2019] [Indexed: 12/25/2022]
Abstract
The prediction of transcription factor (TF) activities from the gene expression of their targets (i.e., TF regulon) is becoming a widely used approach to characterize the functional status of transcriptional regulatory circuits. Several strategies and data sets have been proposed to link the target genes likely regulated by a TF, each one providing a different level of evidence. The most established ones are (1) manually curated repositories, (2) interactions derived from ChIP-seq binding data, (3) in silico prediction of TF binding on gene promoters, and (4) reverse-engineered regulons from large gene expression data sets. However, it is not known how these different sources of regulons affect the TF activity estimations and, thereby, downstream analysis and interpretation. Here we compared the accuracy and biases of these strategies to define human TF regulons by means of their ability to predict changes in TF activities in three reference benchmark data sets. We assembled a collection of TF-target interactions for 1541 human TFs and evaluated how different molecular and regulatory properties of the TFs, such as the DNA-binding domain, specificities, or mode of interaction with the chromatin, affect the predictions of TF activity. We assessed their coverage and found little overlap on the regulons derived from each strategy and better performance by literature-curated information followed by ChIP-seq data. We provide an integrated resource of all TF-target interactions derived through these strategies, with confidence scores, as a resource for enhanced prediction of TF activities.
Collapse
Affiliation(s)
- Luz Garcia-Alonso
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, United Kingdom
- Open Targets, Wellcome Genome Campus, CB10 1SD Cambridge, United Kingdom
| | - Christian H Holland
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, 69120 Heidelberg, Germany
| | - Mahmoud M Ibrahim
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
- Department of Nephrology, RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
| | - Denes Turei
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, 69120 Heidelberg, Germany
| | - Julio Saez-Rodriguez
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, United Kingdom
- Open Targets, Wellcome Genome Campus, CB10 1SD Cambridge, United Kingdom
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), RWTH Aachen University, Faculty of Medicine, 52074 Aachen, Germany
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, 69120 Heidelberg, Germany
| |
Collapse
|
5
|
Dugnani E, Sordi V, Pellegrini S, Chimienti R, Marzinotto I, Pasquale V, Liberati D, Balzano G, Doglioni C, Reni M, Gandolfi A, Falconi M, Lampasona V, Piemonti L. Gene expression analysis of embryonic pancreas development master regulators and terminal cell fate markers in resected pancreatic cancer: A correlation with clinical outcome. Pancreatology 2018; 18:945-953. [PMID: 30293872 DOI: 10.1016/j.pan.2018.09.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 09/03/2018] [Accepted: 09/24/2018] [Indexed: 12/11/2022]
Abstract
BACKGROUND Despite the recent introduction of new drugs and the development of innovative multi-target treatments, the prognosis of pancreatic ductal adenocarcinoma (PDAC) remains very poor. Even when PDAC is resectable, the rate of local or widespread disease recurrence remains particularly high. Currently, reliable prognostic biomarkers of recurrence are lacking. We decided to explore the potential usefulness of pancreatic developmental regulators as biomarkers of PDAC relapse. METHODS We analyzed by quantitative real-time PCR the mRNA of selected factors involved either in pancreatic organogenesis (ISL1, NEUROD1, NGN3, NKX2.2, NKX6.1, PAX4, PAX6, PDX1 and PTF1α) or associated with terminally committed pancreatic cells (CHGA, CHGB, GAD2, GCG, HNF6α, INS, KRT19, SYP) in 17 PDAC cell lines and in frozen tumor samples from 41 PDAC patients. RESULTS High baseline levels of the ISL1, KRT19, PAX6 and PDX1 mRNAs in PDAC cell lines, were risk factors for time-dependent xenograft appearance after subcutaneous injection in CD1-Nude mice. Consistently, in human PDAC samples, high levels of KRT19 mRNA were associated with reduced overall survival and earlier recurrence. Higher levels of PDX1 or PAX6 mRNAs were instead associated with a higher frequency of local recurrence. CONCLUSIONS Our findings suggest that selected factors associated with pancreas development or its terminal differentiation might be implicated in mechanisms of PDAC progression and/or metastatic spread and that the measurement of their mRNA in tumors might be potentially used to improve patient prognostic stratification and prediction of the relapse site.
Collapse
Affiliation(s)
- Erica Dugnani
- Diabetes Research Institute, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Valeria Sordi
- Diabetes Research Institute, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Silvia Pellegrini
- Diabetes Research Institute, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Raniero Chimienti
- Diabetes Research Institute, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Ilaria Marzinotto
- Division of Genetics and Cell Biology, Genomic Unit for the Diagnosis of Human Pathologies, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Valentina Pasquale
- Diabetes Research Institute, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Daniela Liberati
- Division of Genetics and Cell Biology, Genomic Unit for the Diagnosis of Human Pathologies, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Gianpaolo Balzano
- Pancreatic Surgery Unit, Pancreas Translational & Clinical Research Center, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy
| | - Claudio Doglioni
- Department of Pathology, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy; Vita-Salute San Raffaele University, Milan, Italy
| | - Michele Reni
- Department of Medical Oncology, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Alessandra Gandolfi
- Diabetes Research Institute, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Massimo Falconi
- Pancreatic Surgery Unit, Pancreas Translational & Clinical Research Center, IRCCS San Raffaele Scientific Institute, Via Olgettina 60, 20132, Milan, Italy; Vita-Salute San Raffaele University, Milan, Italy
| | - Vito Lampasona
- Division of Genetics and Cell Biology, Genomic Unit for the Diagnosis of Human Pathologies, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Lorenzo Piemonti
- Diabetes Research Institute, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy; Vita-Salute San Raffaele University, Milan, Italy.
| |
Collapse
|