1
|
Kernfeld E, Keener R, Cahan P, Battle A. Transcriptome data are insufficient to control false discoveries in regulatory network inference. Cell Syst 2024; 15:709-724.e13. [PMID: 39173585 DOI: 10.1016/j.cels.2024.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 05/31/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024]
Abstract
Inference of causal transcriptional regulatory networks (TRNs) from transcriptomic data suffers notoriously from false positives. Approaches to control the false discovery rate (FDR), for example, via permutation, bootstrapping, or multivariate Gaussian distributions, suffer from several complications: difficulty in distinguishing direct from indirect regulation, nonlinear effects, and causal structure inference requiring "causal sufficiency," meaning experiments that are free of any unmeasured, confounding variables. Here, we use a recently developed statistical framework, model-X knockoffs, to control the FDR while accounting for indirect effects, nonlinear dose-response, and user-provided covariates. We adjust the procedure to estimate the FDR correctly even when measured against incomplete gold standards. However, benchmarking against chromatin immunoprecipitation (ChIP) and other gold standards reveals higher observed than reported FDR. This indicates that unmeasured confounding is a major driver of FDR in TRN inference. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Eric Kernfeld
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Institute for Cell Engineering, Johns Hopkins Medicine, Baltimore, MD, USA; Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Genetic Medicine, Johns Hopkins Medicine, Baltimore, MD, USA; Malone Center for Engineering and Healthcare, Johns Hopkins University, Baltimore, MD, USA; Data Science and AI Institute, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
2
|
Charon C, Wuillemin PH, Havreng-Théry C, Belmin J. One Month Prediction of Pressure Ulcers in Nursing Home Residents with Bayesian Networks. J Am Med Dir Assoc 2024; 25:104945. [PMID: 38431264 DOI: 10.1016/j.jamda.2024.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/12/2024] [Accepted: 01/17/2024] [Indexed: 03/05/2024]
Abstract
OBJECTIVES Pressure ulcers (PUs) are a common and avoidable condition among residents of nursing homes, and their consequences are severe. Reliable and simple identification of high-risk residents is a major challenge for prevention. Available tools like the Braden and Norton scale have imperfect predictive performance. The objective is to predict the occurrence of PUs in nursing home residents from electronic health record (EHR) data. DESIGN Longitudinal retrospective nested case-control study. SETTING AND PARTICIPANTS EHR database of French nursing homes from 2013 to 2022. METHODS Residents who suffered from PUs were cases and those who did not were controls. For cases, we analyzed the data available in their EHR 1 month before the occurrence of the first PU. For controls, we used available data 1 month before an index date adjusted on the delays of PU onset. We conducted a Bayesian network (BN) analysis, an explainable machine learning method, using 136 input variables of potential medical interest determined with experts. To validate the model, we used scores, features selection, and explainability tools such as Shapley values. RESULTS Among 58,368 residents analyzed, 29% suffered from PUs during their stay. The obtained BN model predicts the occurrence of a PU at a 1-month horizon with a sensitivity of 0.94 (±0.01), a precision of 0.32 (±0.01) and an area under the curve of 0.69 (±0.02). It selects 3 variables: length of stay, delay since last hospitalization, and dependence for transfer. This BN model is suitable and simpler than models provided by other machine learning methods. CONCLUSIONS AND IMPLICATIONS One-month prediction for incident PU is possible in nursing home residents from their EHR data. The study paves the way for the development of a predictive tool fueled by routinely collected data that do not require additional work from health care professionals, thereby opening a new preventive strategy for PUs.
Collapse
Affiliation(s)
- Clara Charon
- LIP6 (UMR 7606), Sorbonne Université, Paris, France; Teranga Software, Paris, France
| | | | | | - Joël Belmin
- LIMICS (UMR 1142), Sorbonne Université, Paris, France; AP-HP, Hôpital Charles-Foix, Ivry-sur-Seine, France.
| |
Collapse
|
3
|
Ribeiro-Dantas MDC, Li H, Cabeli V, Dupuis L, Simon F, Hettal L, Hamy AS, Isambert H. Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients. iScience 2024; 27:109736. [PMID: 38711452 PMCID: PMC11070693 DOI: 10.1016/j.isci.2024.109736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 10/26/2023] [Accepted: 04/10/2024] [Indexed: 05/08/2024] Open
Abstract
Discovering causal effects is at the core of scientific investigation but remains challenging when only observational data are available. In practice, causal networks are difficult to learn and interpret, and limited to relatively small datasets. We report a more reliable and scalable causal discovery method (iMIIC), based on a general mutual information supremum principle, which greatly improves the precision of inferred causal relations while distinguishing genuine causes from putative and latent causal effects. We showcase iMIIC on synthetic and real-world healthcare data from 396,179 breast cancer patients from the US Surveillance, Epidemiology, and End Results program. More than 90% of predicted causal effects appear correct, while the remaining unexpected direct and indirect causal effects can be interpreted in terms of diagnostic procedures, therapeutic timing, patient preference or socio-economic disparity. iMIIC's unique capabilities open up new avenues to discover reliable and interpretable causal networks across a range of research fields.
Collapse
Affiliation(s)
| | - Honghao Li
- CNRS UMR168, Institut Curie, Université PSL, Sorbonne Université, Paris, France
| | - Vincent Cabeli
- CNRS UMR168, Institut Curie, Université PSL, Sorbonne Université, Paris, France
| | - Louise Dupuis
- CNRS UMR168, Institut Curie, Université PSL, Sorbonne Université, Paris, France
| | - Franck Simon
- CNRS UMR168, Institut Curie, Université PSL, Sorbonne Université, Paris, France
| | - Liza Hettal
- CNRS UMR168, Institut Curie, Université PSL, Sorbonne Université, Paris, France
| | - Anne-Sophie Hamy
- INSERM U932, Institut Curie, Paris, France
- Department of Medical Oncology, Institut Curie, Saint-Cloud, France
- Department of Surgery, Institut Curie, Université Paris, Paris, France
| | - Hervé Isambert
- CNRS UMR168, Institut Curie, Université PSL, Sorbonne Université, Paris, France
| |
Collapse
|
4
|
Miladinovic O, Canto PY, Pouget C, Piau O, Radic N, Freschu P, Megherbi A, Brujas Prats C, Jacques S, Hirsinger E, Geeverding A, Dufour S, Petit L, Souyri M, North T, Isambert H, Traver D, Jaffredo T, Charbord P, Durand C. A multistep computational approach reveals a neuro-mesenchymal cell population in the embryonic hematopoietic stem cell niche. Development 2024; 151:dev202614. [PMID: 38451068 PMCID: PMC11057820 DOI: 10.1242/dev.202614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/23/2024] [Indexed: 03/08/2024]
Abstract
The first hematopoietic stem and progenitor cells (HSPCs) emerge in the Aorta-Gonad-Mesonephros (AGM) region of the mid-gestation mouse embryo. However, the precise nature of their supportive mesenchymal microenvironment remains largely unexplored. Here, we profiled transcriptomes of laser micro-dissected aortic tissues at three developmental stages and individual AGM cells. Computational analyses allowed the identification of several cell subpopulations within the E11.5 AGM mesenchyme, with the presence of a yet unidentified subpopulation characterized by the dual expression of genes implicated in adhesive or neuronal functions. We confirmed the identity of this cell subset as a neuro-mesenchymal population, through morphological and lineage tracing assays. Loss of function in the zebrafish confirmed that Decorin, a characteristic extracellular matrix component of the neuro-mesenchyme, is essential for HSPC development. We further demonstrated that this cell population is not merely derived from the neural crest, and hence, is a bona fide novel subpopulation of the AGM mesenchyme.
Collapse
Affiliation(s)
- Olivera Miladinovic
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Pierre-Yves Canto
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Claire Pouget
- Department of Cell and Developmental Biology, University of California San Diego, La Jolla, CA 92093-0380, USA
| | - Olivier Piau
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
- Centre de Recherche Saint-Antoine-Team Proliferation and Differentiation of Stem Cells, Institut Universitaire de Cancérologie, Sorbonne Université, Inserm, UMR-S 938,F-75012 Paris, France
| | - Nevenka Radic
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Priscilla Freschu
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Alexandre Megherbi
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Carla Brujas Prats
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Sebastien Jacques
- Plateforme de génomique, Université de Paris, Institut Cochin, Inserm, CNRS, F-75014 Paris, France
| | - Estelle Hirsinger
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Audrey Geeverding
- Service de microscopie électronique, Fr3631 Institut de Biologie Paris Seine, Sorbonne Université, CNRS, 7-9Quai St-Bernard, 75005 Paris, France
| | - Sylvie Dufour
- Université Paris-Est Créteil, Inserm, IMRB, F94010 Créteil, France
| | - Laurence Petit
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Michele Souyri
- Université de Paris, Inserm UMR 1131, Institut de Recherche Saint Louis, Hôpital Saint Louis, 1 Avenue Claude Vellefaux, 75010 Paris, France
| | - Trista North
- Stem Cell Program, Department of Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA
- Developmental and Regenerative Biology Program, Harvard Medical School, Boston, MA 02115, USA
| | - Hervé Isambert
- Institut Curie, PSL Research University, CNRS UMR168, Paris, France
| | - David Traver
- Department of Cell and Developmental Biology, University of California San Diego, La Jolla, CA 92093-0380, USA
| | - Thierry Jaffredo
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Pierre Charbord
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| | - Charles Durand
- Laboratoire de Biologie du Développement/UMR7622, Institut de Biologie Paris Seine, Sorbonne Université, CNRS, Inserm U1156,9 Quai St-Bernard, 75005 Paris, France
| |
Collapse
|
5
|
Chen X, Ginoux F, Carbo-Tano M, Mora T, Walczak AM, Wyart C. Granger causality analysis for calcium transients in neuronal networks, challenges and improvements. eLife 2023; 12:81279. [PMID: 36749019 PMCID: PMC10017105 DOI: 10.7554/elife.81279] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 02/06/2023] [Indexed: 02/08/2023] Open
Abstract
One challenge in neuroscience is to understand how information flows between neurons in vivo to trigger specific behaviors. Granger causality (GC) has been proposed as a simple and effective measure for identifying dynamical interactions. At single-cell resolution however, GC analysis is rarely used compared to directionless correlation analysis. Here, we study the applicability of GC analysis for calcium imaging data in diverse contexts. We first show that despite underlying linearity assumptions, GC analysis successfully retrieves non-linear interactions in a synthetic network simulating intracellular calcium fluctuations of spiking neurons. We highlight the potential pitfalls of applying GC analysis on real in vivo calcium signals, and offer solutions regarding the choice of GC analysis parameters. We took advantage of calcium imaging datasets from motoneurons in embryonic zebrafish to show how the improved GC can retrieve true underlying information flow. Applied to the network of brainstem neurons of larval zebrafish, our pipeline reveals strong driver neurons in the locus of the mesencephalic locomotor region (MLR), driving target neurons matching expectations from anatomical and physiological studies. Altogether, this practical toolbox can be applied on in vivo population calcium signals to increase the selectivity of GC to infer flow of information across neurons.
Collapse
Affiliation(s)
- Xiaowen Chen
- Laboratoire de physique de l'École normale supérieure, CNRS, PSL UniversityParisFrance
| | - Faustine Ginoux
- Spinal Sensory Signaling team, Sorbonne Université, Paris Brain Institute (Institut du Cerveau, ICM)ParisFrance
| | - Martin Carbo-Tano
- Spinal Sensory Signaling team, Sorbonne Université, Paris Brain Institute (Institut du Cerveau, ICM)ParisFrance
| | - Thierry Mora
- Laboratoire de physique de l'École normale supérieure, CNRS, PSL UniversityParisFrance
| | - Aleksandra M Walczak
- Laboratoire de physique de l'École normale supérieure, CNRS, PSL UniversityParisFrance
| | - Claire Wyart
- Spinal Sensory Signaling team, Sorbonne Université, Paris Brain Institute (Institut du Cerveau, ICM)ParisFrance
| |
Collapse
|
6
|
Hérault L, Poplineau M, Remy E, Duprez E. Single Cell Transcriptomics to Understand HSC Heterogeneity and Its Evolution upon Aging. Cells 2022; 11:3125. [PMID: 36231086 PMCID: PMC9563410 DOI: 10.3390/cells11193125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 09/15/2022] [Accepted: 09/27/2022] [Indexed: 11/16/2022] Open
Abstract
Single-cell transcriptomic technologies enable the uncovering and characterization of cellular heterogeneity and pave the way for studies aiming at understanding the origin and consequences of it. The hematopoietic system is in essence a very well adapted model system to benefit from this technological advance because it is characterized by different cellular states. Each cellular state, and its interconnection, may be defined by a specific location in the global transcriptional landscape sustained by a complex regulatory network. This transcriptomic signature is not fixed and evolved over time to give rise to less efficient hematopoietic stem cells (HSC), leading to a well-documented hematopoietic aging. Here, we review the advance of single-cell transcriptomic approaches for the understanding of HSC heterogeneity to grasp HSC deregulations upon aging. We also discuss the new bioinformatics tools developed for the analysis of the resulting large and complex datasets. Finally, since hematopoiesis is driven by fine-tuned and complex networks that must be interconnected to each other, we highlight how mathematical modeling is beneficial for doing such interconnection between multilayered information and to predict how HSC behave while aging.
Collapse
Affiliation(s)
- Léonard Hérault
- I2M, CNRS, Aix Marseille University, 13009 Marseille, France
- Epigenetic Factors in Normal and Malignant Hematopoiesis Lab., CRCM, CNRS, INSERM, Institut Paoli Calmettes, Aix Marseille University, 13009 Marseille, France
| | - Mathilde Poplineau
- Epigenetic Factors in Normal and Malignant Hematopoiesis Lab., CRCM, CNRS, INSERM, Institut Paoli Calmettes, Aix Marseille University, 13009 Marseille, France
- Equipe Labellisée Ligue Nationale Contre le Cancer, 75013 Paris, France
| | - Elisabeth Remy
- I2M, CNRS, Aix Marseille University, 13009 Marseille, France
| | - Estelle Duprez
- Epigenetic Factors in Normal and Malignant Hematopoiesis Lab., CRCM, CNRS, INSERM, Institut Paoli Calmettes, Aix Marseille University, 13009 Marseille, France
- Equipe Labellisée Ligue Nationale Contre le Cancer, 75013 Paris, France
| |
Collapse
|
7
|
Interactive exploration of a global clinical network from a large breast cancer cohort. NPJ Digit Med 2022; 5:113. [PMID: 35948579 PMCID: PMC9365762 DOI: 10.1038/s41746-022-00647-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 06/27/2022] [Indexed: 11/08/2022] Open
Abstract
Despite unprecedented amount of information now available in medical records, health data remain underexploited due to their heterogeneity and complexity. Simple charts and hypothesis-driven statistics can no longer apprehend the content of information-rich clinical data. There is, therefore, a clear need for powerful interactive visualization tools enabling medical practitioners to perceive the patterns and insights gained by state-of-the-art machine learning algorithms. Here, we report an interactive graphical interface for use as the front end of a machine learning causal inference server (MIIC), to facilitate the visualization and comprehension by clinicians of relationships between clinically relevant variables. The widespread use of such tools, facilitating the interactive exploration of datasets, is crucial both for data visualization and for the generation of research hypotheses. We demonstrate the utility of the MIIC interactive interface, by exploring the clinical network of a large cohort of breast cancer patients treated with neoadjuvant chemotherapy (NAC). This example highlights, in particular, the direct and indirect links between post-NAC clinical responses and patient survival. The MIIC interactive graphical interface has the potential to help clinicians identify actionable nodes and edges in clinical networks, thereby ultimately improving the patient care pathway.
Collapse
|
8
|
Yao J, Zhang Y, Li M, Sun Z, Liu T, Zhao M, Li Z. Single-Cell RNA-Seq Reveals the Promoting Role of Ferroptosis Tendency During Lung Adenocarcinoma EMT Progression. Front Cell Dev Biol 2022; 9:822315. [PMID: 35127731 PMCID: PMC8810644 DOI: 10.3389/fcell.2021.822315] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 12/30/2021] [Indexed: 01/31/2023] Open
Abstract
Epithelial-mesenchymal transition (EMT) and ferroptosis are two important processes in biology. In tumor cells, they are intimately linked. We used single-cell RNA sequencing to investigate the regulatory connection between EMT and ferroptosis tendency in LUAD epithelial cells. We used Seurat to construct the expression matrix using the GEO dataset GSE131907 and extract epithelial cells. We found a positive correlation between the trends of EMT and ferroptosis tendency. Then we used SCENIC to analyze differentially activated transcription factors and constructed a molecular regulatory directed network by causal inference. Some ferroptosis markers (GPX4, SCP2, CAV1) were found to have strong regulatory effects on EMT. Cell communication networks were constructed by iTALK and implied that Ferro_High_EMT_High cells have a higher expression of SDC1, SDC4, and activation of LGALS9-HARVCR2 pathways. By deconvolution of bulk sequencing, the results of CIBERSORTx showed that the co-occurrence of ferroptosis tendency and EMT may lead to tumor metastasis and non-response to immunotherapy. Our findings showed there is a strong correlation between ferroptosis tendency and EMT. Ferroptosis may have a promotive effect on EMT. High propensities of ferroptosis and EMT may lead to poor prognosis and non-response to immunotherapy.
Collapse
Affiliation(s)
- Jiaxi Yao
- Department of Medical Oncology, The First Hospital of China Medical University, Shenyang, China
- Department of Urology, The First Hospital of China Medical University, Shenyang, China
| | - Yuchong Zhang
- Department of Medical Oncology, The First Hospital of China Medical University, Shenyang, China
| | - Mengling Li
- Department of Clinical Epidemiology and Center of Evidence-Based Medicine, The First Hospital of China Medical University, Shenyang, China
| | - Zuyu Sun
- Department of Urology, The First Hospital of China Medical University, Shenyang, China
| | - Tao Liu
- Department of Urology, The First Hospital of China Medical University, Shenyang, China
- *Correspondence: Tao Liu, ; Mingfang Zhao, ; Zhi Li,
| | - Mingfang Zhao
- Department of Medical Oncology, The First Hospital of China Medical University, Shenyang, China
- *Correspondence: Tao Liu, ; Mingfang Zhao, ; Zhi Li,
| | - Zhi Li
- Department of Medical Oncology, The First Hospital of China Medical University, Shenyang, China
- *Correspondence: Tao Liu, ; Mingfang Zhao, ; Zhi Li,
| |
Collapse
|
9
|
Deutschmann IM, Lima-Mendez G, Krabberød AK, Raes J, Vallina SM, Faust K, Logares R. Disentangling environmental effects in microbial association networks. MICROBIOME 2021; 9:232. [PMID: 34823593 PMCID: PMC8620190 DOI: 10.1186/s40168-021-01141-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 07/20/2021] [Indexed: 05/05/2023]
Abstract
BACKGROUND Ecological interactions among microorganisms are fundamental for ecosystem function, yet they are mostly unknown or poorly understood. High-throughput-omics can indicate microbial interactions through associations across time and space, which can be represented as association networks. Associations could result from either ecological interactions between microorganisms, or from environmental selection, where the association is environmentally driven. Therefore, before downstream analysis and interpretation, we need to distinguish the nature of the association, particularly if it is due to environmental selection or not. RESULTS We present EnDED (environmentally driven edge detection), an implementation of four approaches as well as their combination to predict which links between microorganisms in an association network are environmentally driven. The four approaches are sign pattern, overlap, interaction information, and data processing inequality. We tested EnDED on networks from simulated data of 50 microorganisms. The networks contained on average 50 nodes and 1087 edges, of which 60 were true interactions but 1026 false associations (i.e., environmentally driven or due to chance). Applying each method individually, we detected a moderate to high number of environmentally driven edges-87% sign pattern and overlap, 67% interaction information, and 44% data processing inequality. Combining these methods in an intersection approach resulted in retaining more interactions, both true and false (32% of environmentally driven associations). After validation with the simulated datasets, we applied EnDED on a marine microbial network inferred from 10 years of monthly observations of microbial-plankton abundance. The intersection combination predicted that 8.3% of the associations were environmentally driven, while individual methods predicted 24.8% (data processing inequality), 25.7% (interaction information), and up to 84.6% (sign pattern as well as overlap). The fraction of environmentally driven edges among negative microbial associations in the real network increased rapidly with the number of environmental factors. CONCLUSIONS To reach accurate hypotheses about ecological interactions, it is important to determine, quantify, and remove environmentally driven associations in marine microbial association networks. For that, EnDED offers up to four individual methods as well as their combination. However, especially for the intersection combination, we suggest using EnDED with other strategies to reduce the number of false associations and consequently the number of potential interaction hypotheses. Video abstract.
Collapse
Affiliation(s)
- Ina Maria Deutschmann
- Institute of Marine Sciences, CSIC, Passeig Marítim de la Barceloneta, 37-49, 08003 Barcelona, Spain
| | - Gipsi Lima-Mendez
- Research Unit in Biology of Microorganisms (URBM), University of Namur, 61 Rue de Bruxelles, 5000 Namur, Belgium
| | - Anders K. Krabberød
- Department of Biosciences/Section for Genetics and Evolutionary Biology (EVOGENE), University of Oslo, p.b. 1066 Blindern, N-0316 Oslo, Norway
| | - Jeroen Raes
- VIB Center for Microbiology, Herestraat 49-1028, 3000 Leuven, Belgium
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Molecular Bacteriology, Herestraat 49, 3000 Leuven, Belgium
| | - Sergio M. Vallina
- Spanish Institute of Oceanography (IEO - CSIC), Ave Principe de Asturias 70 Bis, 33212 Gijon, Spain
| | - Karoline Faust
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Molecular Bacteriology, Herestraat 49, 3000 Leuven, Belgium
| | - Ramiro Logares
- Institute of Marine Sciences, CSIC, Passeig Marítim de la Barceloneta, 37-49, 08003 Barcelona, Spain
| |
Collapse
|
10
|
Kang Y, Thieffry D, Cantini L. Evaluating the Reproducibility of Single-Cell Gene Regulatory Network Inference Algorithms. Front Genet 2021; 12:617282. [PMID: 33828580 PMCID: PMC8019823 DOI: 10.3389/fgene.2021.617282] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 02/24/2021] [Indexed: 12/13/2022] Open
Abstract
Networks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth. Here, we benchmark six single-cell network inference methods based on their reproducibility, i.e., their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis. Once taking into account networks with up to 100,000 links, GENIE3 results to be the most reproducible algorithm and, together with GRNBoost2, show higher intersection with ground-truth biological interactions. These results are independent from the single-cell sequencing platform, the cell type annotation system and the number of cells constituting the dataset. Finally, GRNBoost2 and CLR show more reproducible performance once a more stringent thresholding is applied to the networks (1,000–100 links). In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.
Collapse
Affiliation(s)
- Yoonjee Kang
- Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR 8197, INSERM U1024, Ecole Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
| | - Denis Thieffry
- Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR 8197, INSERM U1024, Ecole Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
| | - Laura Cantini
- Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR 8197, INSERM U1024, Ecole Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
| |
Collapse
|
11
|
Béal J, Pantolini L, Noël V, Barillot E, Calzone L. Personalized logical models to investigate cancer response to BRAF treatments in melanomas and colorectal cancers. PLoS Comput Biol 2021; 17:e1007900. [PMID: 33507915 PMCID: PMC7872233 DOI: 10.1371/journal.pcbi.1007900] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 02/09/2021] [Accepted: 12/21/2020] [Indexed: 11/19/2022] Open
Abstract
The study of response to cancer treatments has benefited greatly from the contribution of different omics data but their interpretation is sometimes difficult. Some mathematical models based on prior biological knowledge of signaling pathways facilitate this interpretation but often require fitting of their parameters using perturbation data. We propose a more qualitative mechanistic approach, based on logical formalism and on the sole mapping and interpretation of omics data, and able to recover differences in sensitivity to gene inhibition without model training. This approach is showcased by the study of BRAF inhibition in patients with melanomas and colorectal cancers who experience significant differences in sensitivity despite similar omics profiles. We first gather information from literature and build a logical model summarizing the regulatory network of the mitogen-activated protein kinase (MAPK) pathway surrounding BRAF, with factors involved in the BRAF inhibition resistance mechanisms. The relevance of this model is verified by automatically assessing that it qualitatively reproduces response or resistance behaviors identified in the literature. Data from over 100 melanoma and colorectal cancer cell lines are then used to validate the model's ability to explain differences in sensitivity. This generic model is transformed into personalized cell line-specific logical models by integrating the omics information of the cell lines as constraints of the model. The use of mutations alone allows personalized models to correlate significantly with experimental sensitivities to BRAF inhibition, both from drug and CRISPR targeting, and even better with the joint use of mutations and RNA, supporting multi-omics mechanistic models. A comparison of these untrained models with learning approaches highlights similarities in interpretation and complementarity depending on the size of the datasets. This parsimonious pipeline, which can easily be extended to other biological questions, makes it possible to explore the mechanistic causes of the response to treatment, on an individualized basis.
Collapse
Affiliation(s)
- Jonas Béal
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Lorenzo Pantolini
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Vincent Noël
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Laurence Calzone
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| |
Collapse
|
12
|
Desterke C, Petit L, Sella N, Chevallier N, Cabeli V, Coquelin L, Durand C, Oostendorp RAJ, Isambert H, Jaffredo T, Charbord P. Inferring Gene Networks in Bone Marrow Hematopoietic Stem Cell-Supporting Stromal Niche Populations. iScience 2020; 23:101222. [PMID: 32535025 PMCID: PMC7300160 DOI: 10.1016/j.isci.2020.101222] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 03/19/2020] [Accepted: 05/27/2020] [Indexed: 02/07/2023] Open
Abstract
The cardinal property of bone marrow (BM) stromal cells is their capacity to contribute to hematopoietic stem cell (HSC) niches by providing mediators assisting HSC functions. In this study we first contrasted transcriptomes of stromal cells at different developmental stages and then included large number of HSC-supportive and non-supportive samples. Application of a combination of algorithms, comprising one identifying reliable paths and potential causative relationships in complex systems, revealed gene networks characteristic of the BM stromal HSC-supportive capacity and of defined niche populations of perivascular cells, osteoblasts, and mesenchymal stromal cells. Inclusion of single-cell transcriptomes enabled establishing for the perivascular cell subset a partially oriented graph of direct gene-to-gene interactions. As proof of concept we showed that R-spondin-2, expressed by the perivascular subset, synergized with Kit ligand to amplify ex vivo hematopoietic precursors. This study by identifying classifiers and hubs constitutes a resource to unravel candidate BM stromal mediators. A correlation network with predictor genes for the BM HSPC-supportive stromal niche An information theoretic network for the supportive perivascular stromal niche Wnt facilitator Rspo2 together with SCF to amplify ex vivo hematopoietic precursors Resource combining bioinformatics algorithms to search for novel stromal mediators
Collapse
Affiliation(s)
| | - Laurence Petit
- Sorbonne Université, UPMC Université Paris 06, IBPS, CNRS UMR7622, Inserm U 1156, Laboratoire de Biologie du Développement; Paris 75005, France
| | - Nadir Sella
- Institut Curie, PSL Research University, CNRS UMR168, Paris, France
| | - Nathalie Chevallier
- IMRB U955-E10, INSERM, Unité d'Ingenierie et de Thérapie Cellulaire- EFS, Université Paris-EST, Créteil, France
| | - Vincent Cabeli
- Institut Curie, PSL Research University, CNRS UMR168, Paris, France
| | - Laura Coquelin
- IMRB U955-E10, INSERM, Unité d'Ingenierie et de Thérapie Cellulaire- EFS, Université Paris-EST, Créteil, France
| | - Charles Durand
- Sorbonne Université, UPMC Université Paris 06, IBPS, CNRS UMR7622, Inserm U 1156, Laboratoire de Biologie du Développement; Paris 75005, France
| | - Robert A J Oostendorp
- Clinic and Polyclinic for Internal Medicine III, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
| | - Hervé Isambert
- Institut Curie, PSL Research University, CNRS UMR168, Paris, France
| | - Thierry Jaffredo
- Sorbonne Université, UPMC Université Paris 06, IBPS, CNRS UMR7622, Inserm U 1156, Laboratoire de Biologie du Développement; Paris 75005, France
| | - Pierre Charbord
- Sorbonne Université, UPMC Université Paris 06, IBPS, CNRS UMR7622, Inserm U 1156, Laboratoire de Biologie du Développement; Paris 75005, France.
| |
Collapse
|
13
|
Singh PP, Isambert H. OHNOLOGS v2: a comprehensive resource for the genes retained from whole genome duplication in vertebrates. Nucleic Acids Res 2020; 48:D724-D730. [PMID: 31612943 PMCID: PMC7145513 DOI: 10.1093/nar/gkz909] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 09/20/2019] [Accepted: 10/10/2019] [Indexed: 12/12/2022] Open
Abstract
All vertebrates including human have evolved from an ancestor that underwent two rounds of whole genome duplication (2R-WGD). In addition, teleost fish underwent an additional third round of genome duplication (3R-WGD). The genes retained from these genome duplications, so-called ohnologs, have been instrumental in the evolution of vertebrate complexity, development and susceptibility to genetic diseases. However, the identification of vertebrate ohnologs has been challenging, due to lineage specific genome rearrangements since 2R- and 3R-WGD. We previously identified vertebrate ohnologs using a novel synteny comparison across multiple genomes. Here, we refine and apply this approach on 27 vertebrate genomes to identify ohnologs from both 2R- and 3R-WGD, while taking into account the phylogenetically biased sampling of available species. We assemble vertebrate ohnolog pairs and families in an expanded OHNOLOGS v2 database. We find that teleost fish have retained more 2R-WGD ohnologs than mammals and sauropsids, and that these 2R-ohnologs have retained significantly more ohnologs from the subsequent 3R-WGD than genes without 2R-ohnologs. Interestingly, species with fewer extant genes, such as sauropsids, have retained similar or higher proportions of ohnologs. OHNOLOGS v2 should allow deeper evolutionary genomic analysis of the impact of WGD on vertebrates and can be freely accessed at http://ohnologs.curie.fr.
Collapse
Affiliation(s)
- Param Priya Singh
- Institut Curie, Research Center, CNRS UMR168, PSL Research University, 26 rue d'Ulm, 75005, Paris, France
| | - Hervé Isambert
- Institut Curie, Research Center, CNRS UMR168, PSL Research University, 26 rue d'Ulm, 75005, Paris, France
| |
Collapse
|
14
|
Cabeli V, Verny L, Sella N, Uguzzoni G, Verny M, Isambert H. Learning clinical networks from medical records based on information estimates in mixed-type data. PLoS Comput Biol 2020; 16:e1007866. [PMID: 32421707 PMCID: PMC7259796 DOI: 10.1371/journal.pcbi.1007866] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 05/29/2020] [Accepted: 04/10/2020] [Indexed: 12/13/2022] Open
Abstract
The precise diagnostics of complex diseases require to integrate a large amount of information from heterogeneous clinical and biomedical data, whose direct and indirect interdependences are notoriously difficult to assess. To this end, we propose an efficient computational approach to simultaneously compute and assess the significance of multivariate information between any combination of mixed-type (continuous/categorical) variables. The method is then used to uncover direct, indirect and possibly causal relationships between mixed-type data from medical records, by extending a recent machine learning method to reconstruct graphical models beyond simple categorical datasets. The method is shown to outperform existing tools on benchmark mixed-type datasets, before being applied to analyze the medical records of eldery patients with cognitive disorders from La Pitié-Salpêtrière Hospital, Paris. The resulting clinical network visually captures the global interdependences in these medical records and some facets of clinical diagnosis practice, without specific hypothesis nor prior knowledge on any clinically relevant information. In particular, it provides some physiological insights linking the consequence of cerebrovascular accidents to the atrophy of important brain structures associated to cognitive impairment. We developed a machine learning approach to analyze medical records and help clinicians visualize the direct and indirect interrelations between clinical examinations and the variety of syndromes implicated in complex diseases. The reconstruction of such clinical networks is illustrated on the spectrum of cognitive disorders, originating from either neurodegenerative, cerebrovascular or psychiatric dementias. This global network analysis is also shown to uncover novel direct associations and possible cause-effect relationships between clinically relevant information, such as medical examinations, diagnoses, treatments and personal data from patients’ medical records.
Collapse
Affiliation(s)
- Vincent Cabeli
- Institut Curie, PSL Research University, CNRS, UMR168, 26 rue d’Ulm, 75005 Paris, France
- Sorbonne Université, 4, place Jussieu, 75005 Paris, France
| | - Louis Verny
- Institut Curie, PSL Research University, CNRS, UMR168, 26 rue d’Ulm, 75005 Paris, France
- Sorbonne Université, 4, place Jussieu, 75005 Paris, France
| | - Nadir Sella
- Institut Curie, PSL Research University, CNRS, UMR168, 26 rue d’Ulm, 75005 Paris, France
- Sorbonne Université, 4, place Jussieu, 75005 Paris, France
- LIMICS, UMRS 1142, 15 rue de l’école de médecine, 75006 Paris, France
| | - Guido Uguzzoni
- Institut Curie, PSL Research University, CNRS, UMR168, 26 rue d’Ulm, 75005 Paris, France
- Sorbonne Université, 4, place Jussieu, 75005 Paris, France
| | - Marc Verny
- Sorbonne Université, 4, place Jussieu, 75005 Paris, France
- Hôpital La Pitié-Salpêtrière, 47-83 boulevard de l’Hôpital, 75013 Paris, France
- * E-mail: (MV); (HI)
| | - Hervé Isambert
- Institut Curie, PSL Research University, CNRS, UMR168, 26 rue d’Ulm, 75005 Paris, France
- Sorbonne Université, 4, place Jussieu, 75005 Paris, France
- * E-mail: (MV); (HI)
| |
Collapse
|
15
|
Saurty-Seerunghen MS, Bellenger L, El-Habr EA, Delaunay V, Garnier D, Chneiweiss H, Antoniewski C, Morvan-Dubois G, Junier MP. Capture at the single cell level of metabolic modules distinguishing aggressive and indolent glioblastoma cells. Acta Neuropathol Commun 2019; 7:155. [PMID: 31619292 PMCID: PMC6796454 DOI: 10.1186/s40478-019-0819-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 09/29/2019] [Indexed: 02/01/2023] Open
Abstract
Glioblastoma cell ability to adapt their functioning to microenvironment changes is a source of the extensive intra-tumor heterogeneity characteristic of this devastating malignant brain tumor. A systemic view of the metabolic pathways underlying glioblastoma cell functioning states is lacking. We analyzed public single cell RNA-sequencing data from glioblastoma surgical resections, which offer the closest available view of tumor cell heterogeneity as encountered at the time of patients’ diagnosis. Unsupervised analyses revealed that information dispersed throughout the cell transcript repertoires encoded the identity of each tumor and masked information related to cell functioning states. Data reduction based on an experimentally-defined signature of transcription factors overcame this hurdle. It allowed cell grouping according to their tumorigenic potential, regardless of their tumor of origin. The approach relevance was validated using independent datasets of glioblastoma cell and tissue transcriptomes, patient-derived cell lines and orthotopic xenografts. Overexpression of genes coding for amino acid and lipid metabolism enzymes involved in anti-oxidative, energetic and cell membrane processes characterized cells with high tumorigenic potential. Modeling of their expression network highlighted the very long chain polyunsaturated fatty acid synthesis pathway at the core of the network. Expression of its most downstream enzymatic component, ELOVL2, was associated with worsened patient survival, and required for cell tumorigenic properties in vivo. Our results demonstrate the power of signature-driven analyses of single cell transcriptomes to obtain an integrated view of metabolic pathways at play within the heterogeneous cell landscape of patient tumors.
Collapse
|
16
|
Executable pathway analysis using ensemble discrete-state modeling for large-scale data. PLoS Comput Biol 2019; 15:e1007317. [PMID: 31479446 PMCID: PMC6743792 DOI: 10.1371/journal.pcbi.1007317] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 09/13/2019] [Accepted: 08/01/2019] [Indexed: 12/15/2022] Open
Abstract
Pathway analysis is widely used to gain mechanistic insights from high-throughput omics data. However, most existing methods do not consider signal integration represented by pathway topology, resulting in enrichment of convergent pathways when downstream genes are modulated. Incorporation of signal flow and integration in pathway analysis could rank the pathways based on modulation in key regulatory genes. This implementation can be facilitated for large-scale data by discrete state network modeling due to simplicity in parameterization. Here, we model cellular heterogeneity using discrete state dynamics and measure pathway activities in cross-sectional data. We introduce a new algorithm, Boolean Omics Network Invariant-Time Analysis (BONITA), for signal propagation, signal integration, and pathway analysis. Our signal propagation approach models heterogeneity in transcriptomic data as arising from intercellular heterogeneity rather than intracellular stochasticity, and propagates binary signals repeatedly across networks. Logic rules defining signal integration are inferred by genetic algorithm and are refined by local search. The rules determine the impact of each node in a pathway, which is used to score the probability of the pathway's modulation by chance. We have comprehensively tested BONITA for application to transcriptomics data from translational studies. Comparison with state-of-the-art pathway analysis methods shows that BONITA has higher sensitivity at lower levels of source node modulation and similar sensitivity at higher levels of source node modulation. Application of BONITA pathway analysis to previously validated RNA-sequencing studies identifies additional relevant pathways in in-vitro human cell line experiments and in-vivo infant studies. Additionally, BONITA successfully detected modulation of disease specific pathways when comparing relevant RNA-sequencing data with healthy controls. Most interestingly, the two highest impact score nodes identified by BONITA included known drug targets. Thus, BONITA is a powerful approach to prioritize not only pathways but also specific mechanistic role of genes compared to existing methods. BONITA is available at: https://github.com/thakar-lab/BONITA.
Collapse
|