1
|
Babin É, Vigneau E, Antignac JP, Le Bizec B, Cano-Sancho G. Opportunities offered by latent-based multiblock strategies to integrate biomarkers of chemical exposure and biomarkers of effect in environmental health studies. CHEMOSPHERE 2024; 361:142465. [PMID: 38810805 DOI: 10.1016/j.chemosphere.2024.142465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/07/2024] [Accepted: 05/26/2024] [Indexed: 05/31/2024]
Abstract
Modern environmental epidemiology benefits from a new generation of technologies that enable comprehensive profiling of biomarkers, including environmental chemical exposure and omic datasets. The integration and analysis of large and structured datasets to identify functional associations is constrained by computational challenges that cannot be overcome using conventional regression methods. Some extensions of Partial Least Squares (PLS) regression have been developed to efficently integrate multiple datasets, including Multiblock PLS (MB-PLS) and Sequential and Orthogonalized PLS; however, these approaches remain seldom applied in environmental epidemiology. To address that research gap, this study aimed to assess and compare the applicability of PLS-based multiblock models in an observational case study, where biomarkers of exposure to environmental chemicals and endogenous biomarkers of effect were simultaneously integrated to highlight biological links related to a health outcome. The methods were compared with and without sparsity coupling two metrics to support the variable selection: Variable Importance in Projection (VIP) and Selectivity Ratio (SR). The framework was applied to a case-study dataset mimicking the structure of 36 environmental exposure biomarkers (E-block), 61 inflammation biomarkers (M-block), and their relationships with the gestational age at delivery of 161 mother-infant pairs. The results showed an overall consistency in the selected variables across models, although some specific selection patterns were identified. The block-scaled concatenation-based approaches (e.g. MB-PLS) tended to select more variables from the E-block, while these methods were unable to identify certain variables in the M-block. Overall, the number of variables selected using the SR criterion was higher than using the VIP criterion, with lower predictive performances. The multiblock models coupled to VIP, appeared to be the methods of choice for identifying relevant variables with similar statistical performances. Overall, the use of multiblock PLS-based methods appears to be a good strategy to efficiently support the variable selection process in modern environmental epidemiology.
Collapse
|
2
|
Eshawu AB, Ghalsasi VV. Metabolomics of natural samples: A tutorial review on the latest technologies. J Sep Sci 2024; 47:e2300588. [PMID: 37942863 DOI: 10.1002/jssc.202300588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 10/29/2023] [Accepted: 11/06/2023] [Indexed: 11/10/2023]
Abstract
Metabolomics is the study of metabolites present in a living system. It is a rapidly growing field aimed at discovering novel compounds, studying biological processes, diagnosing diseases, and ensuring the quality of food products. Recently, the analysis of natural samples has become important to explore novel bioactive compounds and to study how environment and genetics affect living systems. Various metabolomics techniques, databases, and data analysis tools are available for natural sample metabolomics. However, choosing the right method can be a daunting exercise because natural samples are heterogeneous and require untargeted approaches. This tutorial review aims to compile the latest technologies to guide an early-career scientist on natural sample metabolomics. First, different extraction methods and their pros and cons are reviewed. Second, currently available metabolomics databases and data analysis tools are summarized. Next, recent research on metabolomics of milk, honey, and microbial samples is reviewed. Finally, after reviewing the latest trends in technologies, a checklist is presented to guide an early-career researcher on how to design a metabolomics project. In conclusion, this review is a comprehensive resource for a researcher planning to conduct their first metabolomics analysis. It is also useful for experienced researchers to update themselves on the latest trends in metabolomics.
Collapse
Affiliation(s)
- Ali Baba Eshawu
- School of Biotechnology, Faculty of Applied Sciences and Biotechnology, Shoolini University of Biotechnology and Management Sciences, Solan, India
| | - Vihang Vivek Ghalsasi
- School of Biotechnology, Faculty of Applied Sciences and Biotechnology, Shoolini University of Biotechnology and Management Sciences, Solan, India
| |
Collapse
|
3
|
Tasci E, Jagasia S, Zhuge Y, Sproull M, Cooley Zgela T, Mackey M, Camphausen K, Krauze AV. RadWise: A Rank-Based Hybrid Feature Weighting and Selection Method for Proteomic Categorization of Chemoirradiation in Patients with Glioblastoma. Cancers (Basel) 2023; 15:2672. [PMID: 37345009 DOI: 10.3390/cancers15102672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/03/2023] [Accepted: 05/06/2023] [Indexed: 06/23/2023] Open
Abstract
Glioblastomas (GBM) are rapidly growing, aggressive, nearly uniformly fatal, and the most common primary type of brain cancer. They exhibit significant heterogeneity and resistance to treatment, limiting the ability to analyze dynamic biological behavior that drives response and resistance, which are central to advancing outcomes in glioblastoma. Analysis of the proteome aimed at signal change over time provides a potential opportunity for non-invasive classification and examination of the response to treatment by identifying protein biomarkers associated with interventions. However, data acquired using large proteomic panels must be more intuitively interpretable, requiring computational analysis to identify trends. Machine learning is increasingly employed, however, it requires feature selection which has a critical and considerable effect on machine learning problems when applied to large-scale data to reduce the number of parameters, improve generalization, and find essential predictors. In this study, using 7k proteomic data generated from the analysis of serum obtained from 82 patients with GBM pre- and post-completion of concurrent chemoirradiation (CRT), we aimed to select the most discriminative proteomic features that define proteomic alteration that is the result of administering CRT. Thus, we present a novel rank-based feature weighting method (RadWise) to identify relevant proteomic parameters using two popular feature selection methods, least absolute shrinkage and selection operator (LASSO) and the minimum redundancy maximum relevance (mRMR). The computational results show that the proposed method yields outstanding results with very few selected proteomic features, with higher accuracy rate performance than methods that do not employ a feature selection process. While the computational method identified several proteomic signals identical to the clinical intuitive (heuristic approach), several heuristically identified proteomic signals were not selected while other novel proteomic biomarkers not selected with the heuristic approach that carry biological prognostic relevance in GBM only emerged with the novel method. The computational results show that the proposed method yields promising results, reducing 7k proteomic data to 7 selected proteomic features with a performance value of 93.921%, comparing favorably with techniques that do not employ feature selection.
Collapse
Affiliation(s)
- Erdal Tasci
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| | - Sarisha Jagasia
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| | - Ying Zhuge
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| | - Mary Sproull
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| | - Theresa Cooley Zgela
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| | - Megan Mackey
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| | - Kevin Camphausen
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| | - Andra Valentina Krauze
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA
| |
Collapse
|
4
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
|
5
|
Hiort P, Hugo J, Zeinert J, Müller N, Kashyap S, Rajapakse JC, Azuaje F, Renard BY, Baum K. DrDimont: explainable drug response prediction from differential analysis of multi-omics networks. Bioinformatics 2022; 38:ii113-ii119. [PMID: 36124784 PMCID: PMC9486584 DOI: 10.1093/bioinformatics/btac477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION While it has been well established that drugs affect and help patients differently, personalized drug response predictions remain challenging. Solutions based on single omics measurements have been proposed, and networks provide means to incorporate molecular interactions into reasoning. However, how to integrate the wealth of information contained in multiple omics layers still poses a complex problem. RESULTS We present DrDimont, Drug response prediction from Differential analysis of multi-omics networks. It allows for comparative conclusions between two conditions and translates them into differential drug response predictions. DrDimont focuses on molecular interactions. It establishes condition-specific networks from correlation within an omics layer that are then reduced and combined into heterogeneous, multi-omics molecular networks. A novel semi-local, path-based integration step ensures integrative conclusions. Differential predictions are derived from comparing the condition-specific integrated networks. DrDimont's predictions are explainable, i.e. molecular differences that are the source of high differential drug scores can be retrieved. We predict differential drug response in breast cancer using transcriptomics, proteomics, phosphosite and metabolomics measurements and contrast estrogen receptor positive and receptor negative patients. DrDimont performs better than drug prediction based on differential protein expression or PageRank when evaluating it on ground truth data from cancer cell lines. We find proteomic and phosphosite layers to carry most information for distinguishing drug response. AVAILABILITY AND IMPLEMENTATION DrDimont is available on CRAN: https://cran.r-project.org/package=DrDimont. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pauline Hiort
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Julian Hugo
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Justus Zeinert
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Nataniel Müller
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Spoorthi Kashyap
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | | | - Bernhard Y Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | | |
Collapse
|
6
|
Robin V, Bodein A, Scott-Boyer MP, Leclercq M, Périn O, Droit A. Overview of methods for characterization and visualization of a protein–protein interaction network in a multi-omics integration context. Front Mol Biosci 2022; 9:962799. [PMID: 36158572 PMCID: PMC9494275 DOI: 10.3389/fmolb.2022.962799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/16/2022] [Indexed: 11/26/2022] Open
Abstract
At the heart of the cellular machinery through the regulation of cellular functions, protein–protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.
Collapse
Affiliation(s)
- Vivian Robin
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Mickaël Leclercq
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- *Correspondence: Arnaud Droit,
| |
Collapse
|
7
|
Mannens MMAM, Lombardi MP, Alders M, Henneman P, Bliek J. Further Introduction of DNA Methylation (DNAm) Arrays in Regular Diagnostics. Front Genet 2022; 13:831452. [PMID: 35860466 PMCID: PMC9289263 DOI: 10.3389/fgene.2022.831452] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 06/08/2022] [Indexed: 12/01/2022] Open
Abstract
Methylation tests have been used for decades in regular DNA diagnostics focusing primarily on Imprinting disorders or specific loci annotated to specific disease associated gene promotors. With the introduction of DNA methylation (DNAm) arrays such as the Illumina Infinium HumanMethylation450 Beadchip array or the Illumina Infinium Methylation EPIC Beadchip array (850 k), it has become feasible to study the epigenome in a timely and cost-effective way. This has led to new insights regarding the complexity of well-studied imprinting disorders such as the Beckwith Wiedemann syndrome, but it has also led to the introduction of tests such as EpiSign, implemented as a diagnostic test in which a single array experiment can be compared to databases with known episignatures of multiple genetic disorders, especially neurodevelopmental disorders. The successful use of such DNAm tests is rapidly expanding. More and more disorders are found to be associated with discrete episignatures which enables fast and definite diagnoses, as we have shown. The first examples of environmentally induced clinical disorders characterized by discrete aberrant DNAm are discussed underlining the broad application of DNAm testing in regular diagnostics. Here we discuss exemplary findings in our laboratory covering this broad range of applications and we discuss further use of DNAm tests in the near future.
Collapse
|