1
|
Chilimoniuk J, Erol A, Rödiger S, Burdukiewicz M. Challenges and opportunities in processing NanoString nCounter data. Comput Struct Biotechnol J 2024; 23:1951-1958. [PMID: 38736697 PMCID: PMC11087919 DOI: 10.1016/j.csbj.2024.04.061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/14/2024] Open
Abstract
NanoString nCounter is a medium-throughput technology used in mRNA and miRNA differential expression studies. It offers several advantages, including the absence of an amplification step and the ability to analyze low-grade samples. Despite its considerable strengths, the popularity of the nCounter platform in experimental research stabilized in 2022 and 2023, and this trend may continue in the upcoming years. Such stagnation could potentially be attributed to the absence of a standardized analytical pipeline or the indication of optimal processing methods for nCounter data analysis. To standardize the description of the nCounter data analysis workflow, we divided it into five distinct steps: data pre-processing, quality control, background correction, normalization and differential expression analysis. Next, we evaluated eleven R packages dedicated to nCounter data processing to point out functionalities belonging to these steps and provide comments on their applications in studies of mRNA and miRNA samples.
Collapse
Affiliation(s)
| | - Anna Erol
- Clinical Research Centre, Medical University of Białystok, Białystok, Poland
| | - Stefan Rödiger
- Institute of Biotechnology, Faculty Environment and Natural Sciences, Brandenburg University of Technology Cottbus - Senftenberg, Senftenberg, Germany
| | - Michał Burdukiewicz
- Clinical Research Centre, Medical University of Białystok, Białystok, Poland
- Institute of Biotechnology and Biomedicine, Autonomous University of Barcelona, Barcelona, Spain
| |
Collapse
|
2
|
Niu Z, Kozminsky M, Day KC, Broses LJ, Henderson ML, Patsalis C, Tagett R, Qin Z, Blumberg S, Reichert ZR, Merajver SD, Udager AM, Palmbos PL, Nagrath S, Day ML. Characterization of circulating tumor cells in patients with metastatic bladder cancer utilizing functionalized microfluidics. Neoplasia 2024; 57:101036. [PMID: 39173508 PMCID: PMC11387905 DOI: 10.1016/j.neo.2024.101036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 07/19/2024] [Accepted: 07/28/2024] [Indexed: 08/24/2024]
Abstract
Assessing the molecular profiles of bladder cancer (BC) from patients with locally advanced or metastatic disease provides valuable insights, such as identification of invasive markers, to guide personalized treatment. Currently, most molecular profiling of BC is based on highly invasive biopsy or transurethral tumor resection. Liquid biopsy takes advantage of less-invasive procedures to longitudinally profile disease. Circulating tumor cells (CTCs) isolated from blood are one of the key analytes of liquid biopsy. In this study, we developed a protein and mRNA co-analysis workflow for BC CTCs utilizing the graphene oxide (GO) microfluidic chip. The GO chip was conjugated with antibodies against both EpCAM and EGFR to isolate CTCs from 1 mL of blood drawn from BC patients. Following CTC capture, protein and mRNA were analyzed using immunofluorescent staining and ion-torrent-based whole transcriptome sequencing, respectively. Elevated CTC counts were significantly associated with patient disease status at the time of blood draw. We found a count greater than 2.5 CTCs per mL was associated with shorter overall survival. The invasive markers EGFR, HER2, CD31, and ADAM15 were detected in CTC subpopulations. Whole transcriptome sequencing showed distinct RNA expression profiles from patients with or without tumor burden at the time of blood draw. In patients with advanced metastatic disease, we found significant upregulation of metastasis-related and chemotherapy-resistant genes. This methodology demonstrates the capability of GO chip-based assays to identify tumor-related RNA signatures, highlighting the prognostic potential of CTCs in metastatic BC patients.
Collapse
Affiliation(s)
- Zeqi Niu
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109, USA; Biointerface Institute, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Molly Kozminsky
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109, USA; Biointerface Institute, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Kathleen C Day
- Department of Urology, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Luke J Broses
- Department of Urology, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Marian L Henderson
- Department of Internal Medicine, Hematology Oncology Division, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Christopher Patsalis
- Department of Internal Medicine, Hematology Oncology Division, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Rebecca Tagett
- Bioinformatics Core, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Zhaoping Qin
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sarah Blumberg
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Zachery R Reichert
- Department of Internal Medicine, Hematology Oncology Division, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sofia D Merajver
- Department of Internal Medicine, Hematology Oncology Division, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Aaron M Udager
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Phillip L Palmbos
- Department of Internal Medicine, Hematology Oncology Division, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sunitha Nagrath
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109, USA; Biointerface Institute, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Mark L Day
- Department of Urology, University of Michigan, Ann Arbor, MI 48109, USA; Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
3
|
Yu Y, Mai Y, Zheng Y, Shi L. Assessing and mitigating batch effects in large-scale omics studies. Genome Biol 2024; 25:254. [PMID: 39363244 PMCID: PMC11447944 DOI: 10.1186/s13059-024-03401-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/23/2024] [Indexed: 10/05/2024] Open
Abstract
Batch effects in omics data are notoriously common technical variations unrelated to study objectives, and may result in misleading outcomes if uncorrected, or hinder biomedical discovery if over-corrected. Assessing and mitigating batch effects is crucial for ensuring the reliability and reproducibility of omics data and minimizing the impact of technical variations on biological interpretation. In this review, we highlight the profound negative impact of batch effects and the urgent need to address this challenging problem in large-scale omics studies. We summarize potential sources of batch effects, current progress in evaluating and correcting them, and consortium efforts aiming to tackle them.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
- Cancer Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| |
Collapse
|
4
|
Eissa T, Huber M, Obermayer-Pietsch B, Linkohr B, Peters A, Fleischmann F, Žigman M. CODI: Enhancing machine learning-based molecular profiling through contextual out-of-distribution integration. PNAS NEXUS 2024; 3:pgae449. [PMID: 39440022 PMCID: PMC11495219 DOI: 10.1093/pnasnexus/pgae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 09/07/2024] [Indexed: 10/25/2024]
Abstract
Molecular analytics increasingly utilize machine learning (ML) for predictive modeling based on data acquired through molecular profiling technologies. However, developing robust models that accurately capture physiological phenotypes is challenged by the dynamics inherent to biological systems, variability stemming from analytical procedures, and the resource-intensive nature of obtaining sufficiently representative datasets. Here, we propose and evaluate a new method: Contextual Out-of-Distribution Integration (CODI). Based on experimental observations, CODI generates synthetic data that integrate unrepresented sources of variation encountered in real-world applications into a given molecular fingerprint dataset. By augmenting a dataset with out-of-distribution variance, CODI enables an ML model to better generalize to samples beyond the seed training data, reducing the need for extensive experimental data collection. Using three independent longitudinal clinical studies and a case-control study, we demonstrate CODI's application to several classification tasks involving vibrational spectroscopy of human blood. We showcase our approach's ability to enable personalized fingerprinting for multiyear longitudinal molecular monitoring and enhance the robustness of trained ML models for improved disease detection. Our comparative analyses reveal that incorporating CODI into the classification workflow consistently leads to increased robustness against data variability and improved predictive accuracy.
Collapse
Affiliation(s)
- Tarek Eissa
- Chair of Experimental Physics - Laser Physics, Ludwig-Maximilians-Universität München, Bavaria 85748, Germany
- Laboratory for Attosecond Physics, Max Planck Institute of Quantum Optics, Bavaria 85748, Germany
- School of Computation, Information and Technology, Technical University of Munich, Bavaria 85748, Germany
| | - Marinus Huber
- Chair of Experimental Physics - Laser Physics, Ludwig-Maximilians-Universität München, Bavaria 85748, Germany
- Laboratory for Attosecond Physics, Max Planck Institute of Quantum Optics, Bavaria 85748, Germany
| | - Barbara Obermayer-Pietsch
- Department of Internal Medicine, Division of Endocrinology and Diabetology, Medical University, Styria 8010, Austria
| | - Birgit Linkohr
- Institute of Epidemiology, Helmholtz Zentrum München, Bavaria 85764, Germany
| | - Annette Peters
- Institute of Epidemiology, Helmholtz Zentrum München, Bavaria 85764, Germany
- Chair of Epidemiology, Institute for Medical Information Processing, Biometry and Epidemiology, Medical Faculty, Ludwig-Maximilians-Universität München, Bavaria 81377, Germany
| | - Frank Fleischmann
- Chair of Experimental Physics - Laser Physics, Ludwig-Maximilians-Universität München, Bavaria 85748, Germany
- Laboratory for Attosecond Physics, Max Planck Institute of Quantum Optics, Bavaria 85748, Germany
| | - Mihaela Žigman
- Chair of Experimental Physics - Laser Physics, Ludwig-Maximilians-Universität München, Bavaria 85748, Germany
- Laboratory for Attosecond Physics, Max Planck Institute of Quantum Optics, Bavaria 85748, Germany
| |
Collapse
|
5
|
Hastings J, Lee D, O’Connell MJ. Batch-effect correction in single-cell RNA sequencing data using JIVE. BIOINFORMATICS ADVANCES 2024; 4:vbae134. [PMID: 39387061 PMCID: PMC11461915 DOI: 10.1093/bioadv/vbae134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 07/17/2024] [Accepted: 09/11/2024] [Indexed: 10/12/2024]
Abstract
Motivation In single-cell RNA sequencing analysis, addressing batch effects-technical artifacts stemming from factors such as varying sequencing technologies, equipment, and capture times-is crucial. These factors can cause unwanted variation and obfuscate the underlying biological signal of interest. The joint and individual variation explained (JIVE) method can be used to extract shared biological patterns from multi-source sequencing data while adjusting for individual non-biological variations (i.e. batch effect). However, its current implementation is originally designed for bulk sequencing data, making it computationally infeasible for large-scale single-cell sequencing datasets. Results In this study, we enhance JIVE for large-scale single-cell data by boosting its computational efficiency. Additionally, we introduce a novel application of JIVE for batch-effect correction on multiple single-cell sequencing datasets. Our enhanced method aims to decompose single-cell sequencing datasets into a joint structure capturing the true biological variability and individual structures, which capture technical variability within each batch. This joint structure is then suitable for use in downstream analyses. We benchmarked the results against four popular tools, Seurat v5, Harmony, LIGER, and Combat-seq, which were developed for this purpose. JIVE performed best in terms of preserving cell-type effects and in scenarios in which the batch sizes are balanced. Availability and implementation The JIVE implementation used for this analysis can be found at https://github.com/oconnell-statistics-lab/scJIVE.
Collapse
Affiliation(s)
- Joseph Hastings
- Department of Statistics, Miami University, Oxford, OH 45056, United States
| | - Donghyung Lee
- Department of Statistics, Miami University, Oxford, OH 45056, United States
| | | |
Collapse
|
6
|
Grassi M, Tarantino B. SEMbap: Bow-free covariance search and data de-correlation. PLoS Comput Biol 2024; 20:e1012448. [PMID: 39259748 PMCID: PMC11419354 DOI: 10.1371/journal.pcbi.1012448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 09/23/2024] [Accepted: 08/31/2024] [Indexed: 09/13/2024] Open
Abstract
Large-scale studies of gene expression are commonly influenced by biological and technical sources of expression variation, including batch effects, sample characteristics, and environmental impacts. Learning the causal relationships between observable variables may be challenging in the presence of unobserved confounders. Furthermore, many high-dimensional regression techniques may perform worse. In fact, controlling for unobserved confounding variables is essential, and many deconfounding methods have been suggested for application in a variety of situations. The main contribution of this article is the development of a two-stage deconfounding procedure based on Bow-free Acyclic Paths (BAP) search developed into the framework of Structural Equation Models (SEM), called SEMbap(). In the first stage, an exhaustive search of missing edges with significant covariance is performed via Shipley d-separation tests; then, in the second stage, a Constrained Gaussian Graphical Model (CGGM) is fitted or a low dimensional representation of bow-free edges structure is obtained via Graph Laplacian Principal Component Analysis (gLPCA). We compare four popular deconfounding methods to BAP search approach with applications on simulated and observed expression data. In the former, different structures of the hidden covariance matrix have been replicated. Compared to existing methods, BAP search algorithm is able to correctly identify hidden confounding whilst controlling false positive rate and achieving good fitting and perturbation metrics.
Collapse
Affiliation(s)
- Mario Grassi
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Barbara Tarantino
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| |
Collapse
|
7
|
Jiang Y, Rex DA, Schuster D, Neely BA, Rosano GL, Volkmar N, Momenzadeh A, Peters-Clarke TM, Egbert SB, Kreimer S, Doud EH, Crook OM, Yadav AK, Vanuopadath M, Hegeman AD, Mayta M, Duboff AG, Riley NM, Moritz RL, Meyer JG. Comprehensive Overview of Bottom-Up Proteomics Using Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:338-417. [PMID: 39193565 PMCID: PMC11348894 DOI: 10.1021/acsmeasuresciau.3c00068] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 05/03/2024] [Accepted: 05/03/2024] [Indexed: 08/29/2024]
Abstract
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this Review will serve as a handbook for researchers who are new to the field of bottom-up proteomics.
Collapse
Affiliation(s)
- Yuming Jiang
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Devasahayam Arokia
Balaya Rex
- Center for
Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Dina Schuster
- Department
of Biology, Institute of Molecular Systems
Biology, ETH Zurich, Zurich 8093, Switzerland
- Department
of Biology, Institute of Molecular Biology
and Biophysics, ETH Zurich, Zurich 8093, Switzerland
- Laboratory
of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, Villigen 5232, Switzerland
| | - Benjamin A. Neely
- Chemical
Sciences Division, National Institute of
Standards and Technology, NIST, Charleston, South Carolina 29412, United States
| | - Germán L. Rosano
- Mass
Spectrometry
Unit, Institute of Molecular and Cellular
Biology of Rosario, Rosario, 2000 Argentina
| | - Norbert Volkmar
- Department
of Biology, Institute of Molecular Systems
Biology, ETH Zurich, Zurich 8093, Switzerland
| | - Amanda Momenzadeh
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Trenton M. Peters-Clarke
- Department
of Pharmaceutical Chemistry, University
of California—San Francisco, San Francisco, California, 94158, United States
| | - Susan B. Egbert
- Department
of Chemistry, University of Manitoba, Winnipeg, Manitoba, R3T 2N2 Canada
| | - Simion Kreimer
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Emma H. Doud
- Center
for Proteome Analysis, Indiana University
School of Medicine, Indianapolis, Indiana, 46202-3082, United States
| | - Oliver M. Crook
- Oxford
Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United
Kingdom
| | - Amit Kumar Yadav
- Translational
Health Science and Technology Institute, NCR Biotech Science Cluster 3rd Milestone Faridabad-Gurgaon
Expressway, Faridabad, Haryana 121001, India
| | | | - Adrian D. Hegeman
- Departments
of Horticultural Science and Plant and Microbial Biology, University of Minnesota, Twin Cities, Minnesota 55108, United States
| | - Martín
L. Mayta
- School
of Medicine and Health Sciences, Center for Health Sciences Research, Universidad Adventista del Plata, Libertador San Martin 3103, Argentina
- Molecular
Biology Department, School of Pharmacy and Biochemistry, Universidad Nacional de Rosario, Rosario 2000, Argentina
| | - Anna G. Duboff
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Nicholas M. Riley
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Robert L. Moritz
- Institute
for Systems biology, Seattle, Washington 98109, United States
| | - Jesse G. Meyer
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| |
Collapse
|
8
|
Johnson OD, Paul S, Gutierrez JA, Russell WK, Ward MC. DNA damage-associated protein co-expression network in cardiomyocytes informs on tolerance to genetic variation and disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.14.607863. [PMID: 39185220 PMCID: PMC11343126 DOI: 10.1101/2024.08.14.607863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Cardiovascular disease (CVD) is associated with both genetic variants and environmental factors. One unifying consequence of the molecular risk factors in CVD is DNA damage, which must be repaired by DNA damage response proteins. However, the impact of DNA damage on global cardiomyocyte protein abundance, and its relationship to CVD risk remains unclear. We therefore treated induced pluripotent stem cell-derived cardiomyocytes with the DNA-damaging agent Doxorubicin (DOX) and a vehicle control, and identified 4,178 proteins that contribute to a network comprising 12 co-expressed modules and 403 hub proteins with high intramodular connectivity. Five modules correlate with DOX and represent distinct biological processes including RNA processing, chromatin regulation and metabolism. DOX-correlated hub proteins are depleted for proteins that vary in expression across individuals due to genetic variation but are enriched for proteins encoded by loss-of-function intolerant genes. While proteins associated with genetic risk for CVD, such as arrhythmia are enriched in specific DOX-correlated modules, DOX-correlated hub proteins are not enriched for known CVD risk proteins. Instead, they are enriched among proteins that physically interact with CVD risk proteins. Our data demonstrate that DNA damage in cardiomyocytes induces diverse effects on biological processes through protein co-expression modules that are relevant for CVD, and that the level of protein connectivity in DNA damage-associated modules influences the tolerance to genetic variation.
Collapse
Affiliation(s)
- Omar D. Johnson
- Biochemistry, Cellular and Molecular Biology Graduate Program, University of Texas Medical Branch, Galveston, Texas, USA
- MD-PhD Combined Degree Program, University of Texas Medical Branch, Galveston, Texas, USA
| | - Sayan Paul
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Jose A. Gutierrez
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas, USA
| | - William K. Russell
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Michelle C. Ward
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas, USA
| |
Collapse
|
9
|
Montoto-Louzao J, Gómez-Carballa A, Bello X, Pardo-Seco J, Camino-Mera A, Viz-Lasheras S, Martín MJ, Martinón-Torres F, Salas A. GUANIN: an all-in-one GUi-driven analyzer for NanoString interactive normalization. Bioinformatics 2024; 40:btae462. [PMID: 39051707 PMCID: PMC11333565 DOI: 10.1093/bioinformatics/btae462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/20/2024] [Accepted: 07/24/2024] [Indexed: 07/27/2024] Open
Abstract
SUMMARY Most tools for normalizing NanoString gene expression data, apart from the default NanoString nCounter software, are R packages that focus on technical normalization and lack configurable parameters. However, content normalization is the most sensitive, experiment-specific, and relevant step to preprocess NanoString data. Currently this step requires the use of multiple tools and a deep understanding of data management by the researcher. We present GUANIN, a comprehensive normalization tool that integrates both new and well-established methods, offering a wide variety of options to introduce, filter, choose, and evaluate reference genes for content normalization. GUANIN allows the introduction of genes from an endogenous subset as reference genes, addressing housekeeping-related selection problems. It performs a specific and straightforward normalization approach for each experiment, using a wide variety of parameters with suggested default values. GUANIN provides a large number of informative output files that enable the iterative refinement of the normalization process. In terms of normalization, GUANIN matches or outperforms other available methods. Importantly, it allows researchers to interact comprehensively with the data preprocessing step without programming knowledge, thanks to its easy-to-use Graphical User Interface (GUI). AVAILABILITY AND IMPLEMENTATION GUANIN can be installed with pip install GUANIN and it is available at https://pypi.org/project/guanin/. Source code, documentation, and case studies are available at https://github.com/julimontoto/guanin under the GPLv3 license.
Collapse
Affiliation(s)
- Julián Montoto-Louzao
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15706, Santiago de Compostela, Spain
- Genetics, Vaccines and Infections Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago, Santigo de Compostela, 15706, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, 28029, Spain
| | - Alberto Gómez-Carballa
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15706, Santiago de Compostela, Spain
- Genetics, Vaccines and Infections Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago, Santigo de Compostela, 15706, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, 28029, Spain
| | - Xabier Bello
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15706, Santiago de Compostela, Spain
- Genetics, Vaccines and Infections Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago, Santigo de Compostela, 15706, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, 28029, Spain
| | - Jacobo Pardo-Seco
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15706, Santiago de Compostela, Spain
- Genetics, Vaccines and Infections Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago, Santigo de Compostela, 15706, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, 28029, Spain
| | - Alba Camino-Mera
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15706, Santiago de Compostela, Spain
- Genetics, Vaccines and Infections Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago, Santigo de Compostela, 15706, Spain
| | - Sandra Viz-Lasheras
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15706, Santiago de Compostela, Spain
- Genetics, Vaccines and Infections Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago, Santigo de Compostela, 15706, Spain
| | - María J Martín
- CITIC, Computer Architecture Group, Universidade da Coruña, Facultad de Informática, 15071, A Coruña, Spain
| | - Federico Martinón-Torres
- Genetics, Vaccines and Infections Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago, Santigo de Compostela, 15706, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, 28029, Spain
- Translational Pediatrics and Infectious Diseases, Department of Pediatrics, Hospital Clínico Universitario de Santiago de Compostela, Santiago de Compostela, Choupana s/n, Santiago de Compostela, 15706, Spain
| | - Antonio Salas
- Unidade de Xenética, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and Genética de Poblaciones en Biomedicina (GenPoB) Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15706, Santiago de Compostela, Spain
- Genetics, Vaccines and Infections Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago, Santigo de Compostela, 15706, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBER-ES), Madrid, 28029, Spain
| |
Collapse
|
10
|
Tsang E, Han VX, Flutter C, Alshammery S, Keating BA, Williams T, Gloss BS, Graham ME, Aryamanesh N, Pang I, Wong M, Winlaw D, Cardamone M, Mohammad S, Gold W, Patel S, Dale RC. Ketogenic diet modifies ribosomal protein dysregulation in KMT2D Kabuki syndrome. EBioMedicine 2024; 104:105156. [PMID: 38768529 PMCID: PMC11134553 DOI: 10.1016/j.ebiom.2024.105156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 04/29/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
BACKGROUND Kabuki syndrome (KS) is a genetic disorder caused by DNA mutations in KMT2D, a lysine methyltransferase that methylates histones and other proteins, and therefore modifies chromatin structure and subsequent gene expression. Ketones, derived from the ketogenic diet, are histone deacetylase inhibitors that can 'open' chromatin and encourage gene expression. Preclinical studies have shown that the ketogenic diet rescues hippocampal memory neurogenesis in mice with KS via the epigenetic effects of ketones. METHODS Single-cell RNA sequencing and mass spectrometry-based proteomics were used to explore molecular mechanisms of disease in individuals with KS (n = 4) versus controls (n = 4). FINDINGS Pathway enrichment analysis indicated that loss of function mutations in KMT2D are associated with ribosomal protein dysregulation at an RNA and protein level in individuals with KS (FDR <0.05). Cellular proteomics also identified immune dysregulation and increased abundance of other lysine modification and histone binding proteins, representing a potential compensatory mechanism. A 12-year-old boy with KS, suffering from recurrent episodes of cognitive decline, exhibited improved cognitive function and neuropsychological assessment performance after 12 months on the ketogenic diet, with concomitant improvement in transcriptomic ribosomal protein dysregulation. INTERPRETATION Our data reveals that lysine methyltransferase deficiency is associated with ribosomal protein dysfunction, with secondary immune dysregulation. Diet and the production of bioactive molecules such as ketone bodies serve as a significant environmental factor that can induce epigenetic changes and improve clinical outcomes. Integrating transcriptomic, proteomic, and clinical data can define mechanisms of disease and treatment effects in individuals with neurodevelopmental disorders. FUNDING This study was supported by the Dale NHMRC Investigator Grant (APP1193648) (R.D), Petre Foundation (R.D), and The Sydney Children's Hospital Foundation/Kids Research Early and Mid-Career Researcher Grant (E.T).
Collapse
Affiliation(s)
- Erica Tsang
- Kids Neuroscience Centre, The Children's Hospital at Westmead, Faculty of Medicine and Health, University of Sydney, NSW, Australia; The Children's Hospital at Westmead Clinical School, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - Velda X Han
- Kids Neuroscience Centre, The Children's Hospital at Westmead, Faculty of Medicine and Health, University of Sydney, NSW, Australia; Khoo Teck Puat-National University Children's Medical Institute, National University Health System, Singapore, Singapore; Department of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Chloe Flutter
- The Kabuki Syndrome Foundation - Volunteer, Northbrook, IL, USA
| | - Sarah Alshammery
- Kids Neuroscience Centre, The Children's Hospital at Westmead, Faculty of Medicine and Health, University of Sydney, NSW, Australia; The Children's Hospital at Westmead Clinical School, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - Brooke A Keating
- Kids Neuroscience Centre, The Children's Hospital at Westmead, Faculty of Medicine and Health, University of Sydney, NSW, Australia
| | - Tracey Williams
- Kids Rehab, The Children's Hospital at Westmead, Sydney, NSW, Australia
| | - Brian S Gloss
- Westmead Research Hub, Westmead Institute for Medical Research, Westmead, NSW, Australia
| | - Mark E Graham
- Biomedical Proteomics, Children's Medical Research Institute, The University of Sydney, Australia
| | - Nader Aryamanesh
- Bioinformatics Group, Children's Medical Research Institute, Westmead, Sydney, NSW, Australia; School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Ignatius Pang
- Bioinformatics Group, Children's Medical Research Institute, Westmead, Sydney, NSW, Australia
| | - Melanie Wong
- The Children's Hospital at Westmead, Westmead, NSW, Australia
| | - David Winlaw
- Heart Centre, Ann and Robert H. Lurie Children's Hospital of Chicago and Feinberg School of Medicine, Northwestern University, USA
| | - Michael Cardamone
- Sydney Children's Hospital, Randwick, NSW, Australia; School of Clinical Medicine, University of New South Wales, NSW, Australia
| | - Shekeeb Mohammad
- Kids Neuroscience Centre, The Children's Hospital at Westmead, Faculty of Medicine and Health, University of Sydney, NSW, Australia; The Children's Hospital at Westmead Clinical School, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - Wendy Gold
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia; Molecular Neurobiology Research Laboratory, Kids Research, The Children's Hospital at Westmead & the Children's Medical Research Institute, NSW, Australia
| | - Shrujna Patel
- Kids Neuroscience Centre, The Children's Hospital at Westmead, Faculty of Medicine and Health, University of Sydney, NSW, Australia; The Children's Hospital at Westmead Clinical School, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - Russell C Dale
- Kids Neuroscience Centre, The Children's Hospital at Westmead, Faculty of Medicine and Health, University of Sydney, NSW, Australia; The Children's Hospital at Westmead Clinical School, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia; The Brain and Mind Centre, The University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|
11
|
Khodabakhshi Z, Gabrys H, Wallimann P, Guckenberger M, Andratschke N, Tanadini-Lang S. Magnetic resonance imaging radiomic features stability in brain metastases: Impact of image preprocessing, image-, and feature-level harmonization. Phys Imaging Radiat Oncol 2024; 30:100585. [PMID: 38799810 PMCID: PMC11127267 DOI: 10.1016/j.phro.2024.100585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/23/2024] [Accepted: 05/02/2024] [Indexed: 05/29/2024] Open
Abstract
Background and purpose Magnetic resonance imaging (MRI) scans are highly sensitive to acquisition and reconstruction parameters which affect feature stability and model generalizability in radiomic research. This work aims to investigate the effect of image pre-processing and harmonization methods on the stability of brain MRI radiomic features and the prediction performance of radiomic models in patients with brain metastases (BMs). Materials and methods Two T1 contrast enhanced brain MRI data-sets were used in this study. The first contained 25 BMs patients with scans at two different time points and was used for features stability analysis. The effect of gray level discretization (GLD), intensity normalization (Z-score, Nyul, WhiteStripe, and in house-developed method named N-Peaks), and ComBat harmonization on features stability was investigated and features with intraclass correlation coefficient >0.8 were considered as stable. The second data-set containing 64 BMs patients was used for a classification task to investigate the informativeness of stable features and the effects of harmonization methods on radiomic model performance. Results Applying fixed bin number (FBN) GLD, resulted in higher number of stable features compare to fixed bin size (FBS) discretization (10 ± 5.5 % higher). `Harmonization in feature domain improved the stability for non-normalized and normalized images with Z-score and WhiteStripe methods. For the classification task, keeping the stable features resulted in good performance only for normalized images with N-Peaks along with FBS discretization. Conclusions To develop a robust MRI based radiomic model we recommend using an intensity normalization method based on a reference tissue (e.g N-Peaks) and then using FBS discretization.
Collapse
Affiliation(s)
- Zahra Khodabakhshi
- Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Hubert Gabrys
- Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Philipp Wallimann
- Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Matthias Guckenberger
- Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Nicolaus Andratschke
- Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Stephanie Tanadini-Lang
- Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| |
Collapse
|
12
|
Ferro dos Santos MR, Giuili E, De Koker A, Everaert C, De Preter K. Computational deconvolution of DNA methylation data from mixed DNA samples. Brief Bioinform 2024; 25:bbae234. [PMID: 38762790 PMCID: PMC11102637 DOI: 10.1093/bib/bbae234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/30/2024] [Accepted: 04/30/2024] [Indexed: 05/20/2024] Open
Abstract
In this review, we provide a comprehensive overview of the different computational tools that have been published for the deconvolution of bulk DNA methylation (DNAm) data. Here, deconvolution refers to the estimation of cell-type proportions that constitute a mixed sample. The paper reviews and compares 25 deconvolution methods (supervised, unsupervised or hybrid) developed between 2012 and 2023 and compares the strengths and limitations of each approach. Moreover, in this study, we describe the impact of the platform used for the generation of methylation data (including microarrays and sequencing), the applied data pre-processing steps and the used reference dataset on the deconvolution performance. Next to reference-based methods, we also examine methods that require only partial reference datasets or require no reference set at all. In this review, we provide guidelines for the use of specific methods dependent on the DNA methylation data type and data availability.
Collapse
Affiliation(s)
- Maísa R Ferro dos Santos
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Edoardo Giuili
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Andries De Koker
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Celine Everaert
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Katleen De Preter
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| |
Collapse
|
13
|
Fletez-Brant K, Qiu Y, Gorkin DU, Hu M, Hansen KD. Removing unwanted variation between samples in Hi-C experiments. Brief Bioinform 2024; 25:bbae217. [PMID: 38711367 PMCID: PMC11074651 DOI: 10.1093/bib/bbae217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 01/26/2024] [Accepted: 04/24/2024] [Indexed: 05/08/2024] Open
Abstract
Hi-C data are commonly normalized using single sample processing methods, with focus on comparisons between regions within a given contact map. Here, we aim to compare contact maps across different samples. We demonstrate that unwanted variation, of likely technical origin, is present in Hi-C data with replicates from different individuals, and that properties of this unwanted variation change across the contact map. We present band-wise normalization and batch correction, a method for normalization and batch correction of Hi-C data and show that it substantially improves comparisons across samples, including in a quantitative trait loci analysis as well as differential enrichment across cell types.
Collapse
Affiliation(s)
- Kipper Fletez-Brant
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltmore, MD 21205, USA
| | - Yunjiang Qiu
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
- Ludwig Institute for Cancer Research, New York, NY 10016, USA
| | - David U Gorkin
- Ludwig Institute for Cancer Research, New York, NY 10016, USA
- Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, CA 92093, USA
- Currently: Department of Biology. Emory University. Atlanta, GA 30322, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44196, USA
| | - Kasper D Hansen
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltmore, MD 21205, USA
| |
Collapse
|
14
|
Jiang Y, Rex DAB, Schuster D, Neely BA, Rosano GL, Volkmar N, Momenzadeh A, Peters-Clarke TM, Egbert SB, Kreimer S, Doud EH, Crook OM, Yadav AK, Vanuopadath M, Mayta ML, Duboff AG, Riley NM, Moritz RL, Meyer JG. Comprehensive Overview of Bottom-Up Proteomics using Mass Spectrometry. ARXIV 2023:arXiv:2311.07791v1. [PMID: 38013887 PMCID: PMC10680866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods to aid the novice and experienced researcher. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this work to serve as a basic resource for new practitioners in the field of shotgun or bottom-up proteomics.
Collapse
Affiliation(s)
- Yuming Jiang
- Department of Computational Biomedicine, Cedars Sinai Medical Center
| | - Devasahayam Arokia Balaya Rex
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Dina Schuster
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland; Department of Biology, Institute of Molecular Biology and Biophysics, ETH Zurich, Zurich 8093, Switzerland; Laboratory of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, Villigen 5232, Switzerland
| | - Benjamin A. Neely
- Chemical Sciences Division, National Institute of Standards and Technology, NIST Charleston · Funded by NIST
| | - Germán L. Rosano
- Mass Spectrometry Unit, Institute of Molecular and Cellular Biology of Rosario, Rosario, Argentina · Funded by Grant PICT 2019-02971 (Agencia I+D+i)
| | - Norbert Volkmar
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland
| | - Amanda Momenzadeh
- Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California, USA
| | | | - Susan B. Egbert
- Department of Chemistry, University of Manitoba, Winnipeg, Cananda
| | - Simion Kreimer
- Smidt Heart Institute, Cedars Sinai Medical Center; Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center
| | - Emma H. Doud
- Center for Proteome Analysis, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Oliver M. Crook
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute · Funded by Grant BT/PR16456/BID/7/624/2016 (Department of Biotechnology, India); Grant Translational Research Program (TRP) at THSTI funded by DBT
| | - Muralidharan Vanuopadath
- School of Biotechnology, Amrita Vishwa Vidyapeetham, Kollam-690 525, Kerala, India · Funded by Department of Health Research, Indian Council of Medical Research, Government of India (File No.R.12014/31/2022-HR)
| | - Martín L. Mayta
- School of Medicine and Health Sciences, Center for Health Sciences Research, Universidad Adventista del Plata, Libertador San Martín 3103, Argentina; Molecular Biology Department, School of Pharmacy and Biochemistry, Universidad Nacional de Rosario, Rosario 2000, Argentina
| | - Anna G. Duboff
- Department of Chemistry, University of Washington · Funded by Summer Research Acceleration Fellowship, Department of Chemistry, University of Washington
| | - Nicholas M. Riley
- Department of Chemistry, University of Washington · Funded by National Institutes of Health Grant R00 GM147304
| | - Robert L. Moritz
- Institute for Systems biology, Seattle, WA, USA, 98109 · Funded by National Institutes of Health Grants R01GM087221, R24GM127667, U19AG023122, S10OD026936; National Science Foundation Award 1920268
| | - Jesse G. Meyer
- Department of Computational Biomedicine, Cedars Sinai Medical Center · Funded by National Institutes of Health Grant R21 AG074234; National Institutes of Health Grant R35 GM142502
| |
Collapse
|
15
|
Khodabakhshi Z, Amini M, Hajianfar G, Oveisi M, Shiri I, Zaidi H. Dual-Centre Harmonised Multimodal Positron Emission Tomography/Computed Tomography Image Radiomic Features and Machine Learning Algorithms for Non-small Cell Lung Cancer Histopathological Subtype Phenotype Decoding. Clin Oncol (R Coll Radiol) 2023; 35:713-725. [PMID: 37599160 DOI: 10.1016/j.clon.2023.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 06/10/2023] [Accepted: 08/05/2023] [Indexed: 08/22/2023]
Abstract
AIMS We aimed to build radiomic models for classifying non-small cell lung cancer (NSCLC) histopathological subtypes through a dual-centre dataset and comprehensively evaluate the effect of ComBat harmonisation on the performance of single- and multimodality radiomic models. MATERIALS AND METHODS A public dataset of NSCLC patients from two independent centres was used. Two image fusion methods, namely guided filtering-based fusion and image fusion based on visual saliency map and weighted least square optimisation, were used. Radiomic features were extracted from each scan, including first-order, texture and moment-invariant features. Subsequently, ComBat harmonisation was applied to the extracted features from computed tomography (CT), positron emission tomography (PET) and fused images to correct the centre effect. For feature selection, least absolute shrinkage and selection operator (Lasso) and recursive feature elimination (RFE) were investigated. For machine learning, logistic regression (LR), support vector machine (SVM) and AdaBoost were evaluated for classifying NSCLC subtypes. Training and evaluation of the models were carried out in a robust framework to offset plausible errors and performance was reported using area under the curve, balanced accuracy, sensitivity and specificity before and after harmonisation. N-way ANOVA was used to assess the effect of different factors on the performance of the models. RESULTS Support vector machine fed with selected features by recursive feature elimination from a harmonised PET feature set achieved the highest performance (area under the curve = 0.82) in classifying NSCLC histopathological subtypes. Although the performance of the models did not significantly improve for CT images after harmonisation, the performance of PET and guided filtering-based fusion feature signatures significantly improved for almost all models. Although the selection of the image modality and feature selection methods was effective on the performance of the model (ANOVA P-values <0.001), machine learning and harmonisation did not change the performance significantly (ANOVA P-values = 0.839 and 0.292, respectively). CONCLUSION This study confirmed the potential of radiomic analysis on PET, CT and hybrid images for histopathological classification of NSCLC subtypes.
Collapse
Affiliation(s)
- Z Khodabakhshi
- Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Science, Tehran, Iran
| | - M Amini
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - G Hajianfar
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - M Oveisi
- Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Science, Tehran, Iran; Comprehensive Cancer Centre, School of Cancer & Pharmaceutical Sciences, Faculty of Life Sciences & Medicine, Kings College London, London, UK; Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
| | - I Shiri
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - H Zaidi
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland; Geneva University Neurocenter, Geneva University, Geneva, Switzerland; Department of Nuclear Medicine and Molecular Imaging, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Department of Nuclear Medicine, University of Southern Denmark, Odense, Denmark.
| |
Collapse
|
16
|
Lupancu TJ, Lee KM, Eivazitork M, Hor C, Fleetwood AJ, Cook AD, Olshansky M, Turner SJ, de Steiger R, Lim K, Hamilton JA, Achuthan AA. Epigenetic and transcriptional regulation of CCL17 production by glucocorticoids in arthritis. iScience 2023; 26:108079. [PMID: 37860753 PMCID: PMC10583050 DOI: 10.1016/j.isci.2023.108079] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 08/17/2023] [Accepted: 09/25/2023] [Indexed: 10/21/2023] Open
Abstract
Glucocorticoids (GCs) are potent anti-inflammatory agents and are broadly used in treating rheumatoid arthritis (RA) patients, albeit with adverse side effects associated with long-term usage. The negative consequences of GC therapy provide an impetus for research into gaining insights into the molecular mechanisms of GC action. We have previously reported that granulocyte-macrophage colony-stimulating factor (GM-CSF)-induced CCL17 has a non-redundant role in inflammatory arthritis. Here, we provide molecular evidence that GCs can suppress GM-CSF-mediated upregulation of IRF4 and CCL17 expression via downregulating JMJD3 expression and activity. In mouse models of inflammatory arthritis, GC treatment inhibited CCL17 expression and ameliorated arthritic pain-like behavior and disease. Significantly, GC treatment of RA patient peripheral blood mononuclear cells ex vivo resulted in decreased CCL17 production. This delineated pathway potentially provides new therapeutic options for the treatment of many inflammatory conditions, where GCs are used as an anti-inflammatory drug but without the associated adverse side effects.
Collapse
Affiliation(s)
- Tanya J. Lupancu
- Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Kevin M.C. Lee
- Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Mahtab Eivazitork
- Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Cecil Hor
- Department of Medicine, Western Health, The University of Melbourne, St Albans, VIC 3021, Australia
| | - Andrew J. Fleetwood
- Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Haematopoiesis and Leukocyte Biology, Baker IDI Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Andrew D. Cook
- Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Moshe Olshansky
- Department of Microbiology, Monash University, Clayton, VIC 3800, Australia
| | - Stephen J. Turner
- Department of Microbiology, Monash University, Clayton, VIC 3800, Australia
| | - Richard de Steiger
- Department of Surgery, Epworth HealthCare, The University of Melbourne, Richmond, VIC 3121, Australia
| | - Keith Lim
- Department of Medicine, Western Health, The University of Melbourne, St Albans, VIC 3021, Australia
| | - John A. Hamilton
- Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Adrian A. Achuthan
- Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| |
Collapse
|
17
|
Downing T, Angelopoulos N. A primer on correlation-based dimension reduction methods for multi-omics analysis. J R Soc Interface 2023; 20:20230344. [PMID: 37817584 PMCID: PMC10565429 DOI: 10.1098/rsif.2023.0344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/19/2023] [Indexed: 10/12/2023] Open
Abstract
The continuing advances of omic technologies mean that it is now more tangible to measure the numerous features collectively reflecting the molecular properties of a sample. When multiple omic methods are used, statistical and computational approaches can exploit these large, connected profiles. Multi-omics is the integration of different omic data sources from the same biological sample. In this review, we focus on correlation-based dimension reduction approaches for single omic datasets, followed by methods for pairs of omics datasets, before detailing further techniques for three or more omic datasets. We also briefly detail network methods when three or more omic datasets are available and which complement correlation-oriented tools. To aid readers new to this area, these are all linked to relevant R packages that can implement these procedures. Finally, we discuss scenarios of experimental design and present road maps that simplify the selection of appropriate analysis methods. This review will help researchers navigate emerging methods for multi-omics and integrating diverse omic datasets appropriately. This raises the opportunity of implementing population multi-omics with large sample sizes as omics technologies and our understanding improve.
Collapse
Affiliation(s)
- Tim Downing
- Pirbright Institute, Pirbright, Surrey, UK
- Department of Biotechnology, Dublin City University, Dublin, Ireland
| | | |
Collapse
|
18
|
Zheng J, Wu J, D'Amour A, Franks A. Sensitivity to Unobserved Confounding in Studies with Factor-structured Outcomes. J Am Stat Assoc 2023; 119:2026-2037. [PMID: 39493289 PMCID: PMC11528154 DOI: 10.1080/01621459.2023.2240053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 04/17/2023] [Accepted: 07/13/2023] [Indexed: 11/05/2024]
Abstract
In this work, we propose an approach for assessing sensitivity to unobserved confounding in studies with multiple outcomes. We demonstrate how prior knowledge unique to the multi-outcome setting can be leveraged to strengthen causal conclusions beyond what can be achieved from analyzing individual outcomes in isolation. We argue that it is often reasonable to make a shared confounding assumption, under which residual dependence amongst outcomes can be used to simplify and sharpen sensitivity analyses. We focus on a class of factor models for which we can bound the causal effects for all outcomes conditional on a single sensitivity parameter that represents the fraction of treatment variance explained by unobserved confounders. We characterize how causal ignorance regions shrink under additional prior assumptions about the presence of null control outcomes, and provide new approaches for quantifying the robustness of causal effect estimates. Finally, we illustrate our sensitivity analysis workflow in practice, in an analysis of both simulated data and a case study with data from the National Health and Nutrition Examination Survey (NHANES).
Collapse
|
19
|
Leach DT, Stratton KG, Irvahn J, Richardson R, Webb-Robertson BJM, Bramer LM. malbacR: A Package for Standardized Implementation of Batch Correction Methods for Omics Data. Anal Chem 2023; 95:12195-12199. [PMID: 37551970 DOI: 10.1021/acs.analchem.3c01289] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2023]
Abstract
Mass spectrometry is a powerful tool for identifying and analyzing biomolecules such as metabolites and lipids in complex biological samples. Liquid chromatography and gas chromatography mass spectrometry studies quite commonly involve large numbers of samples, which can require significant time for sample preparation and analyses. To accommodate such studies, the samples are commonly split into batches. Inevitably, variations in sample handling, temperature fluctuation, imprecise timing, column degradation, and other factors result in systematic errors or biases of the measured abundances between the batches. Numerous methods are available via R packages to assist with batch correction for omics data; however, since these methods were developed by different research teams, the algorithms are available in separate R packages, each with different data input and output formats. We introduce the malbacR package, which consolidates 11 common batch effect correction methods for omics data into one place so users can easily implement and compare the following: pareto scaling, power scaling, range scaling, ComBat, EigenMS, NOMIS, RUV-random, QC-RLSC, WaveICA2.0, TIGER, and SERRF. The malbacR package standardizes data input and output formats across these batch correction methods. The package works in conjunction with the pmartR package, allowing users to seamlessly include the batch effect correction in a pmartR workflow without needing any additional data manipulation.
Collapse
Affiliation(s)
- Damon T Leach
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354, United States
| | - Kelly G Stratton
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354, United States
| | - Jan Irvahn
- Artificial Intelligence and Data Analytics Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354, United States
| | - Rachel Richardson
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354, United States
| | - Bobbie-Jo M Webb-Robertson
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354, United States
| | - Lisa M Bramer
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99354, United States
| |
Collapse
|
20
|
Ye H, Zhang X, Wang C, Goode EL, Chen J. Batch-effect correction with sample remeasurement in highly confounded case-control studies. NATURE COMPUTATIONAL SCIENCE 2023; 3:709-719. [PMID: 38177326 PMCID: PMC10993308 DOI: 10.1038/s43588-023-00500-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 07/11/2023] [Indexed: 01/06/2024]
Abstract
Batch effects are pervasive in biomedical studies. One approach to address the batch effects is repeatedly measuring a subset of samples in each batch. These remeasured samples are used to estimate and correct the batch effects. However, rigorous statistical methods for batch-effect correction with remeasured samples are severely underdeveloped. Here we developed a framework for batch-effect correction using remeasured samples in highly confounded case-control studies. We provided theoretical analyses of the proposed procedure, evaluated its power characteristics and provided a power calculation tool to aid in the study design. We found that the number of samples that need to be remeasured depends strongly on the between-batch correlation. When the correlation is high, remeasuring a small subset of samples is possible to rescue most of the power.
Collapse
Affiliation(s)
- Hanxuan Ye
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University, College Station, TX, USA.
| | - Chen Wang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Ellen L Goode
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Jun Chen
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
21
|
Li T, Zhang Y, Patil P, Johnson WE. Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference. Biostatistics 2023; 24:635-652. [PMID: 34893807 PMCID: PMC10449015 DOI: 10.1093/biostatistics/kxab039] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 08/08/2021] [Accepted: 10/18/2021] [Indexed: 12/13/2022] Open
Abstract
Nonignorable technical variation is commonly observed across data from multiple experimental runs, platforms, or studies. These so-called batch effects can lead to difficulty in merging data from multiple sources, as they can severely bias the outcome of the analysis. Many groups have developed approaches for removing batch effects from data, usually by accommodating batch variables into the analysis (one-step correction) or by preprocessing the data prior to the formal or final analysis (two-step correction). One-step correction is often desirable due it its simplicity, but its flexibility is limited and it can be difficult to include batch variables uniformly when an analysis has multiple stages. Two-step correction allows for richer models of batch mean and variance. However, prior investigation has indicated that two-step correction can lead to incorrect statistical inference in downstream analysis. Generally speaking, two-step approaches introduce a correlation structure in the corrected data, which, if ignored, may lead to either exaggerated or diminished significance in downstream applications such as differential expression analysis. Here, we provide more intuitive and more formal evaluations of the impacts of two-step batch correction compared to existing literature. We demonstrate that the undesired impacts of two-step correction (exaggerated or diminished significance) depend on both the nature of the study design and the batch effects. We also provide strategies for overcoming these negative impacts in downstream analyses using the estimated correlation matrix of the corrected data. We compare the results of our proposed workflow with the results from other published one-step and two-step methods and show that our methods lead to more consistent false discovery controls and power of detection across a variety of batch effect scenarios. Software for our method is available through GitHub (https://github.com/jtleek/sva-devel) and will be available in future versions of the $\texttt{sva}$ R package in the Bioconductor project (https://bioconductor.org/packages/release/bioc/html/sva.html).
Collapse
Affiliation(s)
- Tenglong Li
- Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, 111 Ren’ai Road,
Dushu Lake Higher Education Town, Suzhou Industrial Park, Suzhou 215123,
Jiangsu Province, PRC
| | - Yuqing Zhang
- Clinical Bioinformatics, Gilead Sciences, Inc., 333 Lakeside
Dr, Foster City, CA 94404
| | - Prasad Patil
- Department of Biostatistics, School of Public Health, 801
Massachusetts Ave. Boston, MA 02118, USA
| | - W Evan Johnson
- Division of Computational Biomedicine, School of Medicine, 72
E. Concord Street, Boston, MA 02118, USA and Department of Biostatistics, School of
Public Health, 801 Massachusetts Ave. Boston, MA 02118, USA
| |
Collapse
|
22
|
Ni A, Liu M, Qin LX. BatMan: Mitigating Batch Effects Via Stratification for Survival Outcome Prediction. JCO Clin Cancer Inform 2023; 7:e2200138. [PMID: 37335961 PMCID: PMC10530623 DOI: 10.1200/cci.22.00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 01/31/2023] [Indexed: 06/21/2023] Open
Abstract
Reproducible translation of transcriptomics data has been hampered by the ubiquitous presence of batch effects. Statistical methods for managing batch effects were initially developed in the setting of sample group comparison and later borrowed for other settings such as survival outcome prediction. The most notable such method is ComBat, which adjusts for batches by including it as a covariate alongside sample groups in a linear regression. In survival prediction, however, ComBat is used without definable groups for survival outcome and is done sequentially with survival regression for a potentially batch-confounded outcome. To address these issues, we propose a new method called BATch MitigAtion via stratificatioN (BatMan). It adjusts batches as strata in survival regression and uses variable selection methods such as the regularized regression to handle high dimensionality. We assess the performance of BatMan in comparison with ComBat, each used either alone or in conjunction with data normalization, in a resampling-based simulation study under various levels of predictive signal strength and patterns of batch-outcome association. Our simulations show that (1) BatMan outperforms ComBat in nearly all scenarios when there are batch effects in the data and (2) their performance can be worsened by the addition of data normalization. We further evaluate them using microRNA data for ovarian cancer from the Cancer Genome Atlas and find that BatMan outforms ComBat while the addition of data normalization worsens the prediction. Our study thus shows the advantage of BatMan and raises caution about the use of data normalization in the context of developing survival prediction models. The BatMan method and the simulation tool for performance assessment are implemented in R and publicly available at LXQin/PRECISION.survival-GitHub.
Collapse
Affiliation(s)
- Ai Ni
- Division of Biostatistics, College of Public Health, Ohio State University, Columbus, OH
| | - Mengling Liu
- Department of Population Health, New York University, New York, NY
| | - Li-Xuan Qin
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| |
Collapse
|
23
|
Olbrich M, Künstner A, Busch H. MBECS: Microbiome Batch Effects Correction Suite. BMC Bioinformatics 2023; 24:182. [PMID: 37138207 PMCID: PMC10155362 DOI: 10.1186/s12859-023-05252-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 03/20/2023] [Indexed: 05/05/2023] Open
Abstract
Despite the availability of batch effect correcting algorithms (BECA), no comprehensive tool that combines batch correction and evaluation of the results exists for microbiome datasets. This work outlines the Microbiome Batch Effects Correction Suite development that integrates several BECAs and evaluation metrics into a software package for the statistical computation framework R.
Collapse
Affiliation(s)
- Michael Olbrich
- Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany.
- Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany.
- Center for Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates.
| | - Axel Künstner
- Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany
- Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany
| | - Hauke Busch
- Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
24
|
Fishov H, Muchtar E, Salmon‐Divon M, Dispenzieri A, Zvida T, Schneider C, Bender B, Duek A, Leiba M, Shpilberg O, Hershkovitz‐Rokah O. AL amyloidosis clonal plasma cells are regulated by microRNAs and dependent on anti-apoptotic BCL2 family members. Cancer Med 2023; 12:8199-8210. [PMID: 36694297 PMCID: PMC10134277 DOI: 10.1002/cam4.5621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 12/07/2022] [Accepted: 12/19/2022] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Noncoding RNAs such as microRNAs (miRNAs) have attracted attention as biological pathway regulators, which differ from chromosomal translocations and gene point mutations. Their involvement in the molecular mechanisms underlying light chain (AL) amyloidosis pathogenesis is yet to be elucidated. AIMS To decipher specific miRNA expression profile in AL-amyloidosis and to examine how miRNAs are involved in AL pathogenesis. METHODS The expression profile of miRNAs and mRNA from bone marrow (BM)-derived CD138+ cells were determined using the NanoString nCounter assay and RNA-Seq, respectively. The effect of aberrantly expressed miRNAs on potential molecular targets was analyzed by qRT-PCR, Western blot, Mito-potential assay, and Annexin-PI staining. RESULTS Genes which were significantly differentially expressed between AL-amyloidosis and MM, were found to be involved in cell growth and apoptotic mechanisms. Specifically, BCL2L1, MCL1, and BCL2 were upregulated in AL-amyloidosis compared with MM and controls. The levels of miR-181a-5p and miR-9-5p, which regulate the above-mentioned genes, were lower in BM samples from AL-amyloidosis compared with controls, providing a mechanism for BCL2 family gene upregulation. When miR-9-5p and miR-181a-5p were overexpressed in ALMC1 cells, BCL2L1, MCL1, and BCL2 were downregulated and induced apoptosis. Treatment of ALMC-1 cells with venetoclax, (BCL-2 inhibitor), resulted in the upregulation of those miRNAs, the downregulation of BCL2, MCL1, and BCL2L1 mRNA and protein levels, and subsequent apoptosis. CONCLUSION Our findings suggest that miR-9-5p and miR-181a-5p act as tumor-suppressors whose downregulation induces anti-apoptotic mechanisms underlying the pathogenesis of AL-amyloidosis. The study highlights the post-transcriptional regulation in AL-amyloidosis and provides pathogenetic evidence for the potential use of BCL-2 inhibitors in this disease.
Collapse
Affiliation(s)
- Hila Fishov
- Department of Molecular Biology, Faculty of Natural SciencesAriel UniversityArielIsrael
- Translational Research Lab, Assuta Medical CentersTel‐AvivIsrael
| | - Eli Muchtar
- Division of HematologyDepartment of Internal Medicine, Mayo ClinicRochesterMinnesotaUSA
| | - Mali Salmon‐Divon
- Department of Molecular Biology, Faculty of Natural SciencesAriel UniversityArielIsrael
- Adelson School of MedicineAriel UniversityArielIsrael
| | - Angela Dispenzieri
- Division of HematologyDepartment of Internal Medicine, Mayo ClinicRochesterMinnesotaUSA
| | - Tal Zvida
- Department of Molecular Biology, Faculty of Natural SciencesAriel UniversityArielIsrael
- Translational Research Lab, Assuta Medical CentersTel‐AvivIsrael
| | | | | | - Adrian Duek
- Institute of HematologyAssuta Ashdod University Hospital, Faculty of Health Science Ben‐Gurion University of the NegevBeer ShevaIsrael
| | - Merav Leiba
- Institute of HematologyAssuta Ashdod University Hospital, Faculty of Health Science Ben‐Gurion University of the NegevBeer ShevaIsrael
| | - Ofer Shpilberg
- Translational Research Lab, Assuta Medical CentersTel‐AvivIsrael
- Adelson School of MedicineAriel UniversityArielIsrael
- Institute of Hematology, Assuta Medical CentersTel‐AvivIsrael
| | - Oshrat Hershkovitz‐Rokah
- Department of Molecular Biology, Faculty of Natural SciencesAriel UniversityArielIsrael
- Translational Research Lab, Assuta Medical CentersTel‐AvivIsrael
| |
Collapse
|
25
|
Yosef A, Shnaider E, Schneider M, Gurevich M. Heuristic normalization procedure for batch effect correction. Soft comput 2023. [DOI: 10.1007/s00500-023-08049-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
26
|
Cui Y, Pu H, Shi X, Miao W, Tchetgen Tchetgen E. Semiparametric proximal causal inference. J Am Stat Assoc 2023. [DOI: 10.1080/01621459.2023.2191817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Affiliation(s)
- Yifan Cui
- Center for Data Science, Zhejiang University
| | - Hongming Pu
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania
| | - Xu Shi
- Department of Biostatistics, University of Michigan
| | - Wang Miao
- Department of Probability and Statistics, Peking University
| | - Eric Tchetgen Tchetgen
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania
| |
Collapse
|
27
|
Carry PM, Vigers T, Vanderlinden LA, Keeter C, Dong F, Buckner T, Litkowski E, Yang I, Norris JM, Kechris K. Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments. BMC Bioinformatics 2023; 24:86. [PMID: 36882691 PMCID: PMC9990331 DOI: 10.1186/s12859-023-05202-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 02/22/2023] [Indexed: 03/09/2023] Open
Abstract
BACKGROUND We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case-control study (30 per group) with a covariate (case vs control, represented as β1, set to be null) and two biologically relevant confounding variables (age, represented as β2, and hemoglobin A1c (HbA1c), represented as β3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the 'true' dataset (CAPN13 gene). RESULTS Pre-batch correction, under the null hypothesis (β1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (β2 and β3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses. CONCLUSIONS Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation.
Collapse
Affiliation(s)
- Patrick M Carry
- Colorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical Campus, 12631 E. 17Th Ave, Room 4602, Mail Stop B202, Aurora, CO, 80045, USA. .,Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA.
| | - Tim Vigers
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA.,Barbara Davis Center for Diabetes, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lauren A Vanderlinden
- Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA.,Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| | - Carson Keeter
- Colorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical Campus, 12631 E. 17Th Ave, Room 4602, Mail Stop B202, Aurora, CO, 80045, USA
| | - Fran Dong
- Barbara Davis Center for Diabetes, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Teresa Buckner
- Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA
| | - Elizabeth Litkowski
- Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA
| | - Ivana Yang
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Jill M Norris
- Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| |
Collapse
|
28
|
Foltz SM, Greene CS, Taroni JN. Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously. Commun Biol 2023; 6:222. [PMID: 36841852 PMCID: PMC9968332 DOI: 10.1038/s42003-023-04588-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 02/13/2023] [Indexed: 02/27/2023] Open
Abstract
Large compendia of gene expression data have proven valuable for the discovery of novel biological relationships. Historically, most available RNA assays were run on microarray, while RNA-seq is now the platform of choice for many new experiments. The data structure and distributions between the platforms differ, making it challenging to combine them directly. Here we perform supervised and unsupervised machine learning evaluations to assess which existing normalization methods are best suited for combining microarray and RNA-seq data. We find that quantile and Training Distribution Matching normalization allow for supervised and unsupervised model training on microarray and RNA-seq data simultaneously. Nonparanormal normalization and z-scores are also appropriate for some applications, including pathway analysis with Pathway-Level Information Extractor (PLIER). We demonstrate that it is possible to perform effective cross-platform normalization using existing methods to combine microarray and RNA-seq data for machine learning applications.
Collapse
Affiliation(s)
- Steven M Foltz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Wynnewood, PA, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA.
| | - Jaclyn N Taroni
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Wynnewood, PA, USA.
| |
Collapse
|
29
|
Yasa J, Reed CE, Bournazos AM, Evesson FJ, Pang I, Graham ME, Wark JR, Nijagal B, Kwan KH, Kwiatkowski T, Jung R, Weisleder N, Cooper ST, Lemckert FA. Minimal expression of dysferlin prevents development of dysferlinopathy in dysferlin exon 40a knockout mice. Acta Neuropathol Commun 2023; 11:15. [PMID: 36653852 PMCID: PMC9847081 DOI: 10.1186/s40478-022-01473-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 11/03/2022] [Indexed: 01/19/2023] Open
Abstract
Dysferlin is a Ca2+-activated lipid binding protein implicated in muscle membrane repair. Recessive variants in DYSF result in dysferlinopathy, a progressive muscular dystrophy. We showed previously that calpain cleavage within a motif encoded by alternatively spliced exon 40a releases a 72 kDa C-terminal minidysferlin recruited to injured sarcolemma. Herein we use CRISPR/Cas9 gene editing to knock out murine Dysf exon 40a, to specifically assess its role in membrane repair and development of dysferlinopathy. We created three Dysf exon 40a knockout (40aKO) mouse lines that each express different levels of dysferlin protein ranging from ~ 90%, ~ 50% and ~ 10-20% levels of wild-type. Histopathological analysis of skeletal muscles from all 12-month-old 40aKO lines showed virtual absence of dystrophic features and normal membrane repair capacity for all three 40aKO lines, as compared with dysferlin-null BLAJ mice. Further, lipidomic and proteomic analyses on 18wk old quadriceps show all three 40aKO lines are spared the profound lipidomic/proteomic imbalance that characterises dysferlin-deficient BLAJ muscles. Collective results indicate that membrane repair does not depend upon calpain cleavage within exon 40a and that ~ 10-20% of WT dysferlin protein expression is sufficient to maintain the muscle lipidome, proteome and membrane repair capacity to crucially prevent development of dysferlinopathy.
Collapse
Affiliation(s)
- Joe Yasa
- grid.413973.b0000 0000 9690 854XKids Neuroscience Centre, The Children’s Hospital at Westmead, Cnr Hawkesbury Road, Hainsworth Street, Westmead, Sydney, NSW 2145 Australia ,grid.414235.50000 0004 0619 2154Functional Neuromics, Children’s Medical Research Institute, Westmead, Sydney, NSW Australia
| | - Claudia E. Reed
- grid.413973.b0000 0000 9690 854XKids Neuroscience Centre, The Children’s Hospital at Westmead, Cnr Hawkesbury Road, Hainsworth Street, Westmead, Sydney, NSW 2145 Australia ,grid.1013.30000 0004 1936 834XDiscipline of Child and Adolescent Health, Faculty of Medicine, University of Sydney, Sydney, NSW Australia
| | - Adam M. Bournazos
- grid.413973.b0000 0000 9690 854XKids Neuroscience Centre, The Children’s Hospital at Westmead, Cnr Hawkesbury Road, Hainsworth Street, Westmead, Sydney, NSW 2145 Australia ,grid.1013.30000 0004 1936 834XDiscipline of Child and Adolescent Health, Faculty of Medicine, University of Sydney, Sydney, NSW Australia
| | - Frances J. Evesson
- grid.413973.b0000 0000 9690 854XKids Neuroscience Centre, The Children’s Hospital at Westmead, Cnr Hawkesbury Road, Hainsworth Street, Westmead, Sydney, NSW 2145 Australia ,grid.414235.50000 0004 0619 2154Functional Neuromics, Children’s Medical Research Institute, Westmead, Sydney, NSW Australia ,grid.1013.30000 0004 1936 834XDiscipline of Child and Adolescent Health, Faculty of Medicine, University of Sydney, Sydney, NSW Australia
| | - Ignatius Pang
- grid.414235.50000 0004 0619 2154Synapse Proteomics, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW Australia
| | - Mark E. Graham
- grid.414235.50000 0004 0619 2154Synapse Proteomics, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW Australia
| | - Jesse R. Wark
- grid.1013.30000 0004 1936 834XOperations, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW Australia
| | - Brunda Nijagal
- grid.1008.90000 0001 2179 088XMetabolomics Australia, Bio21 Institute, The University of Melbourne, Victoria, Australia
| | - Kim H. Kwan
- grid.1008.90000 0001 2179 088XMetabolomics Australia, Bio21 Institute, The University of Melbourne, Victoria, Australia
| | - Thomas Kwiatkowski
- grid.268132.c0000 0001 0701 2416West Chester University, West Chester, PA 19383 USA
| | - Rachel Jung
- grid.412332.50000 0001 1545 0811Department of Physiology and Cell Biology, Dorothy M. Davis Heart and Lung Research Institute, The Ohio State University Wexner Medical Center, Columbus, OH 43210-1252 USA
| | - Noah Weisleder
- grid.412332.50000 0001 1545 0811Department of Physiology and Cell Biology, Dorothy M. Davis Heart and Lung Research Institute, The Ohio State University Wexner Medical Center, Columbus, OH 43210-1252 USA
| | - Sandra T. Cooper
- grid.413973.b0000 0000 9690 854XKids Neuroscience Centre, The Children’s Hospital at Westmead, Cnr Hawkesbury Road, Hainsworth Street, Westmead, Sydney, NSW 2145 Australia ,grid.414235.50000 0004 0619 2154Functional Neuromics, Children’s Medical Research Institute, Westmead, Sydney, NSW Australia ,grid.1013.30000 0004 1936 834XDiscipline of Child and Adolescent Health, Faculty of Medicine, University of Sydney, Sydney, NSW Australia
| | - Frances A. Lemckert
- grid.413973.b0000 0000 9690 854XKids Neuroscience Centre, The Children’s Hospital at Westmead, Cnr Hawkesbury Road, Hainsworth Street, Westmead, Sydney, NSW 2145 Australia ,grid.414235.50000 0004 0619 2154Functional Neuromics, Children’s Medical Research Institute, Westmead, Sydney, NSW Australia ,grid.1013.30000 0004 1936 834XDiscipline of Child and Adolescent Health, Faculty of Medicine, University of Sydney, Sydney, NSW Australia
| |
Collapse
|
30
|
Yao Y, Zhang H, Tu L, Yu T, Chen B, Huang P, Hu Y, Luan T. Normalization Approach by a Reference Material to Improve LC-MS-Based Metabolomic Data Comparability of Multibatch Samples. Anal Chem 2023; 95:1309-1317. [PMID: 36538611 DOI: 10.1021/acs.analchem.2c04188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Large cohorts of samples from multiple batches are usually required for global metabolomic studies to characterize the metabolic state of human disease. As such, it is critical to eliminate systematic variation and truly reveal the biologically associated alterations. In this study, we proposed a reference material-based approach (Ref-M) for data correction by liquid chromatography-mass spectrometry and represented by an analysis of multibatch human serum samples. The reference material was generated by mixing serum from healthy donors and distributed to each extraction batch of subject samples. Pooled quality control samples and isotopic internal standards were then applied in each acquisition batch for data quality control. Finally, each metabolite in subject samples was normalized by its counterpart in the reference serum. We demonstrated that Ref-M significantly enhanced the numbers of efficient features and effectively eliminated the batch variation of 522 serum samples of healthy individuals, benign pulmonary nodules, and lung cancer patients. Twenty differential metabolites were identified to distinguish lung cancer from healthy controls in the training set. The discriminant model was validated in an independent data set with an area under the receiver operating characteristics (ROC) curve (AUC) of 0.853. Another 40 serum samples further tested with Ref-M were achieved an AUC of 0.843 by the established model. Our results showed that the reference material-based approach presents the potential to improve the data comparability and precision for biomarker discovery in large-scale metabolomic studies.
Collapse
Affiliation(s)
- Yao Yao
- Sate Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou510275, China
| | - Hui Zhang
- Metabolic Innovation Center, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou510080, China.,School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Guangzhou510006, China.,Platform of Metabolomics, Center for Precision Medicine, Sun Yat-Sen University, Guangzhou510080, China
| | - Lanyin Tu
- Sate Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou510275, China
| | - Tiantian Yu
- Metabolic Innovation Center, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou510080, China
| | - Baowei Chen
- Southern Marine Science and Engineering Guangdong Laboratory, School of Marine Sciences, Sun Yat-Sen University, Zhuhai519082, China
| | - Peng Huang
- State Key Laboratory of Oncology in South China, Cancer Metabolism and Intervention Research Center, Sun Yat-Sen University Cancer Center, Guangzhou510060, China.,Metabolic Innovation Center, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou510080, China
| | - Yumin Hu
- State Key Laboratory of Oncology in South China, Cancer Metabolism and Intervention Research Center, Sun Yat-Sen University Cancer Center, Guangzhou510060, China.,Metabolic Innovation Center, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou510080, China
| | - Tiangang Luan
- Sate Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou510275, China.,Institute of Environmental and Ecological Engineering, Guangdong University of Technology, Guangzhou510006, China
| |
Collapse
|
31
|
Zhang D, Zheng C, Zhu T, Yang F, Zhou Y. Identification of key module and hub genes in pulpitis using weighted gene co-expression network analysis. BMC Oral Health 2023; 23:2. [PMID: 36593446 PMCID: PMC9808982 DOI: 10.1186/s12903-022-02638-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 11/30/2022] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Pulpitis is a common disease mainly caused by bacteria. Conventional approaches of diagnosing the state of dental pulp are mainly based on clinical symptoms, thereby harbor deficiencies. The accurate and rapid diagnosis of pulpitis is important for choosing the suitable therapy. The study aimed to identify pulpits related key genes by integrating micro-array data analysis and systems biology network-based methods such as weighted gene co-expression network analysis (WGCNA). METHODS The micro-array data of 13 inflamed pulp and 11 normal pulp were acquired from Gene Expression Omnibus (GEO). WGCNA was utilized to establish a genetic network and categorize genes into diverse modules. Hub genes in the most associated module to pulpitis were screened out using high module group members (MM) methods. Pulpitis model in rat was constructed and iRoot BP plus was applied to cap pulp. Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) was used for validation of hub genes. RESULTS WGCNA was established and genes were categorized into 22 modules. The darkgrey module had the highest correlation with pulpitis among them. A total of 5 hub genes (HMOX1, LOX, ACTG1, STAT3, GNB5) were identified. RT-qPCR proved the differences in expression levels of HMOX1, LOX, ACTG1, STAT3, GNB5 in inflamed dental pulp. Pulp capping reversed the expression level of HMOX1, LOX, ACTG1. CONCLUSION The study was the first to produce a holistic view of pulpitis, screen out and validate hub genes involved in pulpitis using WGCNA method. Pulp capping using iRoot BP plus could reverse partial hub genes.
Collapse
Affiliation(s)
- Denghui Zhang
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Hangzhou, 310006, China
| | - Chen Zheng
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Hangzhou, 310006, China
| | - Tianer Zhu
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Hangzhou, 310006, China
| | - Fan Yang
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Hangzhou, 310006, China
| | - Yiqun Zhou
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Hangzhou, 310006, China.
| |
Collapse
|
32
|
Molania R, Foroutan M, Gagnon-Bartsch JA, Gandolfo LC, Jain A, Sinha A, Olshansky G, Dobrovic A, Papenfuss AT, Speed TP. Removing unwanted variation from large-scale RNA sequencing data with PRPS. Nat Biotechnol 2023; 41:82-95. [PMID: 36109686 PMCID: PMC9849124 DOI: 10.1038/s41587-022-01440-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Accepted: 06/30/2022] [Indexed: 01/22/2023]
Abstract
Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
Collapse
Affiliation(s)
- Ramyar Molania
- Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Melbourne, Victoria, Australia.
| | - Momeneh Foroutan
- Biomedicine Discovery Institute and the Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria, Australia
| | | | - Luke C Gandolfo
- Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Victoria, Australia
- School of Mathematics and Statistics, The University of Melbourne, Melbourne, Victoria, Australia
| | - Aryan Jain
- Department of Economics and Statistics, Monash University, Melbourne, Victoria, Australia
| | - Abhishek Sinha
- Department of Economics and Statistics, Monash University, Melbourne, Victoria, Australia
| | - Gavriel Olshansky
- Metabolomics Laboratory, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Baker Department of Cardiometabolic Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Alexander Dobrovic
- Department of Surgery, The University of Melbourne, Austin Health, Heidelberg, Victoria, Australia
| | - Anthony T Papenfuss
- Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Melbourne, Victoria, Australia.
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, Victoria, Australia.
| | - Terence P Speed
- Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- School of Mathematics and Statistics, The University of Melbourne, Melbourne, Victoria, Australia.
| |
Collapse
|
33
|
Yosef A, Shnaider E, Schneider M, Gurevich M. Normalization of Large-Scale Transcriptome Data Using Heuristic Methods. Bioinform Biol Insights 2023; 17:11779322231160397. [PMID: 37020503 PMCID: PMC10068970 DOI: 10.1177/11779322231160397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 02/09/2023] [Indexed: 04/03/2023] Open
Abstract
In this study, we introduce an artificial intelligent method for addressing the batch effect of a transcriptome data. The method has several clear advantages in comparison with the alternative methods presently in use. Batch effect refers to the discrepancy in gene expression data series, measured under different conditions. While the data from the same batch (measurements performed under the same conditions) are compatible, combining various batches into 1 data set is problematic because of incompatible measurements. Therefore, it is necessary to perform correction of the combined data (normalization), before performing biological analysis. There are numerous methods attempting to correct data set for batch effect. These methods rely on various assumptions regarding the distribution of the measurements. Forcing the data elements into pre-supposed distribution can severely distort biological signals, thus leading to incorrect results and conclusions. As the discrepancy between the assumptions regarding the data distribution and the actual distribution is wider, the biases introduced by such “correction methods” are greater. We introduce a heuristic method to reduce batch effect. The method does not rely on any assumptions regarding the distribution and the behavior of data elements. Hence, it does not introduce any new biases in the process of correcting the batch effect. It strictly maintains the integrity of measurements within the original batches.
Collapse
|
34
|
Oberhofer A, Bronkhorst AJ, Uhlig C, Ungerer V, Holdenrieder S. Tracing the Origin of Cell-Free DNA Molecules through Tissue-Specific Epigenetic Signatures. Diagnostics (Basel) 2022; 12:diagnostics12081834. [PMID: 36010184 PMCID: PMC9406971 DOI: 10.3390/diagnostics12081834] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 07/15/2022] [Accepted: 07/25/2022] [Indexed: 12/11/2022] Open
Abstract
All cell and tissue types constantly release DNA fragments into human body fluids by various mechanisms including programmed cell death, accidental cell degradation and active extrusion. Particularly, cell-free DNA (cfDNA) in plasma or serum has been utilized for minimally invasive molecular diagnostics. Disease onset or pathological conditions that lead to increased cell death alter the contribution of different tissues to the total pool of cfDNA. Because cfDNA molecules retain cell-type specific epigenetic features, it is possible to infer tissue-of-origin from epigenetic characteristics. Recent research efforts demonstrated that analysis of, e.g., methylation patterns, nucleosome occupancy, and fragmentomics determined the cell- or tissue-of-origin of individual cfDNA molecules. This novel tissue-of origin-analysis enables to estimate the contributions of different tissues to the total cfDNA pool in body fluids and find tissues with increased cell death (pathologic condition), expanding the portfolio of liquid biopsies towards a wide range of pathologies and early diagnosis. In this review, we summarize the currently available tissue-of-origin approaches and point out the next steps towards clinical implementation.
Collapse
|
35
|
Wang Y, Sun F, Lin W, Zhang S. AC-PCoA: Adjustment for confounding factors using principal coordinate analysis. PLoS Comput Biol 2022; 18:e1010184. [PMID: 35830390 PMCID: PMC9278763 DOI: 10.1371/journal.pcbi.1010184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 05/08/2022] [Indexed: 12/01/2022] Open
Abstract
Confounding factors exist widely in various biological data owing to technical variations, population structures and experimental conditions. Such factors may mask the true signals and lead to spurious associations in the respective biological data, making it necessary to adjust confounding factors accordingly. However, existing confounder correction methods were mainly developed based on the original data or the pairwise Euclidean distance, either one of which is inadequate for analyzing different types of data, such as sequencing data. In this work, we proposed a method called Adjustment for Confounding factors using Principal Coordinate Analysis, or AC-PCoA, which reduces data dimension and extracts the information from different distance measures using principal coordinate analysis, and adjusts confounding factors across multiple datasets by minimizing the associations between lower-dimensional representations and confounding variables. Application of the proposed method was further extended to classification and prediction. We demonstrated the efficacy of AC-PCoA on three simulated datasets and five real datasets. Compared to the existing methods, AC-PCoA shows better results in visualization, statistical testing, clustering, and classification. With today’s unprecedented amount of data, researchers are challenged by the need to enhance meaningful signals without the interference of unwanted confounders hidden inside the data. Data visualization is an important step toward exploring and explaining data in order to intuitively identify the dominant patterns. Principal coordinate analysis (PCoA), as a visualization tool, allows flexible ways to define pairwise distances and project the samples into lower dimensions without changing the distances. However, when visualizing large-scale biological datasets, the true patterns are often hindered by unwanted confounding variations, either biologically or technically in origin. To eliminate these confounding factors and recover underlying signals, we proposed a method called Adjustment for Confounding factors using Principal Coordinate Analysis, or AC-PCoA, and showed that it significantly outperforms existing methods in visualization through three simulation studies and five real datasets. We further showed that the low-dimensional representations given by AC-PCoA provide promising results in statistical testing, clustering, and classification as well.
Collapse
Affiliation(s)
- Yu Wang
- School of Mathematical Sciences, Fudan University, Shanghai, China
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, California, United States of America
| | - Wei Lin
- School of Mathematical Sciences, Fudan University, Shanghai, China
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, MOE Frontiers Center for Brain Science, and Institutes of Brain Science, Fudan University, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- Key Laboratory of Mathematics for Nonlinear Science (Fudan University), Ministry of Education, Shanghai, China
- Shanghai Key Laboratory for Contemporary Applied Mathematics (Fudan University), Shanghai, China
| | - Shuqin Zhang
- School of Mathematical Sciences, Fudan University, Shanghai, China
- Key Laboratory of Mathematics for Nonlinear Science (Fudan University), Ministry of Education, Shanghai, China
- Shanghai Key Laboratory for Contemporary Applied Mathematics (Fudan University), Shanghai, China
- * E-mail:
| |
Collapse
|
36
|
Salim A, Molania R, Wang J, De Livera A, Thijssen R, Speed TP. RUV-III-NB: normalization of single cell RNA-seq data. Nucleic Acids Res 2022; 50:e96. [PMID: 35758618 PMCID: PMC9458465 DOI: 10.1093/nar/gkac486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 05/18/2022] [Accepted: 05/27/2022] [Indexed: 12/01/2022] Open
Abstract
Normalization of single cell RNA-seq data remains a challenging task. The performance of different methods can vary greatly between datasets when unwanted factors and biology are associated. Most normalization methods also only remove the effects of unwanted variation for the cell embedding but not from gene-level data typically used for differential expression (DE) analysis to identify marker genes. We propose RUV-III-NB, a method that can be used to remove unwanted variation from both the cell embedding and gene-level counts. Using pseudo-replicates, RUV-III-NB explicitly takes into account potential association with biology when removing unwanted variation. The method can be used for both UMI or read counts and returns adjusted counts that can be used for downstream analyses such as clustering, DE and pseudotime analyses. Using published datasets with different technological platforms, kinds of biology and levels of association between biology and unwanted variation, we show that RUV-III-NB manages to remove library size and batch effects, strengthen biological signals, improve DE analyses, and lead to results exhibiting greater concordance with independent datasets of the same kind. The performance of RUV-III-NB is consistent and is not sensitive to the number of factors assumed to contribute to the unwanted variation.
Collapse
Affiliation(s)
- Agus Salim
- Melbourne School of Population and Global Health, University of Melbourne, VIC 3053, Australia.,Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research Parkville, VIC 3052, Australia.,School of Mathematics and Statistics, University of Melbourne, VIC 3010, Australia.,Baker Heart and Diabetes Institute Melbourne, VIC 3004, Australia.,Department of Mathematics and Statistics, La Trobe University, VIC 3086, Australia
| | - Ramyar Molania
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research Parkville, VIC 3052, Australia
| | - Jianan Wang
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research Parkville, VIC 3052, Australia.,Department of Medical Biology, University of Melbourne, VIC 3010, Australia
| | - Alysha De Livera
- Melbourne School of Population and Global Health, University of Melbourne, VIC 3053, Australia.,Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research Parkville, VIC 3052, Australia.,Baker Heart and Diabetes Institute Melbourne, VIC 3004, Australia.,Department of Mathematics and Statistics, La Trobe University, VIC 3086, Australia.,School of Science, RMIT University, Melbourne VIC 3000, Australia
| | - Rachel Thijssen
- Blood Cells and Blood Cancer Division, Walter and Eliza Hall Institute of Medical Research, Parkville VIC 3052, Australia
| | - Terence P Speed
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research Parkville, VIC 3052, Australia.,School of Mathematics and Statistics, University of Melbourne, VIC 3010, Australia
| |
Collapse
|
37
|
O'Neill AC, Uzbas F, Antognolli G, Merino F, Draganova K, Jäck A, Zhang S, Pedini G, Schessner JP, Cramer K, Schepers A, Metzger F, Esgleas M, Smialowski P, Guerrini R, Falk S, Feederle R, Freytag S, Wang Z, Bahlo M, Jungmann R, Bagni C, Borner GHH, Robertson SP, Hauck SM, Götz M. Spatial centrosome proteome of human neural cells uncovers disease-relevant heterogeneity. Science 2022; 376:eabf9088. [PMID: 35709258 DOI: 10.1126/science.abf9088] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The centrosome provides an intracellular anchor for the cytoskeleton, regulating cell division, cell migration, and cilia formation. We used spatial proteomics to elucidate protein interaction networks at the centrosome of human induced pluripotent stem cell-derived neural stem cells (NSCs) and neurons. Centrosome-associated proteins were largely cell type-specific, with protein hubs involved in RNA dynamics. Analysis of neurodevelopmental disease cohorts identified a significant overrepresentation of NSC centrosome proteins with variants in patients with periventricular heterotopia (PH). Expressing the PH-associated mutant pre-mRNA-processing factor 6 (PRPF6) reproduced the periventricular misplacement in the developing mouse brain, highlighting missplicing of transcripts of a microtubule-associated kinase with centrosomal location as essential for the phenotype. Collectively, cell type-specific centrosome interactomes explain how genetic variants in ubiquitous proteins may convey brain-specific phenotypes.
Collapse
Affiliation(s)
- Adam C O'Neill
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany
| | - Fatma Uzbas
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany
| | - Giulia Antognolli
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany
| | - Florencia Merino
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany
| | - Kalina Draganova
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany
| | - Alex Jäck
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany
| | - Sirui Zhang
- CAS Key Laboratory of Computational Biology, Biomedical Big Data Center, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.,University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,CAS Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Giorgia Pedini
- Department of Biomedicine and Prevention, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy
| | | | - Kimberly Cramer
- Max Planck Institute of Biochemistry, Martinsried, Germany.,Faculty of Physics and Center for Nanoscience, LMU, Munich, Germany
| | - Aloys Schepers
- Monoclonal Antibody Core Facility, Institute for Diabetes and Obesity, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany
| | - Fabian Metzger
- Research Unit Protein Science and Metabolomics and Proteomics Core, Helmholtz Centre Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany
| | - Miriam Esgleas
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany
| | - Pawel Smialowski
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany
| | - Renzo Guerrini
- Neuroscience Department, Children's Hospital Meyer-University of Florence, Florence, Italy
| | - Sven Falk
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany
| | - Regina Feederle
- Monoclonal Antibody Core Facility, Institute for Diabetes and Obesity, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany.,SYNERGY, Excellence Cluster of Systems Neurology, Biomedical Center, LMU, Planegg-Martinsried, Germany
| | - Saskia Freytag
- Personalised Oncology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Zefeng Wang
- CAS Key Laboratory of Computational Biology, Biomedical Big Data Center, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.,University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,CAS Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Melanie Bahlo
- Personalised Oncology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Ralf Jungmann
- Max Planck Institute of Biochemistry, Martinsried, Germany.,Faculty of Physics and Center for Nanoscience, LMU, Munich, Germany
| | - Claudia Bagni
- Department of Biomedicine and Prevention, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy.,Department of Fundamental Neurosciences, University of Lausanne, Rue du Bugnon 9, 1005 Lausanne, Switzerland
| | | | - Stephen P Robertson
- Department of Women's and Children's Health, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
| | - Stefanie M Hauck
- Research Unit Protein Science and Metabolomics and Proteomics Core, Helmholtz Centre Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany
| | - Magdalena Götz
- Physiological Genomics, Biomedical Center (BMC), Ludwig-Maximilians-Universitaet (LMU), Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,Institute of Stem Cell Research, Helmholtz Center Munich, German Research Center for Environmental Health, Großhaderner Straße 9, 82152 Planegg-Martinsried, Germany.,SYNERGY, Excellence Cluster of Systems Neurology, Biomedical Center, LMU, Planegg-Martinsried, Germany
| |
Collapse
|
38
|
Cadby G, Giles C, Melton PE, Huynh K, Mellett NA, Duong T, Nguyen A, Cinel M, Smith A, Olshansky G, Wang T, Brozynska M, Inouye M, McCarthy NS, Ariff A, Hung J, Hui J, Beilby J, Dubé MP, Watts GF, Shah S, Wray NR, Lim WLF, Chatterjee P, Martins I, Laws SM, Porter T, Vacher M, Bush AI, Rowe CC, Villemagne VL, Ames D, Masters CL, Taddei K, Arnold M, Kastenmüller G, Nho K, Saykin AJ, Han X, Kaddurah-Daouk R, Martins RN, Blangero J, Meikle PJ, Moses EK. Comprehensive genetic analysis of the human lipidome identifies loci associated with lipid homeostasis with links to coronary artery disease. Nat Commun 2022; 13:3124. [PMID: 35668104 PMCID: PMC9170690 DOI: 10.1038/s41467-022-30875-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 05/17/2022] [Indexed: 12/26/2022] Open
Abstract
We integrated lipidomics and genomics to unravel the genetic architecture of lipid metabolism and identify genetic variants associated with lipid species putatively in the mechanistic pathway for coronary artery disease (CAD). We quantified 596 lipid species in serum from 4,492 individuals from the Busselton Health Study. The discovery GWAS identified 3,361 independent lipid-loci associations, involving 667 genomic regions (479 previously unreported), with validation in two independent cohorts. A meta-analysis revealed an additional 70 independent genomic regions associated with lipid species. We identified 134 lipid endophenotypes for CAD associated with 186 genomic loci. Associations between independent lipid-loci with coronary atherosclerosis were assessed in ∼456,000 individuals from the UK Biobank. Of the 53 lipid-loci that showed evidence of association (P < 1 × 10-3), 43 loci were associated with at least one lipid endophenotype. These findings illustrate the value of integrative biology to investigate the aetiology of atherosclerosis and CAD, with implications for other complex diseases.
Collapse
Affiliation(s)
- Gemma Cadby
- School of Population and Global Health, University of Western Australia, Crawley, WA, Australia
| | - Corey Giles
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, VIC, Australia
| | - Phillip E Melton
- School of Population and Global Health, University of Western Australia, Crawley, WA, Australia
- Menzies Research Institute, University of Tasmania, Hobart, TAS, Australia
| | - Kevin Huynh
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, VIC, Australia
| | | | - Thy Duong
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Anh Nguyen
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Michelle Cinel
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Alex Smith
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Gavriel Olshansky
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, VIC, Australia
| | - Tingting Wang
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, VIC, Australia
| | - Marta Brozynska
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Mike Inouye
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Nina S McCarthy
- School of Biomedical Sciences, University of Western Australia, Crawley, WA, Australia
| | - Amir Ariff
- School of Women's and Children's Health, University of New South Wales, Sydney, NSW, Australia
| | - Joseph Hung
- School of Medicine, The University of Western Australia, Crawley, WA, Australia
- Department of Cardiovascular Medicine, Sir Charles Gairdner Hospital, Perth, WA, Australia
- Busselton Population Medical Research Institute Inc., Perth, WA, Australia
| | - Jennie Hui
- Busselton Population Medical Research Institute Inc., Perth, WA, Australia
- PathWest Laboratory Medicine WA, Perth, WA, Australia
| | - John Beilby
- Busselton Population Medical Research Institute Inc., Perth, WA, Australia
- PathWest Laboratory Medicine WA, Perth, WA, Australia
| | - Marie-Pierre Dubé
- Université de Montréal Beaulieu-Saucier Pharmacogenomics Centre, Montreal Heart Institute, Montreal, QC, Canada
| | - Gerald F Watts
- School of Medicine, The University of Western Australia, Crawley, WA, Australia
- Lipid Disorders Clinic, Department of Cardiology, Royal Perth Hospital, Perth, WA, Australia
| | - Sonia Shah
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Naomi R Wray
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, Australia
| | - Wei Ling Florence Lim
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
- Cooperative research Centre (CRC) for Mental Health, Joondalup, WA, Australia
| | - Pratishtha Chatterjee
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
- Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, Australia
- KaRa Institute of Neurological Disease, Sydney, Macquarie Park, NSW, Australia
| | - Ian Martins
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
| | - Simon M Laws
- Centre for Precision Health, Edith Cowan University, Joondalup, WA, Australia
- Collaborative Genomics Group, School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
- Curtin Health Innovation Research Institute, Curtin University, Perth, WA, Australia
| | - Tenielle Porter
- Centre for Precision Health, Edith Cowan University, Joondalup, WA, Australia
- Collaborative Genomics Group, School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
- Curtin Health Innovation Research Institute, Curtin University, Perth, WA, Australia
| | - Michael Vacher
- Centre for Precision Health, Edith Cowan University, Joondalup, WA, Australia
- Collaborative Genomics Group, School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
- The Australian e-Health Research Centre, Health and Biosecurity, CSIRO, Floreat, WA, Australia
| | - Ashley I Bush
- The Florey Department of Neuroscience and Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Christopher C Rowe
- The Florey Department of Neuroscience and Mental Health, The University of Melbourne, Melbourne, VIC, Australia
- Department of Molecular Imaging and Therapy, Austin Health, Heidelberg, VIC, Australia
| | - Victor L Villemagne
- Department of Molecular Imaging and Therapy, Austin Health, Heidelberg, VIC, Australia
- Department of Medicine, Austin Health, The University of Melbourne, Heidelberg, VIC, Australia
| | - David Ames
- National Ageing Research Institute, Parkville, VIC, Australia
- University of Melbourne Academic Unit for Psychiatry of Old Age, St George's Hospital, Kew, VIC, Australia
| | - Colin L Masters
- The Florey Department of Neuroscience and Mental Health, The University of Melbourne, Melbourne, VIC, Australia
| | - Kevin Taddei
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
| | - Matthias Arnold
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Gabi Kastenmüller
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Kwangsik Nho
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Xianlin Han
- Barshop Institute for Longevity and Aging Studies, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | - Rima Kaddurah-Daouk
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA
- Duke Institute of Brain Sciences, Duke University, Durham, NC, USA
- Department of Medicine, Duke University, Durham, NC, USA
| | - Ralph N Martins
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
- Cooperative research Centre (CRC) for Mental Health, Joondalup, WA, Australia
- Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, Australia
- KaRa Institute of Neurological Disease, Sydney, Macquarie Park, NSW, Australia
| | - John Blangero
- South Texas Diabetes and Obesity Institute, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Peter J Meikle
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, VIC, Australia.
- Monash University, Melbourne, VIC, Australia.
| | - Eric K Moses
- Menzies Research Institute, University of Tasmania, Hobart, TAS, Australia.
- School of Biomedical Sciences, University of Western Australia, Crawley, WA, Australia.
| |
Collapse
|
39
|
Manuck TA, Eaves LA, Rager JE, Sheffield-abdullah K, Fry RC. Nitric oxide-related gene and microRNA expression in peripheral blood in pregnancy vary by self-reported race. Epigenetics 2022; 17:731-745. [PMID: 34308756 PMCID: PMC9336489 DOI: 10.1080/15592294.2021.1957576] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 07/13/2021] [Accepted: 07/15/2021] [Indexed: 12/12/2022] Open
Abstract
Adverse pregnancy outcomes disproportionately affect non-Hispanic (NH) Black patients in the United States. Structural racism has been associated with increased psychosocial distress and inflammation and may trigger oxidative stress. Thus, the nitric oxide (NO) pathway (involved in the regulation of inflammation and oxidative stress) may partly explain the underlying disparities in obstetric outcomes.Cohort study of 154 pregnant patients with high-risk obstetric histories; n = 212 mRNAs and n = 108 microRNAs (miRNAs) in the NO pathway were evaluated in circulating white blood cells. NO pathway mRNA and miRNA transcript counts were compared by self-reported race; NH Black patients were compared with women of other races/ethnicities. Finally, miRNA-mRNA expression levels were correlated.Twenty-two genes (q < 0.10) were differentially expressed in self-identified NH Black individuals. Superoxide dismutase 1 (SOD1), interleukin-8 (IL-8), dynein light chain LC8-type 1 (DYNLL1), glutathione peroxidase 4 (GPX4), and glutathione peroxidase 1 (GPX1) were the five most differentially expressed genes among NH Black patients compared to other patients. There were 63 significantly correlated miRNA-mRNA pairs (q < 0.10) demonstrating potential miRNA regulation of associated target mRNA expression. Ten miRNAs that were identified as members of significant miRNA-mRNA pairs were also differentially expressed among NH Black patients (q < 0.10).These findings support an association between NO pathway and inflammation and infection-related mRNA and miRNA expression in blood drawn during pregnancy and patient race/ethnicity. These findings may reflect key differences in the biology of inflammatory gene dysregulation that occurs in response to the stress of systemic racism and that underlies disparities in pregnancy outcomes.
Collapse
Affiliation(s)
- Tracy A. Manuck
- Department of Obstetrics and Gynecology, Division of Maternal Fetal Medicine, University of North Carolina-Chapel Hill, Chapel Hill, NC, United States
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC
| | - Lauren A. Eaves
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC
| | - Julia E Rager
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC
| | | | - Rebecca C. Fry
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC
| |
Collapse
|
40
|
Guo Z, Ćevid D, Bühlmann P. Doubly debiased lasso: High-dimensional inference under hidden confounding. Ann Stat 2022; 50:1320-1347. [DOI: 10.1214/21-aos2152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Zijian Guo
- Department of Statistics, Rutgers University
| | | | | |
Collapse
|
41
|
Nan Y, Ser JD, Walsh S, Schönlieb C, Roberts M, Selby I, Howard K, Owen J, Neville J, Guiot J, Ernst B, Pastor A, Alberich-Bayarri A, Menzel MI, Walsh S, Vos W, Flerin N, Charbonnier JP, van Rikxoort E, Chatterjee A, Woodruff H, Lambin P, Cerdá-Alberich L, Martí-Bonmatí L, Herrera F, Yang G. Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2022; 82:99-122. [PMID: 35664012 PMCID: PMC8878813 DOI: 10.1016/j.inffus.2022.01.001] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 12/22/2021] [Accepted: 01/07/2022] [Indexed: 05/13/2023]
Abstract
Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.
Collapse
Affiliation(s)
- Yang Nan
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Javier Del Ser
- Department of Communications Engineering, University of the Basque Country UPV/EHU, Bilbao 48013, Spain
- TECNALIA, Basque Research and Technology Alliance (BRTA), Derio 48160, Spain
| | - Simon Walsh
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Carola Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
- Oncology R&D, AstraZeneca, Cambridge, Northern Ireland UK
| | - Ian Selby
- Department of Radiology, University of Cambridge, Cambridge, Northern Ireland UK
| | - Kit Howard
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - John Owen
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Jon Neville
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Julien Guiot
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | - Benoit Ernst
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | | | | | - Marion I. Menzel
- Technische Hochschule Ingolstadt, Ingolstadt, Germany
- GE Healthcare GmbH, Munich, Germany
| | - Sean Walsh
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Wim Vos
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Nina Flerin
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | | | | | - Avishek Chatterjee
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Henry Woodruff
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Philippe Lambin
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Leonor Cerdá-Alberich
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Luis Martí-Bonmatí
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Francisco Herrera
- Department of Computer Sciences and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI) University of Granada, Granada, Spain
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London, Northern Ireland UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, Northern Ireland UK
| |
Collapse
|
42
|
Epigenome-wide association analyses of active injection drug use. Drug Alcohol Depend 2022; 235:109431. [PMID: 35395503 DOI: 10.1016/j.drugalcdep.2022.109431] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 02/28/2022] [Accepted: 03/21/2022] [Indexed: 11/20/2022]
Abstract
BACKGROUND Injection drug use (IDU) is prevalent in the US and is associated with substantial risk of blood-borne infections, morbidity, and mortality. However, the spectrum of its biologic effects on DNA methylation in blood is not well characterized. METHODS 401 participants (Mage = 47.9; 68% male; 90% African American) over several timepoints (1054 visits) were drawn from a longitudinal cohort of people who inject drugs. DNA methylation was measured among buffy coat samples from the 1054 visits. Compared to samples collected after ≥ 6 months of abstinence, separate EWAS were conducted for active injecting of any drug, quantitative injection frequency, injecting of heroin and injecting of cocaine. Linear mixed effect models were used and analyses were adjusted for repeated measurements and key technical, biological, and sociodemographic characteristics. RESULTS We found epigenome-wide significant CpG sites associated with active injection (cg10636246, AIM2, p = 2.33 × 10-8) and injection intensity (cg13117953, p = 4.30 × 10-8). We found converging evidence that cg10636246 (AIM2), cg23110600 (PRKCH), cg03546163 (FKBP5), cg04590956 (GMCL1), and cg16317961 (MAPRE2) were among the top 0.1% significantly differentially methylated CpG sites shared across the five EWAS. Top ranked CpGs among the five EWAS were enriched (p < 0.0001) in AIM2 inflammasome complex, T cell migration, insulin regulation and epinephrine synthesis pathways. During periods of active injection, samples had 0.46 years of epigenetic age acceleration relative to the abstinence period, within the same subject (p = 0.03). CONCLUSIONS Findings from this study demonstrate modest, common, and specific effects on DNA methylation during a relatively short time between periods of active drug injection and abstinence.
Collapse
|
43
|
Controlling Batch Effect in Epigenome-Wide Association Study. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2432:73-84. [PMID: 35505208 DOI: 10.1007/978-1-0716-1994-0_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Methylation data, similar to other omics data, is susceptible to various technical issues that are potentially associated with unexplained or unrelated factors. Any difference in the measurement of DNA methylation, such as laboratory operation and sequencing platform, may lead to batch effects. With the accumulation of large-scale omics data, scientists are making joint efforts to generate and analyze omics data to answer various scientific questions. However, batch effects are inevitable in practice, and careful adjustment is needed. Multiple statistical methods for controlling bias and inflation between batches have been developed either by correcting based on known batch factors or by estimating directly from the output data. In this chapter, we will review and demonstrate several popular methods for batch effect correction and make practical recommendations in epigenome-wide association studies (EWAS).
Collapse
|
44
|
Bing X, Ning Y, Xu Y. Adaptive estimation in multivariate response regression with hidden variables. Ann Stat 2022. [DOI: 10.1214/21-aos2059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Xin Bing
- Department of Statistics and Data Science, Cornell University
| | - Yang Ning
- Department of Statistics and Data Science, Cornell University
| | - Yaosheng Xu
- Department of Statistics and Data Science, Cornell University
| |
Collapse
|
45
|
C Monte-Rubio G, Segura B, P Strafella A, van Eimeren T, Ibarretxe-Bilbao N, Diez-Cirarda M, Eggers C, Lucas-Jiménez O, Ojeda N, Peña J, Ruppert MC, Sala-Llonch R, Theis H, Uribe C, Junque C. Parameters from site classification to harmonize MRI clinical studies: Application to a multi-site Parkinson's disease dataset. Hum Brain Mapp 2022; 43:3130-3142. [PMID: 35305545 PMCID: PMC9188966 DOI: 10.1002/hbm.25838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 02/10/2022] [Accepted: 03/07/2022] [Indexed: 11/10/2022] Open
Abstract
Multi‐site MRI datasets are crucial for big data research. However, neuroimaging studies must face the batch effect. Here, we propose an approach that uses the predictive probabilities provided by Gaussian processes (GPs) to harmonize clinical‐based studies. A multi‐site dataset of 216 Parkinson's disease (PD) patients and 87 healthy subjects (HS) was used. We performed a site GP classification using MRI data. The outcomes estimated from this classification, redefined like Weighted HARMonization PArameters (WHARMPA), were used as regressors in two different clinical studies: A PD versus HS machine learning classification using GP, and a VBM comparison (FWE‐p < .05, k = 100). Same studies were also conducted using conventional Boolean site covariates, and without information about site belonging. The results from site GP classification provided high scores, balanced accuracy (BAC) was 98.39% for grey matter images. PD versus HS classification performed better when the WHARMPA were used to harmonize (BAC = 78.60%; AUC = 0.90) than when using the Boolean site information (BAC = 56.31%; AUC = 0.71) and without it (BAC = 57.22%; AUC = 0.73). The VBM analysis harmonized using WHARMPA provided larger and more statistically robust clusters in regions previously reported in PD than when the Boolean site covariates or no corrections were added to the model. In conclusion, WHARMPA might encode global site‐effects quantitatively and allow the harmonization of data. This method is user‐friendly and provides a powerful solution, without complex implementations, to clean the analyses by removing variability associated with the differences between sites.
Collapse
Affiliation(s)
- Gemma C Monte-Rubio
- Institute of Neurosciences, University of Barcelona, Barcelona, Catalonia, Spain.,Medical Psychology Unit, Department of Medicine, University of Barcelona, Barcelona, Catalonia, Spain
| | - Barbara Segura
- Institute of Neurosciences, University of Barcelona, Barcelona, Catalonia, Spain.,Medical Psychology Unit, Department of Medicine, University of Barcelona, Barcelona, Catalonia, Spain.,Institute of Biomedical Research August Pi i Sunyer (IDIBAPS), Barcelona, Catalonia, Spain.,Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED: CB06/05/0018-ISCIII) Barcelona, Barcelona, Catalonia, Spain
| | - Antonio P Strafella
- Edmond J. Safra Parkinson Disease Program & Morton and Gloria Shulman Movement Disorder Unit, Neurology Division, University Health Network, University of Toronto, Toronto, Ontario, Canada.,Krembil Brain Institute, University Health Network, University of Toronto, Toronto, Ontario, Canada.,Brain Health Imaging Centre, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health University of Toronto, Toronto, Ontario, Canada
| | - Thilo van Eimeren
- Department of Nuclear Medicine, University of Cologne, Cologne, Germany.,Department of Neurology, University of Cologne, Cologne, Germany
| | - Naroa Ibarretxe-Bilbao
- Department of Psychology, Faculty of Health Sciences, University of Deusto, Bilbao, Spain
| | - Maria Diez-Cirarda
- Brain Health Imaging Centre, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health University of Toronto, Toronto, Ontario, Canada
| | - Carsten Eggers
- Department of Neurology, University Hospital Marburg, Marburg, Germany.,Center for Mind, Brain and Behavior - CMBB, Universities Marburg and Gießen, Marburg and Gießen, Germany.,Department of Neurology, Knappschaftskrankenhaus Bottrop, Bottrop, Germany
| | - Olaia Lucas-Jiménez
- Department of Psychology, Faculty of Health Sciences, University of Deusto, Bilbao, Spain
| | - Natalia Ojeda
- Department of Psychology, Faculty of Health Sciences, University of Deusto, Bilbao, Spain
| | - Javier Peña
- Department of Psychology, Faculty of Health Sciences, University of Deusto, Bilbao, Spain
| | - Marina C Ruppert
- Department of Neurology, University Hospital Marburg, Marburg, Germany.,Center for Mind, Brain and Behavior - CMBB, Universities Marburg and Gießen, Marburg and Gießen, Germany
| | - Roser Sala-Llonch
- Institute of Neurosciences, University of Barcelona, Barcelona, Catalonia, Spain.,Institute of Biomedical Research August Pi i Sunyer (IDIBAPS), Barcelona, Catalonia, Spain.,Department of Biomedicine, University of Barcelona, Barcelona, Catalonia, Spain.,Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Barcelona, Catalonia, Spain
| | - Hendrik Theis
- Department of Neurology, University of Cologne, Cologne, Germany
| | - Carme Uribe
- Institute of Neurosciences, University of Barcelona, Barcelona, Catalonia, Spain.,Medical Psychology Unit, Department of Medicine, University of Barcelona, Barcelona, Catalonia, Spain.,Institute of Biomedical Research August Pi i Sunyer (IDIBAPS), Barcelona, Catalonia, Spain.,Brain Health Imaging Centre, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health University of Toronto, Toronto, Ontario, Canada
| | - Carme Junque
- Institute of Neurosciences, University of Barcelona, Barcelona, Catalonia, Spain.,Medical Psychology Unit, Department of Medicine, University of Barcelona, Barcelona, Catalonia, Spain.,Institute of Biomedical Research August Pi i Sunyer (IDIBAPS), Barcelona, Catalonia, Spain.,Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED: CB06/05/0018-ISCIII) Barcelona, Barcelona, Catalonia, Spain
| |
Collapse
|
46
|
Bhattacharya A, Freedman AN, Avula V, Harris R, Liu W, Pan C, Lusis AJ, Joseph RM, Smeester L, Hartwell HJ, Kuban KCK, Marsit CJ, Li Y, O'Shea TM, Fry RC, Santos HP. Placental genomics mediates genetic associations with complex health traits and disease. Nat Commun 2022; 13:706. [PMID: 35121757 PMCID: PMC8817049 DOI: 10.1038/s41467-022-28365-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 12/15/2021] [Indexed: 01/09/2023] Open
Abstract
As the master regulator in utero, the placenta is core to the Developmental Origins of Health and Disease (DOHaD) hypothesis but is historically understudied. To identify placental gene-trait associations (GTAs) across the life course, we perform distal mediator-enriched transcriptome-wide association studies (TWAS) for 40 traits, integrating placental multi-omics from the Extremely Low Gestational Age Newborn Study. At [Formula: see text], we detect 248 GTAs, mostly for neonatal and metabolic traits, across 176 genes, enriched for cell growth and immunological pathways. In aggregate, genetic effects mediated by placental expression significantly explain 4 early-life traits but no later-in-life traits. 89 GTAs show significant mediation through distal genetic variants, identifying hypotheses for distal regulation of GTAs. Investigation of one hypothesis in human placenta-derived choriocarcinoma cells reveal that knockdown of mediator gene EPS15 upregulates predicted targets SPATA13 and FAM214A, both associated with waist-hip ratio in TWAS, and multiple genes involved in metabolic pathways. These results suggest profound health impacts of placental genomic regulation in developmental programming across the life course.
Collapse
Affiliation(s)
- Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA.
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA.
| | - Anastasia N Freedman
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Vennela Avula
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Rebeca Harris
- Biobehavioral Laboratory, School of Nursing, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Weifang Liu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Calvin Pan
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Aldons J Lusis
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Microbiology, Immunology and Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Robert M Joseph
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA, 02118, USA
| | - Lisa Smeester
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
- Curriculum in Toxicology and Environmental Medicine, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Hadley J Hartwell
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Karl C K Kuban
- Department of Pediatrics, Division of Pediatric Neurology, Boston University Medical Center, Boston, MA, 02118, USA
| | - Carmen J Marsit
- Gangarosa Department of Environmental Health, Rollins School of Public Health Emory University, Atlanta, GA, 30322, USA
| | - Yun Li
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27514, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - T Michael O'Shea
- Department of Pediatrics, School of Medicine, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Rebecca C Fry
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA.
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA.
- Curriculum in Toxicology and Environmental Medicine, University of North Carolina, Chapel Hill, NC, 27514, USA.
| | - Hudson P Santos
- Biobehavioral Laboratory, School of Nursing, University of North Carolina, Chapel Hill, NC, 27514, USA.
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA.
| |
Collapse
|
47
|
Payne NY, Gagnon-Bartsch JA. Separating and reintegrating latent variables to improve classification of genomic data. Biostatistics 2022; 23:1133-1149. [DOI: 10.1093/biostatistics/kxab046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 11/09/2021] [Accepted: 11/24/2021] [Indexed: 11/12/2022] Open
Abstract
Summary
Genomic data sets contain the effects of various unobserved biological variables in addition to the variable of primary interest. These latent variables often affect a large number of features (e.g., genes), giving rise to dense latent variation. This latent variation presents both challenges and opportunities for classification. While some of these latent variables may be partially correlated with the phenotype of interest and thus helpful, others may be uncorrelated and merely contribute additional noise. Moreover, whether potentially helpful or not, these latent variables may obscure weaker effects that impact only a small number of features but more directly capture the signal of primary interest. To address these challenges, we propose the cross-residualization classifier (CRC). Through an adjustment and ensemble procedure, the CRC estimates and residualizes out the latent variation, trains a classifier on the residuals, and then reintegrates the latent variation in a final ensemble classifier. Thus, the latent variables are accounted for without discarding any potentially predictive information. We apply the method to simulated data and a variety of genomic data sets from multiple platforms. In general, we find that the CRC performs well relative to existing classifiers and sometimes offers substantial gains.
Collapse
Affiliation(s)
- Nora Yujia Payne
- Department of Statistics, University of Michigan, 1085 S. University Ave., Ann Arbor, MI 48109, USA
| | - Johann A Gagnon-Bartsch
- Department of Statistics, University of Michigan, 1085 S. University Ave., Ann Arbor, MI 48109, USA
| |
Collapse
|
48
|
Miao W, Hu W, Ogburn EL, Zhou X. Identifying effects of multiple treatments in the presence of unmeasured confounding. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2021.2023551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Wang Miao
- Department of Probability and Statistics, Peking University, Beijing, PRC
| | - Wenjie Hu
- Department of Probability and Statistics, Peking University, Beijing, PRC
| | - Elizabeth L. Ogburn
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Xiaohua Zhou
- Department of Biostatistics and Beijing International Center for Mathematical Research, Peking University, Beijing, PRC
| |
Collapse
|
49
|
McKennan C, Nicolae D. Estimating and accounting for unobserved covariates in high-dimensional correlated data. J Am Stat Assoc 2022; 117:225-236. [PMID: 35615339 PMCID: PMC9126075 DOI: 10.1080/01621459.2020.1769635] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Many high dimensional and high-throughput biological datasets have complex sample correlation structures, which include longitudinal and multiple tissue data, as well as data with multiple treatment conditions or related individuals. These data, as well as nearly all high-throughput 'omic' data, are influenced by technical and biological factors unknown to the researcher, which, if unaccounted for, can severely obfuscate estimation of and inference on the effects of interest. We therefore developed CBCV and CorrConf: provably accurate and computationally efficient methods to choose the number of and estimate latent confounding factors present in high dimensional data with correlated or nonexchangeable residuals. We demonstrate each method's superior performance compared to other state of the art methods by analyzing simulated multi-tissue gene expression data and identifying sex-associated DNA methylation sites in a real, longitudinal twin study.
Collapse
Affiliation(s)
| | - Dan Nicolae
- Department of Statistics, University of Chicago
| |
Collapse
|
50
|
Cho-Clark MJ, Sukumar G, Vidal NM, Raiciulescu S, Oyola MG, Olsen C, Mariño-Ramírez L, Dalgard CL, Wu TJ. Comparative transcriptome analysis between patient and endometrial cancer cell lines to determine common signaling pathways and markers linked to cancer progression. Oncotarget 2021; 12:2500-2513. [PMID: 34966482 PMCID: PMC8711572 DOI: 10.18632/oncotarget.28161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/10/2021] [Indexed: 01/08/2023] Open
Abstract
The rising incidence and mortality of endometrial cancer (EC) in the United States calls for an improved understanding of the disease's progression. Current methodologies for diagnosis and treatment rely on the use of cell lines as models for tumor biology. However, due to inherent heterogeneity and differential growing environments between cell lines and tumors, these comparative studies have found little parallels in molecular signatures. As a consequence, the development and discovery of preclinical models and reliable drug targets are delayed. In this study, we established transcriptome parallels between cell lines and tumors from The Cancer Genome Atlas (TCGA) with the use of optimized normalization methods. We identified genes and signaling pathways associated with regulating the transformation and progression of EC. Specifically, the LXR/RXR activation, neuroprotective role for THOP1 in Alzheimer's disease, and glutamate receptor signaling pathways were observed to be mostly downregulated in advanced cancer stage. While some of these highlighted markers and signaling pathways are commonly found in the central nervous system (CNS), our results suggest a novel function of these genes in the periphery. Finally, our study underscores the value of implementing appropriate normalization methods in comparative studies to improve the identification of accurate and reliable markers.
Collapse
Affiliation(s)
- Madelaine J. Cho-Clark
- Department of Gynecologic Surgery & Obstetrics, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - Gauthaman Sukumar
- Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - Newton Medeiros Vidal
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Sorana Raiciulescu
- Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - Mario G. Oyola
- Department of Gynecologic Surgery & Obstetrics, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - Cara Olsen
- Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - Leonardo Mariño-Ramírez
- National Institute on Minority Health and Health Disparities, National Institutes of Health, Bethesda, MD 20814, USA
| | - Clifton L. Dalgard
- Collaborative Health Initiative Research Program, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
- Department of Anatomy, Physiology and Genetics, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - T. John Wu
- Department of Gynecologic Surgery & Obstetrics, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| |
Collapse
|