1
|
Oliveira DVNP, Biskup E, O'Rourke CJ, Hentze JL, Andersen JB, Høgdall C, Høgdall EV. Developing a DNA Methylation Signature to Differentiate High-Grade Serous Ovarian Carcinomas from Benign Ovarian Tumors. Mol Diagn Ther 2024; 28:821-834. [PMID: 39414761 PMCID: PMC11512855 DOI: 10.1007/s40291-024-00740-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2024] [Indexed: 10/18/2024]
Abstract
INTRODUCTION Epithelial ovarian cancer (EOC) represents a significant health challenge, with high-grade serous ovarian cancer (HGSOC) being the most common subtype. Early detection is hindered by nonspecific symptoms, leading to late-stage diagnoses and poor survival rates. Biomarkers are crucial for early diagnosis and personalized treatment OBJECTIVE: Our goal was to develop a robust statistical procedure to identify a set of differentially methylated probes (DMPs) that would allow differentiation between HGSOC and benign ovarian tumors. METHODOLOGY Using the Infinium EPIC Methylation array, we analyzed the methylation profiles of 48 ovarian samples diagnosed with HGSOC, borderline ovarian tumors, or benign ovarian disease. Through a multi-step statistical procedure combining univariate and multivariate logistic regression models, we aimed to identify CpG sites of interest. RESULTS AND CONCLUSIONS We discovered 21 DMPs and developed a predictive model validated in two independent cohorts. Our model, using a distance-to-centroid approach, accurately distinguished between benign and malignant disease. This model can potentially be used in other types of sample material. Moreover, the strategy of the model development and validation can also be used in other disease contexts for diagnostic purposes.
Collapse
Affiliation(s)
| | - Edyta Biskup
- Department of Pathology, Herlev Hospital, University of Copenhagen, Herlev, Denmark
| | - Colm J O'Rourke
- Biotech Research & Innovation Centre, Department of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Julie L Hentze
- Department of Pathology, Herlev Hospital, University of Copenhagen, Herlev, Denmark
| | - Jesper B Andersen
- Biotech Research & Innovation Centre, Department of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Claus Høgdall
- Department of Gynecology, Juliane Marie Centre, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Estrid V Høgdall
- Department of Pathology, Herlev Hospital, University of Copenhagen, Herlev, Denmark.
| |
Collapse
|
2
|
Yu Y, Mai Y, Zheng Y, Shi L. Assessing and mitigating batch effects in large-scale omics studies. Genome Biol 2024; 25:254. [PMID: 39363244 PMCID: PMC11447944 DOI: 10.1186/s13059-024-03401-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/23/2024] [Indexed: 10/05/2024] Open
Abstract
Batch effects in omics data are notoriously common technical variations unrelated to study objectives, and may result in misleading outcomes if uncorrected, or hinder biomedical discovery if over-corrected. Assessing and mitigating batch effects is crucial for ensuring the reliability and reproducibility of omics data and minimizing the impact of technical variations on biological interpretation. In this review, we highlight the profound negative impact of batch effects and the urgent need to address this challenging problem in large-scale omics studies. We summarize potential sources of batch effects, current progress in evaluating and correcting them, and consortium efforts aiming to tackle them.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
- Cancer Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| |
Collapse
|
3
|
Hanna MG, Olson NH, Zarella M, Dash RC, Herrmann MD, Furtado LV, Stram MN, Raciti PM, Hassell L, Mays A, Pantanowitz L, Sirintrapun JS, Krishnamurthy S, Parwani A, Lujan G, Evans A, Glassy EF, Bui MM, Singh R, Souers RJ, de Baca ME, Seheult JN. Recommendations for Performance Evaluation of Machine Learning in Pathology: A Concept Paper From the College of American Pathologists. Arch Pathol Lab Med 2024; 148:e335-e361. [PMID: 38041522 DOI: 10.5858/arpa.2023-0042-cp] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2023] [Indexed: 12/03/2023]
Abstract
CONTEXT.— Machine learning applications in the pathology clinical domain are emerging rapidly. As decision support systems continue to mature, laboratories will increasingly need guidance to evaluate their performance in clinical practice. Currently there are no formal guidelines to assist pathology laboratories in verification and/or validation of such systems. These recommendations are being proposed for the evaluation of machine learning systems in the clinical practice of pathology. OBJECTIVE.— To propose recommendations for performance evaluation of in vitro diagnostic tests on patient samples that incorporate machine learning as part of the preanalytical, analytical, or postanalytical phases of the laboratory workflow. Topics described include considerations for machine learning model evaluation including risk assessment, predeployment requirements, data sourcing and curation, verification and validation, change control management, human-computer interaction, practitioner training, and competency evaluation. DATA SOURCES.— An expert panel performed a review of the literature, Clinical and Laboratory Standards Institute guidance, and laboratory and government regulatory frameworks. CONCLUSIONS.— Review of the literature and existing documents enabled the development of proposed recommendations. This white paper pertains to performance evaluation of machine learning systems intended to be implemented for clinical patient testing. Further studies with real-world clinical data are encouraged to support these proposed recommendations. Performance evaluation of machine learning models is critical to verification and/or validation of in vitro diagnostic tests using machine learning intended for clinical practice.
Collapse
Affiliation(s)
- Matthew G Hanna
- From the Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York (Hanna, Sirintrapun)
| | - Niels H Olson
- The Defense Innovation Unit, Mountain View, California (Olson)
- The Department of Pathology, Uniformed Services University, Bethesda, Maryland (Olson)
| | - Mark Zarella
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Zarella, Seheult)
| | - Rajesh C Dash
- Department of Pathology, Duke University Health System, Durham, North Carolina (Dash)
| | - Markus D Herrmann
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston (Herrmann)
| | - Larissa V Furtado
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee (Furtado)
| | - Michelle N Stram
- The Department of Forensic Medicine, New York University, and Office of Chief Medical Examiner, New York (Stram)
| | | | - Lewis Hassell
- Department of Pathology, Oklahoma University Health Sciences Center, Oklahoma City (Hassell)
| | - Alex Mays
- The MITRE Corporation, McLean, Virginia (Mays)
| | - Liron Pantanowitz
- Department of Pathology & Clinical Labs, University of Michigan, Ann Arbor (Pantanowitz)
| | - Joseph S Sirintrapun
- From the Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York (Hanna, Sirintrapun)
| | | | - Anil Parwani
- Department of Pathology, The Ohio State University Wexner Medical Center, Columbus (Parwani, Lujan)
| | - Giovanni Lujan
- Department of Pathology, The Ohio State University Wexner Medical Center, Columbus (Parwani, Lujan)
| | - Andrew Evans
- Laboratory Medicine, Mackenzie Health, Toronto, Ontario, Canada (Evans)
| | - Eric F Glassy
- Affiliated Pathologists Medical Group, Rancho Dominguez, California (Glassy)
| | - Marilyn M Bui
- Departments of Pathology and Machine Learning, Moffitt Cancer Center, Tampa, Florida (Bui)
| | - Rajendra Singh
- Department of Dermatopathology, Summit Health, Summit Woodland Park, New Jersey (Singh)
| | - Rhona J Souers
- Department of Biostatistics, College of American Pathologists, Northfield, Illinois (Souers)
| | | | - Jansen N Seheult
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Zarella, Seheult)
| |
Collapse
|
4
|
Marín-Miret J, Pérez-Cobas AE, Domínguez-Santos R, Pérez-Rocher B, Latorre A, Moya A. Adaptability of the gut microbiota of the German cockroach Blattella germanica to a periodic antibiotic treatment. Microbiol Res 2024; 287:127863. [PMID: 39106785 DOI: 10.1016/j.micres.2024.127863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 07/20/2024] [Accepted: 07/29/2024] [Indexed: 08/09/2024]
Abstract
High-throughput sequencing studies have shown that diet or antimicrobial treatments impact animal gut microbiota equilibrium. However, properties related to the gut microbial ecosystem stability, such as resilience, resistance, or functional redundancy, must be better understood. To shed light on these ecological processes, we combined advanced statistical methods with 16 S rRNA gene sequencing, functional prediction, and fitness analyses in the gut microbiota of the cockroach Blattella germanica subject to three periodic pulses of the antibiotic (AB) kanamycin (n=512). We first confirmed that AB did not significantly affect cockroaches' biological fitness, and gut microbiota changes were not caused by insect physiology alterations. The sex variable was examined for the first time in this species, and no statistical differences in the gut microbiota diversity or composition were found. The comparison of the gut microbiota dynamics in control and treated populations revealed that (1) AB treatment decreases diversity and completely disrupts the co-occurrence networks between bacteria, significantly altering the gut community structure. (2) Although AB also affected the genetic composition, functional redundancy would explain a smaller effect on the functional potential than on the taxonomic composition. (3) As predicted by Taylor's law, AB generally affected the most abundant taxa to a lesser extent than the less abundant taxa. (4) Taxa follow different trends in response to ABs, highlighting "resistant taxa," which could be critical for community restoration. (5) The gut microbiota recovered faster after the three AB pulses, suggesting that gut microbiota adapts to repeated treatments.
Collapse
Affiliation(s)
- Jesús Marín-Miret
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish Research Council, Paterna, Valencia 46980, Spain; Genomic and Health Area, Foundation for the Promotion of Sanitary and Biomedical Research of the Valencia Region, Valencia 46020, Spain
| | - Ana Elena Pérez-Cobas
- Department of Microbiology, Ramón y Cajal Institute for Health Research (IRYCIS), Ramón y Cajal University Hospital, Madrid, Spain; CIBER in Infectious Diseases (CIBERINFEC), Madrid, Spain
| | - Rebeca Domínguez-Santos
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish Research Council, Paterna, Valencia 46980, Spain
| | - Benjamí Pérez-Rocher
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish Research Council, Paterna, Valencia 46980, Spain
| | - Amparo Latorre
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish Research Council, Paterna, Valencia 46980, Spain; Genomic and Health Area, Foundation for the Promotion of Sanitary and Biomedical Research of the Valencia Region, Valencia 46020, Spain
| | - Andrés Moya
- Institute for Integrative Systems Biology (I2SysBio), University of Valencia and Spanish Research Council, Paterna, Valencia 46980, Spain; Genomic and Health Area, Foundation for the Promotion of Sanitary and Biomedical Research of the Valencia Region, Valencia 46020, Spain.
| |
Collapse
|
5
|
Shabani M, Eghbali M, Abiri A, Abiri M. Comprehensive microarray analysis of severe preeclampsia placenta to identify differentially expressed genes, biological pathways, hub genes, and their related non-coding RNAs. Placenta 2024; 155:22-31. [PMID: 39121584 DOI: 10.1016/j.placenta.2024.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/03/2024] [Accepted: 08/05/2024] [Indexed: 08/12/2024]
Abstract
INTRODUCTION Preeclampsia (PE) is a serious pregnancy-related complication caused by high blood pressure in pregnant women. The severe form has more devastating effects. According to the growing evidence, the placenta is a crucial component in the pathogenesis of PE, and eliminating it will alleviate symptoms. METHODS GEO's severe preeclampsia placenta microarray datasets; GSE147776, GSE66273, GSE102897, and GSE10588, were chosen to identify differentially expressed genes (DEGs) in different biological pathways. The analysis of hub genes and related non-coding RNAs was done as well. RESULTS A total of 347 DEGs with adj p-value <0.05 and ǀlog2FoldChangeǀ> 0.5 were discovered between severe PEs and healthy pregnancies, including 204 over-expressed genes and 143 under-expressed genes. The MCC method identified ISG15, IFI44L, MX2, OAS2, MX1, FN1, LDHA, ITGB3, TKT, HK2 genes as the top ten hub genes. Interactions between hub genes and noncoding RNAs were also conducted. The most enriched pathways were as follows; HIF-1 signaling pathway; Pathways in cancer; Alanine, aspartate and glutamate metabolism; Arginine biosynthesis; Human papillomavirus infection; Glycolysis/Gluconeogenesis; Central carbon metabolism in cancer; Valine, leucine and isoleucine degradation; Cysteine and methionine metabolism; and Galactose metabolism. DISCUSSION This is a secondary data analysis conducted on severe preeclampsia placenta to identify differentially expressed genes, biological pathways, hub-genes, and related noncoding RNAs. Functional studies are crucial to understanding the precise role of these genes in the pathogenesis of PE. Also, accepting a gene as a diagnostic or prognostic marker for early diagnosis and management of PE requires multiple lines of evidence.
Collapse
Affiliation(s)
- Maedeh Shabani
- Department of Medical Genetics, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Maryam Eghbali
- Endocrine Research Center, Institute of Endocrinology and Metabolism, Iran University of Medical Sciences, Tehran, Iran
| | - Ameneh Abiri
- Perinatology Department, Arash Women's Hospital, Tehran University of Medical Sciences, Tehran, Iran.
| | - Maryam Abiri
- Department of Medical Genetics, School of Medicine, Iran University of Medical Sciences, Tehran, Iran; Shahid Akbarabadi Clinical Research Development Unit (ShACRDU), Iran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
6
|
Hui HWH, Kong W, Goh WWB. Thinking points for effective batch correction on biomedical data. Brief Bioinform 2024; 25:bbae515. [PMID: 39397427 PMCID: PMC11471903 DOI: 10.1093/bib/bbae515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 09/11/2024] [Accepted: 10/01/2024] [Indexed: 10/15/2024] Open
Abstract
Batch effects introduce significant variability into high-dimensional data, complicating accurate analysis and leading to potentially misleading conclusions if not adequately addressed. Despite technological and algorithmic advancements in biomedical research, effectively managing batch effects remains a complex challenge requiring comprehensive considerations. This paper underscores the necessity of a flexible and holistic approach for selecting batch effect correction algorithms (BECAs), advocating for proper BECA evaluations and consideration of artificial intelligence-based strategies. We also discuss key challenges in batch effect correction, including the importance of uncovering hidden batch factors and understanding the impact of design imbalance, missing values, and aggressive correction. Our aim is to provide researchers with a robust framework for effective batch effects management and enhancing the reliability of high-dimensional data analyses.
Collapse
Affiliation(s)
- Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
| | - Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, 59 Nanyang Dr, Singapore 636921, Singapore
- Center of AI in Medicine, Nanyang Technological University, 59 Nanyang Dr, Singapore 636921, Singapore
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, Burlington Danes, The Hammersmith Hospital, Du Cane Road, London W12 0NN, United Kingdom
| |
Collapse
|
7
|
Li Z, Katz S, Saccenti E, Fardo DW, Claes P, Martins dos Santos VAP, Van Steen K, Roshchupkin GV. Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping. Brief Bioinform 2024; 25:bbae512. [PMID: 39413796 PMCID: PMC11483139 DOI: 10.1093/bib/bbae512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 09/20/2024] [Accepted: 09/25/2024] [Indexed: 10/18/2024] Open
Abstract
Unsupervised learning, particularly clustering, plays a pivotal role in disease subtyping and patient stratification, especially with the abundance of large-scale multi-omics data. Deep learning models, such as variational autoencoders (VAEs), can enhance clustering algorithms by leveraging inter-individual heterogeneity. However, the impact of confounders-external factors unrelated to the condition, e.g. batch effect or age-on clustering is often overlooked, introducing bias and spurious biological conclusions. In this work, we introduce four novel VAE-based deconfounding frameworks tailored for clustering multi-omics data. These frameworks effectively mitigate confounding effects while preserving genuine biological patterns. The deconfounding strategies employed include (i) removal of latent features correlated with confounders, (ii) a conditional VAE, (iii) adversarial training, and (iv) adding a regularization term to the loss function. Using real-life multi-omics data from The Cancer Genome Atlas, we simulated various confounding effects (linear, nonlinear, categorical, mixed) and assessed model performance across 50 repetitions based on reconstruction error, clustering stability, and deconfounding efficacy. Our results demonstrate that our novel models, particularly the conditional multi-omics VAE (cXVAE), successfully handle simulated confounding effects and recover biologically driven clustering structures. cXVAE accurately identifies patient labels and unveils meaningful pathological associations among cancer types, validating deconfounded representations. Furthermore, our study suggests that some of the proposed strategies, such as adversarial training, prove insufficient in confounder removal. In summary, our study contributes by proposing innovative frameworks for simultaneous multi-omics data integration, dimensionality reduction, and deconfounding in clustering. Benchmarking on open-access data offers guidance to end-users, facilitating meaningful patient stratification for optimized precision medicine.
Collapse
Affiliation(s)
- Zuqi Li
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Herestraat 49, 3000 Leuven, Belgium
- BIO3 - Laboratory for Systems Genetics, GIGA Molecular & Computational Biology, University of Liège, Avenue de l'Hôpital 11, 4000 Liège, Belgium
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, Dr. Molewaterplein 40, 3015 GD Rotterdam, Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, PO Box 8033, 6700 EJ Wageningen, Netherlands
- LifeGlimmer GmbH, Markelstraße 38, 12163 Berlin, Germany
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, PO Box 8033, 6700 EJ Wageningen, Netherlands
| | - David W Fardo
- Department of Biostatistics, University of Kentucky, 111 Washington Avenue, Lexington, KY 40536, United States
- Sanders-Brown Center on Aging, University of Kentucky, 789 S Limestone, Lexington, KY 40536, United States
| | - Peter Claes
- Medical Imaging Research Center, University Hospitals Leuven, Herestraat 49, 3000 Leuven, Belgium
- Department of Human Genetics, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
- Department of Electrical Engineering, ESAT-PSI, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Vitor A P Martins dos Santos
- LifeGlimmer GmbH, Markelstraße 38, 12163 Berlin, Germany
- Laboratory of Bioprocess Engineering, WageningenUniversity & Research, PO Box 16, 6700 AA Wageningen, the Netherlands
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
- BIO3 - Laboratory for Systems Genetics, GIGA Molecular & Computational Biology, University of Liège, Avenue de l'Hôpital 11, 4000 Liège, Belgium
| | - Gennady V Roshchupkin
- Medical Imaging Research Center, University Hospitals Leuven, Herestraat 49, 3000 Leuven, Belgium
- Department of Epidemiology, Erasmus MC, Dr. Molewaterplein 40, 3015 GD Rotterdam, Netherlands
| |
Collapse
|
8
|
Weißbach S, Milkovits J, Pastore S, Heine M, Gerber S, Todorov H. Cortexa: a comprehensive resource for studying gene expression and alternative splicing in the murine brain. BMC Bioinformatics 2024; 25:293. [PMID: 39237879 PMCID: PMC11378610 DOI: 10.1186/s12859-024-05919-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 08/28/2024] [Indexed: 09/07/2024] Open
Abstract
BACKGROUND Gene expression and alternative splicing are strictly regulated processes that shape brain development and determine the cellular identity of differentiated neural cell populations. Despite the availability of multiple valuable datasets, many functional implications, especially those related to alternative splicing, remain poorly understood. Moreover, neuroscientists working primarily experimentally often lack the bioinformatics expertise required to process alternative splicing data and produce meaningful and interpretable results. Notably, re-analyzing publicly available datasets and integrating them with in-house data can provide substantial novel insights. However, such analyses necessitate developing harmonized data handling and processing pipelines which in turn require considerable computational resources and in-depth bioinformatics expertise. RESULTS Here, we present Cortexa-a comprehensive web portal that incorporates RNA-sequencing datasets from the mouse cerebral cortex (longitudinal or cell-specific) and the hippocampus. Cortexa facilitates understandable visualization of the expression and alternative splicing patterns of individual genes. Our platform provides SplicePCA-a tool that allows users to integrate their alternative splicing dataset and compare it to cell-specific or developmental neocortical splicing patterns. All standardized gene expression and alternative splicing datasets can be downloaded for further in-depth downstream analysis without the need for extensive preprocessing. CONCLUSIONS Cortexa provides a robust and readily available resource for unraveling the complexity of gene expression and alternative splicing regulatory processes in the mouse brain. The data portal is available at https://cortexa-rna.com/.
Collapse
Affiliation(s)
- Stephan Weißbach
- Institute of Developmental Biology and Neurobiology (iDN), Johannes Gutenberg University Mainz, 55128, Mainz, Germany
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg University Mainz, 55131, Mainz, Germany
| | - Jonas Milkovits
- Institute of Developmental Biology and Neurobiology (iDN), Johannes Gutenberg University Mainz, 55128, Mainz, Germany
| | - Stefan Pastore
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg University Mainz, 55131, Mainz, Germany
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg University Mainz, 55128, Mainz, Germany
| | - Martin Heine
- Institute of Developmental Biology and Neurobiology (iDN), Johannes Gutenberg University Mainz, 55128, Mainz, Germany
| | - Susanne Gerber
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg University Mainz, 55131, Mainz, Germany.
| | - Hristo Todorov
- Institute of Human Genetics, University Medical Center, Johannes Gutenberg University Mainz, 55131, Mainz, Germany.
| |
Collapse
|
9
|
Jiang P, Zhang Z, Yu Q, Wang Z, Diao L, Li D. ToxDAR: A Workflow Software for Analyzing Toxicologically Relevant Proteomic and Transcriptomic Data, from Data Preparation to Toxicological Mechanism Elucidation. Int J Mol Sci 2024; 25:9544. [PMID: 39273492 PMCID: PMC11394870 DOI: 10.3390/ijms25179544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 08/26/2024] [Accepted: 08/30/2024] [Indexed: 09/15/2024] Open
Abstract
Exploration of toxicological mechanisms is imperative for the assessment of potential adverse reactions to chemicals and pharmaceutical agents, the engineering of safer compounds, and the preservation of public health. It forms the foundation of drug development and disease treatment. High-throughput proteomics and transcriptomics can accurately capture the body's response to toxins and have become key tools for revealing complex toxicological mechanisms. Recently, a vast amount of omics data related to toxicological mechanisms have been accumulated. However, analyzing and utilizing these data remains a major challenge for researchers, especially as there is a lack of a knowledge-based analysis system to identify relevant biological pathways associated with toxicity from the data and to establish connections between omics data and existing toxicological knowledge. To address this, we have developed ToxDAR, a workflow-oriented R package for preprocessing and analyzing toxicological multi-omics data. ToxDAR integrates packages like NormExpression, DESeq2, and igraph, and utilizes R functions such as prcomp and phyper. It supports data preparation, quality control, differential expression analysis, functional analysis, and network analysis. ToxDAR's architecture also includes a knowledge graph with five major categories of mechanism-related biological entities and details fifteen types of interactions among them, providing comprehensive knowledge annotation for omics data analysis results. As a case study, we used ToxDAR to analyze a transcriptomic dataset on the toxicology of triphenyl phosphate (TPP). The results indicate that TPP may impair thyroid function by activating thyroid hormone receptor β (THRB), impacting pathways related to programmed cell death and inflammation. As a workflow-oriented data analysis tool, ToxDAR is expected to be crucial for understanding toxic mechanisms from omics data, discovering new therapeutic targets, and evaluating chemical safety.
Collapse
Affiliation(s)
- Peng Jiang
- School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China
| | - Zuzhen Zhang
- School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China
| | - Qing Yu
- College of Life Sciences, Hebei University, Baoding 071002, China
| | - Ze Wang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Lihong Diao
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Dong Li
- School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China
- College of Life Sciences, Hebei University, Baoding 071002, China
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| |
Collapse
|
10
|
Goh WWB, Kabir MN, Yoo S, Wong L. Ten quick tips for ensuring machine learning model validity. PLoS Comput Biol 2024; 20:e1012402. [PMID: 39298376 DOI: 10.1371/journal.pcbi.1012402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2024] Open
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) models are increasingly deployed on biomedical and health data to shed insights on biological mechanism, predict disease outcomes, and support clinical decision-making. However, ensuring model validity is challenging. The 10 quick tips described here discuss useful practices on how to check AI/ML models from 2 perspectives-the user and the developer.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
- Center of AI in Medicine, Nanyang Technological University, Singapore, Singapore
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Mohammad Neamul Kabir
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| | - Sehwan Yoo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|
11
|
Jiang Y, Rex DA, Schuster D, Neely BA, Rosano GL, Volkmar N, Momenzadeh A, Peters-Clarke TM, Egbert SB, Kreimer S, Doud EH, Crook OM, Yadav AK, Vanuopadath M, Hegeman AD, Mayta M, Duboff AG, Riley NM, Moritz RL, Meyer JG. Comprehensive Overview of Bottom-Up Proteomics Using Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:338-417. [PMID: 39193565 PMCID: PMC11348894 DOI: 10.1021/acsmeasuresciau.3c00068] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 05/03/2024] [Accepted: 05/03/2024] [Indexed: 08/29/2024]
Abstract
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this Review will serve as a handbook for researchers who are new to the field of bottom-up proteomics.
Collapse
Affiliation(s)
- Yuming Jiang
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Devasahayam Arokia
Balaya Rex
- Center for
Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Dina Schuster
- Department
of Biology, Institute of Molecular Systems
Biology, ETH Zurich, Zurich 8093, Switzerland
- Department
of Biology, Institute of Molecular Biology
and Biophysics, ETH Zurich, Zurich 8093, Switzerland
- Laboratory
of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, Villigen 5232, Switzerland
| | - Benjamin A. Neely
- Chemical
Sciences Division, National Institute of
Standards and Technology, NIST, Charleston, South Carolina 29412, United States
| | - Germán L. Rosano
- Mass
Spectrometry
Unit, Institute of Molecular and Cellular
Biology of Rosario, Rosario, 2000 Argentina
| | - Norbert Volkmar
- Department
of Biology, Institute of Molecular Systems
Biology, ETH Zurich, Zurich 8093, Switzerland
| | - Amanda Momenzadeh
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Trenton M. Peters-Clarke
- Department
of Pharmaceutical Chemistry, University
of California—San Francisco, San Francisco, California, 94158, United States
| | - Susan B. Egbert
- Department
of Chemistry, University of Manitoba, Winnipeg, Manitoba, R3T 2N2 Canada
| | - Simion Kreimer
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Emma H. Doud
- Center
for Proteome Analysis, Indiana University
School of Medicine, Indianapolis, Indiana, 46202-3082, United States
| | - Oliver M. Crook
- Oxford
Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United
Kingdom
| | - Amit Kumar Yadav
- Translational
Health Science and Technology Institute, NCR Biotech Science Cluster 3rd Milestone Faridabad-Gurgaon
Expressway, Faridabad, Haryana 121001, India
| | | | - Adrian D. Hegeman
- Departments
of Horticultural Science and Plant and Microbial Biology, University of Minnesota, Twin Cities, Minnesota 55108, United States
| | - Martín
L. Mayta
- School
of Medicine and Health Sciences, Center for Health Sciences Research, Universidad Adventista del Plata, Libertador San Martin 3103, Argentina
- Molecular
Biology Department, School of Pharmacy and Biochemistry, Universidad Nacional de Rosario, Rosario 2000, Argentina
| | - Anna G. Duboff
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Nicholas M. Riley
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Robert L. Moritz
- Institute
for Systems biology, Seattle, Washington 98109, United States
| | - Jesse G. Meyer
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| |
Collapse
|
12
|
Moral-Turón C, Asencio-Cortés G, Rodriguez-Diaz F, Rubio A, Navarro AG, Brokate-Llanos AM, Garzón A, Muñoz MJ, Pérez-Pulido AJ. ASACO: Automatic and Serial Analysis of CO-expression to discover gene modifiers with potential use in drug repurposing. Brief Funct Genomics 2024; 23:484-494. [PMID: 38422352 DOI: 10.1093/bfgp/elae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 01/21/2024] [Accepted: 01/31/2024] [Indexed: 03/02/2024] Open
Abstract
Massive gene expression analyses are widely used to find differentially expressed genes under specific conditions. The results of these experiments are often available in public databases that are undergoing a growth similar to that of molecular sequence databases in the past. This now allows novel secondary computational tools to emerge that use such information to gain new knowledge. If several genes have a similar expression profile across heterogeneous transcriptomics experiments, they could be functionally related. These associations are usually useful for the annotation of uncharacterized genes. In addition, the search for genes with opposite expression profiles is useful for finding negative regulators and proposing inhibitory compounds in drug repurposing projects. Here we present a new web application, Automatic and Serial Analysis of CO-expression (ASACO), which has the potential to discover positive and negative correlator genes to a given query gene, based on thousands of public transcriptomics experiments. In addition, examples of use are presented, comparing with previous contrasted knowledge. The results obtained propose ASACO as a useful tool to improve knowledge about genes associated with human diseases and noncoding genes. ASACO is available at http://www.bioinfocabd.upo.es/asaco/.
Collapse
Affiliation(s)
- Cristina Moral-Turón
- Andalusian Centre for Developmental Biology (CABD, UPO-CSIC-JA). Faculty of Experimental Sciences (Genetics Dept.), University Pablo de Olavide, 41013, Seville, Spain
| | | | | | - Alejandro Rubio
- Andalusian Centre for Developmental Biology (CABD, UPO-CSIC-JA). Faculty of Experimental Sciences (Genetics Dept.), University Pablo de Olavide, 41013, Seville, Spain
| | - Alberto G Navarro
- Andalusian Centre for Developmental Biology (CABD, UPO-CSIC-JA). Faculty of Experimental Sciences (Genetics Dept.), University Pablo de Olavide, 41013, Seville, Spain
| | - Ana M Brokate-Llanos
- Andalusian Centre for Developmental Biology (CABD, UPO-CSIC-JA). Faculty of Experimental Sciences (Genetics Dept.), University Pablo de Olavide, 41013, Seville, Spain
| | - Andrés Garzón
- Andalusian Centre for Developmental Biology (CABD, UPO-CSIC-JA). Faculty of Experimental Sciences (Genetics Dept.), University Pablo de Olavide, 41013, Seville, Spain
| | - Manuel J Muñoz
- Andalusian Centre for Developmental Biology (CABD, UPO-CSIC-JA). Faculty of Experimental Sciences (Genetics Dept.), University Pablo de Olavide, 41013, Seville, Spain
| | - Antonio J Pérez-Pulido
- Andalusian Centre for Developmental Biology (CABD, UPO-CSIC-JA). Faculty of Experimental Sciences (Genetics Dept.), University Pablo de Olavide, 41013, Seville, Spain
| |
Collapse
|
13
|
Regueira-Iglesias A, Suárez-Rodríguez B, Blanco-Pintos T, Relvas M, Alonso-Sampedro M, Balsa-Castro C, Tomás I. The salivary microbiome as a diagnostic biomarker of periodontitis: a 16S multi-batch study before and after the removal of batch effects. Front Cell Infect Microbiol 2024; 14:1405699. [PMID: 39071165 PMCID: PMC11272481 DOI: 10.3389/fcimb.2024.1405699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 06/17/2024] [Indexed: 07/30/2024] Open
Abstract
Introduction Microbiome-based clinical applications that improve diagnosis related to oral health are of great interest to precision dentistry. Predictive studies on the salivary microbiome are scarce and of low methodological quality (low sample sizes, lack of biological heterogeneity, and absence of a validation process). None of them evaluates the impact of confounding factors as batch effects (BEs). This is the first 16S multi-batch study to analyze the salivary microbiome at the amplicon sequence variant (ASV) level in terms of differential abundance and machine learning models. This is done in periodontally healthy and periodontitis patients before and after removing BEs. Methods Saliva was collected from 124 patients (50 healthy, 74 periodontitis) in our setting. Sequencing of the V3-V4 16S rRNA gene region was performed in Illumina MiSeq. In parallel, searches were conducted on four databases to identify previous Illumina V3-V4 sequencing studies on the salivary microbiome. Investigations that met predefined criteria were included in the analysis, and the own and external sequences were processed using the same bioinformatics protocol. The statistical analysis was performed in the R-Bioconductor environment. Results The elimination of BEs reduced the number of ASVs with differential abundance between the groups by approximately one-third (Before=265; After=190). Before removing BEs, the model constructed using all study samples (796) comprised 16 ASVs (0.16%) and had an area under the curve (AUC) of 0.944, sensitivity of 90.73%, and specificity of 87.16%. The model built using two-thirds of the specimens (training=531) comprised 35 ASVs (0.36%) and had an AUC of 0.955, sensitivity of 86.54%, and specificity of 90.06% after being validated in the remaining one-third (test=265). After removing BEs, the models required more ASVs (all samples=200-2.03%; training=100-1.01%) to obtain slightly lower AUC (all=0.935; test=0.947), lower sensitivity (all=81.79%; test=78.85%), and similar specificity (all=91.51%; test=90.68%). Conclusions The removal of BEs controls false positive ASVs in the differential abundance analysis. However, their elimination implies a significantly larger number of predictor taxa to achieve optimal performance, creating less robust classifiers. As all the provided models can accurately discriminate health from periodontitis, implying good/excellent sensitivities/specificities, the salivary microbiome demonstrates potential clinical applicability as a precision diagnostic tool for periodontitis.
Collapse
Affiliation(s)
- Alba Regueira-Iglesias
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Berta Suárez-Rodríguez
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Triana Blanco-Pintos
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Marta Relvas
- Instituto Universitário de Ciências da Saúde, Cooperativa de Ensino Superior Politécnico e Universitário (IUCS-CESPU), Unidade de Investigação em Patologia e Reabilitação Oral (UNIPRO), Gandra, Portugal
| | - Manuela Alonso-Sampedro
- Department of Internal Medicine and Clinical Epidemiology, Instituto de Investigación Sanitaria de Santiago (IDIS), Complejo Hospitalario Universitario, Santiago de Compostela, Spain
| | - Carlos Balsa-Castro
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Inmaculada Tomás
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Instituto de Investigación Sanitaria de Santiago (IDIS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| |
Collapse
|
14
|
Nitz AA, Giraldez Chavez JH, Eliason ZG, Payne SH. Are We There Yet? Assessing the Readiness of Single-Cell Proteomics to Answer Biological Hypotheses. J Proteome Res 2024. [PMID: 38981598 DOI: 10.1021/acs.jproteome.4c00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
Single-cell analysis is an active area of research in many fields of biology. Measurements at single-cell resolution allow researchers to study diverse populations without losing biologically meaningful information to sample averages. Many technologies have been used to study single cells, including mass spectrometry-based single-cell proteomics (SCP). SCP has seen a lot of growth over the past couple of years through improvements in data acquisition and analysis, leading to greater proteomic depth. Because method development has been the main focus in SCP, biological applications have been sprinkled in only as proof-of-concept. However, SCP methods now provide significant coverage of the proteome and have been implemented in many laboratories. Thus, a primary question to address in our community is whether the current state of technology is ready for widespread adoption for biological inquiry. In this Perspective, we examine the potential for SCP in three thematic areas of biological investigation: cell annotation, developmental trajectories, and spatial mapping. We identify that the primary limitation of SCP is sample throughput. As proteome depth has been the primary target for method development to date, we advocate for a change in focus to facilitate measuring tens of thousands of single-cell proteomes to enable biological applications beyond proof-of-concept.
Collapse
Affiliation(s)
- Alyssa A Nitz
- Biology Department, Brigham Young University, Provo, Utah 84602, United States
| | | | - Zachary G Eliason
- Biology Department, Brigham Young University, Provo, Utah 84602, United States
| | - Samuel H Payne
- Biology Department, Brigham Young University, Provo, Utah 84602, United States
| |
Collapse
|
15
|
Yu Y, Hou W, Liu Y, Wang H, Dong L, Mai Y, Chen Q, Li Z, Sun S, Yang J, Cao Z, Zhang P, Zi Y, Liu R, Gao J, Zhang N, Li J, Ren L, Jiang H, Shang J, Zhu S, Wang X, Qing T, Bao D, Li B, Li B, Suo C, Pi Y, Wang X, Dai F, Scherer A, Mattila P, Han J, Zhang L, Jiang H, Thierry-Mieg D, Thierry-Mieg J, Xiao W, Hong H, Tong W, Wang J, Li J, Fang X, Jin L, Xu J, Qian F, Zhang R, Shi L, Zheng Y. Quartet RNA reference materials improve the quality of transcriptomic data through ratio-based profiling. Nat Biotechnol 2024; 42:1118-1132. [PMID: 37679545 PMCID: PMC11251996 DOI: 10.1038/s41587-023-01867-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 06/15/2023] [Indexed: 09/09/2023]
Abstract
Certified RNA reference materials are indispensable for assessing the reliability of RNA sequencing to detect intrinsically small biological differences in clinical settings, such as molecular subtyping of diseases. As part of the Quartet Project for quality control and data integration of multi-omics profiling, we established four RNA reference materials derived from immortalized B-lymphoblastoid cell lines from four members of a monozygotic twin family. Additionally, we constructed ratio-based transcriptome-wide reference datasets between two samples, providing cross-platform and cross-laboratory 'ground truth'. Investigation of the intrinsically subtle biological differences among the Quartet samples enables sensitive assessment of cross-batch integration of transcriptomic measurements at the ratio level. The Quartet RNA reference materials, combined with the ratio-based reference datasets, can serve as unique resources for assessing and improving the quality of transcriptomic data in clinical and biological settings.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Haiyan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | | | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zhihui Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shanyue Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peipei Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yi Zi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jian Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Sibo Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xiaolin Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Tao Qing
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ding Bao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bingying Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bin Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Chen Suo
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yan Pi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xia Wang
- National Institute of Metrology, Beijing, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, The Netherlands
| | - Pirkko Mattila
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, The Netherlands
| | | | - Lijun Zhang
- Nanjing Vazyme Biotech Co. Ltd., Nanjing, China
| | | | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
- National Center of Gerontology, Beijing, China
| | - Xiang Fang
- National Institute of Metrology, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA.
| | - Feng Qian
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China.
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China.
- National Center of Gerontology, Beijing, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes, Shanghai, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
16
|
Zheng Y, Liu Y, Yang J, Dong L, Zhang R, Tian S, Yu Y, Ren L, Hou W, Zhu F, Mai Y, Han J, Zhang L, Jiang H, Lin L, Lou J, Li R, Lin J, Liu H, Kong Z, Wang D, Dai F, Bao D, Cao Z, Chen Q, Chen Q, Chen X, Gao Y, Jiang H, Li B, Li B, Li J, Liu R, Qing T, Shang E, Shang J, Sun S, Wang H, Wang X, Zhang N, Zhang P, Zhang R, Zhu S, Scherer A, Wang J, Wang J, Huo Y, Liu G, Cao C, Shao L, Xu J, Hong H, Xiao W, Liang X, Lu D, Jin L, Tong W, Ding C, Li J, Fang X, Shi L. Multi-omics data integration using ratio-based quantitative profiling with Quartet reference materials. Nat Biotechnol 2024; 42:1133-1149. [PMID: 37679543 PMCID: PMC11252085 DOI: 10.1038/s41587-023-01934-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 07/31/2023] [Indexed: 09/09/2023]
Abstract
Characterization and integration of the genome, epigenome, transcriptome, proteome and metabolome of different datasets is difficult owing to a lack of ground truth. Here we develop and characterize suites of publicly available multi-omics reference materials of matched DNA, RNA, protein and metabolites derived from immortalized cell lines from a family quartet of parents and monozygotic twin daughters. These references provide built-in truth defined by relationships among the family members and the information flow from DNA to RNA to protein. We demonstrate how using a ratio-based profiling approach that scales the absolute feature values of a study sample relative to those of a concurrently measured common reference sample produces reproducible and comparable data suitable for integration across batches, labs, platforms and omics types. Our study identifies reference-free 'absolute' feature quantification as the root cause of irreproducibility in multi-omics measurement and data integration and establishes the advantages of ratio-based multi-omics profiling with common reference materials.
Collapse
Affiliation(s)
- Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Sha Tian
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Feng Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | | | | | | | - Ling Lin
- Zhangjiang Center for Translational Medicine, Shanghai Biotecan Medical Diagnostics Co. Ltd., Shanghai, China
| | - Jingwei Lou
- Zhangjiang Center for Translational Medicine, Shanghai Biotecan Medical Diagnostics Co. Ltd., Shanghai, China
| | - Ruiqiang Li
- Novogene Bioinformatics Institute, Beijing, China
| | - Jingchao Lin
- Metabo-Profile Biotechnology (Shanghai) Co. Ltd., Shanghai, China
| | | | | | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, China
| | | | - Ding Bao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xingdong Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bin Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bingying Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, China
| | - Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Tao Qing
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Erfei Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shanyue Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Haiyan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xiaolin Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peipei Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ruolan Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Sibo Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jiucun Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jing Wang
- National Institute of Metrology, Beijing, China
| | - Yinbo Huo
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Gang Liu
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Chengming Cao
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Li Shao
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Xiaozhen Liang
- Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, Shanghai, China
| | - Daru Lu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Weida Tong
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Chen Ding
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| |
Collapse
|
17
|
Takemoto Y, Ito D, Komori S, Kishimoto Y, Yamada S, Hashizume A, Katsuno M, Nakatochi M. Comparing preprocessing strategies for 3D-Gene microarray data of extracellular vesicle-derived miRNAs. BMC Bioinformatics 2024; 25:221. [PMID: 38902629 PMCID: PMC11188187 DOI: 10.1186/s12859-024-05840-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 06/12/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Extracellular vesicle-derived (EV)-miRNAs have potential to serve as biomarkers for the diagnosis of various diseases. miRNA microarrays are widely used to quantify circulating EV-miRNA levels, and the preprocessing of miRNA microarray data is critical for analytical accuracy and reliability. Thus, although microarray data have been used in various studies, the effects of preprocessing have not been studied for Toray's 3D-Gene chip, a widely used measurement method. We aimed to evaluate batch effect, missing value imputation accuracy, and the influence of preprocessing on measured values in 18 different preprocessing pipelines for EV-miRNA microarray data from two cohorts with amyotrophic lateral sclerosis using 3D-Gene technology. RESULTS Eighteen different pipelines with different types and orders of missing value completion and normalization were used to preprocess the 3D-Gene microarray EV-miRNA data. Notable results were suppressed in the batch effects in all pipelines using the batch effect correction method ComBat. Furthermore, pipelines utilizing missForest for missing value imputation showed high agreement with measured values. In contrast, imputation using constant values for missing data exhibited low agreement. CONCLUSIONS This study highlights the importance of selecting the appropriate preprocessing strategy for EV-miRNA microarray data when using 3D-Gene technology. These findings emphasize the importance of validating preprocessing approaches, particularly in the context of batch effect correction and missing value imputation, for reliably analyzing data in biomarker discovery and disease research.
Collapse
Affiliation(s)
- Yuto Takemoto
- Public Health Informatics Unit, Department of Integrated Health Sciences, Nagoya University Graduate School of Medicine, 1-1-20 Daiko-Minami, Higashi-Ku, Nagoya, 461-8673, Japan
| | - Daisuke Ito
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Shota Komori
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Yoshiyuki Kishimoto
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Shinichiro Yamada
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Atsushi Hashizume
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
- Department of Clinical Research Education, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Masahisa Katsuno
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
- Department of Clinical Research Education, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Masahiro Nakatochi
- Public Health Informatics Unit, Department of Integrated Health Sciences, Nagoya University Graduate School of Medicine, 1-1-20 Daiko-Minami, Higashi-Ku, Nagoya, 461-8673, Japan.
| |
Collapse
|
18
|
Xue Y, Friedl V, Ding H, Wong CK, Stuart JM. Single-cell signatures identify microenvironment factors in tumors associated with patient outcomes. CELL REPORTS METHODS 2024; 4:100799. [PMID: 38889686 PMCID: PMC11228369 DOI: 10.1016/j.crmeth.2024.100799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 04/30/2024] [Accepted: 05/21/2024] [Indexed: 06/20/2024]
Abstract
The cellular components of tumors and their microenvironment play pivotal roles in tumor progression, patient survival, and the response to cancer treatments. Unveiling a comprehensive cellular profile within bulk tumors via single-cell RNA sequencing (scRNA-seq) data is crucial, as it unveils intrinsic tumor cellular traits that elude identification through conventional cancer subtyping methods. Our contribution, scBeacon, is a tool that derives cell-type signatures by integrating and clustering multiple scRNA-seq datasets to extract signatures for deconvolving unrelated tumor datasets on bulk samples. Through the employment of scBeacon on the The Cancer Genome Atlas (TCGA) cohort, we find cellular and molecular attributes within specific tumor categories, many with patient outcome relevance. We developed a tumor cell-type map to visually depict the relationships among TCGA samples based on the cell-type inferences.
Collapse
Affiliation(s)
- Yuanqing Xue
- UC Santa Cruz Department, Biomolecular Engineering, Genomics Institute, Santa Cruz, CA, USA
| | - Verena Friedl
- UC Santa Cruz Department, Biomolecular Engineering, Genomics Institute, Santa Cruz, CA, USA
| | - Hongxu Ding
- UC Santa Cruz Department, Biomolecular Engineering, Genomics Institute, Santa Cruz, CA, USA
| | - Christopher K Wong
- UC Santa Cruz Department, Biomolecular Engineering, Genomics Institute, Santa Cruz, CA, USA
| | - Joshua M Stuart
- UC Santa Cruz Department, Biomolecular Engineering, Genomics Institute, Santa Cruz, CA, USA.
| |
Collapse
|
19
|
Yang J, Wang DF, Huang JH, Zhu QH, Luo LY, Lu R, Xie XL, Salehian-Dehkordi H, Esmailizadeh A, Liu GE, Li MH. Structural variant landscapes reveal convergent signatures of evolution in sheep and goats. Genome Biol 2024; 25:148. [PMID: 38845023 PMCID: PMC11155191 DOI: 10.1186/s13059-024-03288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 05/21/2024] [Indexed: 06/10/2024] Open
Abstract
BACKGROUND Sheep and goats have undergone domestication and improvement to produce similar phenotypes, which have been greatly impacted by structural variants (SVs). Here, we report a high-quality chromosome-level reference genome of Asiatic mouflon, and implement a comprehensive analysis of SVs in 897 genomes of worldwide wild and domestic populations of sheep and goats to reveal genetic signatures underlying convergent evolution. RESULTS We characterize the SV landscapes in terms of genetic diversity, chromosomal distribution and their links with genes, QTLs and transposable elements, and examine their impacts on regulatory elements. We identify several novel SVs and annotate corresponding genes (e.g., BMPR1B, BMPR2, RALYL, COL21A1, and LRP1B) associated with important production traits such as fertility, meat and milk production, and wool/hair fineness. We detect signatures of selection involving the parallel evolution of orthologous SV-associated genes during domestication, local environmental adaptation, and improvement. In particular, we find that fecundity traits experienced convergent selection targeting the gene BMPR1B, with the DEL00067921 deletion explaining ~10.4% of the phenotypic variation observed in goats. CONCLUSIONS Our results provide new insights into the convergent evolution of SVs and serve as a rich resource for the future improvement of sheep, goats, and related livestock.
Collapse
Affiliation(s)
- Ji Yang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Dong-Feng Wang
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Jia-Hui Huang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Qiang-Hui Zhu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ling-Yun Luo
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ran Lu
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Xing-Long Xie
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Hosein Salehian-Dehkordi
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ali Esmailizadeh
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, 76169-133, Iran
| | - George E Liu
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, MD, 20705, USA
| | - Meng-Hua Li
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China.
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
20
|
Ma Y, Pei Y. NDMNN: A novel deep residual network based MNN method to remove batch effects from scRNA-seq data. J Bioinform Comput Biol 2024; 22:2450015. [PMID: 39036845 DOI: 10.1142/s021972002450015x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
The rapid development of single-cell RNA sequencing (scRNA-seq) technology has generated vast amounts of data. However, these data often exhibit batch effects due to various factors such as different time points, experimental personnel, and instruments used, which can obscure the biological differences in the data itself. Based on the characteristics of scRNA-seq data, we designed a dense deep residual network model, referred to as NDnetwork. Subsequently, we combined the NDnetwork model with the MNN method to correct batch effects in scRNA-seq data, and named it the NDMNN method. Comprehensive experimental results demonstrate that the NDMNN method outperforms existing commonly used methods for correcting batch effects in scRNA-seq data. As the scale of single-cell sequencing continues to expand, we believe that NDMNN will be a valuable tool for researchers in the biological community for correcting batch effects in their studies. The source code and experimental results of the NDMNN method can be found at https://github.com/mustang-hub/NDMNN.
Collapse
Affiliation(s)
- Yupeng Ma
- Software Engineering, Tiangong University, Tianjin, P. R. China
| | - Yongzhen Pei
- School of Mathematical Sciences, Tiangong University, Tianjin, P. R. China
| |
Collapse
|
21
|
Liu L, Jia R, Hou R, Huang C. Prediction of cell-type-specific cohesin-mediated chromatin loops based on chromatin state. Methods 2024; 226:151-160. [PMID: 38670416 DOI: 10.1016/j.ymeth.2024.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 04/02/2024] [Accepted: 04/18/2024] [Indexed: 04/28/2024] Open
Abstract
Chromatin loop is of crucial importance for the regulation of gene transcription. Cohesin is a type of chromatin-associated protein that mediates the interaction of chromatin through the loop extrusion. Cohesin-mediated chromatin interactions have strong cell-type specificity, posing a challenge for predicting chromatin loops. Existing computational methods perform poorly in predicting cell-type-specific chromatin loops. To address this issue, we propose a random forest model to predict cell-type-specific cohesin-mediated chromatin loops based on chromatin states identified by ChromHMM and the occupancy of related factors. Our results show that chromatin state is responsible for cell-type-specificity of loops. Using only chromatin states as features, the model achieved high accuracy in predicting cell-type-specific loops between two cell types and can be applied to different cell types. Furthermore, when chromatin states are combined with the occurrence frequency of CTCF, RAD21, YY1, and H3K27ac ChIP-seq peaks, more accurate prediction can be achieved. Our feature extraction method provides novel insights into predicting cell-type-specific chromatin loops and reveals the relationship between chromatin state and chromatin loop formation.
Collapse
Affiliation(s)
- Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China.
| | - Ranran Jia
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China.
| | - Rui Hou
- College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010051, China.
| | - Chengbing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba 623002, China.
| |
Collapse
|
22
|
Goldstein Y, Cohen OT, Wald O, Bavli D, Kaplan T, Benny O. Particle uptake in cancer cells can predict malignancy and drug resistance using machine learning. SCIENCE ADVANCES 2024; 10:eadj4370. [PMID: 38809990 PMCID: PMC11314625 DOI: 10.1126/sciadv.adj4370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 04/23/2024] [Indexed: 05/31/2024]
Abstract
Tumor heterogeneity is a primary factor that contributes to treatment failure. Predictive tools, capable of classifying cancer cells based on their functions, may substantially enhance therapy and extend patient life span. The connection between cell biomechanics and cancer cell functions is used here to classify cells through mechanical measurements, via particle uptake. Machine learning (ML) was used to classify cells based on single-cell patterns of uptake of particles with diverse sizes. Three pairs of human cancer cell subpopulations, varied in their level of drug resistance or malignancy, were studied. Cells were allowed to interact with fluorescently labeled polystyrene particles ranging in size from 0.04 to 3.36 μm and analyzed for their uptake patterns using flow cytometry. ML algorithms accurately classified cancer cell subtypes with accuracy rates exceeding 95%. The uptake data were especially advantageous for morphologically similar cell subpopulations. Moreover, the uptake data were found to serve as a form of "normalization" that could reduce variation in repeated experiments.
Collapse
Affiliation(s)
- Yoel Goldstein
- Institute for Drug Research, The School of Pharmacy, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ora T. Cohen
- Institute for Drug Research, The School of Pharmacy, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ori Wald
- Department of Cardiothoracic Surgery, Hadassah Medical Center, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Danny Bavli
- Department of Stem Cell and Regenerative Biology, Harvard Stem Cell Institute, Harvard University, Cambridge, MA, USA
| | - Tommy Kaplan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
- Department of Developmental Biology and Cancer Research, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ofra Benny
- Institute for Drug Research, The School of Pharmacy, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| |
Collapse
|
23
|
Ma C, Zhang Y, Ding R, Chen H, Wu X, Xu L, Yu C. In search of the ratio of miRNA expression as robust biomarkers for constructing stable diagnostic models among multi-center data. Front Genet 2024; 15:1381917. [PMID: 38746057 PMCID: PMC11091382 DOI: 10.3389/fgene.2024.1381917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 04/10/2024] [Indexed: 05/16/2024] Open
Abstract
MicroRNAs (miRNAs) are promising biomarkers for the early detection of disease, and many miRNA-based diagnostic models have been constructed to distinguish patients and healthy individuals. To thoroughly utilize the miRNA-profiling data across different sequencing platforms or multiple centers, the models accounting the batch effects were demanded for the generalization of medical application. We conducted transcription factor (TF)-mediated miRNA-miRNA interaction network analysis and adopted the within-sample expression ratios of miRNA pairs as predictive markers. The ratio of the expression values between each miRNA pair turned out to be stable across multiple data sources. A genetic algorithm-based classifier was constructed to quantify risk scores of the probability of disease and discriminate disease states from normal states in discovery, with a validation dataset for COVID-19, renal cell carcinoma, and lung adenocarcinoma. The predictive models based on the expression ratio of interacting miRNA pairs demonstrated good performances in the discovery and validation datasets, and the classifier may be used accurately for the early detection of disease.
Collapse
Affiliation(s)
- Cuidie Ma
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Yonghao Zhang
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Rui Ding
- State Key Laboratory of Complex Severe and Rare Diseases, Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China
| | - Han Chen
- Shenyang Medical College, Shenyang, China
| | - Xudong Wu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Lida Xu
- Beijing Hotgen Biotech Co., Ltd., Beijing, China
| | - Changyuan Yu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| |
Collapse
|
24
|
Zhang T, Zhang Z, Li L, Ren J, Wu Z, Gao B, Wang G. GTADC: A Graph-Based Method for Inferring Cell Spatial Distribution in Cancer Tissues. Biomolecules 2024; 14:436. [PMID: 38672453 PMCID: PMC11048052 DOI: 10.3390/biom14040436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 03/23/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
The heterogeneity of tumors poses a challenge for understanding cell interactions and constructing complex ecosystems within cancer tissues. Current research strategies integrate spatial transcriptomics (ST) and single-cell sequencing (scRNA-seq) data to thoroughly analyze this intricate system. However, traditional deep learning methods using scRNA-seq data tend to filter differentially expressed genes through statistical methods. In the context of cancer tissues, where cancer cells exhibit significant differences in gene expression compared to normal cells, this heterogeneity renders traditional analysis methods incapable of accurately capturing differences between cell types. Therefore, we propose a graph-based deep learning method, GTADC, which utilizes Silhouette scores to precisely capture genes with significant expression differences within each cell type, enhancing the accuracy of gene selection. Compared to traditional methods, GTADC not only considers the expression similarity of genes within their respective clusters but also comprehensively leverages information from the overall clustering structure. The introduction of graph structure effectively captures spatial relationships and topological structures between the two types of data, enabling GTADC to more accurately and comprehensively resolve the spatial composition of different cell types within tissues. This refinement allows GTADC to intricately reconstruct the cellular spatial composition, offering a precise solution for inferring cell spatial composition. This method allows for early detection of potential cancer cell regions within tissues, assessing their quantity and spatial information in cell populations. We aim to achieve a preliminary estimation of cancer occurrence and development, contributing to a deeper understanding of early-stage cancer and providing potential support for early cancer diagnosis.
Collapse
Affiliation(s)
- Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (T.Z.); (Z.Z.); (L.L.); (J.R.); (Z.W.)
| | - Ziheng Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (T.Z.); (Z.Z.); (L.L.); (J.R.); (Z.W.)
| | - Liangyu Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (T.Z.); (Z.Z.); (L.L.); (J.R.); (Z.W.)
| | - Jixiang Ren
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (T.Z.); (Z.Z.); (L.L.); (J.R.); (Z.W.)
| | - Zhenao Wu
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (T.Z.); (Z.Z.); (L.L.); (J.R.); (Z.W.)
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150040, China;
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (T.Z.); (Z.Z.); (L.L.); (J.R.); (Z.W.)
| |
Collapse
|
25
|
Marmarelis MG, Littman R, Battaglin F, Niedzwiecki D, Venook A, Ambite JL, Galstyan A, Lenz HJ, Ver Steeg G. q-Diffusion leverages the full dimensionality of gene coexpression in single-cell transcriptomics. Commun Biol 2024; 7:400. [PMID: 38565955 PMCID: PMC11255321 DOI: 10.1038/s42003-024-06104-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 03/25/2024] [Indexed: 04/04/2024] Open
Abstract
Unlocking the full dimensionality of single-cell RNA sequencing data (scRNAseq) is the next frontier to a richer, fuller understanding of cell biology. We introduce q-diffusion, a framework for capturing the coexpression structure of an entire library of genes, improving on state-of-the-art analysis tools. The method is demonstrated via three case studies. In the first, q-diffusion helps gain statistical significance for differential effects on patient outcomes when analyzing the CALGB/SWOG 80405 randomized phase III clinical trial, suggesting precision guidance for the treatment of metastatic colorectal cancer. Secondly, q-diffusion is benchmarked against existing scRNAseq classification methods using an in vitro PBMC dataset, in which the proposed method discriminates IFN-γ stimulation more accurately. The same case study demonstrates improvements in unsupervised cell clustering with the recent Tabula Sapiens human atlas. Finally, a local distributional segmentation approach for spatial scRNAseq, driven by q-diffusion, yields interpretable structures of human cortical tissue.
Collapse
Affiliation(s)
- Myrl G Marmarelis
- Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA, 90292, USA.
| | - Russell Littman
- University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Francesca Battaglin
- Keck School of Medicine, University of Southern California, 1975 Zonal Ave., Los Angeles, CA, 90033, USA
| | | | - Alan Venook
- University of California San Francisco, San Francisco, CA, 94143, USA
| | - Jose-Luis Ambite
- Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA, 90292, USA
| | - Aram Galstyan
- Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA, 90292, USA
| | - Heinz-Josef Lenz
- Keck School of Medicine, University of Southern California, 1975 Zonal Ave., Los Angeles, CA, 90033, USA
| | - Greg Ver Steeg
- Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA, 90292, USA
- University of California Riverside, Riverside, CA, 92521, USA
| |
Collapse
|
26
|
Orcel E, Hage H, Taha M, Boucher N, Chautard E, Courtois V, Saliou A. A single workflow for multi-species blood transcriptomics. BMC Genomics 2024; 25:282. [PMID: 38493105 PMCID: PMC10944614 DOI: 10.1186/s12864-024-10208-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 03/11/2024] [Indexed: 03/18/2024] Open
Abstract
BACKGROUND Blood transcriptomic analysis is widely used to provide a detailed picture of a physiological state with potential outcomes for applications in diagnostics and monitoring of the immune response to vaccines. However, multi-species transcriptomic analysis is still a challenge from a technological point of view and a standardized workflow is urgently needed to allow interspecies comparisons. RESULTS Here, we propose a single and complete total RNA-Seq workflow to generate reliable transcriptomic data from blood samples from humans and from animals typically used in preclinical models. Blood samples from a maximum of six individuals and four different species (rabbit, non-human primate, mouse and human) were extracted and sequenced in triplicates. The workflow was evaluated using different wet-lab and dry-lab criteria, including RNA quality and quantity, the library molarity, the number of raw sequencing reads, the Phred-score quality, the GC content, the performance of ribosomal-RNA and globin depletion, the presence of residual DNA, the strandness, the percentage of coding genes, the number of genes expressed, and the presence of saturation plateau in rarefaction curves. We identified key criteria and their associated thresholds to be achieved for validating the transcriptomic workflow. In this study, we also generated an automated analysis of the transcriptomic data that streamlines the validation of the dataset generated. CONCLUSIONS Our study has developed an end-to-end workflow that should improve the standardization and the inter-species comparison in blood transcriptomics studies. In the context of vaccines and drug development, RNA sequencing data from preclinical models can be directly compared with clinical data and used to identify potential biomarkers of value to monitor safety and efficacy.
Collapse
Affiliation(s)
- Elody Orcel
- BIOASTER, 40 Avenue Tony Garnier, Lyon, 69007, France
| | - Hayat Hage
- BIOASTER, 40 Avenue Tony Garnier, Lyon, 69007, France
| | - May Taha
- BIOASTER, 40 Avenue Tony Garnier, Lyon, 69007, France
| | | | - Emilie Chautard
- SANOFI, 1541 Av. Marcel Mérieux, Marcy-L'Étoile, 69280, France
| | | | - Adrien Saliou
- BIOASTER, 40 Avenue Tony Garnier, Lyon, 69007, France.
| |
Collapse
|
27
|
Lin H, Zhang M, Hu M, Zhang Y, Jiang W, Tang W, Ouyang Y, Jiang L, Mi Y, Chen Z, He P, Zhao G, Ouyang X. Emerging applications of single-cell profiling in precision medicine of atherosclerosis. J Transl Med 2024; 22:97. [PMID: 38263066 PMCID: PMC10804726 DOI: 10.1186/s12967-023-04629-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 10/14/2023] [Indexed: 01/25/2024] Open
Abstract
Atherosclerosis is a chronic, progressive, inflammatory disease that occurs in the arterial wall. Despite recent advancements in treatment aimed at improving efficacy and prolonging survival, atherosclerosis remains largely incurable. In this review, we discuss emerging single-cell sequencing techniques and their novel insights into atherosclerosis. We provide examples of single-cell profiling studies that reveal phenotypic characteristics of atherosclerosis plaques, blood, liver, and the intestinal tract. Additionally, we highlight the potential clinical applications of single-cell analysis and propose that combining this approach with other techniques can facilitate early diagnosis and treatment, leading to more accurate medical interventions.
Collapse
Affiliation(s)
- Huiling Lin
- Department of Physiology, Medical College, Institute of Neuroscience Research, Hengyang Key Laboratory of Neurodegeneration and Cognitive Impairment, University of South China, Hengyang, 421001, Hunan, China
- Department of Physiology, School of Medicine, Hunan Normal University, Changsha, 410081, Hunan, China
| | - Ming Zhang
- Affiliated Qingyuan Hospital, Guangzhou Medical University (Qingyuan People's Hospital), Qingyuan, 511518, Guangdong, China
| | - Mi Hu
- Department of Physiology, Medical College, Institute of Neuroscience Research, Hengyang Key Laboratory of Neurodegeneration and Cognitive Impairment, University of South China, Hengyang, 421001, Hunan, China
| | - Yangkai Zhang
- Department of Physiology, Medical College, Institute of Neuroscience Research, Hengyang Key Laboratory of Neurodegeneration and Cognitive Impairment, University of South China, Hengyang, 421001, Hunan, China
| | - WeiWei Jiang
- Department of Organ Transplantation, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong, China
| | - Wanying Tang
- Department of Physiology, Medical College, Institute of Neuroscience Research, Hengyang Key Laboratory of Neurodegeneration and Cognitive Impairment, University of South China, Hengyang, 421001, Hunan, China
| | - Yuxin Ouyang
- Department of Physiology, Medical College, Institute of Neuroscience Research, Hengyang Key Laboratory of Neurodegeneration and Cognitive Impairment, University of South China, Hengyang, 421001, Hunan, China
| | - Liping Jiang
- Department of Clinical Pharmacology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Yali Mi
- Affiliated Qingyuan Hospital, Guangzhou Medical University (Qingyuan People's Hospital), Qingyuan, 511518, Guangdong, China
| | - Zhi Chen
- College of Physics and Optoelectronic Engineering, Shenzhen University, Shenzhen, China
| | - Pingping He
- Department of Nursing, School of Medicine, Hunan Normal University, Changsha, 410081, Hunan, China.
| | - Guojun Zhao
- Affiliated Qingyuan Hospital, Guangzhou Medical University (Qingyuan People's Hospital), Qingyuan, 511518, Guangdong, China.
| | - Xinping Ouyang
- Department of Physiology, Medical College, Institute of Neuroscience Research, Hengyang Key Laboratory of Neurodegeneration and Cognitive Impairment, University of South China, Hengyang, 421001, Hunan, China.
- Department of Physiology, School of Medicine, Hunan Normal University, Changsha, 410081, Hunan, China.
- The Key Laboratory of Model Animals and Stem Cell Biology in Hunan Province, School of Medicine, Hunan Normal University, 410081, Hunan, Changsha, China.
- The Engineering Research Center of Reproduction and Translational Medicine of Hunan Province, School of Medicine, Hunan Normal University, 410081, Hunan, Changsha, China.
| |
Collapse
|
28
|
Bai L, Wu Y, Li G, Zhang W, Zhang H, Su J. AI-enabled organoids: Construction, analysis, and application. Bioact Mater 2024; 31:525-548. [PMID: 37746662 PMCID: PMC10511344 DOI: 10.1016/j.bioactmat.2023.09.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 09/09/2023] [Accepted: 09/09/2023] [Indexed: 09/26/2023] Open
Abstract
Organoids, miniature and simplified in vitro model systems that mimic the structure and function of organs, have attracted considerable interest due to their promising applications in disease modeling, drug screening, personalized medicine, and tissue engineering. Despite the substantial success in cultivating physiologically relevant organoids, challenges remain concerning the complexities of their assembly and the difficulties associated with data analysis. The advent of AI-Enabled Organoids, which interfaces with artificial intelligence (AI), holds the potential to revolutionize the field by offering novel insights and methodologies that can expedite the development and clinical application of organoids. This review succinctly delineates the fundamental concepts and mechanisms underlying AI-Enabled Organoids, summarizing the prospective applications on rapid screening of construction strategies, cost-effective extraction of multiscale image features, streamlined analysis of multi-omics data, and precise preclinical evaluation and application. We also explore the challenges and limitations of interfacing organoids with AI, and discuss the future direction of the field. Taken together, the AI-Enabled Organoids hold significant promise for advancing our understanding of organ development and disease progression, ultimately laying the groundwork for clinical application.
Collapse
Affiliation(s)
- Long Bai
- Department of Orthopedics, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, China
- Organoid Research Center, Institute of Translational Medicine, Shanghai University, Shanghai, 200444, China
- National Center for Translational Medicine (Shanghai) SHU Branch, Shanghai University, Shanghai, 200444, China
- Wenzhou Institute of Shanghai University, Wenzhou, 325000, China
| | - Yan Wu
- Department of Orthopedics, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, China
- Organoid Research Center, Institute of Translational Medicine, Shanghai University, Shanghai, 200444, China
- National Center for Translational Medicine (Shanghai) SHU Branch, Shanghai University, Shanghai, 200444, China
| | - Guangfeng Li
- Organoid Research Center, Institute of Translational Medicine, Shanghai University, Shanghai, 200444, China
- National Center for Translational Medicine (Shanghai) SHU Branch, Shanghai University, Shanghai, 200444, China
- Department of Orthopedics, Shanghai Zhongye Hospital, Shanghai, 201941, China
| | - Wencai Zhang
- Department of Orthopedics, First Affiliated Hospital, Jinan University, Guangzhou, 510632, China
| | - Hao Zhang
- Department of Orthopedics, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, China
- Organoid Research Center, Institute of Translational Medicine, Shanghai University, Shanghai, 200444, China
- National Center for Translational Medicine (Shanghai) SHU Branch, Shanghai University, Shanghai, 200444, China
| | - Jiacan Su
- Department of Orthopedics, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, China
- Organoid Research Center, Institute of Translational Medicine, Shanghai University, Shanghai, 200444, China
- National Center for Translational Medicine (Shanghai) SHU Branch, Shanghai University, Shanghai, 200444, China
| |
Collapse
|
29
|
Gruntenko NE, Deryuzhenko MA, Andreenkova OV, Shishkina OD, Bobrovskikh MA, Shatskaya NV, Vasiliev GV. Drosophila melanogaster Transcriptome Response to Different Wolbachia Strains. Int J Mol Sci 2023; 24:17411. [PMID: 38139239 PMCID: PMC10743526 DOI: 10.3390/ijms242417411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 11/26/2023] [Accepted: 12/07/2023] [Indexed: 12/24/2023] Open
Abstract
Wolbachia is a maternally inherited, intercellular bacterial symbiont of insects and some other invertebrates. Here, we investigated the effect of two different Wolbachia strains, differing in a large chromosomal inversion, on the differential expression of genes in D. melanogaster females. We revealed significant changes in the transcriptome of the infected flies compared to the uninfected ones, as well as in the transcriptome of flies infected with the Wolbachia strain, wMelPlus, compared to flies infected with the wMelCS112 strain. We linked differentially expressed genes (DEGs) from two pairwise comparisons, "uninfected-wMelPlus-infected" and "uninfected-wMelCS112-infected", into two gene networks, in which the following functional groups were designated: "Proteolysis", "Carbohydrate transport and metabolism", "Oxidation-reduction process", "Embryogenesis", "Transmembrane transport", "Response to stress" and "Alkaline phosphatases". Our data emphasized similarities and differences between infections by different strains under study: a wMelPlus infection results in more than double the number of upregulated DEGs and half the number of downregulated DEGs compared to a wMelCS112 infection. Thus, we demonstrated that Wolbachia made a significant contribution to differential expression of host genes and that the bacterial genotype plays a vital role in establishing the character of this contribution.
Collapse
Affiliation(s)
- Nataly E. Gruntenko
- Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia; (M.A.D.); (O.V.A.); (O.D.S.); (M.A.B.); (N.V.S.); (G.V.V.)
| | | | | | | | | | | | | |
Collapse
|
30
|
Wang H, Lim KP, Kong W, Gao H, Wong BJH, Phua SX, Guo T, Goh WWB. MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects. Sci Data 2023; 10:858. [PMID: 38042886 PMCID: PMC10693559 DOI: 10.1038/s41597-023-02779-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 11/23/2023] [Indexed: 12/04/2023] Open
Abstract
Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are not well studied. Although proteomic technologies have improved significantly in recent years, this alone cannot resolve these issues. What is needed are better algorithms and data processing knowledge. But to obtain these, we need appropriate proteomics datasets for exploration, investigation, and benchmarking. To meet this need, we developed MultiPro (Multi-purpose Proteome Resource), a resource comprising four comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) modes. Each dataset contains a balanced two-class design based on well-characterized and widely studied cell lines (A549 vs K562 or HCC1806 vs HS578T) with 48 or 36 biological and technical replicates altogether, allowing for investigation of a multitude of technical issues. These datasets allow for investigation of inter-connections between class and batch factors, or to develop approaches to compare and integrate data from DDA and DIA platforms.
Collapse
Affiliation(s)
- He Wang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Kai Peng Lim
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Huanhuan Gao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, 310030, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, 310030, China
- Research Center for Industries of the Future, Westlake University, 600 Dunyu Road, Hangzhou, Zhejiang, 310030, China
| | - Bertrand Jern Han Wong
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Ser Xian Phua
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Tiannan Guo
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, 310030, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, 310030, China
- Research Center for Industries of the Future, Westlake University, 600 Dunyu Road, Hangzhou, Zhejiang, 310030, China
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore.
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore.
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, 636921, Singapore.
| |
Collapse
|
31
|
Zhang T, Zhang Z, Li L, Dong B, Wang G, Zhang D. GTAD: a graph-based approach for cell spatial composition inference from integrated scRNA-seq and ST-seq data. Brief Bioinform 2023; 25:bbad469. [PMID: 38127088 PMCID: PMC10734610 DOI: 10.1093/bib/bbad469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 12/23/2023] Open
Abstract
With the emergence of spatial transcriptome sequencing (ST-seq), research now heavily relies on the joint analysis of ST-seq and single-cell RNA sequencing (scRNA-seq) data to precisely identify cell spatial composition in tissues. However, common methods for combining these datasets often merge data from multiple cells to generate pseudo-ST data, overlooking topological relationships and failing to represent spatial arrangements accurately. We introduce GTAD, a method utilizing the Graph Attention Network for deconvolution of integrated scRNA-seq and ST-seq data. GTAD effectively captures cell spatial relationships and topological structures within tissues using a graph-based approach, enhancing cell-type identification and our understanding of complex tissue cellular landscapes. By integrating scRNA-seq and ST data into a unified graph structure, GTAD outperforms traditional 'pseudo-ST' methods, providing robust and information-rich results. GTAD performs exceptionally well with synthesized spatial data and accurately identifies cell spatial composition in tissues like the mouse cerebral cortex, cerebellum, developing human heart and pancreatic ductal carcinoma. GTAD holds the potential to enhance our understanding of tissue microenvironments and cellular diversity in complex bio-logical systems. The source code is available at https://github.com/zzhjs/GTAD.
Collapse
Affiliation(s)
- Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Ziheng Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Liangyu Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Benzhi Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Dandan Zhang
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| |
Collapse
|
32
|
Zhou R, Ng SK, Sung JJY, Goh WWB, Wong SH. Data pre-processing for analyzing microbiome data - A mini review. Comput Struct Biotechnol J 2023; 21:4804-4815. [PMID: 37841330 PMCID: PMC10569954 DOI: 10.1016/j.csbj.2023.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/01/2023] [Accepted: 10/01/2023] [Indexed: 10/17/2023] Open
Abstract
The human microbiome is an emerging research frontier due to its profound impacts on health. High-throughput microbiome sequencing enables studying microbial communities but suffers from analytical challenges. In particular, the lack of dedicated preprocessing methods to improve data quality impedes effective minimization of biases prior to downstream analysis. This review aims to address this gap by providing a comprehensive overview of preprocessing techniques relevant to microbiome research. We outline a typical workflow for microbiome data analysis. Preprocessing methods discussed include quality filtering, batch effect correction, imputation of missing values, normalization, and data transformation. We highlight strengths and limitations of each technique to serve as a practical guide for researchers and identify areas needing further methodological development. Establishing robust, standardized preprocessing will be essential for drawing valid biological conclusions from microbiome studies.
Collapse
Affiliation(s)
- Ruwen Zhou
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
| | - Siu Kin Ng
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
| | - Joseph Jao Yiu Sung
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
- Department of Gastroenterology and Hepatology, Tan Tock Seng Hospital, National Healthcare Group, 11 Jalan Tan Tock Seng, 308433, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, 59 Nanyang Drive, 636921, Singapore
| | - Sunny Hei Wong
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, 308232, Singapore
- Department of Gastroenterology and Hepatology, Tan Tock Seng Hospital, National Healthcare Group, 11 Jalan Tan Tock Seng, 308433, Singapore
| |
Collapse
|
33
|
Ng S, Masarone S, Watson D, Barnes MR. The benefits and pitfalls of machine learning for biomarker discovery. Cell Tissue Res 2023; 394:17-31. [PMID: 37498390 PMCID: PMC10558383 DOI: 10.1007/s00441-023-03816-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 07/12/2023] [Indexed: 07/28/2023]
Abstract
Prospects for the discovery of robust and reproducible biomarkers have improved considerably with the development of sensitive omics platforms that can enable measurement of biological molecules at an unprecedented scale. With technical barriers to success lowering, the challenge is now moving into the analytical domain. Genome-wide discovery presents a problem of scale and multiple testing as standard statistical methods struggle to distinguish signal from noise in increasingly complex biological systems. Machine learning and AI methods are good at finding answers in large datasets, but they have a tendency to overfit solutions. It may be possible to find a local answer or mechanism in a specific patient sample or small group of samples, but this may not generalise to wider patient populations due to the high likelihood of false discovery. The rise of explainable AI offers to improve the opportunity for true discovery by providing explanations for predictions that can be explored mechanistically before proceeding to costly and time-consuming validation studies. This review aims to introduce some of the basic concepts of machine learning and AI for biomarker discovery with a focus on post hoc explanation of predictions. To illustrate this, we consider how explainable AI has already been used successfully, and we explore a case study that applies AI to biomarker discovery in rheumatoid arthritis, demonstrating the accessibility of tools for AI and machine learning. We use this to illustrate and discuss some of the potential challenges and solutions that may enable AI to critically interrogate disease and response mechanisms.
Collapse
Affiliation(s)
- Sandra Ng
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Sara Masarone
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, UK
- Alan Turing Institute, London, UK
| | - David Watson
- Department of Informatics, King's College London, London, UK
| | - Michael R Barnes
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, UK.
- Alan Turing Institute, London, UK.
| |
Collapse
|
34
|
Regueira-Iglesias A, Balsa-Castro C, Blanco-Pintos T, Tomás I. Critical review of 16S rRNA gene sequencing workflow in microbiome studies: From primer selection to advanced data analysis. Mol Oral Microbiol 2023; 38:347-399. [PMID: 37804481 DOI: 10.1111/omi.12434] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/01/2023] [Accepted: 09/14/2023] [Indexed: 10/09/2023]
Abstract
The multi-batch reanalysis approach of jointly reevaluating gene/genome sequences from different works has gained particular relevance in the literature in recent years. The large amount of 16S ribosomal ribonucleic acid (rRNA) gene sequence data stored in public repositories and information in taxonomic databases of the same gene far exceeds that related to complete genomes. This review is intended to guide researchers new to studying microbiota, particularly the oral microbiota, using 16S rRNA gene sequencing and those who want to expand and update their knowledge to optimise their decision-making and improve their research results. First, we describe the advantages and disadvantages of using the 16S rRNA gene as a phylogenetic marker and the latest findings on the impact of primer pair selection on diversity and taxonomic assignment outcomes in oral microbiome studies. Strategies for primer selection based on these results are introduced. Second, we identified the key factors to consider in selecting the sequencing technology and platform. The process and particularities of the main steps for processing 16S rRNA gene-derived data are described in detail to enable researchers to choose the most appropriate bioinformatics pipeline and analysis methods based on the available evidence. We then produce an overview of the different types of advanced analyses, both the most widely used in the literature and the most recent approaches. Several indices, metrics and software for studying microbial communities are included, highlighting their advantages and disadvantages. Considering the principles of clinical metagenomics, we conclude that future research should focus on rigorous analytical approaches, such as developing predictive models to identify microbiome-based biomarkers to classify health and disease states. Finally, we address the batch effect concept and the microbiome-specific methods for accounting for or correcting them.
Collapse
Affiliation(s)
- Alba Regueira-Iglesias
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, A Coruña, Spain
| | - Carlos Balsa-Castro
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, A Coruña, Spain
| | - Triana Blanco-Pintos
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, A Coruña, Spain
| | - Inmaculada Tomás
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, A Coruña, Spain
| |
Collapse
|
35
|
Alchahin AM, Tsea I, Baryawno N. Recent Advances in Single-Cell RNA-Sequencing of Primary and Metastatic Clear Cell Renal Cell Carcinoma. Cancers (Basel) 2023; 15:4734. [PMID: 37835428 PMCID: PMC10571653 DOI: 10.3390/cancers15194734] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 09/22/2023] [Accepted: 09/22/2023] [Indexed: 10/15/2023] Open
Abstract
Over the past two decades, significant progress has been made in the treatment of clear cell renal cell carcinoma (ccRCC), with a shift towards adopting new treatment approaches ranging from monotherapy to triple-combination therapy. This progress has been spearheaded by fundamental technological advancements that have allowed a deeper understanding of the various biological components of this cancer. In particular, the rapid commercialization of transcriptomics technologies, such as single-cell RNA-sequencing (scRNA-seq) methodologies, has played a crucial role in accelerating this understanding. Through precise measurements facilitated by these technologies, the research community has successfully identified and characterized diverse tumor, immune, and stromal cell populations, uncovering their interactions and pathways involved in disease progression. In localized ccRCC, patients have shown impressive response rates to treatment. However, despite the emerging findings and new knowledge provided in the field, there are still patients that do not respond to treatment, especially in advanced disease stages. One of the key challenges lies in the limited study of ccRCC metastases compared to localized cases. This knowledge gap may contribute to the relatively low survival rates and response rates observed in patients with metastatic ccRCC. To bridge this gap, we here delve into recent research utilizing scRNA-seq technologies in both primary and metastatic ccRCC. The goal of this review is to shed light on the current state of knowledge in the field, present existing treatment options, and emphasize the crucial steps needed to improve survival rates, particularly in cases of metastatic ccRCC.
Collapse
Affiliation(s)
| | | | - Ninib Baryawno
- Childhood Cancer Research Unit, Department of Women’s and Children’s Health, Karolinska Institutet, 10000-19999 Stockholm, Sweden; (A.M.A.); (I.T.)
| |
Collapse
|
36
|
Zhang Z, Mathew D, Lim T, Mason K, Martinez CM, Huang S, Wherry EJ, Susztak K, Minn AJ, Ma Z, Zhang NR. Signal recovery in single cell batch integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539614. [PMID: 37215021 PMCID: PMC10197537 DOI: 10.1101/2023.05.05.539614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork: Cells across batches should be aligned to be "appropriately" mixed, while preserving "main cell type clusters". We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a "pool-of-controls" design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered. These concepts are illustrated on studies of diverse designs, where the biological signals that are recovered by CellANOVA are shown to be validated by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nuclei data integration, where the recovered biological signals are replicated in an independent study.
Collapse
Affiliation(s)
- Zhaojun Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, PA, United States
| | - Divij Mathew
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Tristan Lim
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Kaishu Mason
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, PA, United States
| | - Clara Morral Martinez
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Mark Foundation Center for Immunotherapy, Immune Signaling, and Radiation, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Sijia Huang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - E John Wherry
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Mark Foundation Center for Immunotherapy, Immune Signaling, and Radiation, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Katalin Susztak
- Renal, Electrolyte, and Hypertension Division, Department of Medicine, University of Pennsylvania, Perelman School of Medicine, PA, United States
- Institute for Diabetes, Obesity, and Metabolism, University of Pennsylvania, PA, United States
- Department of Genetics, University of Pennsylvania, PA, United States
| | - Andy J Minn
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Mark Foundation Center for Immunotherapy, Immune Signaling, and Radiation, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Zongming Ma
- Department of Statistics and Data Science, Yale University, CT, United States
| | - Nancy R Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, PA, United States
| |
Collapse
|
37
|
Papoutsoglou G, Tarazona S, Lopes MB, Klammsteiner T, Ibrahimi E, Eckenberger J, Novielli P, Tonda A, Simeon A, Shigdel R, Béreux S, Vitali G, Tangaro S, Lahti L, Temko A, Claesson MJ, Berland M. Machine learning approaches in microbiome research: challenges and best practices. Front Microbiol 2023; 14:1261889. [PMID: 37808286 PMCID: PMC10556866 DOI: 10.3389/fmicb.2023.1261889] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 09/04/2023] [Indexed: 10/10/2023] Open
Abstract
Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.
Collapse
Affiliation(s)
- Georgios Papoutsoglou
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, Heraklion, Greece
| | - Sonia Tarazona
- Department of Applied Statistics and Operations Research and Quality, Polytechnic University of Valencia, Valencia, Spain
| | - Marta B. Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Research and Development Unit for Mechanical and Industrial Engineering (UNIDEMI), Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| | - Thomas Klammsteiner
- Department of Ecology, Universität Innsbruck, Innsbruck, Austria
- Department of Microbiology, Universität Innsbruck, Innsbruck, Austria
| | - Eliana Ibrahimi
- Department of Biology, University of Tirana, Tirana, Albania
| | - Julia Eckenberger
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Pierfrancesco Novielli
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Alberto Tonda
- UMR 518 MIA-PS, INRAE, Paris-Saclay University, Palaiseau, France
- Complex Systems Institute of Paris Ile-de-France (ISC-PIF) - UAR 3611 CNRS, Paris, France
| | - Andrea Simeon
- BioSense Institute, University of Novi Sad, Novi Sad, Serbia
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Stéphane Béreux
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
- MaIAGE, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Giacomo Vitali
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Sabina Tangaro
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Andriy Temko
- Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland
| | - Marcus J. Claesson
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Magali Berland
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| |
Collapse
|
38
|
Wang P, Paquet ÉR, Robert C. Comprehensive transcriptomic analysis of long non-coding RNAs in bovine ovarian follicles and early embryos. PLoS One 2023; 18:e0291761. [PMID: 37725621 PMCID: PMC10508637 DOI: 10.1371/journal.pone.0291761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 09/05/2023] [Indexed: 09/21/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) have been the subject of numerous studies over the past decade. First thought to come from aberrant transcriptional events, lncRNAs are now considered a crucial component of the genome with roles in multiple cellular functions. However, the functional annotation and characterization of bovine lncRNAs during early development remain limited. In this comprehensive analysis, we review lncRNAs expression in bovine ovarian follicles and early embryos, based on a unique database comprising 468 microarray hybridizations from a single platform designed to target 7,724 lncRNA transcripts, of which 5,272 are intergenic (lincRNA), 958 are intronic, and 1,524 are antisense (lncNAT). Compared to translated mRNA, lncRNAs have been shown to be more tissue-specific and expressed in low copy numbers. This analysis revealed that protein-coding genes and lncRNAs are both expressed more in oocytes. Differences between the oocyte and the 2-cell embryo are also more apparent in terms of lncRNAs than mRNAs. Co-expression network analysis using WGCNA generated 25 modules with differing proportions of lncRNAs. The modules exhibiting a higher proportion of lncRNAs were found to be associated with fewer annotated mRNAs and housekeeping functions. Functional annotation of co-expressed mRNAs allowed attribution of lncRNAs to a wide array of key cellular events such as meiosis, translation initiation, immune response, and mitochondrial related functions. We thus provide evidence that lncRNAs play diverse physiological roles that are tissue-specific and associated with key cellular functions alongside mRNAs in bovine ovarian follicles and early embryos. This contributes to add lncRNAs as active molecules in the complex regulatory networks driving folliculogenesis, oogenesis and early embryogenesis all of which are necessary for reproductive success.
Collapse
Affiliation(s)
- Pengmin Wang
- Département des sciences animales, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec City, Québec, Canada
| | - Éric R. Paquet
- Département des sciences animales, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec City, Québec, Canada
| | - Claude Robert
- Département des sciences animales, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec City, Québec, Canada
| |
Collapse
|
39
|
Edwards JM, Andrews MC, Burridge H, Smith R, Owens C, Edinger M, Pilkington K, Desfrancois J, Shackleton M, Senthi S, van Zelm MC. Design, optimisation and standardisation of a high-dimensional spectral flow cytometry workflow assessing T-cell immunophenotype in patients with melanoma. Clin Transl Immunology 2023; 12:e1466. [PMID: 37692904 PMCID: PMC10484688 DOI: 10.1002/cti2.1466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/26/2023] [Accepted: 08/18/2023] [Indexed: 09/12/2023] Open
Abstract
Objectives Despite the success of immune checkpoint blockade, most metastatic melanoma patients fail to respond to therapy or experience severe toxicity. Assessment of biomarkers and immunophenotypes before or early into treatment will help to understand favourable responses and improve therapeutic outcomes. Methods We present a high-dimensional approach for blood T-cell profiling using three multi-parameter cytometry panels: (1) a TruCount panel for absolute cell counts, (2) a 27-colour spectral panel assessing T-cell markers and (3) a 20-colour spectral panel evaluating intracellular cytokine expression. Pre-treatment blood mononuclear cells from patients and healthy controls were cryopreserved before staining across 11 batches. Batch effects were tracked using a single-donor control and the suitability of normalisation was assessed. The data were analysed using manual gating and high-dimensional strategies. Results Batch-to-batch variation was minimal, as demonstrated by the dimensionality reduction of batch-control samples, and normalisation did not improve manual or high-dimensional analysis. Application of the workflow demonstrated the capacity of the panels and showed that patients had fewer lymphocytes than controls (P = 0.0027), due to lower naive CD4+ (P = 0.015) and CD8+ (P = 0.011) T cells and follicular helper T cells (P = 0.00076). Patients showed trends for higher proportions of Ki67 and IL-2-expressing cells within CD4+ and CD8+ memory subsets, and increased CD57 and EOMES expression within TCRγδ+ T cells. Conclusion Our optimised high-parameter spectral cytometry approach provided in-depth profiling of blood T cells and found differences in patient immunophenotype at baseline. The robustness of our workflow, as demonstrated by minimal batch effects, makes this approach highly suitable for the longitudinal evaluation of immunotherapy effects.
Collapse
Affiliation(s)
- Jack M Edwards
- Alfred Health Radiation OncologyThe Alfred HospitalMelbourneVICAustralia
- Department of Immunology, Central Clinical SchoolMonash University and Alfred HospitalMelbourneVICAustralia
| | - Miles C Andrews
- Department of Medicine, Central Clinical SchoolMonash UniversityMelbourneVICAustralia
- Department of Medical OncologyThe Alfred HospitalMelbourneVICAustralia
| | - Hayley Burridge
- Department of Medical OncologyThe Alfred HospitalMelbourneVICAustralia
| | - Robin Smith
- Alfred Health Radiation OncologyThe Alfred HospitalMelbourneVICAustralia
| | - Carole Owens
- Alfred Health Radiation OncologyThe Alfred HospitalMelbourneVICAustralia
| | | | | | | | - Mark Shackleton
- Department of Medicine, Central Clinical SchoolMonash UniversityMelbourneVICAustralia
- Department of Medical OncologyThe Alfred HospitalMelbourneVICAustralia
| | - Sashendra Senthi
- Alfred Health Radiation OncologyThe Alfred HospitalMelbourneVICAustralia
| | - Menno C van Zelm
- Department of Immunology, Central Clinical SchoolMonash University and Alfred HospitalMelbourneVICAustralia
| |
Collapse
|
40
|
Yu Y, Zhang N, Mai Y, Ren L, Chen Q, Cao Z, Chen Q, Liu Y, Hou W, Yang J, Hong H, Xu J, Tong W, Dong L, Shi L, Fang X, Zheng Y. Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method. Genome Biol 2023; 24:201. [PMID: 37674217 PMCID: PMC10483871 DOI: 10.1186/s13059-023-03047-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 05/18/2023] [Indexed: 09/08/2023] Open
Abstract
BACKGROUND Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. RESULTS As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. CONCLUSIONS Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | | | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
41
|
Goh WWB, Hui HWH, Wong L. How missing value imputation is confounded with batch effects and what you can do about it. Drug Discov Today 2023; 28:103661. [PMID: 37301250 DOI: 10.1016/j.drudis.2023.103661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/31/2023] [Accepted: 06/05/2023] [Indexed: 06/12/2023]
Abstract
In data-processing pipelines, upstream steps can influence downstream processes because of their sequential nature. Among these data-processing steps, batch effect (BE) correction (BEC) and missing value imputation (MVI) are crucial for ensuring data suitability for advanced modeling and reducing the likelihood of false discoveries. Although BEC-MVI interactions are not well studied, they are ultimately interdependent. Batch sensitization can improve the quality of MVI. Conversely, accounting for missingness also improves proper BE estimation in BEC. Here, we discuss how BEC and MVI are interconnected and interdependent. We show how batch sensitization can improve any MVI and bring attention to the idea of BE-associated missing values (BEAMs). Finally, we discuss how batch-class imbalance problems can be mitigated by borrowing ideas from machine learning.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; School of Biological Sciences, Nanyang Technological University, Singapore; Center for Biomedical Informatics, Nanyang Technological University, Singapore.
| | - Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; School of Biological Sciences, Nanyang Technological University, Singapore
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore; Department of Pathology, National University of Singapore, Singapore.
| |
Collapse
|
42
|
Chen C, Wang J, Pan D, Wang X, Xu Y, Yan J, Wang L, Yang X, Yang M, Liu G. Applications of multi-omics analysis in human diseases. MedComm (Beijing) 2023; 4:e315. [PMID: 37533767 PMCID: PMC10390758 DOI: 10.1002/mco2.315] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/25/2023] [Accepted: 05/31/2023] [Indexed: 08/04/2023] Open
Abstract
Multi-omics usually refers to the crossover application of multiple high-throughput screening technologies represented by genomics, transcriptomics, single-cell transcriptomics, proteomics and metabolomics, spatial transcriptomics, and so on, which play a great role in promoting the study of human diseases. Most of the current reviews focus on describing the development of multi-omics technologies, data integration, and application to a particular disease; however, few of them provide a comprehensive and systematic introduction of multi-omics. This review outlines the existing technical categories of multi-omics, cautions for experimental design, focuses on the integrated analysis methods of multi-omics, especially the approach of machine learning and deep learning in multi-omics data integration and the corresponding tools, and the application of multi-omics in medical researches (e.g., cancer, neurodegenerative diseases, aging, and drug target discovery) as well as the corresponding open-source analysis tools and databases, and finally, discusses the challenges and future directions of multi-omics integration and application in precision medicine. With the development of high-throughput technologies and data integration algorithms, as important directions of multi-omics for future disease research, single-cell multi-omics and spatial multi-omics also provided a detailed introduction. This review will provide important guidance for researchers, especially who are just entering into multi-omics medical research.
Collapse
Affiliation(s)
- Chongyang Chen
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
| | - Jing Wang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Donghui Pan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xinyu Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Yuping Xu
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Junjie Yan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Lizhen Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xifei Yang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Min Yang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Gong‐Ping Liu
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
- Department of PathophysiologySchool of Basic MedicineKey Laboratory of Ministry of Education of China and Hubei Province for Neurological DisordersTongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
43
|
Ye H, Zhang X, Wang C, Goode EL, Chen J. Batch-effect correction with sample remeasurement in highly confounded case-control studies. NATURE COMPUTATIONAL SCIENCE 2023; 3:709-719. [PMID: 38177326 PMCID: PMC10993308 DOI: 10.1038/s43588-023-00500-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 07/11/2023] [Indexed: 01/06/2024]
Abstract
Batch effects are pervasive in biomedical studies. One approach to address the batch effects is repeatedly measuring a subset of samples in each batch. These remeasured samples are used to estimate and correct the batch effects. However, rigorous statistical methods for batch-effect correction with remeasured samples are severely underdeveloped. Here we developed a framework for batch-effect correction using remeasured samples in highly confounded case-control studies. We provided theoretical analyses of the proposed procedure, evaluated its power characteristics and provided a power calculation tool to aid in the study design. We found that the number of samples that need to be remeasured depends strongly on the between-batch correlation. When the correlation is high, remeasuring a small subset of samples is possible to rescue most of the power.
Collapse
Affiliation(s)
- Hanxuan Ye
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University, College Station, TX, USA.
| | - Chen Wang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Ellen L Goode
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Jun Chen
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
44
|
Zhong L, Bacher R. Leveraging remeasured samples in biomedical studies. NATURE COMPUTATIONAL SCIENCE 2023; 3:669-670. [PMID: 38177325 DOI: 10.1038/s43588-023-00491-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Luer Zhong
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| | - Rhonda Bacher
- Department of Biostatistics, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
45
|
Shapiro JA, Gaonkar KS, Spielman SJ, Savonen CL, Bethell CJ, Jin R, Rathi KS, Zhu Y, Egolf LE, Farrow BK, Miller DP, Yang Y, Koganti T, Noureen N, Koptyra MP, Duong N, Santi M, Kim J, Robins S, Storm PB, Mack SC, Lilly JV, Xie HM, Jain P, Raman P, Rood BR, Lulla RR, Nazarian J, Kraya AA, Vaksman Z, Heath AP, Kline C, Scolaro L, Viaene AN, Huang X, Way GP, Foltz SM, Zhang B, Poetsch AR, Mueller S, Ennis BM, Prados M, Diskin SJ, Zheng S, Guo Y, Kannan S, Waanders AJ, Margol AS, Kim MC, Hanson D, Van Kuren N, Wong J, Kaufman RS, Coleman N, Blackden C, Cole KA, Mason JL, Madsen PJ, Koschmann CJ, Stewart DR, Wafula E, Brown MA, Resnick AC, Greene CS, Rokita JL, Taroni JN. OpenPBTA: The Open Pediatric Brain Tumor Atlas. CELL GENOMICS 2023; 3:100340. [PMID: 37492101 PMCID: PMC10363844 DOI: 10.1016/j.xgen.2023.100340] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 02/28/2023] [Accepted: 05/04/2023] [Indexed: 07/27/2023]
Abstract
Pediatric brain and spinal cancers are collectively the leading disease-related cause of death in children; thus, we urgently need curative therapeutic strategies for these tumors. To accelerate such discoveries, the Children's Brain Tumor Network (CBTN) and Pacific Pediatric Neuro-Oncology Consortium (PNOC) created a systematic process for tumor biobanking, model generation, and sequencing with immediate access to harmonized data. We leverage these data to establish OpenPBTA, an open collaborative project with over 40 scalable analysis modules that genomically characterize 1,074 pediatric brain tumors. Transcriptomic classification reveals universal TP53 dysregulation in mismatch repair-deficient hypermutant high-grade gliomas and TP53 loss as a significant marker for poor overall survival in ependymomas and H3 K28-mutant diffuse midline gliomas. Already being actively applied to other pediatric cancers and PNOC molecular tumor board decision-making, OpenPBTA is an invaluable resource to the pediatric oncology community.
Collapse
Affiliation(s)
- Joshua A. Shapiro
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
| | - Krutika S. Gaonkar
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Stephanie J. Spielman
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Rowan University, Glassboro, NJ 08028, USA
| | - Candace L. Savonen
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
| | - Chante J. Bethell
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
| | - Run Jin
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Komal S. Rathi
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yuankun Zhu
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Laura E. Egolf
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Bailey K. Farrow
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Daniel P. Miller
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yang Yang
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL 60637, USA
| | - Tejaswi Koganti
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Nighat Noureen
- Greehey Children’s Cancer Research Institute, UT Health San Antonio, San Antonio, TX 78229, USA
| | - Mateusz P. Koptyra
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Nhat Duong
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Mariarita Santi
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Jung Kim
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA
| | - Shannon Robins
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Phillip B. Storm
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Stephen C. Mack
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Jena V. Lilly
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Hongbo M. Xie
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Payal Jain
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Pichai Raman
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Brian R. Rood
- Children’s National Research Institute, Washington, DC 20012, USA
- George Washington University School of Medicine and Health Sciences, Washington, DC 20052, USA
| | - Rishi R. Lulla
- Division of Hematology/Oncology, Hasbro Children’s Hospital, Providence, RI 02903, USA
- Department of Pediatrics, The Warren Alpert School of Brown University, Providence, RI 02912, USA
| | - Javad Nazarian
- Children’s National Research Institute, Washington, DC 20012, USA
- George Washington University School of Medicine and Health Sciences, Washington, DC 20052, USA
- Department of Pediatrics, University of Zurich, Zurich, Switzerland
| | - Adam A. Kraya
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Zalman Vaksman
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Allison P. Heath
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Cassie Kline
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Laura Scolaro
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Angela N. Viaene
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Xiaoyan Huang
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Gregory P. Way
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Steven M. Foltz
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Bo Zhang
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Anna R. Poetsch
- Biotechnology Center, Technical University Dresden, Dresden, Germany
- National Center for Tumor Diseases, Dresden, Germany
| | - Sabine Mueller
- Department of Neurology, Neurosurgery and Pediatrics, University of California, San Francisco, San Francisco, CA 94115, USA
| | - Brian M. Ennis
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Michael Prados
- University of California, San Francisco, San Francisco, CA 94115, USA
| | - Sharon J. Diskin
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Siyuan Zheng
- Greehey Children’s Cancer Research Institute, UT Health San Antonio, San Antonio, TX 78229, USA
| | - Yiran Guo
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Shrivats Kannan
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Angela J. Waanders
- Division of Hematology, Oncology, Neuro-Oncology, and Stem Cell Transplant, Ann & Robert H Lurie Children’s Hospital of Chicago, Chicago, IL 60611, USA
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Ashley S. Margol
- Division of Hematology and Oncology, Children’s Hospital of Los Angeles, Los Angeles, CA 90027, USA
- Department of Pediatrics, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
| | - Meen Chul Kim
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Derek Hanson
- Hackensack Meridian School of Medicine, Nutley, NJ 07110, USA
- Hackensack University Medical Center, Hackensack, NJ 07601, USA
| | - Nicholas Van Kuren
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jessica Wong
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Rebecca S. Kaufman
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Noel Coleman
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Christopher Blackden
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kristina A. Cole
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Abramson Family Cancer Research Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jennifer L. Mason
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Peter J. Madsen
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Carl J. Koschmann
- Department of Pediatrics, University of Michigan Health, Ann Arbor, MI 48105, USA
- Pediatric Hematology Oncology, Mott Children’s Hospital, Ann Arbor, MI 48109, USA
| | - Douglas R. Stewart
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA
| | - Eric Wafula
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Miguel A. Brown
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Adam C. Resnick
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Casey S. Greene
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jo Lynne Rokita
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jaclyn N. Taroni
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
| | - Children’s Brain Tumor Network
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Rowan University, Glassboro, NJ 08028, USA
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL 60637, USA
- Greehey Children’s Cancer Research Institute, UT Health San Antonio, San Antonio, TX 78229, USA
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Children’s National Research Institute, Washington, DC 20012, USA
- George Washington University School of Medicine and Health Sciences, Washington, DC 20052, USA
- Division of Hematology/Oncology, Hasbro Children’s Hospital, Providence, RI 02903, USA
- Department of Pediatrics, The Warren Alpert School of Brown University, Providence, RI 02912, USA
- Department of Pediatrics, University of Zurich, Zurich, Switzerland
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Biotechnology Center, Technical University Dresden, Dresden, Germany
- National Center for Tumor Diseases, Dresden, Germany
- Department of Neurology, Neurosurgery and Pediatrics, University of California, San Francisco, San Francisco, CA 94115, USA
- University of California, San Francisco, San Francisco, CA 94115, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Hematology, Oncology, Neuro-Oncology, and Stem Cell Transplant, Ann & Robert H Lurie Children’s Hospital of Chicago, Chicago, IL 60611, USA
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
- Division of Hematology and Oncology, Children’s Hospital of Los Angeles, Los Angeles, CA 90027, USA
- Department of Pediatrics, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
- Hackensack Meridian School of Medicine, Nutley, NJ 07110, USA
- Hackensack University Medical Center, Hackensack, NJ 07601, USA
- Abramson Family Cancer Research Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Michigan Health, Ann Arbor, MI 48105, USA
- Pediatric Hematology Oncology, Mott Children’s Hospital, Ann Arbor, MI 48109, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Pacific Pediatric Neuro-Oncology Consortium
- Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA 19004, USA
- Center for Data-Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurosurgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Bioinformatics and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Rowan University, Glassboro, NJ 08028, USA
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Oncology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL 60637, USA
- Greehey Children’s Cancer Research Institute, UT Health San Antonio, San Antonio, TX 78229, USA
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
- Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Children’s National Research Institute, Washington, DC 20012, USA
- George Washington University School of Medicine and Health Sciences, Washington, DC 20052, USA
- Division of Hematology/Oncology, Hasbro Children’s Hospital, Providence, RI 02903, USA
- Department of Pediatrics, The Warren Alpert School of Brown University, Providence, RI 02912, USA
- Department of Pediatrics, University of Zurich, Zurich, Switzerland
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Biotechnology Center, Technical University Dresden, Dresden, Germany
- National Center for Tumor Diseases, Dresden, Germany
- Department of Neurology, Neurosurgery and Pediatrics, University of California, San Francisco, San Francisco, CA 94115, USA
- University of California, San Francisco, San Francisco, CA 94115, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Hematology, Oncology, Neuro-Oncology, and Stem Cell Transplant, Ann & Robert H Lurie Children’s Hospital of Chicago, Chicago, IL 60611, USA
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
- Division of Hematology and Oncology, Children’s Hospital of Los Angeles, Los Angeles, CA 90027, USA
- Department of Pediatrics, Keck School of Medicine of University of Southern California, Los Angeles, CA 90033, USA
- Hackensack Meridian School of Medicine, Nutley, NJ 07110, USA
- Hackensack University Medical Center, Hackensack, NJ 07601, USA
- Abramson Family Cancer Research Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Michigan Health, Ann Arbor, MI 48105, USA
- Pediatric Hematology Oncology, Mott Children’s Hospital, Ann Arbor, MI 48109, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
46
|
Zhao Y, Wang X, Sun T, Shan P, Zhan Z, Zhao Z, Jiang Y, Qu M, Lv Q, Wang Y, Liu P, Chen S. Artificial intelligence-driven electrochemical immunosensing biochips in multi-component detection. BIOMICROFLUIDICS 2023; 17:041301. [PMID: 37614678 PMCID: PMC10444200 DOI: 10.1063/5.0160808] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 08/01/2023] [Indexed: 08/25/2023]
Abstract
Electrochemical Immunosensing (EI) combines electrochemical analysis and immunology principles and is characterized by its simplicity, rapid detection, high sensitivity, and specificity. EI has become an important approach in various fields, such as clinical diagnosis, disease prevention and treatment, environmental monitoring, and food safety. However, EI multi-component detection still faces two major bottlenecks: first, the lack of cost-effective and portable detection platforms; second, the difficulty in eliminating batch differences and accurately decoupling signals from multiple analytes. With the gradual maturation of biochip technology, high-throughput analysis and portable detection utilizing the advantages of miniaturized chips, high sensitivity, and low cost have become possible. Meanwhile, Artificial Intelligence (AI) enables accurate decoupling of signals and enhances the sensitivity and specificity of multi-component detection. We believe that by evaluating and analyzing the characteristics, benefits, and linkages of EI, biochip, and AI technologies, we may considerably accelerate the development of EI multi-component detection. Therefore, we propose three specific prospects: first, AI can enhance and optimize the performance of the EI biochips, addressing the issue of multi-component detection for portable platforms. Second, the AI-enhanced EI biochips can be widely applied in home care, medical healthcare, and other areas. Third, the cross-fusion and innovation of EI, biochip, and AI technologies will effectively solve key bottlenecks in biochip detection, promoting interdisciplinary development. However, challenges may arise from AI algorithms that are difficult to explain and limited data access. Nevertheless, we believe that with technological advances and further research, there will be more methods and technologies to overcome these challenges.
Collapse
Affiliation(s)
- Yuliang Zhao
- School of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066000, Hebei, China
| | - Xiaoai Wang
- School of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066000, Hebei, China
| | - Tingting Sun
- School of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066000, Hebei, China
| | - Peng Shan
- School of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066000, Hebei, China
| | - Zhikun Zhan
- School of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066000, Hebei, China
| | - Zhongpeng Zhao
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing 100071, China
| | - Yongqiang Jiang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing 100071, China
| | - Mingyue Qu
- The PLA Rocket Force Characteristic Medical Center, Beijing 100088, China
| | - Qingyu Lv
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing 100071, China
| | - Ying Wang
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Peng Liu
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing 100071, China
| | - Shaolong Chen
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing 100071, China
| |
Collapse
|
47
|
Maqueda JJ, Giovanazzi A, Rocha AM, Rocha S, Silva I, Saraiva N, Bonito N, Carvalho J, Maia L, Wauben MHM, Oliveira C. Adapter dimer contamination in sRNA-sequencing datasets predicts sequencing failure and batch effects and hampers extracellular vesicle-sRNA analysis. JOURNAL OF EXTRACELLULAR BIOLOGY 2023; 2:e91. [PMID: 38938917 PMCID: PMC11080836 DOI: 10.1002/jex2.91] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/21/2023] [Accepted: 05/10/2023] [Indexed: 06/29/2024]
Abstract
Small RNA (sRNA) profiling of Extracellular Vesicles (EVs) by Next-Generation Sequencing (NGS) often delivers poor outcomes, independently of reagents, platforms or pipelines used, which contributes to poor reproducibility of studies. Here we analysed pre/post-sequencing quality controls (QC) to predict issues potentially biasing biological sRNA-sequencing results from purified human milk EVs, human and mouse EV-enriched plasma and human paraffin-embedded tissues. Although different RNA isolation protocols and NGS platforms were used in these experiments, all datasets had samples characterized by a marked removal of reads after pre-processing. The extent of read loss between individual samples within a dataset did not correlate with isolated RNA quantity or sequenced base quality. Rather, cDNA electropherograms revealed the presence of a constant peak whose intensity correlated with the degree of read loss and, remarkably, with the percentage of adapter dimers, which were found to be overrepresented sequences in high read-loss samples. The analysis through a QC pipeline, which allowed us to monitor quality parameters in a step-by-step manner, provided compelling evidence that adapter dimer contamination was the main factor causing batch effects. We concluded this study by summarising peer-reviewed published workflows that perform consistently well in avoiding adapter dimer contamination towards a greater likelihood of sequencing success.
Collapse
Affiliation(s)
- Joaquín J. Maqueda
- BIOINF2BIO, LDAPortoPortugal
- i3S – Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- Ipatimup – Institute of Molecular Pathology and Immunology of the University of PortoPortoPortugal
| | - Alberta Giovanazzi
- Department of Biomolecular Health SciencesFaculty of Veterinary Medicine Utrecht UniversityUtrechtThe Netherlands
| | - Ana Mafalda Rocha
- i3S – Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- Ipatimup – Institute of Molecular Pathology and Immunology of the University of PortoPortoPortugal
| | - Sara Rocha
- i3S – Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- Ipatimup – Institute of Molecular Pathology and Immunology of the University of PortoPortoPortugal
| | - Isabel Silva
- i3S – Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- IBMC ‐ Instituto de Biologia Molecular e CelularUniversity of PortoPortoPortugal
| | - Nadine Saraiva
- IPOC – Instituto Português de Oncologia Francisco GentilCoimbraPortugal
| | - Nuno Bonito
- IPOC – Instituto Português de Oncologia Francisco GentilCoimbraPortugal
| | - Joana Carvalho
- i3S – Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- Ipatimup – Institute of Molecular Pathology and Immunology of the University of PortoPortoPortugal
| | - Luis Maia
- i3S – Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- ICBAS‐UP ‐ Instituto de Ciências Biomédicas Abel SalazarUniversity of PortoPortoPortugal
- CHUPorto – Department of NeurologyCentro Hospitalar Universitário do PortoPortoPortugal
| | - Marca H. M. Wauben
- Department of Biomolecular Health SciencesFaculty of Veterinary Medicine Utrecht UniversityUtrechtThe Netherlands
| | - Carla Oliveira
- BIOINF2BIO, LDAPortoPortugal
- i3S – Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
- Ipatimup – Institute of Molecular Pathology and Immunology of the University of PortoPortoPortugal
- FMUP – Faculty of MedicineUniversity of PortoPortoPortugal
| |
Collapse
|
48
|
Maxwell CB, Sandhu JK, Cao TH, McCann GP, Ng LL, Jones DJL. The Edge Effect in High-Throughput Proteomics: A Cautionary Tale. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2023. [PMID: 37155737 DOI: 10.1021/jasms.3c00035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
In order for mass spectrometry to continue to grow as a platform for high-throughput clinical and translational research, careful consideration must be given to quality control by ensuring that the assay performs reproducibly and accurately and precisely. In particular, the throughput required for large cohort clinical validation in biomarker discovery and diagnostic screening has driven the growth of multiplexed targeted liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) assays paired with sample preparation and analysis in multiwell plates. However, large scale MS-based proteomics studies are often plagued by batch effects: sources of technical variation in the data, which can arise from a diverse array of sources such as sample preparation batches, different reagent lots, or indeed MS signal drift. These batch effects can confound the detection of true signal differences, resulting in incorrect conclusions being drawn about significant biological effects or lack thereof. Here, we present an intraplate batch effect termed the edge effect arising from temperature gradients in multiwell plates, commonly reported in preclinical cell culture studies but not yet reported in a clinical proteomics setting. We present methods herein to ameliorate the phenomenon including proper assessment of heating techniques for multiwell plates and incorporation of surrogate standards, which can normalize for intraplate variation.
Collapse
Affiliation(s)
- Colleen B Maxwell
- The Leicester van Geest MultiOmics Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Jatinderpal K Sandhu
- The Leicester van Geest MultiOmics Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Thong H Cao
- The Leicester van Geest MultiOmics Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Gerry P McCann
- Department of Cardiovascular Sciences and NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Leong L Ng
- The Leicester van Geest MultiOmics Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
| | - Donald J L Jones
- The Leicester van Geest MultiOmics Facility, Hodgkin Building, University of Leicester, Leicester LE1 9HN, United Kingdom
- Department of Cardiovascular Sciences and NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, United Kingdom
- Leicester Cancer Research Centre, RKCSB, University of Leicester, Leicester LE2 7LX, United Kingdom
| |
Collapse
|
49
|
Olbrich M, Künstner A, Busch H. MBECS: Microbiome Batch Effects Correction Suite. BMC Bioinformatics 2023; 24:182. [PMID: 37138207 PMCID: PMC10155362 DOI: 10.1186/s12859-023-05252-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 03/20/2023] [Indexed: 05/05/2023] Open
Abstract
Despite the availability of batch effect correcting algorithms (BECA), no comprehensive tool that combines batch correction and evaluation of the results exists for microbiome datasets. This work outlines the Microbiome Batch Effects Correction Suite development that integrates several BECAs and evaluation metrics into a software package for the statistical computation framework R.
Collapse
Affiliation(s)
- Michael Olbrich
- Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany.
- Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany.
- Center for Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates.
| | - Axel Künstner
- Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany
- Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany
| | - Hauke Busch
- Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
50
|
Guo F, Lin G, Dong L, Cheng KK, Deng L, Xu X, Raftery D, Dong J. Concordance-Based Batch Effect Correction for Large-Scale Metabolomics. Anal Chem 2023; 95:7220-7228. [PMID: 37115661 DOI: 10.1021/acs.analchem.2c05748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
For a large-scale metabolomics study, sample collection, preparation, and analysis may last several days, months, or even (intermittently) over years. This may lead to apparent batch effects in the acquired metabolomics data due to variability in instrument status, environmental conditions, or experimental operators. Batch effects may confound the true biological relationships among metabolites and thus obscure real metabolic changes. At present, most of the commonly used batch effect correction (BEC) methods are based on quality control (QC) samples, which require sufficient and stable QC samples. However, the quality of the QC samples may deteriorate if the experiment lasts for a long time. Alternatively, isotope-labeled internal standards have been used, but they generally do not provide good coverage of the metabolome. On the other hand, BEC can also be conducted through a data-driven method, in which no QC sample is needed. Here, we propose a novel data-driven BEC method, namely, CordBat, to achieve concordance between each batch of samples. In the proposed CordBat method, a reference batch is first selected from all batches of data, and the remaining batches are referred to as "other batches." The reference batch serves as the baseline for the batch adjustment by providing a coordinate of correlation between metabolites. Next, a Gaussian graphical model is built on the combined dataset of reference and other batches, and finally, BEC is achieved by optimizing the correction coefficients in the other batches so that the correlation between metabolites of each batch and their combinations are in concordance with that of the reference batch. Three real-world metabolomics datasets are used to evaluate the performance of CordBat by comparing it with five commonly used BEC methods. The present experimental results showed the effectiveness of CordBat in batch effect removal and the concordance of correlation between metabolites after BEC. CordBat was found to be comparable to the QC-based methods and achieved better performance in the preservation of biological effects. The proposed CordBat method may serve as an alternative BEC method for large-scale metabolomics that lack proper QC samples.
Collapse
Affiliation(s)
- Fanjing Guo
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Genjin Lin
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Liheng Dong
- School of Computer Science and Technology, Xiamen University Malaysia, Sepang 43600, Malaysia
| | - Kian-Kai Cheng
- Faculty of Chemical and Energy Engineering, Universiti Teknologi Malaysia, Johor 81310, Malaysia
| | - Lingli Deng
- Department of Information Engineering, East China University of Technology, Nanchang 330013, China
| | - Xiangnan Xu
- School of Mathematics and Statistics, The University of Sydney, Sydney, New South Wales 2006, Australia
| | - Daniel Raftery
- Northwest Metabolomics Research Center, University of Washington, Seattle, Washington 98109, United States
| | - Jiyang Dong
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|