1
|
Zhang X, Zhang P, Ren Q, Li J, Lin H, Huang Y, Wang W. Integrative multi-omic and machine learning approach for prognostic stratification and therapeutic targeting in lung squamous cell carcinoma. Biofactors 2024. [PMID: 39391958 DOI: 10.1002/biof.2128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 09/25/2024] [Indexed: 10/12/2024]
Abstract
The proliferation, metastasis, and drug resistance of cancer cells pose significant challenges to the treatment of lung squamous cell carcinoma (LUSC). However, there is a lack of optimal predictive models that can accurately forecast patient prognosis and guide the selection of targeted therapies. The extensive multi-omic data obtained from multi-level molecular biology provides a unique perspective for understanding the underlying biological characteristics of cancer, offering potential prognostic indicators and drug sensitivity biomarkers for LUSC patients. We integrated diverse datasets encompassing gene expression, DNA methylation, genomic mutations, and clinical data from LUSC patients to achieve consensus clustering using a suite of 10 multi-omics integration algorithms. Subsequently, we employed 10 commonly used machine learning algorithms, combining them into 101 unique configurations to design an optimal performing model. We then explored the characteristics of high- and low-risk LUSC patient groups in terms of the tumor microenvironment and response to immunotherapy, ultimately validating the functional roles of the model genes through in vitro experiments. Through the application of 10 clustering algorithms, we identified two prognostically relevant subtypes, with CS1 exhibiting a more favorable prognosis. We then constructed a subtype-specific machine learning model, LUSC multi-omics signature (LMS) based on seven key hub genes. Compared to previously published LUSC biomarkers, our LMS score demonstrated superior predictive performance. Patients with lower LMS scores had higher overall survival rates and better responses to immunotherapy. Notably, the high LMS group was more inclined toward "cold" tumors, characterized by immune suppression and exclusion, but drugs like dasatinib may represent promising therapeutic options for these patients. Notably, we also validated the model gene SERPINB13 through cell experiments, confirming its role as a potential oncogene influencing the progression of LUSC and as a promising therapeutic target. Our research provides new insights into refining the molecular classification of LUSC and further optimizing immunotherapy strategies.
Collapse
Affiliation(s)
- Xiao Zhang
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Pengpeng Zhang
- Department of Lung Cancer, Tianjin Lung Cancer Center, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin, China
| | - Qianhe Ren
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Jun Li
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Haoran Lin
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yuming Huang
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Wei Wang
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
2
|
Davis JT, Obermayer AN, Soupir AC, Hesterberg RS, Duong T, Yang CY, Dao KP, Manley BJ, Grass GD, Avram D, Rodriguez PC, Fridley BL, Yu X, Teng M, Wang X, Shaw TI. BatchFLEX: feature-level equalization of X-batch. Bioinformatics 2024; 40:btae587. [PMID: 39360977 PMCID: PMC11486499 DOI: 10.1093/bioinformatics/btae587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 08/15/2024] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open
Abstract
MOTIVATION Integrative analysis of heterogeneous expression data remains challenging due to variations in platform, RNA quality, sample processing, and other unknown technical effects. Selecting the approach for removing unwanted batch effects can be a time-consuming and tedious process, especially for more biologically focused investigators. RESULTS Here, we present BatchFLEX, a Shiny app that can facilitate visualization and correction of batch effects using several established methods. BatchFLEX can visualize the variance contribution of a factor before and after correction. As an example, we have analyzed ImmGen microarray data and enhanced its expression signals that distinguishes each immune cell type. Moreover, our analysis revealed the impact of the batch correction in altering the gene expression rank and single-sample GSEA pathway scores in immune cell types, highlighting the importance of real-time assessment of the batch correction for optimal downstream analysis. AVAILABILITY AND IMPLEMENTATION Our tool is available through Github https://github.com/shawlab-moffitt/BATCH-FLEX-ShinyApp with an online example on Shiny.io https://shawlab-moffitt.shinyapps.io/batch_flex/.
Collapse
Affiliation(s)
- Joshua T Davis
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Alyssa N Obermayer
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Alex C Soupir
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Rebecca S Hesterberg
- Department of Tumor Microenvironment and Metastasis, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Thac Duong
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Ching-Yao Yang
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Ken Phong Dao
- Department of Malignant Hematology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Brandon J Manley
- Department of Genitourinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - G Daniel Grass
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Dorina Avram
- Department of Immunology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Paulo C Rodriguez
- Department of Immunology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
- Department of Malignant Hematology, Children’s Mercy, Kansas City, MO 64108, United States
| | - Xiaoqing Yu
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Mingxiang Teng
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Xuefeng Wang
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Timothy I Shaw
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| |
Collapse
|
3
|
Kanwal A, Zhang Z. Exploring common pathogenic association between Epstein Barr virus infection and long-COVID by integrating RNA-Seq and molecular dynamics simulations. Front Immunol 2024; 15:1435170. [PMID: 39391317 PMCID: PMC11464307 DOI: 10.3389/fimmu.2024.1435170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 08/27/2024] [Indexed: 10/12/2024] Open
Abstract
The term "Long-COVID" (LC) is characterized by the aftereffects of COVID-19 infection. Various studies have suggested that Epstein-Barr virus (EBV) reactivation is among the significant reported causes of LC. However, there is a lack of in-depth research that could largely explore the pathogenic mechanism and pinpoint the key genes in the EBV and LC context. This study mainly aimed to predict the potential disease-associated common genes between EBV reactivation and LC condition using next-generation sequencing (NGS) data and reported naturally occurring biomolecules as inhibitors. We applied the bulk RNA-Seq from LC and EBV-infected peripheral blood mononuclear cells (PBMCs), identified the differentially expressed genes (DEGs) and the Protein-Protein interaction (PPI) network using the STRING database, identified hub genes using the cytoscape plugins CytoHubba and MCODE, and performed enrichment analysis using ClueGO. The interaction analysis of a hub gene was performed against naturally occurring bioflavonoid molecules using molecular docking and the molecular dynamics (MD) simulation method. Out of 357 common genes, 22 genes (CCL2, CCL20, CDCA2, CEP55, CHI3L1, CKAP2L, DEPDC1, DIAPH3, DLGAP5, E2F8, FGF1, NEK2, PBK, TOP2A, CCL3, CXCL8, DEPDC1, IL6, RETN, MMP2, LCN2, and OLR1) were classified as hub genes, and the remaining ones were classified as neighboring genes. Enrichment analysis showed the role of hub genes in various pathways such as immune-signaling pathways, including JAK-STAT signaling, interleukin signaling, protein kinase signaling, and toll-like receptor pathways associated with the symptoms reported in the LC condition. ZNF and MYBL TF-family were predicted as abundant TFs controlling hub genes' transcriptional machinery. Furthermore, OLR1 (PDB: 7XMP) showed stable interactions with the five shortlisted refined naturally occurring bioflavonoids, i.e., apigenin, amentoflavone, ilexgenin A, myricetin, and orientin compounds. The total binding energy pattern was observed, with amentoflavone being the top docked molecule (with a binding affinity of -8.3 kcal/mol) with the lowest total binding energy of -18.48 kcal/mol. In conclusion, our research has predicted the hub genes, their molecular pathways, and the potential inhibitors between EBV and LC potential pathogenic association. The in vivo or in vitro experimental methods could be utilized to functionally validate our findings, which would be helpful to cure LC or to prevent EBV reactivation.
Collapse
Affiliation(s)
- Ayesha Kanwal
- MOE Key Laboratory for Cellular Dynamics and Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Zhiyong Zhang
- MOE Key Laboratory for Cellular Dynamics and Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
- Department of Physics, University of Science and Technology of China, Hefei, Anhui, China
| |
Collapse
|
4
|
Maciejewski K, Czerwinska P. Scoping Review: Methods and Applications of Spatial Transcriptomics in Tumor Research. Cancers (Basel) 2024; 16:3100. [PMID: 39272958 PMCID: PMC11394603 DOI: 10.3390/cancers16173100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/30/2024] [Accepted: 08/30/2024] [Indexed: 09/15/2024] Open
Abstract
Spatial transcriptomics (ST) examines gene expression within its spatial context on tissue, linking morphology and function. Advances in ST resolution and throughput have led to an increase in scientific interest, notably in cancer research. This scoping study reviews the challenges and practical applications of ST, summarizing current methods, trends, and data analysis techniques for ST in neoplasm research. We analyzed 41 articles published by the end of 2023 alongside public data repositories. The findings indicate cancer biology is an important focus of ST research, with a rising number of studies each year. Visium (10x Genomics, Pleasanton, CA, USA) is the leading ST platform, and SCTransform from Seurat R library is the preferred method for data normalization and integration. Many studies incorporate additional data types like single-cell sequencing and immunohistochemistry. Common ST applications include discovering the composition and function of tumor tissues in the context of their heterogeneity, characterizing the tumor microenvironment, or identifying interactions between cells, including spatial patterns of expression and co-occurrence. However, nearly half of the studies lacked comprehensive data processing protocols, hindering their reproducibility. By recommending greater transparency in sharing analysis methods and adapting single-cell analysis techniques with caution, this review aims to improve the reproducibility and reliability of future studies in cancer research.
Collapse
Affiliation(s)
- Kacper Maciejewski
- Undergraduate Research Group "Biobase", Poznan University of Medical Sciences, 61-701 Poznan, Poland
| | - Patrycja Czerwinska
- Undergraduate Research Group "Biobase", Poznan University of Medical Sciences, 61-701 Poznan, Poland
- Department of Cancer Immunology, Poznan University of Medical Sciences, 61-866 Poznan, Poland
- Department of Diagnostics and Cancer Immunology, Greater Poland Cancer Centre, 61-866 Poznan, Poland
| |
Collapse
|
5
|
Takemoto Y, Ito D, Komori S, Kishimoto Y, Yamada S, Hashizume A, Katsuno M, Nakatochi M. Comparing preprocessing strategies for 3D-Gene microarray data of extracellular vesicle-derived miRNAs. BMC Bioinformatics 2024; 25:221. [PMID: 38902629 PMCID: PMC11188187 DOI: 10.1186/s12859-024-05840-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 06/12/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Extracellular vesicle-derived (EV)-miRNAs have potential to serve as biomarkers for the diagnosis of various diseases. miRNA microarrays are widely used to quantify circulating EV-miRNA levels, and the preprocessing of miRNA microarray data is critical for analytical accuracy and reliability. Thus, although microarray data have been used in various studies, the effects of preprocessing have not been studied for Toray's 3D-Gene chip, a widely used measurement method. We aimed to evaluate batch effect, missing value imputation accuracy, and the influence of preprocessing on measured values in 18 different preprocessing pipelines for EV-miRNA microarray data from two cohorts with amyotrophic lateral sclerosis using 3D-Gene technology. RESULTS Eighteen different pipelines with different types and orders of missing value completion and normalization were used to preprocess the 3D-Gene microarray EV-miRNA data. Notable results were suppressed in the batch effects in all pipelines using the batch effect correction method ComBat. Furthermore, pipelines utilizing missForest for missing value imputation showed high agreement with measured values. In contrast, imputation using constant values for missing data exhibited low agreement. CONCLUSIONS This study highlights the importance of selecting the appropriate preprocessing strategy for EV-miRNA microarray data when using 3D-Gene technology. These findings emphasize the importance of validating preprocessing approaches, particularly in the context of batch effect correction and missing value imputation, for reliably analyzing data in biomarker discovery and disease research.
Collapse
Affiliation(s)
- Yuto Takemoto
- Public Health Informatics Unit, Department of Integrated Health Sciences, Nagoya University Graduate School of Medicine, 1-1-20 Daiko-Minami, Higashi-Ku, Nagoya, 461-8673, Japan
| | - Daisuke Ito
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Shota Komori
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Yoshiyuki Kishimoto
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Shinichiro Yamada
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Atsushi Hashizume
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
- Department of Clinical Research Education, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Masahisa Katsuno
- Department of Neurology, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
- Department of Clinical Research Education, Nagoya University Graduate School of Medicine, 65 Tsurumai-Cho, Showa-Ku, Nagoya, 466-8550, Japan
| | - Masahiro Nakatochi
- Public Health Informatics Unit, Department of Integrated Health Sciences, Nagoya University Graduate School of Medicine, 1-1-20 Daiko-Minami, Higashi-Ku, Nagoya, 461-8673, Japan.
| |
Collapse
|
6
|
Wang J, Nuray U, Yan H, Xu Y, Fang L, Li R, Zhou X, Zhang H. Pyroptosis is involved in the immune microenvironment regulation of unexplained recurrent miscarriage. Mamm Genome 2024; 35:256-279. [PMID: 38538990 DOI: 10.1007/s00335-024-10038-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 03/11/2024] [Indexed: 05/29/2024]
Abstract
Unexplained recurrent miscarriage (URM) is a common pregnancy complication with few effective therapies. Moreover, little is known regarding the role of pyroptosis in the regulation of the URM immune microenvironment. To address this issue, gene expression profiles of publicly available placental datasets GSE22490 and GSE76862 were downloaded from the Gene Expression Omnibus database. Pyroptosis-related differentially expressed genes were identified and a total of 16 differentially expressed genes associated with pyroptosis were detected, among which 1 was upregulated and 15 were downregulated. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses indicated that the functionally enriched modules and pathways of these genes are closely related to immune and inflammatory responses. Four hub genes were identified: BTK, TLR8, NLRC4, and TNFSF13B. BTK, TLR8, and TNFSF13B were highly connected with immune cells, according to the correlation analysis of four hub genes and 20 different types of immune cells (p < 0.05). The four hub genes were used as research objects to construct the interaction networks. Chorionic villus tissue was used for quantitative real-time polymerase chain reaction and western blot to confirm the expression levels of hub genes, and the results showed that the expression of the four hub genes was significantly decreased in the chorionic villus tissue in the URM group. Collectively, the present study indicates that perhaps pyroptosis is essential to the diversity and complexity of the URM immune microenvironment, and provides a theoretical basis and research ideas for subsequent target gene verification and mechanism research.
Collapse
Affiliation(s)
- Jing Wang
- Department of Obstetrics and Gynecology, Second Affiliated Hospital of Soochow University, Suzhou, China
- Department of Obstetrics and Gynecology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | | | - Hongchao Yan
- Department of Obstetrics and Gynecology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Yang Xu
- Department of Obstetrics and Gynecology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Lisha Fang
- Department of Obstetrics and Gynecology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Ranran Li
- First clinical medical college of Xuzhou Medical University, Xuzhou, China
| | - Xin Zhou
- First clinical medical college of Xuzhou Medical University, Xuzhou, China
| | - Hong Zhang
- Department of Obstetrics and Gynecology, Second Affiliated Hospital of Soochow University, Suzhou, China.
| |
Collapse
|
7
|
Goldstein Y, Cohen OT, Wald O, Bavli D, Kaplan T, Benny O. Particle uptake in cancer cells can predict malignancy and drug resistance using machine learning. SCIENCE ADVANCES 2024; 10:eadj4370. [PMID: 38809990 PMCID: PMC11314625 DOI: 10.1126/sciadv.adj4370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 04/23/2024] [Indexed: 05/31/2024]
Abstract
Tumor heterogeneity is a primary factor that contributes to treatment failure. Predictive tools, capable of classifying cancer cells based on their functions, may substantially enhance therapy and extend patient life span. The connection between cell biomechanics and cancer cell functions is used here to classify cells through mechanical measurements, via particle uptake. Machine learning (ML) was used to classify cells based on single-cell patterns of uptake of particles with diverse sizes. Three pairs of human cancer cell subpopulations, varied in their level of drug resistance or malignancy, were studied. Cells were allowed to interact with fluorescently labeled polystyrene particles ranging in size from 0.04 to 3.36 μm and analyzed for their uptake patterns using flow cytometry. ML algorithms accurately classified cancer cell subtypes with accuracy rates exceeding 95%. The uptake data were especially advantageous for morphologically similar cell subpopulations. Moreover, the uptake data were found to serve as a form of "normalization" that could reduce variation in repeated experiments.
Collapse
Affiliation(s)
- Yoel Goldstein
- Institute for Drug Research, The School of Pharmacy, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ora T. Cohen
- Institute for Drug Research, The School of Pharmacy, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ori Wald
- Department of Cardiothoracic Surgery, Hadassah Medical Center, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Danny Bavli
- Department of Stem Cell and Regenerative Biology, Harvard Stem Cell Institute, Harvard University, Cambridge, MA, USA
| | - Tommy Kaplan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
- Department of Developmental Biology and Cancer Research, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ofra Benny
- Institute for Drug Research, The School of Pharmacy, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| |
Collapse
|
8
|
Ma R, Sun ED, Donoho D, Zou J. Principled and interpretable alignability testing and integration of single-cell data. Proc Natl Acad Sci U S A 2024; 121:e2313719121. [PMID: 38416677 DOI: 10.1073/pnas.2313719121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 01/23/2024] [Indexed: 03/01/2024] Open
Abstract
Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.
Collapse
Affiliation(s)
- Rong Ma
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Eric D Sun
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| | - David Donoho
- Department of Statistics, Stanford University, Stanford, CA 94305
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| |
Collapse
|
9
|
Lin Z, Wu Z, Luo W. Bulk and single-cell sequencing identified a prognostic model based on the macrophage and lipid metabolism related signatures for osteosarcoma patients. Heliyon 2024; 10:e26091. [PMID: 38404899 PMCID: PMC10884844 DOI: 10.1016/j.heliyon.2024.e26091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 02/07/2024] [Accepted: 02/07/2024] [Indexed: 02/27/2024] Open
Abstract
The introduction of multidrug combination chemotherapy has significantly advanced the long-term survival prospects for osteosarcoma (OS) patients over the past decades. However, the escalating prevalence of chemoresistance has emerged as a substantial impediment to further advancements, necessitating the formulation of innovative strategies. Our present study leveraged sophisticated bulk and single-cell sequencing techniques to scrutinize the OS immune microenvironment, unveiling a potential association between the differentiation state of macrophages and the efficacy of OS chemotherapy. Notably, we observed that a heightened presence of lipid metabolism genes and pathways in predifferentiated macrophages, constituting the major cluster of OS patients exhibiting a less favorable response to chemotherapy. Subsequently, we developed a robust Macrophage and Lipid Metabolism (MLMR) risk model and a nomogram, both of which demonstrated commendable prognostic predictive performance. Furthermore, a comprehensive investigation into the underlying mechanisms of the risk model revealed intricate associations with variations in the immune response among OS patients. Finally, our meticulous drug sensitivity analysis identified a spectrum of potential therapeutic agents for OS, including AZD2014, Sapitinib, Buparlisib, Afuresertib, MIRA-1, and BIBR-1532. These findings significantly augment the therapeutic arsenal available to clinicians managing OS, presenting a promising avenue for elevating treatment outcomes.
Collapse
Affiliation(s)
- Zili Lin
- Department of Orthopaedics, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, PR China
| | - Ziyi Wu
- Department of Orthopaedics, the Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, PR China
| | - Wei Luo
- Department of Orthopaedics, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, PR China
- National Clinical Research Center for Geriatric Disorders,Xiangya Hospital, Changsha, Hunan, 410008, PR China
| |
Collapse
|
10
|
Fukutani KF, Hampton TH, Bobak CA, MacKenzie TA, Stanton BA. APPLICATION OF QUANTILE DISCRETIZATION AND BAYESIAN NETWORK ANALYSIS TO PUBLICLY AVAILABLE CYSTIC FIBROSIS DATA SETS. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024; 29:534-548. [PMID: 38160305 PMCID: PMC10783867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
The availability of multiple publicly-available datasets studying the same phenomenon has the promise of accelerating scientific discovery. Meta-analysis can address issues of reproducibility and often increase power. The promise of meta-analysis is especially germane to rarer diseases like cystic fibrosis (CF), which affects roughly 100,000 people worldwide. A recent search of the National Institute of Health's Gene Expression Omnibus revealed 1.3 million data sets related to cancer compared to about 2,000 related to CF. These studies are highly diverse, involving different tissues, animal models, treatments, and clinical covariates. In our search for gene expression studies of primary human airway epithelial cells, we identified three studies with compatible methodologies and sufficient metadata: GSE139078, Sala Study, and PRJEB9292. Even so, experimental designs were not identical, and we identified significant batch effects that would have complicated functional analysis. Here we present quantile discretization and Bayesian network construction using the Hill climb method as a powerful tool to overcome experimental differences and reveal biologically relevant responses to the CF genotype itself, exposure to virus, bacteria, and drugs used to treat CF. Functional patterns revealed by cluster Profiler included interferon signaling, interferon gamma signaling, interleukins 4 and 13 signaling, interleukin 6 signaling, interleukin 21 signaling, and inactivation of CSF3/G-CSF signaling pathways showing significant alterations. These pathways were consistently associated with higher gene expression in CF epithelial cells compared to non-CF cells, suggesting that targeting these pathways could improve clinical outcomes. The success of quantile discretization and Bayesian network analysis in the context of CF suggests that these approaches might be applicable to other contexts where exactly comparable data sets are hard to find.
Collapse
|
11
|
Downing T, Angelopoulos N. A primer on correlation-based dimension reduction methods for multi-omics analysis. J R Soc Interface 2023; 20:20230344. [PMID: 37817584 PMCID: PMC10565429 DOI: 10.1098/rsif.2023.0344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/19/2023] [Indexed: 10/12/2023] Open
Abstract
The continuing advances of omic technologies mean that it is now more tangible to measure the numerous features collectively reflecting the molecular properties of a sample. When multiple omic methods are used, statistical and computational approaches can exploit these large, connected profiles. Multi-omics is the integration of different omic data sources from the same biological sample. In this review, we focus on correlation-based dimension reduction approaches for single omic datasets, followed by methods for pairs of omics datasets, before detailing further techniques for three or more omic datasets. We also briefly detail network methods when three or more omic datasets are available and which complement correlation-oriented tools. To aid readers new to this area, these are all linked to relevant R packages that can implement these procedures. Finally, we discuss scenarios of experimental design and present road maps that simplify the selection of appropriate analysis methods. This review will help researchers navigate emerging methods for multi-omics and integrating diverse omic datasets appropriately. This raises the opportunity of implementing population multi-omics with large sample sizes as omics technologies and our understanding improve.
Collapse
Affiliation(s)
- Tim Downing
- Pirbright Institute, Pirbright, Surrey, UK
- Department of Biotechnology, Dublin City University, Dublin, Ireland
| | | |
Collapse
|
12
|
Selvitella AM, Foster KL. On the variability and dependence of human leg stiffness across strides during running and some consequences for the analysis of locomotion data. ROYAL SOCIETY OPEN SCIENCE 2023; 10:230597. [PMID: 37621665 PMCID: PMC10445019 DOI: 10.1098/rsos.230597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/27/2023] [Indexed: 08/26/2023]
Abstract
Typically, animal locomotion studies involve consecutive strides, which are frequently assumed to be independent with parameters that do not vary across strides. This assumption is often not tested. However, failing in particular to account for dependence across strides may cause an incorrect estimate of the uncertainty of the measurements and thereby lead to either missing (overestimating variance) or over-evaluating (underestimating variance) biological signals. In turn, this impacts replicability of the results because variability is accounted for differently across experiments. In this paper, we analyse the changes of a couple of measures of human leg stiffness across strides during running experiments, using a publicly available dataset. A major finding of this analysis is that the time series of these measurements of stiffness show autocorrelation even at large lags and so there is dependence between individual strides, even when separated by many intervening strides. Our results question the practice in biomechanics research of using each stride as an independent observation or of sub-selecting strides at small lags. Following the outcome of our analysis, we strongly recommend caution in doing so without first confirming the independence of the measurements across strides and without confirming that sub-selection does not produce spurious results.
Collapse
Affiliation(s)
- Alessandro Maria Selvitella
- Department of Mathematical Sciences, Purdue University Fort Wayne, 2101 East Coliseum Boulevard, Fort Wayne, IN 46805, USA
- eScience Institute, University of Washington, 3910 15th Avenue Northeast, Seattle, WA 98195, USA
- NSF-Simons Center for Quantitative Biology, Northwestern University, 2200 Campus Drive Evanston, IL 60208, USA
| | - Kathleen Lois Foster
- NSF-Simons Center for Quantitative Biology, Northwestern University, 2200 Campus Drive Evanston, IL 60208, USA
- Department of Biology, Ball State University, 2000 West University Avenue, Muncie, IN 47306, USA
| |
Collapse
|
13
|
Singh PP, Benayoun BA. Considerations for reproducible omics in aging research. NATURE AGING 2023; 3:921-930. [PMID: 37386258 PMCID: PMC10527412 DOI: 10.1038/s43587-023-00448-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 06/01/2023] [Indexed: 07/01/2023]
Abstract
Technical advancements over the past two decades have enabled the measurement of the panoply of molecules of cells and tissues including transcriptomes, epigenomes, metabolomes and proteomes at unprecedented resolution. Unbiased profiling of these molecular landscapes in the context of aging can reveal important details about mechanisms underlying age-related functional decline and age-related diseases. However, the high-throughput nature of these experiments creates unique analytical and design demands for robustness and reproducibility. In addition, 'omic' experiments are generally onerous, making it crucial to effectively design them to eliminate as many spurious sources of variation as possible as well as account for any biological or technical parameter that may influence such measures. In this Perspective, we provide general guidelines on best practices in the design and analysis of omic experiments in aging research from experimental design to data analysis and considerations for long-term reproducibility and validation of such studies.
Collapse
Affiliation(s)
- Param Priya Singh
- Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.
- Bakar Aging Research Institute, University of California, San Francisco, San Francisco, CA, USA.
| | - Bérénice A Benayoun
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, USA.
- Molecular and Computational Biology Department, USC Dornsife College of Letters, Arts and Sciences, Los Angeles, CA, USA.
- Biochemistry and Molecular Medicine Department, USC Keck School of Medicine, Los Angeles, CA, USA.
- Epigenetics and Gene Regulation, USC Norris Comprehensive Cancer Center, Los Angeles, CA, USA.
- USC Stem Cell Initiative, Los Angeles, CA, USA.
| |
Collapse
|
14
|
Guo F, Lin G, Dong L, Cheng KK, Deng L, Xu X, Raftery D, Dong J. Concordance-Based Batch Effect Correction for Large-Scale Metabolomics. Anal Chem 2023; 95:7220-7228. [PMID: 37115661 DOI: 10.1021/acs.analchem.2c05748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
For a large-scale metabolomics study, sample collection, preparation, and analysis may last several days, months, or even (intermittently) over years. This may lead to apparent batch effects in the acquired metabolomics data due to variability in instrument status, environmental conditions, or experimental operators. Batch effects may confound the true biological relationships among metabolites and thus obscure real metabolic changes. At present, most of the commonly used batch effect correction (BEC) methods are based on quality control (QC) samples, which require sufficient and stable QC samples. However, the quality of the QC samples may deteriorate if the experiment lasts for a long time. Alternatively, isotope-labeled internal standards have been used, but they generally do not provide good coverage of the metabolome. On the other hand, BEC can also be conducted through a data-driven method, in which no QC sample is needed. Here, we propose a novel data-driven BEC method, namely, CordBat, to achieve concordance between each batch of samples. In the proposed CordBat method, a reference batch is first selected from all batches of data, and the remaining batches are referred to as "other batches." The reference batch serves as the baseline for the batch adjustment by providing a coordinate of correlation between metabolites. Next, a Gaussian graphical model is built on the combined dataset of reference and other batches, and finally, BEC is achieved by optimizing the correction coefficients in the other batches so that the correlation between metabolites of each batch and their combinations are in concordance with that of the reference batch. Three real-world metabolomics datasets are used to evaluate the performance of CordBat by comparing it with five commonly used BEC methods. The present experimental results showed the effectiveness of CordBat in batch effect removal and the concordance of correlation between metabolites after BEC. CordBat was found to be comparable to the QC-based methods and achieved better performance in the preservation of biological effects. The proposed CordBat method may serve as an alternative BEC method for large-scale metabolomics that lack proper QC samples.
Collapse
Affiliation(s)
- Fanjing Guo
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Genjin Lin
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Liheng Dong
- School of Computer Science and Technology, Xiamen University Malaysia, Sepang 43600, Malaysia
| | - Kian-Kai Cheng
- Faculty of Chemical and Energy Engineering, Universiti Teknologi Malaysia, Johor 81310, Malaysia
| | - Lingli Deng
- Department of Information Engineering, East China University of Technology, Nanchang 330013, China
| | - Xiangnan Xu
- School of Mathematics and Statistics, The University of Sydney, Sydney, New South Wales 2006, Australia
| | - Daniel Raftery
- Northwest Metabolomics Research Center, University of Washington, Seattle, Washington 98109, United States
| | - Jiyang Dong
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|
15
|
Hattaway ME, Black GP, Young TM. Batch correction methods for nontarget chemical analysis data: application to a municipal wastewater collection system. Anal Bioanal Chem 2023; 415:1321-1331. [PMID: 36627378 PMCID: PMC9928919 DOI: 10.1007/s00216-023-04511-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/08/2022] [Accepted: 01/02/2023] [Indexed: 01/12/2023]
Abstract
Nontarget chemical analysis using high-resolution mass spectrometry has increasingly been used to discern spatial patterns and temporal trends in anthropogenic chemical abundance in natural and engineered systems. A critical experimental design consideration in such applications, especially those monitoring complex matrices over long time periods, is a choice between analyzing samples in multiple batches as they are collected, or in one batch after all samples have been processed. While datasets acquired in multiple analytical batches can include the effects of instrumental variability over time, datasets acquired in a single batch risk compound degradation during sample storage. To assess the influence of batch effects on the analysis and interpretation of nontarget data, this study examined a set of 56 samples collected from a municipal wastewater system over 7 months. Each month's samples included 6 from sites within the collection system, one combined influent, and one treated effluent sample. Samples were analyzed using liquid chromatography high-resolution mass spectrometry in positive electrospray ionization mode in multiple batches as the samples were collected and in a single batch at the conclusion of the study. Data were aligned and normalized using internal standard scaling and ComBat, an empirical Bayes method developed for estimating and removing batch effects in microarrays. As judged by multiple lines of evidence, including comparing principal variance component analysis between single and multi-batch datasets and through patterns in principal components and hierarchical clustering analyses, ComBat appeared to significantly reduce the influence of batch effects. For this reason, we recommend the use of more, small batches with an appropriate batch correction step rather than acquisition in one large batch.
Collapse
Affiliation(s)
- Madison E Hattaway
- Department of Civil and Environmental Engineering, University of California, Davis, Davis, CA, 95616, USA
| | - Gabrielle P Black
- Department of Civil and Environmental Engineering, University of California, Davis, Davis, CA, 95616, USA
| | - Thomas M Young
- Department of Civil and Environmental Engineering, University of California, Davis, Davis, CA, 95616, USA.
| |
Collapse
|
16
|
Korlimarla A, Ps H, Prabhu J, Ragulan C, Patil Y, Vp S, Desai K, Mathews A, Appachu S, Diwakar RB, Bs S, Melcher A, Cheang M, Sadanandam A. Comprehensive characterization of immune landscape of Indian and Western triple negative breast cancers. Transl Oncol 2022; 25:101511. [PMID: 35964339 PMCID: PMC9386467 DOI: 10.1016/j.tranon.2022.101511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 07/22/2022] [Accepted: 08/02/2022] [Indexed: 11/01/2022] Open
Abstract
PURPOSE Triple-negative breast cancer (TNBC) is a heterogeneous disease with a significant challenge to effectively manage in the clinic worldwide. Immunotherapy may be beneficial to TNBC patients if responders can be effectively identified. Here we sought to elucidate the immune landscape of TNBCs by stratifying patients into immune-specific subtypes (immunotypes) to decipher the molecular and cellular presentations and signaling events of this heterogeneous disease and associating them with their clinical outcomes and potential treatment options. EXPERIMENTAL DESIGN We profiled 730 immune genes in 88 retrospective Indian TNBC samples using the NanoString platform, established immunotypes using non-negative matrix factorization-based machine learning approach, and validated them using Western TNBCs (n=422; public datasets). Immunotype-specific gene signatures were associated with clinicopathological features, immune cell types, biological pathways, acute/chronic inflammatory responses, and immunogenic cell death processes. Responses to different immunotherapies associated with TNBC immunotypes were assessed using cross-cancer comparison to melanoma (n=504). Tumor-infiltrating lymphocytes (TILs) and pan-macrophage spatial marker expression were evaluated. RESULTS We identified three robust transcriptome-based immunotypes in both Indian and Western TNBCs in similar proportions. Immunotype-1 tumors, mainly representing well-known claudin-low and immunomodulatory subgroups, harbored dense TIL infiltrates and T-helper-1 (Th1) response profiles associated with smaller tumors, pre-menopausal status, and a better prognosis. They displayed a cascade of events, including acute inflammation, damage-associated molecular patterns, T-cell receptor-related and chemokine-specific signaling, antigen presentation, and viral-mimicry pathways. On the other hand, immunotype-2 was enriched for Th2/Th17 responses, CD4+ regulatory cells, basal-like/mesenchymal immunotypes, and an intermediate prognosis. In contrast to the two T-cell enriched immunotypes, immunotype-3 patients expressed innate immune genes/proteins, including those representing myeloid infiltrations (validated by spatial immunohistochemistry), and had poor survival. Remarkably, a cross-cancer comparison analysis revealed the association of immunotype-1 with responses to anti-PD-L1 and MAGEA3 immunotherapies. CONCLUSION Overall, the TNBC immunotypes identified in TNBCs reveal different prognoses, immune infiltrations, signaling, acute/chronic inflammation leading to immunogenic cell death of cancer cells, and potentially distinct responses to immunotherapies. The overlap in immune characteristics in Indian and Western TNBCs suggests similar efficiency of immunotherapy in both populations if strategies to select patients according to immunotypes can be further optimized and implemented.
Collapse
Affiliation(s)
- Aruna Korlimarla
- St. John's Research Institute, St. John's National Academy of Health Sciences, Bangalore, India; Sri Shankara Cancer Hospital and Research Centre, Bangalore, India
| | - Hari Ps
- St. John's Research Institute, St. John's National Academy of Health Sciences, Bangalore, India; Sri Shankara Cancer Hospital and Research Centre, Bangalore, India; Division of Molecular Pathology, The Institute of Cancer Research, London, UK
| | - Jyoti Prabhu
- St. John's Research Institute, St. John's National Academy of Health Sciences, Bangalore, India
| | - Chanthirika Ragulan
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK
| | - Yatish Patil
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK
| | - Snijesh Vp
- St. John's Research Institute, St. John's National Academy of Health Sciences, Bangalore, India
| | - Krisha Desai
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK
| | - Aju Mathews
- MOSC Medical College, Kolenchery, Kerala, India
| | - Sandhya Appachu
- Sri Shankara Cancer Hospital and Research Centre, Bangalore, India
| | - Ravi B Diwakar
- Sri Shankara Cancer Hospital and Research Centre, Bangalore, India
| | - Srinath Bs
- Sri Shankara Cancer Hospital and Research Centre, Bangalore, India
| | - Alan Melcher
- Centre for Translational Immunotherapy, Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK
| | - Maggie Cheang
- Clinical Trials and Statistical Unit, The Institute of Cancer Research, London, UK
| | - Anguraj Sadanandam
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK; Centre for Translational Immunotherapy, Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK; Centre for Global Oncology, Division of Molecular Pathology, The Institute of Cancer Research, 15 Cotswold Road, Sutton, London SM2 5NG, UK.
| |
Collapse
|
17
|
Zheng X, Ma Y, Bai Y, Huang T, Lv X, Deng J, Wang Z, Lian W, Tong Y, Zhang X, Yue M, Zhang Y, Li L, Peng M. Identification and validation of immunotherapy for four novel clusters of colorectal cancer based on the tumor microenvironment. Front Immunol 2022; 13:984480. [PMID: 36389763 PMCID: PMC9650243 DOI: 10.3389/fimmu.2022.984480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 10/07/2022] [Indexed: 12/24/2022] Open
Abstract
The incidence and mortality of colorectal cancer (CRC) are increasing year by year. The accurate classification of CRC can realize the purpose of personalized and precise treatment for patients. The tumor microenvironment (TME) plays an important role in the malignant progression and immunotherapy of CRC. An in-depth understanding of the clusters based on the TME is of great significance for the discovery of new therapeutic targets for CRC. We extracted data on CRC, including gene expression profile, DNA methylation array, somatic mutations, clinicopathological information, and copy number variation (CNV), from The Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO) (four datasets-GSE14333, GSE17538, GSE38832, and GSE39582), cBioPortal, and FireBrowse. The MCPcounter was utilized to quantify the abundance of 10 TME cells for CRC samples. Cluster repetitive analysis was based on the Hcluster function of the Pheatmap package in R. The ESTIMATE package was applied to compute immune and stromal scores for CRC patients. PCA analysis was used to remove batch effects among different datasets and transform genome-wide DNA methylation profiling into methylation of tumor-infiltrating lymphocyte (MeTIL). We evaluated the mutation differences of the clusters using MOVICS, DeconstructSigs, and GISTIC packages. As for therapy, TIDE and SubMap analyses were carried out to forecast the immunotherapy response of the clusters, and chemotherapeutic sensibility was estimated based on the pRRophetic package. All results were verified in the TCGA and GEO data. Four immune clusters (ImmClust-CS1, ImmClust-CS2, ImmClust-CS3, and ImmClust-CS4) were identified for CRC. The four ImmClusts exhibited distinct TME compositions, cancer-associated fibroblasts (CAFs), functional orientation, and immune checkpoints. The highest immune, stromal, and MeTIL scores were observed in CS2, in contrast to the lowest scores in CS4. CS1 may respond to immunotherapy, while CS2 may respond to immunotherapy after anti-CAFs. Among the four ImmClusts, the top 15 markers with the highest mutation frequency were acquired, and CS1 had significantly lower CNA on the focal level than other subtypes. In addition, CS1 and CS2 patients had more stable chromosomes than CS3 and CS4. The most sensitive chemotherapeutic agents in these four ImmClusts were also found. IHC results revealed that CD29 stained significantly darker in the cancer samples, indicating that their CD29 was highly expressed in colon cancer. This work revealed the novel clusters based on TME for CRC, which would guide in predicting the prognosis, biological features, and appropriate treatment for patients with CRC.
Collapse
Affiliation(s)
- Xiaoyong Zheng
- Department of Digestion, Henan Provincial Third People’s Hospital, Zhengzhou, China
| | - Yajie Ma
- Department of Medical Affair, Henan Provincial Third People’s Hospital, Zhengzhou, China
| | - Yan Bai
- Department of Digestion, Zhengzhou First People’s Hospital, Zhengzhou, China
| | - Tao Huang
- Medical School, Huanghe Science and Technology University, Zhengzhou, China
| | - Xuefeng Lv
- Department of Clinical Laboratory, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Jinhai Deng
- Richard Dimbleby Department of Cancer Research, Comprehensive Cancer Centre, Kings College London, London, United Kingdom
| | - Zhongquan Wang
- Department of Clinical Laboratory, Henan Provincial Third People’s Hospital, Zhengzhou, China
| | - Wenping Lian
- Department of Clinical Laboratory, Henan Provincial Third People’s Hospital, Zhengzhou, China
| | - Yalin Tong
- Department of Digestion, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Xinyu Zhang
- Department of Medical Affair, Henan Provincial Third People’s Hospital, Zhengzhou, China
| | - Miaomiao Yue
- Department of Digestion, Henan Provincial Third People’s Hospital, Zhengzhou, China
| | - Yan Zhang
- Department of Digestion, Henan Provincial Third People’s Hospital, Zhengzhou, China
| | - Lifeng Li
- Medical School, Huanghe Science and Technology University, Zhengzhou, China
- Cancer Center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- Internet Medical and System Applications of National Engineering Laboratory, Zhengzhou, China
| | - Mengle Peng
- Department of Clinical Laboratory, Henan Provincial Third People’s Hospital, Zhengzhou, China
| |
Collapse
|
18
|
Jeong JC, Hands I, Kolesar JM, Rao M, Davis B, Dobyns Y, Hurt-Mueller J, Levens J, Gregory J, Williams J, Witt L, Kim EM, Burton C, Elbiheary AA, Chang M, Durbin EB. Local data commons: the sleeping beauty in the community of data commons. BMC Bioinformatics 2022; 23:386. [PMID: 36151511 PMCID: PMC9502580 DOI: 10.1186/s12859-022-04922-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Accepted: 09/12/2022] [Indexed: 12/03/2022] Open
Abstract
Background Public Data Commons (PDC) have been highlighted in the scientific literature for their capacity to collect and harmonize big data. On the other hand, local data commons (LDC), located within an institution or organization, have been underrepresented in the scientific literature, even though they are a critical part of research infrastructure. Being closest to the sources of data, LDCs provide the ability to collect and maintain the most up-to-date, high-quality data within an organization, closest to the sources of the data. As a data provider, LDCs have many challenges in both collecting and standardizing data, moreover, as a consumer of PDC, they face problems of data harmonization stemming from the monolithic harmonization pipeline designs commonly adapted by many PDCs. Unfortunately, existing guidelines and resources for building and maintaining data commons exclusively focus on PDC and provide very little information on LDC. Results This article focuses on four important observations. First, there are three different types of LDC service models that are defined based on their roles and requirements. These can be used as guidelines for building new LDC or enhancing the services of existing LDC. Second, the seven core services of LDC are discussed, including cohort identification and facilitation of genomic sequencing, the management of molecular reports and associated infrastructure, quality control, data harmonization, data integration, data sharing, and data access control. Third, instead of commonly developed monolithic systems, we propose a new data sharing method for data harmonization that combines both divide-and-conquer and bottom-up approaches. Finally, an end-to-end LDC implementation is introduced with real-world examples. Conclusions Although LDCs are an optimal place to identify and address data quality issues, they have traditionally been relegated to the role of passive data provider for much larger PDC. Indeed, many LDCs limit their functions to only conducting routine data storage and transmission tasks due to a lack of information on how to design, develop, and improve their services using limited resources. We hope that this work will be the first small step in raising awareness among the LDCs of their expanded utility and to publicize to a wider audience the importance of LDC.
Collapse
Affiliation(s)
- Jong Cheol Jeong
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA. .,Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.
| | - Isaac Hands
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Jill M Kolesar
- Department of Pharmacy Practice and Science, College of Pharmacy, University of Kentucky, Lexington, KY, USA
| | - Mahadev Rao
- Department of Pharmacy Practice, Center for Translational Research, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Bront Davis
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - York Dobyns
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Joseph Hurt-Mueller
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Justin Levens
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Jenny Gregory
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - John Williams
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Lisa Witt
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Eun Mi Kim
- Department of Computer Science, Eastern Kentucky University, Richmond, KY, USA
| | - Carlee Burton
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA
| | - Amir A Elbiheary
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA
| | - Mingguang Chang
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA
| | - Eric B Durbin
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA. .,Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA. .,Kentucky Cancer Registry, Lexington, KY, USA.
| |
Collapse
|
19
|
Liu H, Xing K, Jiang Y, Liu Y, Wang C, Ding X. Using Machine Learning to Identify Biomarkers Affecting Fat Deposition in Pigs by Integrating Multisource Transcriptome Information. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2022; 70:10359-10370. [PMID: 35953074 PMCID: PMC9413214 DOI: 10.1021/acs.jafc.2c03339] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/27/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
Fat deposition in pigs is not only closely related to pig production efficiency and pork quality but also an ideal model for human obesity. Transcriptome sequencing is widely used to study fat deposition. However, due to small sample sizes, high false positive rates, and poor consistency of results from different studies, new strategies are urgently needed. Machine learning, a new analysis method, can effectively fit complex data and accurately identify samples and genes. In this study, 36 samples of adipose tissue, muscle tissue, and liver tissue were collected from Songliao black pigs and Landrace pigs, and the mRNA of all the samples was sequenced. In addition, we collected transcriptome data for 64 samples in the GEO database from four different sources. After standardization and imputation of missing values in the data set comprising 100 samples, traditional differential expression analysis was carried out, and different numbers of expressed genes were selected as features for the training model of eight machine learning methods. In the 1000 replications of fourfold cross validation with 100 samples, AdaBoost performed best, with an average prediction accuracy greater than 93% and the highest mean area under the curve in predicting the high- and low-fat content groups among the eight ML methods. According to their performance-based ranks inferred by AdaBoost, 12 genes related to fat deposition were identified; among them, FASN and APOD were specifically expressed in adipose tissue, and APOA1 was specifically expressed in the liver, which could be important candidate biomarkers affecting fat deposition.
Collapse
|
20
|
Sprang M, Andrade-Navarro MA, Fontaine JF. Batch effect detection and correction in RNA-seq data using machine-learning-based automated assessment of quality. BMC Bioinformatics 2022; 23:279. [PMID: 35836114 PMCID: PMC9284682 DOI: 10.1186/s12859-022-04775-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 06/08/2022] [Indexed: 11/26/2022] Open
Abstract
Background The constant evolving and development of next-generation sequencing techniques lead to high throughput data composed of datasets that include a large number of biological samples. Although a large number of samples are usually experimentally processed by batches, scientific publications are often elusive about this information, which can greatly impact the quality of the samples and confound further statistical analyzes. Because dedicated bioinformatics methods developed to detect unwanted sources of variance in the data can wrongly detect real biological signals, such methods could benefit from using a quality-aware approach. Results We recently developed statistical guidelines and a machine learning tool to automatically evaluate the quality of a next-generation-sequencing sample. We leveraged this quality assessment to detect and correct batch effects in 12 publicly available RNA-seq datasets with available batch information. We were able to distinguish batches by our quality score and used it to correct for some batch effects in sample clustering. Overall, the correction was evaluated as comparable to or better than the reference method that uses a priori knowledge of the batches (in 10 and 1 datasets of 12, respectively; total = 92%). When coupled to outlier removal, the correction was more often evaluated as better than the reference (comparable or better in 5 and 6 datasets of 12, respectively; total = 92%). Conclusions In this work, we show the capabilities of our software to detect batches in public RNA-seq datasets from differences in the predicted quality of their samples. We also use these insights to correct the batch effect and observe the relation of sample quality and batch effect. These observations reinforce our expectation that while batch effects do correlate with differences in quality, batch effects also arise from other artifacts and are more suitably corrected statistically in well-designed experiments. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04775-y.
Collapse
Affiliation(s)
- Maximilian Sprang
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| | - Jean-Fred Fontaine
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| |
Collapse
|
21
|
Young K, Lawlor RT, Ragulan C, Patil Y, Mafficini A, Bersani S, Antonello D, Mansfield D, Cingarlini S, Landoni L, Pea A, Luchini C, Piredda L, Kannan N, Nyamundanda G, Morganstein D, Chau I, Wiedenmann B, Milella M, Melcher A, Cunningham D, Starling N, Scarpa A, Sadanandam A. Immune landscape, evolution, hypoxia-mediated viral mimicry pathways and therapeutic potential in molecular subtypes of pancreatic neuroendocrine tumours. Gut 2021; 70:1904-1913. [PMID: 32883872 PMCID: PMC8458094 DOI: 10.1136/gutjnl-2020-321016] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 08/11/2020] [Accepted: 08/12/2020] [Indexed: 12/12/2022]
Abstract
OBJECTIVE A comprehensive analysis of the immune landscape of pancreatic neuroendocrine tumours (PanNETs) was performed according to clinicopathological parameters and previously defined molecular subtypes to identify potential therapeutic vulnerabilities in this disease. DESIGN Differential expression analysis of 600 immune-related genes was performed on 207 PanNET samples, comprising a training cohort (n=72) and two validation cohorts (n=135) from multiple transcriptome profiling platforms. Different immune-related and subtype-related phenotypes, cell types and pathways were investigated using different in silico methods and were further validated using spatial multiplex immunofluorescence. RESULTS The study identified an immune signature of 132 genes segregating PanNETs (n=207) according to four previously defined molecular subtypes: metastasis-like primary (MLP)-1 and MLP-2, insulinoma-like and intermediate. The MLP-1 subtype (26%-31% samples across three cohorts) was strongly associated with elevated levels of immune-related genes, poor prognosis and a cascade of tumour evolutionary events: larger hypoxic and necroptotic tumours leading to increased damage-associated molecular patterns (viral mimicry), stimulator of interferon gene pathway, T cell-inflamed genes, immune checkpoint targets, and T cell-mediated and M1 macrophage-mediated immune escape mechanisms. Multiplex spatial profiling validated significantly increased macrophages in the MLP-1 subtype. CONCLUSION This study provides novel data on the immune microenvironment of PanNETs and identifies MLP-1 subtype as an immune-high phenotype featuring a broad and robust activation of immune-related genes. This study, with further refinement, paves the way for future precision immunotherapy studies in PanNETs to potentially select a subset of MLP-1 patients who may be more likely to respond.
Collapse
Affiliation(s)
- Kate Young
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
- Department of Medicine, Royal Marsden Hospital, London and Surrey, UK
| | - Rita T Lawlor
- ARC-Net Research Centre, University of Verona, Verona, Italy
| | - Chanthirika Ragulan
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
- Centre for Molecular Pathology, Royal Marsden Hospital, London, UK
| | - Yatish Patil
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
| | - Andrea Mafficini
- ARC-Net Research Centre, University of Verona, Verona, Italy
- Department of Diagnostics and Public Health, University and Hospital Trust of Verona, Verona, Italy
| | - Samantha Bersani
- ARC-Net Research Centre, University of Verona, Verona, Italy
- Department of Diagnostics and Public Health, University and Hospital Trust of Verona, Verona, Italy
| | - Davide Antonello
- General and Pancreatic Surgery Department, Pancreas Institute, University and Hospital Trust of Verona, Verona, Italy
| | - David Mansfield
- Division of Radiotherapy and Imaging, Institute of Cancer Research, London, UK
| | - Sara Cingarlini
- Department of Medicine, Medical Oncology, University and Hospital Trust of Verona, Verona, Italy
| | - Luca Landoni
- General and Pancreatic Surgery Department, Pancreas Institute, University and Hospital Trust of Verona, Verona, Italy
| | - Antonio Pea
- General and Pancreatic Surgery Department, Pancreas Institute, University and Hospital Trust of Verona, Verona, Italy
| | - Claudio Luchini
- ARC-Net Research Centre, University of Verona, Verona, Italy
- Department of Diagnostics and Public Health, University and Hospital Trust of Verona, Verona, Italy
| | - Liliana Piredda
- ARC-Net Research Centre, University of Verona, Verona, Italy
| | - Nagarajan Kannan
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota, USA
| | - Gift Nyamundanda
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
| | | | - Ian Chau
- Department of Medicine, Royal Marsden Hospital, London and Surrey, UK
| | - Bertram Wiedenmann
- Institut für Pathologie, Charite, Campus Virchow-Klinikum, University Medicine, Berlin, Germany
| | - Michele Milella
- Department of Medicine, Medical Oncology, University and Hospital Trust of Verona, Verona, Italy
| | - Alan Melcher
- Division of Radiotherapy and Imaging, Institute of Cancer Research, London, UK
| | - David Cunningham
- Department of Medicine, Royal Marsden Hospital, London and Surrey, UK
| | - Naureen Starling
- Department of Medicine, Royal Marsden Hospital, London and Surrey, UK
| | - Aldo Scarpa
- ARC-Net Research Centre, University of Verona, Verona, Italy
- Department of Diagnostics and Public Health, University and Hospital Trust of Verona, Verona, Italy
| | - Anguraj Sadanandam
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
- Centre for Molecular Pathology, Royal Marsden Hospital, London, UK
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
22
|
Comparative evaluation of pathways and gene expression profile similarity in differentiated stem cells versus normal adult cells in seven human tissues. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
23
|
Wilkins A, Fontana E, Nyamundanda G, Ragulan C, Patil Y, Mansfield D, Kingston J, Errington-Mais F, Bottomley D, von Loga K, Bye H, Carter P, Tinkler-Hundal E, Noshirwani A, Downs J, Dillon M, Demaria S, Sebag-Montefiore D, Harrington K, West N, Melcher A, Sadanandam A. Differential and longitudinal immune gene patterns associated with reprogrammed microenvironment and viral mimicry in response to neoadjuvant radiotherapy in rectal cancer. J Immunother Cancer 2021; 9:e001717. [PMID: 33678606 PMCID: PMC7939016 DOI: 10.1136/jitc-2020-001717] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/14/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Rectal cancers show a highly varied response to neoadjuvant radiotherapy/chemoradiation (RT/CRT) and the impact of the tumor immune microenvironment on this response is poorly understood. Current clinical tumor regression grading systems attempt to measure radiotherapy response but are subject to interobserver variation. An unbiased and unique histopathological quantification method (change in tumor cell density (ΔTCD)) may improve classification of RT/CRT response. Furthermore, immune gene expression profiling (GEP) may identify differences in expression levels of genes relevant to different radiotherapy responses: (1) at baseline between poor and good responders, and (2) longitudinally from preradiotherapy to postradiotherapy samples. Overall, this may inform novel therapeutic RT/CRT combination strategies in rectal cancer. METHODS We generated GEPs for 53 patients from biopsies taken prior to preoperative radiotherapy. TCD was used to assess rectal tumor response to neoadjuvant RT/CRT and ΔTCD was subjected to k-means clustering to classify patients into different response categories. Differential gene expression analysis was performed using statistical analysis of microarrays, pathway enrichment analysis and immune cell type analysis using single sample gene set enrichment analysis. Immunohistochemistry was performed to validate specific results. The results were validated using 220 pretreatment samples from publicly available datasets at metalevel of pathway and survival analyses. RESULTS ΔTCD scores ranged from 12.4% to -47.7% and stratified patients into three response categories. At baseline, 40 genes were significantly upregulated in poor (n=12) versus good responders (n=21), including myeloid and stromal cell genes. Of several pathways showing significant enrichment at baseline in poor responders, epithelial to mesenchymal transition, coagulation, complement activation and apical junction pathways were validated in external cohorts. Unlike poor responders, good responders showed longitudinal (preradiotherapy vs postradiotherapy samples) upregulation of 198 immune genes, reflecting an increased T-cell-inflamed GEP, type-I interferon and macrophage populations. Longitudinal pathway analysis suggested viral-like pathogen responses occurred in post-treatment resected samples compared with pretreatment biopsies in good responders. CONCLUSION This study suggests potentially druggable immune targets in poor responders at baseline and indicates that tumors with a good RT/CRT response reprogrammed from immune "cold" towards an immunologically "hot" phenotype on treatment with radiotherapy.
Collapse
Affiliation(s)
- Anna Wilkins
- Division of Radiotherapy and Imaging, Institute of Cancer Research, London, UK
- The Francis Crick Institute, London, UK
| | - Elisa Fontana
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
- Current Affiliation: Sarah Cannon Research Institute, London, UK
| | - Gift Nyamundanda
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
| | | | - Yatish Patil
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
| | - David Mansfield
- Division of Radiotherapy and Imaging, Institute of Cancer Research, London, UK
| | - Jennifer Kingston
- Leeds Institute of Medical Research at St. James's, University of Leeds, Leeds, UK
| | - Fiona Errington-Mais
- Leeds Institute of Medical Research at St. James's, University of Leeds, Leeds, UK
| | - Daniel Bottomley
- Leeds Institute of Medical Research at St. James's, University of Leeds, Leeds, UK
| | - Katharina von Loga
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
- The Royal Marsden Hospital, London, UK
| | - Hannah Bye
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
- The Royal Marsden Hospital, London, UK
| | - Paul Carter
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
- The Royal Marsden Hospital, London, UK
| | - Emma Tinkler-Hundal
- Leeds Institute of Medical Research at St. James's, University of Leeds, Leeds, UK
| | - Amir Noshirwani
- Leeds Institute of Medical Research at St. James's, University of Leeds, Leeds, UK
| | - Jessica Downs
- Division of Cancer Biology, Institute of Cancer Research, London, UK
| | - Magnus Dillon
- Division of Radiotherapy and Imaging, Institute of Cancer Research, London, UK
| | | | | | - Kevin Harrington
- Division of Radiotherapy and Imaging, Institute of Cancer Research, London, UK
| | - Nick West
- Leeds Institute of Medical Research at St. James's, University of Leeds, Leeds, UK
| | - Alan Melcher
- Division of Radiotherapy and Imaging, Institute of Cancer Research, London, UK
| | - Anguraj Sadanandam
- Division of Molecular Pathology, Institute of Cancer Research, London, UK
| |
Collapse
|
24
|
Hettegger P, Vierlinger K, Weinhaeusel A. Random rotation for identifying differentially expressed genes with linear models following batch effect correction. Bioinformatics 2021; 37:2142-2149. [PMID: 33523104 DOI: 10.1093/bioinformatics/btab063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 01/11/2021] [Accepted: 01/27/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Data generated from high-throughput technologies such as sequencing, microarray and bead-chip technologies are unavoidably affected by batch effects. Large effort has been put into developing methods for correcting these effects. Often, batch effect correction and hypothesis testing cannot be done with one single model, but are done successively with separate models in data analysis pipelines. This potentially leads to biased p-values or false discovery rates due to the influence of batch effect correction on the data. RESULTS We present a novel approach for estimating null distributions of test statistics in data analysis pipelines where batch effect correction is followed by linear model analysis. The approach is based on generating simulated datasets by random rotation and thereby retains the dependence structure of genes adequately. This allows estimating null distributions of dependent test statistics and thus the calculation of resampling based p-values and false discovery rates following batch effect correction while maintaining the alpha level. AVAILABILITY The described methods are implemented as randRotation package on Bioconductor: https://bioconductor.org/packages/randRotation/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peter Hettegger
- Competence Unit Molecular Diagnostics, Health and Environment Department, Austrian Institute of Technology, Vienna, 1220, Austria
| | - Klemens Vierlinger
- Competence Unit Molecular Diagnostics, Health and Environment Department, Austrian Institute of Technology, Vienna, 1220, Austria
| | - Andreas Weinhaeusel
- Competence Unit Molecular Diagnostics, Health and Environment Department, Austrian Institute of Technology, Vienna, 1220, Austria
| |
Collapse
|
25
|
Zhu T, Sun R, Zhang F, Chen GB, Yi X, Ruan G, Yuan C, Zhou S, Guo T. BatchServer: A Web Server for Batch Effect Evaluation, Visualization, and Correction. J Proteome Res 2020; 20:1079-1086. [PMID: 33338382 DOI: 10.1021/acs.jproteome.0c00488] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Batch effects are unwanted data variations that may obscure biological signals, leading to bias or errors in subsequent data analyses. Effective evaluation and elimination of batch effects are necessary for omics data analysis. In order to facilitate the evaluation and correction of batch effects, here we present BatchSever, an open-source R/Shiny based user-friendly interactive graphical web platform for batch effects analysis. In BatchServer, we introduced autoComBat, a modified version of ComBat, which is the most widely adopted tool for batch effect correction. BatchServer uses PVCA (Principal Variance Component Analysis) and UMAP (Manifold Approximation and Projection) for evaluation and visualization of batch effects. We demonstrate its applications in multiple proteomics and transcriptomic data sets. BatchServer is provided at https://lifeinfor.shinyapps.io/batchserver/ as a web server. The source codes are freely available at https://github.com/guomics-lab/batch_server.
Collapse
Affiliation(s)
- Tiansheng Zhu
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 200438 Shanghai, China.,Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 201203 Hangzhou, Zhejiang, China.,Westlake Laboratory of Life Sciences and Biomedicine, 201203 Hangzhou, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 201203 Hangzhou, Zhejiang, China
| | - Rui Sun
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 201203 Hangzhou, Zhejiang, China.,Westlake Laboratory of Life Sciences and Biomedicine, 201203 Hangzhou, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 201203 Hangzhou, Zhejiang, China
| | - Fangfei Zhang
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 201203 Hangzhou, Zhejiang, China.,Westlake Laboratory of Life Sciences and Biomedicine, 201203 Hangzhou, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 201203 Hangzhou, Zhejiang, China
| | - Guo-Bo Chen
- Clinical Research Institute, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, 310014, Hangzhou, Zhejiang, China
| | - Xiao Yi
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 201203 Hangzhou, Zhejiang, China.,Westlake Laboratory of Life Sciences and Biomedicine, 201203 Hangzhou, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 201203 Hangzhou, Zhejiang, China
| | - Guan Ruan
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 201203 Hangzhou, Zhejiang, China.,Westlake Laboratory of Life Sciences and Biomedicine, 201203 Hangzhou, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 201203 Hangzhou, Zhejiang, China
| | - Chunhui Yuan
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 201203 Hangzhou, Zhejiang, China.,Westlake Laboratory of Life Sciences and Biomedicine, 201203 Hangzhou, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 201203 Hangzhou, Zhejiang, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 200438 Shanghai, China
| | - Tiannan Guo
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 201203 Hangzhou, Zhejiang, China.,Westlake Laboratory of Life Sciences and Biomedicine, 201203 Hangzhou, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 201203 Hangzhou, Zhejiang, China
| |
Collapse
|
26
|
Pös Z, Pös O, Styk J, Mocova A, Strieskova L, Budis J, Kadasi L, Radvanszky J, Szemes T. Technical and Methodological Aspects of Cell-Free Nucleic Acids Analyzes. Int J Mol Sci 2020; 21:ijms21228634. [PMID: 33207777 PMCID: PMC7697251 DOI: 10.3390/ijms21228634] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 11/12/2020] [Accepted: 11/13/2020] [Indexed: 02/07/2023] Open
Abstract
Analyzes of cell-free nucleic acids (cfNAs) have shown huge potential in many biomedical applications, gradually entering several fields of research and everyday clinical care. Many biological properties of cfNAs can be informative to gain deeper insights into the function of the organism, such as their different types (DNA, RNAs) and subtypes (gDNA, mtDNA, bacterial DNA, miRNAs, etc.), forms (naked or vesicle bound NAs), fragmentation profiles, sequence composition, epigenetic modifications, and many others. On the other hand, the workflows of their analyzes comprise many important steps, from sample collection, storage and transportation, through extraction and laboratory analysis, up to bioinformatic analyzes and statistical evaluations, where each of these steps has the potential to affect the outcome and informational value of the performed analyzes. There are, however, no universal or standard protocols on how to exactly proceed when analyzing different cfNAs for different applications, at least according to our best knowledge. We decided therefore to prepare an overview of the available literature and products commercialized for cfNAs processing, in an attempt to summarize the benefits and limitations of the currently available approaches, devices, consumables, and protocols, together with various factors influencing the workflow, its processes, and outcomes.
Collapse
Affiliation(s)
- Zuzana Pös
- Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, 845 05 Bratislava, Slovakia; (Z.P.); (A.M.); (L.K.)
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia;
- Geneton Ltd., 841 04 Bratislava, Slovakia; (L.S.); (J.B.)
| | - Ondrej Pös
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia;
- Geneton Ltd., 841 04 Bratislava, Slovakia; (L.S.); (J.B.)
- Comenius University Science Park, Comenius University, 841 04 Bratislava, Slovakia;
| | - Jakub Styk
- Comenius University Science Park, Comenius University, 841 04 Bratislava, Slovakia;
- Faculty of Medicine, Institute of Medical Biology, Genetics and Clinical Genetics, 811 08 Bratislava, Slovakia
| | - Angelika Mocova
- Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, 845 05 Bratislava, Slovakia; (Z.P.); (A.M.); (L.K.)
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia;
| | | | - Jaroslav Budis
- Geneton Ltd., 841 04 Bratislava, Slovakia; (L.S.); (J.B.)
- Comenius University Science Park, Comenius University, 841 04 Bratislava, Slovakia;
- Slovak Center of Scientific and Technical Information, 811 04 Bratislava, Slovakia
| | - Ludevit Kadasi
- Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, 845 05 Bratislava, Slovakia; (Z.P.); (A.M.); (L.K.)
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia;
| | - Jan Radvanszky
- Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, 845 05 Bratislava, Slovakia; (Z.P.); (A.M.); (L.K.)
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia;
- Comenius University Science Park, Comenius University, 841 04 Bratislava, Slovakia;
- Correspondence: (J.R.); (T.S.); Tel.: +421-2-60296637 (J.R.); +421-2-9026-8807 (T.S.)
| | - Tomas Szemes
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04 Bratislava, Slovakia;
- Geneton Ltd., 841 04 Bratislava, Slovakia; (L.S.); (J.B.)
- Comenius University Science Park, Comenius University, 841 04 Bratislava, Slovakia;
- Correspondence: (J.R.); (T.S.); Tel.: +421-2-60296637 (J.R.); +421-2-9026-8807 (T.S.)
| |
Collapse
|
27
|
Li L, Chang L, Zhang X, Ning Z, Mayne J, Ye Y, Stintzi A, Liu J, Figeys D. Berberine and its structural analogs have differing effects on functional profiles of individual gut microbiomes. Gut Microbes 2020; 11:1348-1361. [PMID: 32372706 PMCID: PMC7524264 DOI: 10.1080/19490976.2020.1755413] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
The understanding of the effects of compounds on the gut microbiome is limited. In particular, it is unclear whether structurally similar compounds would have similar or distinct effects on the gut microbiome. Here, we selected berberine (BBR), an isoquinoline quaternary alkaloid, and 16 structural analogs and evaluated their effects on seven individual gut microbiomes cultured in vitro. The responses of the individual microbiomes were evaluated by metaproteomic profiles and by assessing butyrate production. We show that both interindividual differences and compound treatments significantly contributed to the variance of metaproteomic profiles. BBR and eight analogs led to changes in proteins involved in microbial defense and stress responses and enrichment of proteins from Verrucomicrobia, Proteobacteria, and Bacteroidetes phyla. It also led to a decrease in proteins from the Firmicutes phylum and its Clostridiales order which correlated to decrease proteins involved in the butyrate production pathway and butyrate concentration. Three of the compounds, sanguinarine, chelerythrine, and ethoxysanguinarine, activated bacterial protective mechanisms, enriched Proteobacteria, increased opacity proteins, and markedly reduced butyrate production. Dihydroberberine had a similar function to BBR in enriching the Akkermansia genus. In addition, it showed less overall adverse impacts on the functionality of the gut microbiome, including a better maintenance of the butyrate level. Our study shows that ex vivo microbiome assay can assess differential regulating effects of compounds with subtle differences and reveals that compound analogs can have distinct effects on the microbiome.
Collapse
Affiliation(s)
- Leyuan Li
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, Canada
| | - Lu Chang
- Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Xu Zhang
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, Canada
| | - Zhibin Ning
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, Canada
| | - Janice Mayne
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, Canada
| | - Yang Ye
- State Key Laboratory of Drug Research & Natural Products Chemistry Department, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China,Shanghai Institute of Materia Medica, University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, Shanghai, China
| | - Alain Stintzi
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, Canada,Shanghai Institute of Materia Medica, University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, Shanghai, China
| | - Jia Liu
- Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China,Shanghai Institute of Materia Medica, University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, Shanghai, China,Jia Liu Shanghai Institute of Materia Medica, University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, China
| | - Daniel Figeys
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, Canada,Shanghai Institute of Materia Medica, University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, Shanghai, China,Canadian Institute for Advanced Research, Toronto, Canada,CONTACT Daniel Figeys
| |
Collapse
|
28
|
Joshi SR, Jagtap S, Basu B, Deobagkar DD, Ghosh P. Construction, analysis and validation of co-expression network to understand stress adaptation in Deinococcus radiodurans R1. PLoS One 2020; 15:e0234721. [PMID: 32579573 PMCID: PMC7314050 DOI: 10.1371/journal.pone.0234721] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 06/02/2020] [Indexed: 01/12/2023] Open
Abstract
Systems biology based approaches have been effectively utilized to mine high throughput data. In the current study, we have performed system-level analysis for Deinococcus radiodurans R1 by constructing a gene co-expression network based on several microarray datasets available in the public domain. This condition-independent network was constructed by Weighted Gene Co-expression Network Analysis (WGCNA) with 61 microarray samples from 9 different experimental conditions. We identified 13 co-expressed modules, of which, 11 showed functional enrichments of one or more pathway/s or biological process. Comparative analysis of differentially expressed genes and proteins from radiation and desiccation stress studies with our co-expressed modules revealed the association of cyan with radiation response. Interestingly, two modules viz darkgreen and tan was associated with radiation as well as desiccation stress responses. The functional analysis of these modules showed enrichment of pathways important for adaptation of radiation or desiccation stress. To decipher the regulatory roles of these stress responsive modules, we identified transcription factors (TFs) and then calculated a Biweight mid correlation between modules hub gene and the identified TFs. We obtained 7 TFs for radiation and desiccation responsive modules. The expressions of 3 TFs were validated in response to gamma radiation using qRT-PCR. Along with the TFs, selected close neighbor genes of two important TFs, viz., DR_0997 (CRP) and DR_2287 (AsnC family transcriptional regulator) in the darkgreen module were also validated. In our network, among 13 hub genes associated with 13 modules, the functionality of 5 hub genes which are annotated as hypothetical proteins (hypothetical hub genes) in D. radiodurans genome has been revealed. Overall the study provided a better insight of pathways and regulators associated with relevant DNA damaging stress response in D. radiodurans.
Collapse
Affiliation(s)
- Suraj R. Joshi
- Bioinformatics Centre, Savitribai Phule Pune University, Pune, India
- Molecular Biology Research Laboratory, Department of Zoology, Savitribai Phule Pune University, Pune, India
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai, India
| | - Surabhi Jagtap
- Bioinformatics Centre, Savitribai Phule Pune University, Pune, India
| | - Bhakti Basu
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai, India
| | - Deepti D. Deobagkar
- Molecular Biology Research Laboratory, Department of Zoology, Savitribai Phule Pune University, Pune, India
| | - Payel Ghosh
- Bioinformatics Centre, Savitribai Phule Pune University, Pune, India
- * E-mail: ,
| |
Collapse
|
29
|
Li L, Ning Z, Zhang X, Mayne J, Cheng K, Stintzi A, Figeys D. RapidAIM: a culture- and metaproteomics-based Rapid Assay of Individual Microbiome responses to drugs. MICROBIOME 2020; 8:33. [PMID: 32160905 PMCID: PMC7066843 DOI: 10.1186/s40168-020-00806-z] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 02/12/2020] [Indexed: 05/20/2023]
Abstract
BACKGROUND Human-targeted drugs may exert off-target effects or can be repurposed to modulate the gut microbiota. However, our understanding of such effects is limited due to a lack of rapid and scalable assay to comprehensively assess microbiome responses to drugs. Drugs and other compounds can drastically change the overall abundance, taxonomic composition, and functions of a gut microbiome. RESULTS Here, we developed an approach to screen compounds against individual microbiomes in vitro, using metaproteomics to both measure absolute bacterial abundances and to functionally profile the microbiome. Our approach was evaluated by testing 43 compounds (including 4 antibiotics) against 5 individual microbiomes. The method generated technically highly reproducible readouts, including changes of overall microbiome abundance, microbiome composition, and functional pathways. Results show that besides the antibiotics, the compounds berberine and ibuprofen inhibited the accumulation of biomass during in vitro growth of the microbiota. By comparing genus and species level-biomass contributions, selective antibacterial-like activities were found with 35 of the 39 non-antibiotic compounds. Seven of the compounds led to a global alteration of the metaproteome, with apparent compound-specific patterns of functional responses. The taxonomic distributions of altered proteins varied among drugs, i.e., different drugs affect functions of different members of the microbiome. We also showed that bacterial function can shift in response to drugs without a change in the abundance of the bacteria. CONCLUSIONS Current drug-microbiome interaction studies largely focus on relative microbiome composition and microbial drug metabolism. In contrast, our workflow enables multiple insights into microbiome absolute abundance and functional responses to drugs. The workflow is robust, reproducible, and quantitative and is scalable for personalized high-throughput drug screening applications.
Collapse
Affiliation(s)
- Leyuan Li
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Canada
| | - Zhibin Ning
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Canada
| | - Xu Zhang
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Canada
| | - Janice Mayne
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Canada
| | - Kai Cheng
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Canada
| | - Alain Stintzi
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Canada.
| | - Daniel Figeys
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Canada.
- Canadian Institute for Advanced Research, Toronto, Canada.
| |
Collapse
|
30
|
Ugidos M, Tarazona S, Prats-Montalbán JM, Ferrer A, Conesa A. MultiBaC: A strategy to remove batch effects between different omic data types. Stat Methods Med Res 2020; 29:2851-2864. [PMID: 32131696 DOI: 10.1177/0962280220907365] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform-i.e. gene expression- is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects.
Collapse
Affiliation(s)
- Manuel Ugidos
- Gene expression and RNA Metabolism Laboratory, Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas (CSIC), Valencia, Spain
| | - Sonia Tarazona
- Multivariate Statistical Engineering Group, Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - José M Prats-Montalbán
- Multivariate Statistical Engineering Group, Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Alberto Ferrer
- Multivariate Statistical Engineering Group, Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Ana Conesa
- Microbiology and Cell Science Department, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, USA
| |
Collapse
|
31
|
Schmidt F, List M, Cukuroglu E, Köhler S, Göke J, Schulz MH. An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets. Bioinformatics 2019; 34:i908-i916. [PMID: 30423059 PMCID: PMC6129283 DOI: 10.1093/bioinformatics/bty553] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Motivation International consortia such as the Genotype-Tissue Expression (GTEx) project, The Cancer Genome Atlas (TCGA) or the International Human Epigenetics Consortium (IHEC) have produced a wealth of genomic datasets with the goal of advancing our understanding of cell differentiation and disease mechanisms. However, utilizing all of these data effectively through integrative analysis is hampered by batch effects, large cell type heterogeneity and low replicate numbers. To study if batch effects across datasets can be observed and adjusted for, we analyze RNA-seq data of 215 samples from ENCODE, Roadmap, BLUEPRINT and DEEP as well as 1336 samples from GTEx and TCGA. While batch effects are a considerable issue, it is non-trivial to determine if batch adjustment leads to an improvement in data quality, especially in cases of low replicate numbers. Results We present a novel method for assessing the performance of batch effect adjustment methods on heterogeneous data. Our method borrows information from the Cell Ontology to establish if batch adjustment leads to a better agreement between observed pairwise similarity and similarity of cell types inferred from the ontology. A comparison of state-of-the art batch effect adjustment methods suggests that batch effects in heterogeneous datasets with low replicate numbers cannot be adequately adjusted. Better methods need to be developed, which can be assessed objectively in the framework presented here. Availability and implementation Our method is available online at https://github.com/SchulzLab/OntologyEval. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Florian Schmidt
- Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.,Cluster of Excellence MMCI, Saarland University, Saarland Informatics Campus, Saarbrücken, Germany.,Graduate School of Computer Science, Saarland Informatics Campus, Saarbrücken, Germany.,Genome Institute of Singapore, Computational Genomics and Transcriptomics, Singapore
| | - Markus List
- Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.,Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Engin Cukuroglu
- Genome Institute of Singapore, Computational Genomics and Transcriptomics, Singapore
| | | | - Jonathan Göke
- Genome Institute of Singapore, Computational Genomics and Transcriptomics, Singapore
| | - Marcel H Schulz
- Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.,Cluster of Excellence MMCI, Saarland University, Saarland Informatics Campus, Saarbrücken, Germany.,Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main, Germany.,German Center for Cardiovascular Research, Partner Site Rhein-Main, Frankfurt am Main, Germany
| |
Collapse
|
32
|
Somekh J, Shen-Orr SS, Kohane IS. Batch correction evaluation framework using a-priori gene-gene associations: applied to the GTEx dataset. BMC Bioinformatics 2019; 20:268. [PMID: 31138121 PMCID: PMC6537327 DOI: 10.1186/s12859-019-2855-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 04/26/2019] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Correcting a heterogeneous dataset that presents artefacts from several confounders is often an essential bioinformatics task. Attempting to remove these batch effects will result in some biologically meaningful signals being lost. Thus, a central challenge is assessing if the removal of unwanted technical variation harms the biological signal that is of interest to the researcher. RESULTS We describe a novel framework, B-CeF, to evaluate the effectiveness of batch correction methods and their tendency toward over or under correction. The approach is based on comparing co-expression of adjusted gene-gene pairs to a-priori knowledge of highly confident gene-gene associations based on thousands of unrelated experiments derived from an external reference. Our framework includes three steps: (1) data adjustment with the desired methods (2) calculating gene-gene co-expression measurements for adjusted datasets (3) evaluating the performance of the co-expression measurements against a gold standard. Using the framework, we evaluated five batch correction methods applied to RNA-seq data of six representative tissue datasets derived from the GTEx project. CONCLUSIONS Our framework enables the evaluation of batch correction methods to better preserve the original biological signal. We show that using a multiple linear regression model to correct for known confounders outperforms factor analysis-based methods that estimate hidden confounders. The code is publicly available as an R package.
Collapse
Affiliation(s)
- Judith Somekh
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
- Faculty of Medicine, Technion – Israel Institute of Technology, Haifa, Israel
- Department of Information Systems, University of Haifa, Haifa, Israel
| | - Shai S Shen-Orr
- Faculty of Medicine, Technion – Israel Institute of Technology, Haifa, Israel
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| |
Collapse
|
33
|
Suprun M, Suárez-Fariñas M. PlateDesigner: a web-based application for the design of microplate experiments. Bioinformatics 2019; 35:1605-1607. [PMID: 30304481 PMCID: PMC6821189 DOI: 10.1093/bioinformatics/bty853] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 09/09/2018] [Accepted: 10/08/2018] [Indexed: 11/12/2022] Open
Abstract
SUMMARY In biological assays, systematic variability, known as a batch effect, can often confound the effects of true biological conditions and has been well documented for a variety of high-throughput technologies. In microplate-based multiplex experiments, such as Luminex or OLINK assays, researchers need to consider both position and plate effects. Those effects can be easily accounted for if the experiments are properly designed, which includes randomization of the samples across multiple experimental runs. However, doing the ad hoc randomization becomes challenging when handling multiple samples. PlateDesigner is the first web-based application that provides randomization for microplate experiments, ensuring that the main principles of the experimental design, such as grouping samples from the same biological units and balancing the distribution of experimental conditions, are applied. Creating randomizations with PlateDesigner is simple and the results can be exported in a variety of formats, and easily integrated with microplate readers and statistical analysis software. AVAILABILITY AND IMPLEMENTATION PlateDesigner is written in R/Shiny and is hosted online by the Center of Biostatistics at the Icahn School of Medicine at Mount Sinai. This application is freely available at platedesigner.net.
Collapse
Affiliation(s)
- Maria Suprun
- Department of Pediatrics, Allergy and Immunology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mayte Suárez-Fariñas
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
34
|
DEBrowser: interactive differential expression analysis and visualization tool for count data. BMC Genomics 2019; 20:6. [PMID: 30611200 PMCID: PMC6321710 DOI: 10.1186/s12864-018-5362-x] [Citation(s) in RCA: 143] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 12/11/2018] [Indexed: 01/09/2023] Open
Abstract
Background Sequencing data has become a standard measure of diverse cellular activities. For example, gene expression is accurately measured by RNA sequencing (RNA-Seq) libraries, protein-DNA interactions are captured by chromatin immunoprecipitation sequencing (ChIP-Seq), protein-RNA interactions by crosslinking immunoprecipitation sequencing (CLIP-Seq) or RNA immunoprecipitation (RIP-Seq) sequencing, DNA accessibility by assay for transposase-accessible chromatin (ATAC-Seq), DNase or MNase sequencing libraries. The processing of these sequencing techniques involves library-specific approaches. However, in all cases, once the sequencing libraries are processed, the result is a count table specifying the estimated number of reads originating from each genomic locus. Differential analysis to determine which loci have different cellular activity under different conditions starts with the count table and iterates through a cycle of data assessment, preparation and analysis. Such complex analysis often relies on multiple programs and is therefore a challenge for those without programming skills. Results We developed DEBrowser as an R bioconductor project to interactively visualize every step of the differential analysis, without programming. The application provides a rich and interactive web based graphical user interface built on R’s shiny infrastructure. DEBrowser allows users to visualize data with various types of graphs that can be explored further by selecting and re-plotting any desired subset of data. Using the visualization approaches provided, users can determine and correct technical variations such as batch effects and sequencing depth that affect differential analysis. We show DEBrowser’s ease of use by reproducing the analysis of two previously published data sets. Conclusions DEBrowser is a flexible, intuitive, web-based analysis platform that enables an iterative and interactive analysis of count data without any requirement of programming knowledge. Electronic supplementary material The online version of this article (10.1186/s12864-018-5362-x) contains supplementary material, which is available to authorized users.
Collapse
|
35
|
Jardillier R, Chatelain F, Guyon L. Bioinformatics Methods to Select Prognostic Biomarker Genes from Large Scale Datasets: A Review. Biotechnol J 2018; 13:e1800103. [DOI: 10.1002/biot.201800103] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 10/15/2018] [Indexed: 12/28/2022]
Affiliation(s)
- Rémy Jardillier
- University Grenoble Alpes, CEA, INSERMBiology of Cancer Infection UMR_S 103638000GrenobleFrance
- University Grenoble Alpes, CNRS, Grenoble INPGIPSA‐labInstitute of Engineering University Grenoble Alpes38000GrenobleFrance
| | - Florent Chatelain
- University Grenoble Alpes, CNRS, Grenoble INPGIPSA‐labInstitute of Engineering University Grenoble Alpes38000GrenobleFrance
| | - Laurent Guyon
- University Grenoble Alpes, CEA, INSERMBiology of Cancer Infection UMR_S 103638000GrenobleFrance
| |
Collapse
|