1
|
Yu Y, Mai Y, Zheng Y, Shi L. Assessing and mitigating batch effects in large-scale omics studies. Genome Biol 2024; 25:254. [PMID: 39363244 PMCID: PMC11447944 DOI: 10.1186/s13059-024-03401-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/23/2024] [Indexed: 10/05/2024] Open
Abstract
Batch effects in omics data are notoriously common technical variations unrelated to study objectives, and may result in misleading outcomes if uncorrected, or hinder biomedical discovery if over-corrected. Assessing and mitigating batch effects is crucial for ensuring the reliability and reproducibility of omics data and minimizing the impact of technical variations on biological interpretation. In this review, we highlight the profound negative impact of batch effects and the urgent need to address this challenging problem in large-scale omics studies. We summarize potential sources of batch effects, current progress in evaluating and correcting them, and consortium efforts aiming to tackle them.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
- Cancer Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| |
Collapse
|
2
|
Hoang N, Sardaripour N, Ramey GD, Schilling K, Liao E, Chen Y, Park JH, Bledsoe X, Landman BA, Gamazon ER, Benton ML, Capra JA, Rubinov M. Integration of estimated regional gene expression with neuroimaging and clinical phenotypes at biobank scale. PLoS Biol 2024; 22:e3002782. [PMID: 39269986 PMCID: PMC11424006 DOI: 10.1371/journal.pbio.3002782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 09/25/2024] [Accepted: 08/01/2024] [Indexed: 09/15/2024] Open
Abstract
An understanding of human brain individuality requires the integration of data on brain organization across people and brain regions, molecular and systems scales, as well as healthy and clinical states. Here, we help advance this understanding by leveraging methods from computational genomics to integrate large-scale genomic, transcriptomic, neuroimaging, and electronic-health record data sets. We estimated genetically regulated gene expression (gr-expression) of 18,647 genes, across 10 cortical and subcortical regions of 45,549 people from the UK Biobank. First, we showed that patterns of estimated gr-expression reflect known genetic-ancestry relationships, regional identities, as well as inter-regional correlation structure of directly assayed gene expression. Second, we performed transcriptome-wide association studies (TWAS) to discover 1,065 associations between individual variation in gr-expression and gray-matter volumes across people and brain regions. We benchmarked these associations against results from genome-wide association studies (GWAS) of the same sample and found hundreds of novel associations relative to these GWAS. Third, we integrated our results with clinical associations of gr-expression from the Vanderbilt Biobank. This integration allowed us to link genes, via gr-expression, to neuroimaging and clinical phenotypes. Fourth, we identified associations of polygenic gr-expression with structural and functional MRI phenotypes in the Human Connectome Project (HCP), a small neuroimaging-genomic data set with high-quality functional imaging data. Finally, we showed that estimates of gr-expression and magnitudes of TWAS were generally replicable and that the p-values of TWAS were replicable in large samples. Collectively, our results provide a powerful new resource for integrating gr-expression with population genetics of brain organization and disease.
Collapse
Affiliation(s)
- Nhung Hoang
- Department of Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Neda Sardaripour
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Grace D. Ramey
- Biological and Medical Informatics Division, University of California, San Francisco, California, United States of America
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, United States of America
| | - Kurt Schilling
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Emily Liao
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Yiting Chen
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jee Hyun Park
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Xavier Bledsoe
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Bennett A. Landman
- Department of Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Eric R. Gamazon
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Mary Lauren Benton
- Department of Computer Science, Baylor University, Waco, Texas, United States of America
| | - John A. Capra
- Department of Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, United States of America
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, United States of America
| | - Mikail Rubinov
- Department of Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Psychology, Vanderbilt University, Nashville, Tennessee, United States of America
- Howard Hughes Medical Institute Janelia Research Campus, Ashburn, Virginia, United States of America
| |
Collapse
|
3
|
Takahashi M, Chong HB, Zhang S, Yang TY, Lazarov MJ, Harry S, Maynard M, Hilbert B, White RD, Murrey HE, Tsou CC, Vordermark K, Assaad J, Gohar M, Dürr BR, Richter M, Patel H, Kryukov G, Brooijmans N, Alghali ASO, Rubio K, Villanueva A, Zhang J, Ge M, Makram F, Griesshaber H, Harrison D, Koglin AS, Ojeda S, Karakyriakou B, Healy A, Popoola G, Rachmin I, Khandelwal N, Neil JR, Tien PC, Chen N, Hosp T, van den Ouweland S, Hara T, Bussema L, Dong R, Shi L, Rasmussen MQ, Domingues AC, Lawless A, Fang J, Yoda S, Nguyen LP, Reeves SM, Wakefield FN, Acker A, Clark SE, Dubash T, Kastanos J, Oh E, Fisher DE, Maheswaran S, Haber DA, Boland GM, Sade-Feldman M, Jenkins RW, Hata AN, Bardeesy NM, Suvà ML, Martin BR, Liau BB, Ott CJ, Rivera MN, Lawrence MS, Bar-Peled L. DrugMap: A quantitative pan-cancer analysis of cysteine ligandability. Cell 2024; 187:2536-2556.e30. [PMID: 38653237 PMCID: PMC11143475 DOI: 10.1016/j.cell.2024.03.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/15/2024] [Accepted: 03/19/2024] [Indexed: 04/25/2024]
Abstract
Cysteine-focused chemical proteomic platforms have accelerated the clinical development of covalent inhibitors for a wide range of targets in cancer. However, how different oncogenic contexts influence cysteine targeting remains unknown. To address this question, we have developed "DrugMap," an atlas of cysteine ligandability compiled across 416 cancer cell lines. We unexpectedly find that cysteine ligandability varies across cancer cell lines, and we attribute this to differences in cellular redox states, protein conformational changes, and genetic mutations. Leveraging these findings, we identify actionable cysteines in NF-κB1 and SOX10 and develop corresponding covalent ligands that block the activity of these transcription factors. We demonstrate that the NF-κB1 probe blocks DNA binding, whereas the SOX10 ligand increases SOX10-SOX10 interactions and disrupts melanoma transcriptional signaling. Our findings reveal heterogeneity in cysteine ligandability across cancers, pinpoint cell-intrinsic features driving cysteine targeting, and illustrate the use of covalent probes to disrupt oncogenic transcription-factor activity.
Collapse
Affiliation(s)
- Mariko Takahashi
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA.
| | - Harrison B Chong
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Siwen Zhang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Tzu-Yi Yang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Matthew J Lazarov
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Stefan Harry
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | | | | | | | | | | | - Kira Vordermark
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Jonathan Assaad
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Magdy Gohar
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Benedikt R Dürr
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Marianne Richter
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Himani Patel
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | | | | | | | - Karla Rubio
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Antonio Villanueva
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Junbing Zhang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Maolin Ge
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Farah Makram
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Hanna Griesshaber
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Drew Harrison
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Ann-Sophie Koglin
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Samuel Ojeda
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Barbara Karakyriakou
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Alexander Healy
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - George Popoola
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Inbal Rachmin
- Cutaneous Biology Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Neha Khandelwal
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | | | - Pei-Chieh Tien
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Nicholas Chen
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA
| | - Tobias Hosp
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Sanne van den Ouweland
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Toshiro Hara
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lillian Bussema
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Rui Dong
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lei Shi
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Martin Q Rasmussen
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Ana Carolina Domingues
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Aleigha Lawless
- Department of Surgery, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jacy Fang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Satoshi Yoda
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Linh Phuong Nguyen
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Sarah Marie Reeves
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Farrah Nicole Wakefield
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Adam Acker
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Sarah Elizabeth Clark
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Taronish Dubash
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - John Kastanos
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Eugene Oh
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - David E Fisher
- Cutaneous Biology Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Shyamala Maheswaran
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Daniel A Haber
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Genevieve M Boland
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Surgery, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Surgery, Harvard Medical School, Boston, MA 02114, USA
| | - Moshe Sade-Feldman
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Russell W Jenkins
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Aaron N Hata
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Nabeel M Bardeesy
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Mario L Suvà
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA
| | | | - Brian B Liau
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | - Christopher J Ott
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Miguel N Rivera
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA
| | - Michael S Lawrence
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA.
| | - Liron Bar-Peled
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA.
| |
Collapse
|
4
|
Ma C, Zhang Y, Ding R, Chen H, Wu X, Xu L, Yu C. In search of the ratio of miRNA expression as robust biomarkers for constructing stable diagnostic models among multi-center data. Front Genet 2024; 15:1381917. [PMID: 38746057 PMCID: PMC11091382 DOI: 10.3389/fgene.2024.1381917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 04/10/2024] [Indexed: 05/16/2024] Open
Abstract
MicroRNAs (miRNAs) are promising biomarkers for the early detection of disease, and many miRNA-based diagnostic models have been constructed to distinguish patients and healthy individuals. To thoroughly utilize the miRNA-profiling data across different sequencing platforms or multiple centers, the models accounting the batch effects were demanded for the generalization of medical application. We conducted transcription factor (TF)-mediated miRNA-miRNA interaction network analysis and adopted the within-sample expression ratios of miRNA pairs as predictive markers. The ratio of the expression values between each miRNA pair turned out to be stable across multiple data sources. A genetic algorithm-based classifier was constructed to quantify risk scores of the probability of disease and discriminate disease states from normal states in discovery, with a validation dataset for COVID-19, renal cell carcinoma, and lung adenocarcinoma. The predictive models based on the expression ratio of interacting miRNA pairs demonstrated good performances in the discovery and validation datasets, and the classifier may be used accurately for the early detection of disease.
Collapse
Affiliation(s)
- Cuidie Ma
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Yonghao Zhang
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Rui Ding
- State Key Laboratory of Complex Severe and Rare Diseases, Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China
| | - Han Chen
- Shenyang Medical College, Shenyang, China
| | - Xudong Wu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Lida Xu
- Beijing Hotgen Biotech Co., Ltd., Beijing, China
| | - Changyuan Yu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| |
Collapse
|
5
|
Wu P, Li D, Zhang C, Dai B, Tang X, Liu J, Wu Y, Wang X, Shen A, Zhao J, Zi X, Li R, Sun N, He J. A unique circulating microRNA pairs signature serves as a superior tool for early diagnosis of pan-cancer. Cancer Lett 2024; 588:216655. [PMID: 38460724 DOI: 10.1016/j.canlet.2024.216655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 11/18/2023] [Accepted: 01/16/2024] [Indexed: 03/11/2024]
Abstract
Cancer remains a major burden globally and the critical role of early diagnosis is self-evident. Although various miRNA-based signatures have been developed in past decades, clinical utilization is limited due to a lack of precise cutoff value. Here, we innovatively developed a signature based on pairwise expression of miRNAs (miRPs) for pan-cancer diagnosis using machine learning approach. We analyzed miRNA spectrum of 15832 patients, who were divided into training, validation, test, and external test sets, with 13 different cancers from 10 cohorts. Five different machine-learning (ML) algorithms (XGBoost, SVM, RandomForest, LASSO, and Logistic) were adopted for signature construction. The best ML algorithm and the optimal number of miRPs included were identified using area under the curve (AUC) and youden index in validation set. The AUC of the best model was compared to previously published 25 signatures. Overall, Random Forest approach including 31 miRPs (31-miRP) was developed, proving highly efficient in cancer diagnosis across different datasets and cancer types (AUC range: 0.980-1.000). Regarding diagnosis of cancers at early stage, 31-miRP also exhibited high capacities, with AUC ranging from 0.961 to 0.998. Moreover, 31-miRP exhibited advantages in differentiating cancers from normal tissues (AUC range: 0.976-0.998) as well as differentiating cancers from corresponding benign lesions. Encouragingly, comparing to previously published 25 different signatures, 31-miRP also demonstrated clear advantages. In conclusion, 31-miRP acts as a powerful model for cancer diagnosis, characterized by high specificity and sensitivity as well as a clear cutoff value, thereby holding potential as a reliable tool for cancer diagnosis at early stage.
Collapse
Affiliation(s)
- Peng Wu
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Dongyu Li
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China; 4+4 Medical Doctor Program, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Chaoqi Zhang
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Bing Dai
- School of Software, Tsinghua University, Beijing, 100084, China
| | - Xiaoya Tang
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Jingjing Liu
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Yue Wu
- Department of Clinical Laboratory, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Xingwu Wang
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Ao Shen
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Jiapeng Zhao
- 4+4 Medical Doctor Program, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Xiaohui Zi
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Ruirui Li
- Department of Pathology, National Cancer Center/ National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Nan Sun
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| | - Jie He
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| |
Collapse
|
6
|
Khoa LTP, Yang W, Shan M, Zhang L, Mao F, Zhou B, Li Q, Malcore R, Harris C, Zhao L, Rao RC, Iwase S, Kalantry S, Bielas SL, Lyssiotis CA, Dou Y. Quiescence enables unrestricted cell fate in naive embryonic stem cells. Nat Commun 2024; 15:1721. [PMID: 38409226 PMCID: PMC10897426 DOI: 10.1038/s41467-024-46121-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 02/14/2024] [Indexed: 02/28/2024] Open
Abstract
Quiescence in stem cells is traditionally considered as a state of inactive dormancy or with poised potential. Naive mouse embryonic stem cells (ESCs) can enter quiescence spontaneously or upon inhibition of MYC or fatty acid oxidation, mimicking embryonic diapause in vivo. The molecular underpinning and developmental potential of quiescent ESCs (qESCs) are relatively unexplored. Here we show that qESCs possess an expanded or unrestricted cell fate, capable of generating both embryonic and extraembryonic cell types (e.g., trophoblast stem cells). These cells have a divergent metabolic landscape comparing to the cycling ESCs, with a notable decrease of the one-carbon metabolite S-adenosylmethionine. The metabolic changes are accompanied by a global reduction of H3K27me3, an increase of chromatin accessibility, as well as the de-repression of endogenous retrovirus MERVL and trophoblast master regulators. Depletion of methionine adenosyltransferase Mat2a or deletion of Eed in the polycomb repressive complex 2 results in removal of the developmental constraints towards the extraembryonic lineages. Our findings suggest that quiescent ESCs are not dormant but rather undergo an active transition towards an unrestricted cell fate.
Collapse
Affiliation(s)
- Le Tran Phuc Khoa
- Department of Medicine, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, 90033, USA
- Department of Molecular and Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Wentao Yang
- Department of Medicine, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, 90033, USA
| | - Mengrou Shan
- Department of Molecular and Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Li Zhang
- Department of Molecular and Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Fengbiao Mao
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing, China
| | - Bo Zhou
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Qiang Li
- Department of Ophthalmology & Visual Sciences, W.K. Kellogg Eye Center, University of Michigan, 1000 Wall St., Ann Arbor, MI, 48105, USA
| | - Rebecca Malcore
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Clair Harris
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Lili Zhao
- Beaumont Hospital, Wayne, 33155 Annapolis St., Wayne, MI, 48184, USA
| | - Rajesh C Rao
- Department of Ophthalmology & Visual Sciences, W.K. Kellogg Eye Center, University of Michigan, 1000 Wall St., Ann Arbor, MI, 48105, USA
| | - Shigeki Iwase
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Sundeep Kalantry
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Stephanie L Bielas
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Costas A Lyssiotis
- Department of Molecular and Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Yali Dou
- Department of Medicine, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, 90033, USA.
| |
Collapse
|
7
|
Diao Y, Zhao Y, Li X, Li B, Huo R, Han X. A simplified machine learning model utilizing platelet-related genes for predicting poor prognosis in sepsis. Front Immunol 2023; 14:1286203. [PMID: 38054005 PMCID: PMC10694245 DOI: 10.3389/fimmu.2023.1286203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/03/2023] [Indexed: 12/07/2023] Open
Abstract
Background Thrombocytopenia is a known prognostic factor in sepsis, yet the relationship between platelet-related genes and sepsis outcomes remains elusive. We developed a machine learning (ML) model based on platelet-related genes to predict poor prognosis in sepsis. The model underwent rigorous evaluation on six diverse platforms, ensuring reliable and versatile findings. Methods A retrospective analysis of platelet data from 365 sepsis patients confirmed the predictive role of platelet count in prognosis. We employed COX analysis, Least Absolute Shrinkage and Selection Operator (LASSO) and Support Vector Machine (SVM) techniques to identify platelet-related genes from the GSE65682 dataset. Subsequently, these genes were trained and validated on six distinct platforms comprising 719 patients, and compared against the Acute Physiology and Chronic Health Evaluation II (APACHE II) and Sequential Organ-Failure Assessment (SOFA) score. Results A PLT count <100×109/L independently increased the risk of death in sepsis patients (OR = 2.523; 95% CI: 1.084-5.872). The ML model, based on five platelet-related genes, demonstrated impressive area under the curve (AUC) values ranging from 0.5 to 0.795 across various validation platforms. On the GPL6947 platform, our ML model outperformed the APACHE II score with an AUC of 0.795 compared to 0.761. Additionally, by incorporating age, the model's performance was further improved to an AUC of 0.812. On the GPL4133 platform, the initial AUC of the machine learning model based on five platelet-related genes was 0.5. However, after including age, the AUC increased to 0.583. In comparison, the AUC of the APACHE II score was 0.604, and the AUC of the SOFA score was 0.542. Conclusion Our findings highlight the broad applicability of this ML model, based on platelet-related genes, in facilitating early treatment decisions for sepsis patients with poor outcomes. Our study paves the way for advancements in personalized medicine and improved patient care.
Collapse
Affiliation(s)
| | | | | | | | | | - Xiaoxu Han
- National Clinical Research Center for Laboratory Medicine, Department of Laboratory Medicine, The First Hospital of China Medical University, Shenyang, China
| |
Collapse
|
8
|
Abdallah N, Marion JM, Tauber C, Carlier T, Hatt M, Chauvet P. Enhancing histopathological image classification of invasive ductal carcinoma using hybrid harmonization techniques. Sci Rep 2023; 13:20014. [PMID: 37973797 PMCID: PMC10654662 DOI: 10.1038/s41598-023-46239-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 10/30/2023] [Indexed: 11/19/2023] Open
Abstract
This study aims to develop a robust pipeline for classifying invasive ductal carcinomas and benign tumors in histopathological images, addressing variability within and between centers. We specifically tackle the challenge of detecting atypical data and variability between common clusters within the same database. Our feature engineering-based pipeline comprises a feature extraction step, followed by multiple harmonization techniques to rectify intra- and inter-center batch effects resulting from image acquisition variability and diverse patient clinical characteristics. These harmonization steps facilitate the construction of more robust and efficient models. We assess the proposed pipeline's performance on two public breast cancer databases, BreaKHIS and IDCDB, utilizing recall, precision, and accuracy metrics. Our pipeline outperforms recent models, achieving 90-95% accuracy in classifying benign and malignant tumors. We demonstrate the advantage of harmonization for classifying patches from different databases. Our top model scored 94.7% for IDCDB and 95.2% for BreaKHis, surpassing existing feature engineering-based models (92.1% for IDCDB and 87.7% for BreaKHIS) and attaining comparable performance to deep learning models. The proposed feature-engineering-based pipeline effectively classifies malignant and benign tumors while addressing variability within and between centers through the incorporation of various harmonization techniques. Our findings reveal that harmonizing variabilities between patches from different batches directly impacts the learning and testing performance of classification models. This pipeline has the potential to enhance breast cancer diagnosis and treatment and may be applicable to other diseases.
Collapse
Affiliation(s)
- Nassib Abdallah
- LaTIM, INSERM, Université de Bretagne-Occidentale, Brest, France.
- LARIS, Université d'Angers, Angers, France.
| | | | | | | | - Mathieu Hatt
- LaTIM, INSERM, Université de Bretagne-Occidentale, Brest, France
| | | |
Collapse
|
9
|
Maselli F, D’Antona S, Utichi M, Arnaudi M, Castiglioni I, Porro D, Papaleo E, Gandellini P, Cava C. Computational analysis of five neurodegenerative diseases reveals shared and specific genetic loci. Comput Struct Biotechnol J 2023; 21:5395-5407. [PMID: 38022694 PMCID: PMC10651457 DOI: 10.1016/j.csbj.2023.10.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 10/09/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
Neurodegenerative diseases (ND) are heterogeneous disorders of the central nervous system that share a chronic and selective process of neuronal cell death. A computational approach to investigate shared genetic and specific loci was applied to 5 different ND: Amyotrophic lateral sclerosis (ALS), Alzheimer's disease (AD), Parkinson's disease (PD), Multiple sclerosis (MS), and Lewy body dementia (LBD). The datasets were analyzed separately, and then we compared the obtained results. For this purpose, we applied a genetic correlation analysis to genome-wide association datasets and revealed different genetic correlations with several human traits and diseases. In addition, a clumping analysis was carried out to identify SNPs genetically associated with each disease. We found 27 SNPs in AD, 6 SNPs in ALS, 10 SNPs in PD, 17 SNPs in MS, and 3 SNPs in LBD. Most of them are located in non-coding regions, with the exception of 5 SNPs on which a protein structure and stability prediction was performed to verify their impact on disease. Furthermore, an analysis of the differentially expressed miRNAs of the 5 examined pathologies was performed to reveal regulatory mechanisms that could involve genes associated with selected SNPs. In conclusion, the results obtained constitute an important step toward the discovery of diagnostic biomarkers and a better understanding of the diseases.
Collapse
Affiliation(s)
- Francesca Maselli
- Institute of Bioimaging and Molecular Physiology, National Research Council, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| | - Salvatore D’Antona
- Institute of Bioimaging and Molecular Physiology, National Research Council, Milan, Italy
| | - Mattia Utichi
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Lyngby, Technical University of Denmark
- Cancer Structural Biology, Danish Cancer Institute, Copenhagen, Denmark
| | - Matteo Arnaudi
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Lyngby, Technical University of Denmark
- Cancer Structural Biology, Danish Cancer Institute, Copenhagen, Denmark
| | - Isabella Castiglioni
- Department of Physics ‘‘Giuseppe Occhialini”, University of Milan, Bicocca, Italy
| | - Danilo Porro
- Institute of Bioimaging and Molecular Physiology, National Research Council, Milan, Italy
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Lyngby, Technical University of Denmark
- Cancer Structural Biology, Danish Cancer Institute, Copenhagen, Denmark
| | | | - Claudia Cava
- Institute of Bioimaging and Molecular Physiology, National Research Council, Milan, Italy
- Department of Science, Technology and Society, University School for Advanced Studies IUSS Pavia, Italy
| |
Collapse
|
10
|
He J, Yang H, Liu Z, Chen M, Ye Y, Tao Y, Li S, Fang J, Xu J, Wu X, Qi H. Elevated expression of glycolytic genes as a prominent feature of early-onset preeclampsia: insights from integrative transcriptomic analysis. Front Mol Biosci 2023; 10:1248771. [PMID: 37818100 PMCID: PMC10561389 DOI: 10.3389/fmolb.2023.1248771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 09/08/2023] [Indexed: 10/12/2023] Open
Abstract
Introduction: Preeclampsia (PE), a notable pregnancy-related disorder, leads to 40,000+ maternal deaths yearly. Recent research shows PE divides into early-onset (EOPE) and late-onset (LOPE) subtypes, each with distinct clinical features and outcomes. However, the molecular characteristics of various subtypes are currently subject to debate and are not consistent. Methods: We integrated transcriptomic expression data from a total of 372 placental samples across 8 publicly available databases via combat algorithm. Then, a variety of strategies including Random Forest Recursive Feature Elimination (RF-RFE), differential analysis, oposSOM, and Weighted Correlation Network Analysis were employed to identify the characteristic genes of the EOPE and LOPE subtypes. Finally, we conducted in vitro experiments on the key gene HK2 in HTR8/SVneo cells to explore its function. Results: Our results revealed a complex classification of PE placental samples, wherein EOPE manifests as a highly homogeneous sample group characterized by hypoxia and HIF1A activation. Among the core features is the upregulation of glycolysis-related genes, particularly HK2, in the placenta-an observation corroborated by independent validation data and single-cell data. Building on the pronounced correlation between HK2 and EOPE, we conducted in vitro experiments to assess the potential functional impact of HK2 on trophoblast cells. Additionally, the LOPE samples exhibit strong heterogeneity and lack distinct features, suggesting a complex molecular makeup for this subtype. Unsupervised clustering analysis indicates that LOPE likely comprises at least two distinct subtypes, linked to cell-environment interaction and cytokine and protein modification functionalities. Discussion: In summary, these findings elucidate potential mechanistic differences between the two PE subtypes, lend support to the hypothesis of classifying PE based on gestational weeks, and emphasize the potential significant role of glycolysis-related genes, especially HK2 in EOPE.
Collapse
Affiliation(s)
- Jie He
- Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing, China
- Joint International Research Laboratory of Reproduction and Development of Chinese Ministry of Education, Chongqing Medical University, Chongqing, China
| | - Huan Yang
- Department of Obstetrics, Chongqing University Three Gorges Hospital, Chongqing, China
| | - Zheng Liu
- Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing, China
- Joint International Research Laboratory of Reproduction and Development of Chinese Ministry of Education, Chongqing Medical University, Chongqing, China
| | - Miaomiao Chen
- Maternal and Child Health Hospital of Hubei Province, Wuhan, China
| | - Ying Ye
- Department of Cardiothoracic Surgery, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yuelan Tao
- Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing, China
- Joint International Research Laboratory of Reproduction and Development of Chinese Ministry of Education, Chongqing Medical University, Chongqing, China
| | - Shuhong Li
- Department of Oncology, Chengdu Second People’s Hospital, Chengdu, China
| | - Jie Fang
- Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing, China
- Joint International Research Laboratory of Reproduction and Development of Chinese Ministry of Education, Chongqing Medical University, Chongqing, China
| | - Jiacheng Xu
- Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing, China
- Joint International Research Laboratory of Reproduction and Development of Chinese Ministry of Education, Chongqing Medical University, Chongqing, China
| | - Xiafei Wu
- Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing, China
- Joint International Research Laboratory of Reproduction and Development of Chinese Ministry of Education, Chongqing Medical University, Chongqing, China
| | - Hongbo Qi
- Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing, China
- Joint International Research Laboratory of Reproduction and Development of Chinese Ministry of Education, Chongqing Medical University, Chongqing, China
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
11
|
Wang P, Paquet ÉR, Robert C. Comprehensive transcriptomic analysis of long non-coding RNAs in bovine ovarian follicles and early embryos. PLoS One 2023; 18:e0291761. [PMID: 37725621 PMCID: PMC10508637 DOI: 10.1371/journal.pone.0291761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 09/05/2023] [Indexed: 09/21/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) have been the subject of numerous studies over the past decade. First thought to come from aberrant transcriptional events, lncRNAs are now considered a crucial component of the genome with roles in multiple cellular functions. However, the functional annotation and characterization of bovine lncRNAs during early development remain limited. In this comprehensive analysis, we review lncRNAs expression in bovine ovarian follicles and early embryos, based on a unique database comprising 468 microarray hybridizations from a single platform designed to target 7,724 lncRNA transcripts, of which 5,272 are intergenic (lincRNA), 958 are intronic, and 1,524 are antisense (lncNAT). Compared to translated mRNA, lncRNAs have been shown to be more tissue-specific and expressed in low copy numbers. This analysis revealed that protein-coding genes and lncRNAs are both expressed more in oocytes. Differences between the oocyte and the 2-cell embryo are also more apparent in terms of lncRNAs than mRNAs. Co-expression network analysis using WGCNA generated 25 modules with differing proportions of lncRNAs. The modules exhibiting a higher proportion of lncRNAs were found to be associated with fewer annotated mRNAs and housekeeping functions. Functional annotation of co-expressed mRNAs allowed attribution of lncRNAs to a wide array of key cellular events such as meiosis, translation initiation, immune response, and mitochondrial related functions. We thus provide evidence that lncRNAs play diverse physiological roles that are tissue-specific and associated with key cellular functions alongside mRNAs in bovine ovarian follicles and early embryos. This contributes to add lncRNAs as active molecules in the complex regulatory networks driving folliculogenesis, oogenesis and early embryogenesis all of which are necessary for reproductive success.
Collapse
Affiliation(s)
- Pengmin Wang
- Département des sciences animales, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec City, Québec, Canada
| | - Éric R. Paquet
- Département des sciences animales, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec City, Québec, Canada
| | - Claude Robert
- Département des sciences animales, Faculté des sciences de l’agriculture et de l’alimentation, Université Laval, Québec City, Québec, Canada
| |
Collapse
|
12
|
Samadishadlou M, Rahbarghazi R, Piryaei Z, Esmaeili M, Avcı ÇB, Bani F, Kavousi K. Unlocking the potential of microRNAs: machine learning identifies key biomarkers for myocardial infarction diagnosis. Cardiovasc Diabetol 2023; 22:247. [PMID: 37697288 PMCID: PMC10496209 DOI: 10.1186/s12933-023-01957-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 08/10/2023] [Indexed: 09/13/2023] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) play a crucial role in regulating adaptive and maladaptive responses in cardiovascular diseases, making them attractive targets for potential biomarkers. However, their potential as novel biomarkers for diagnosing cardiovascular diseases requires systematic evaluation. METHODS In this study, we aimed to identify a key set of miRNA biomarkers using integrated bioinformatics and machine learning analysis. We combined and analyzed three gene expression datasets from the Gene Expression Omnibus (GEO) database, which contains peripheral blood mononuclear cell (PBMC) samples from individuals with myocardial infarction (MI), stable coronary artery disease (CAD), and healthy individuals. Additionally, we selected a set of miRNAs based on their area under the receiver operating characteristic curve (AUC-ROC) for separating the CAD and MI samples. We designed a two-layer architecture for sample classification, in which the first layer isolates healthy samples from unhealthy samples, and the second layer classifies stable CAD and MI samples. We trained different machine learning models using both biomarker sets and evaluated their performance on a test set. RESULTS We identified hsa-miR-21-3p, hsa-miR-186-5p, and hsa-miR-32-3p as the differentially expressed miRNAs, and a set including hsa-miR-186-5p, hsa-miR-21-3p, hsa-miR-197-5p, hsa-miR-29a-5p, and hsa-miR-296-5p as the optimum set of miRNAs selected by their AUC-ROC. Both biomarker sets could distinguish healthy from not-healthy samples with complete accuracy. The best performance for the classification of CAD and MI was achieved with an SVM model trained using the biomarker set selected by AUC-ROC, with an AUC-ROC of 0.96 and an accuracy of 0.94 on the test data. CONCLUSIONS Our study demonstrated that miRNA signatures derived from PBMCs could serve as valuable novel biomarkers for cardiovascular diseases.
Collapse
Affiliation(s)
- Mehrdad Samadishadlou
- Department of Medical Nanotechnology, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Rahbarghazi
- Stem Cell Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
- Department of Applied Cell Sciences, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Zeynab Piryaei
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Mahdad Esmaeili
- Medical Bioengineering Department, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Çığır Biray Avcı
- Medical Biology Department, School of Medicine, Ege University, İzmir, Türkiye
| | - Farhad Bani
- Department of Medical Nanotechnology, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran.
- Drug Applied Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.
| |
Collapse
|
13
|
Yu Y, Zhang N, Mai Y, Ren L, Chen Q, Cao Z, Chen Q, Liu Y, Hou W, Yang J, Hong H, Xu J, Tong W, Dong L, Shi L, Fang X, Zheng Y. Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method. Genome Biol 2023; 24:201. [PMID: 37674217 PMCID: PMC10483871 DOI: 10.1186/s13059-023-03047-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 05/18/2023] [Indexed: 09/08/2023] Open
Abstract
BACKGROUND Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. RESULTS As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. CONCLUSIONS Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | | | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
14
|
Rokavec M, Özcan E, Neumann J, Hermeking H. Development and Validation of a 15-gene Expression Signature with Superior Prognostic Ability in Stage II Colorectal Cancer. CANCER RESEARCH COMMUNICATIONS 2023; 3:1689-1700. [PMID: 37654625 PMCID: PMC10467603 DOI: 10.1158/2767-9764.crc-22-0489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/24/2023] [Accepted: 07/31/2023] [Indexed: 09/02/2023]
Abstract
Currently, there is no consensus about the use of adjuvant chemotherapy for patients with stage II colorectal cancer. Here, we aimed to identify and validate a prognostic mRNA expression signature for the stratification of patients with stage II colorectal cancer according to their risk for relapse. First, publicly available mRNA expression profiling datasets from 792 primary, stage II colorectal cancers from six different training cohorts were analyzed to identify genes that are consistently associated with patient relapse-free survival (RFS). Second, the identified gene expression signature was experimentally validated using NanoString technology and computationally refined on primary colorectal cancer samples from 205 patients with stage II colorectal cancer. Third, the refined signature was validated in two independent publicly available cohorts of 166 patients with stage II colorectal cancer. Bioinformatics analysis of training cohorts identified a 61-gene signature that was highly significantly associated with RFS (HR = 37.08, P = 2.68*10-106, sensitivity = 89.29%, specificity = 89.61%, and AUC = 0.937). The experimental validation and refinement revealed a 15-gene signature that robustly predicted relapse in three independent cohorts: an in-house cohort (HR = 20.4, P = 8.73*10-23, sensitivity = 90.32%, specificity = 80.99%, AUC = 0.812), GSE161158 (HR = 5.81, P = 3.57*10-4, sensitivity = 64.29%, specificity = 81.67%, AUC = 0.796), and GSE26906 (HR = 7.698, P = 7.26*10-8, sensitivity = 61.54%, specificity = 78.33%, AUC = 0.752). In the pooled training cohort, the 15-gene signature (HR = 4.72, P = 7.76*10-25, sensitivity = 75%, specificity = 67.44%, AUC = 0.784) was superior to the Oncotype DX colon 7-gene signature (HR = 2.698, P = 6.3*10-8, sensitivity = 62.16%, specificity = 55.5%, AUC = 0.633). We report the identification and validation of a novel mRNA expression signature for robust prognostication and stratification of patients with stage II colorectal cancer, with superior performance in the analyzed validation cohorts when compared with clinicopathologic biomarkers and signatures currently used for stage II colorectal cancer prognostication. Significance We identified and validated a 15-gene expression signature for robust prognostication and stratification of patients with stage II colorectal cancer, with superior performance when compared with currently used biomarkers. Therefore, the 15-gene expression signature has the potential to improve the prognostication and treatment decisions for patients with stage II colorectal cancer.
Collapse
Affiliation(s)
- Matjaz Rokavec
- Experimental and Molecular Pathology, Institute of Pathology, Faculty of Medicine, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Elif Özcan
- Experimental and Molecular Pathology, Institute of Pathology, Faculty of Medicine, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Jens Neumann
- Institute of Pathology, Faculty of Medicine, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Heiko Hermeking
- Experimental and Molecular Pathology, Institute of Pathology, Faculty of Medicine, Ludwig-Maximilians-Universität München, Munich, Germany
- German Cancer Consortium (DKTK), Partner site Munich, Munich, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
15
|
Mei T, Li Y, Orduña Dolado A, Li Z, Andersson R, Berliocchi L, Rasmussen LJ. Pooled analysis of frontal lobe transcriptomic data identifies key mitophagy gene changes in Alzheimer's disease brain. Front Aging Neurosci 2023; 15:1101216. [PMID: 37358952 PMCID: PMC10288858 DOI: 10.3389/fnagi.2023.1101216] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 05/18/2023] [Indexed: 06/28/2023] Open
Abstract
Background The growing prevalence of Alzheimer's disease (AD) is becoming a global health challenge without effective treatments. Defective mitochondrial function and mitophagy have recently been suggested as etiological factors in AD, in association with abnormalities in components of the autophagic machinery like lysosomes and phagosomes. Several large transcriptomic studies have been performed on different brain regions from AD and healthy patients, and their data represent a vast source of important information that can be utilized to understand this condition. However, large integration analyses of these publicly available data, such as AD RNA-Seq data, are still missing. In addition, large-scale focused analysis on mitophagy, which seems to be relevant for the aetiology of the disease, has not yet been performed. Methods In this study, publicly available raw RNA-Seq data generated from healthy control and sporadic AD post-mortem human samples of the brain frontal lobe were collected and integrated. Sex-specific differential expression analysis was performed on the combined data set after batch effect correction. From the resulting set of differentially expressed genes, candidate mitophagy-related genes were identified based on their known functional roles in mitophagy, the lysosome, or the phagosome, followed by Protein-Protein Interaction (PPI) and microRNA-mRNA network analysis. The expression changes of candidate genes were further validated in human skin fibroblast and induced pluripotent stem cells (iPSCs)-derived cortical neurons from AD patients and matching healthy controls. Results From a large dataset (AD: 589; control: 246) based on three different datasets (i.e., ROSMAP, MSBB, & GSE110731), we identified 299 candidate mitophagy-related differentially expressed genes (DEG) in sporadic AD patients (male: 195, female: 188). Among these, the AAA ATPase VCP, the GTPase ARF1, the autophagic vesicle forming protein GABARAPL1 and the cytoskeleton protein actin beta ACTB were selected based on network degrees and existing literature. Changes in their expression were further validated in AD-relevant human in vitro models, which confirmed their down-regulation in AD conditions. Conclusion Through the joint analysis of multiple publicly available data sets, we identify four differentially expressed key mitophagy-related genes potentially relevant for the pathogenesis of sporadic AD. Changes in expression of these four genes were validated using two AD-relevant human in vitro models, primary human fibroblasts and iPSC-derived neurons. Our results provide foundation for further investigation of these genes as potential biomarkers or disease-modifying pharmacological targets.
Collapse
Affiliation(s)
- Taoyu Mei
- Center for Healthy Aging, Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Yuan Li
- Center for Healthy Aging, Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Anna Orduña Dolado
- Center for Healthy Aging, Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Zhiquan Li
- Center for Healthy Aging, Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Robin Andersson
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Laura Berliocchi
- Center for Healthy Aging, Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark
- Department of Health Sciences, University Magna Græcia of Catanzaro, Catanzaro, Italy
| | - Lene Juel Rasmussen
- Center for Healthy Aging, Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
16
|
Stokes T, Cen HH, Kapranov P, Gallagher IJ, Pitsillides AA, Volmar C, Kraus WE, Johnson JD, Phillips SM, Wahlestedt C, Timmons JA. Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq. ADVANCED GENETICS (HOBOKEN, N.J.) 2023; 4:2200024. [PMID: 37288167 PMCID: PMC10242409 DOI: 10.1002/ggn2.202200024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 06/09/2023]
Abstract
Sequencing the human genome empowers translational medicine, facilitating transcriptome-wide molecular diagnosis, pathway biology, and drug repositioning. Initially, microarrays are used to study the bulk transcriptome; but now short-read RNA sequencing (RNA-seq) predominates. Positioned as a superior technology, that makes the discovery of novel transcripts routine, most RNA-seq analyses are in fact modeled on the known transcriptome. Limitations of the RNA-seq methodology have emerged, while the design of, and the analysis strategies applied to, arrays have matured. An equitable comparison between these technologies is provided, highlighting advantages that modern arrays hold over RNA-seq. Array protocols more accurately quantify constitutively expressed protein coding genes across tissue replicates, and are more reliable for studying lower expressed genes. Arrays reveal long noncoding RNAs (lncRNA) are neither sparsely nor lower expressed than protein coding genes. Heterogeneous coverage of constitutively expressed genes observed with RNA-seq, undermines the validity and reproducibility of pathway analyses. The factors driving these observations, many of which are relevant to long-read or single-cell sequencing are discussed. As proposed herein, a reappreciation of bulk transcriptomic methods is required, including wider use of the modern high-density array data-to urgently revise existing anatomical RNA reference atlases and assist with more accurate study of lncRNAs.
Collapse
Affiliation(s)
- Tanner Stokes
- Faculty of ScienceMcMaster UniversityHamiltonL8S 4L8Canada
| | - Haoning Howard Cen
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | - Iain J Gallagher
- School of Applied SciencesEdinburgh Napier UniversityEdinburghEH11 4BNUK
| | | | | | | | - James D. Johnson
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | | | - James A. Timmons
- Miller School of MedicineUniversity of MiamiMiamiFL33136USA
- William Harvey Research InstituteQueen Mary University LondonLondonEC1M 6BQUK
- Augur Precision Medicine LTDStirlingFK9 5NFUK
| |
Collapse
|
17
|
Ni A, Liu M, Qin LX. BatMan: Mitigating Batch Effects Via Stratification for Survival Outcome Prediction. JCO Clin Cancer Inform 2023; 7:e2200138. [PMID: 37335961 PMCID: PMC10530623 DOI: 10.1200/cci.22.00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 01/31/2023] [Indexed: 06/21/2023] Open
Abstract
Reproducible translation of transcriptomics data has been hampered by the ubiquitous presence of batch effects. Statistical methods for managing batch effects were initially developed in the setting of sample group comparison and later borrowed for other settings such as survival outcome prediction. The most notable such method is ComBat, which adjusts for batches by including it as a covariate alongside sample groups in a linear regression. In survival prediction, however, ComBat is used without definable groups for survival outcome and is done sequentially with survival regression for a potentially batch-confounded outcome. To address these issues, we propose a new method called BATch MitigAtion via stratificatioN (BatMan). It adjusts batches as strata in survival regression and uses variable selection methods such as the regularized regression to handle high dimensionality. We assess the performance of BatMan in comparison with ComBat, each used either alone or in conjunction with data normalization, in a resampling-based simulation study under various levels of predictive signal strength and patterns of batch-outcome association. Our simulations show that (1) BatMan outperforms ComBat in nearly all scenarios when there are batch effects in the data and (2) their performance can be worsened by the addition of data normalization. We further evaluate them using microRNA data for ovarian cancer from the Cancer Genome Atlas and find that BatMan outforms ComBat while the addition of data normalization worsens the prediction. Our study thus shows the advantage of BatMan and raises caution about the use of data normalization in the context of developing survival prediction models. The BatMan method and the simulation tool for performance assessment are implemented in R and publicly available at LXQin/PRECISION.survival-GitHub.
Collapse
Affiliation(s)
- Ai Ni
- Division of Biostatistics, College of Public Health, Ohio State University, Columbus, OH
| | - Mengling Liu
- Department of Population Health, New York University, New York, NY
| | - Li-Xuan Qin
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| |
Collapse
|
18
|
Liu Y, Wei X, Feng X, Liu Y, Feng G, Du Y. Repeatability of radiomics studies in colorectal cancer: a systematic review. BMC Gastroenterol 2023; 23:125. [PMID: 37059990 PMCID: PMC10105401 DOI: 10.1186/s12876-023-02743-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 03/22/2023] [Indexed: 04/16/2023] Open
Abstract
BACKGROUND Recently, radiomics has been widely used in colorectal cancer, but many variable factors affect the repeatability of radiomics research. This review aims to analyze the repeatability of radiomics studies in colorectal cancer and to evaluate the current status of radiomics in the field of colorectal cancer. METHODS The included studies in this review by searching from the PubMed and Embase databases. Then each study in our review was evaluated using the Radiomics Quality Score (RQS). We analyzed the factors that may affect the repeatability in the radiomics workflow and discussed the repeatability of the included studies. RESULTS A total of 188 studies was included in this review, of which only two (2/188, 1.06%) studies controlled the influence of individual factors. In addition, the median score of RQS was 11 (out of 36), range-1 to 27. CONCLUSIONS The RQS score was moderately low, and most studies did not consider the repeatability of radiomics features, especially in terms of Intra-individual, scanners, and scanning parameters. To improve the generalization of the radiomics model, it is necessary to further control the variable factors of repeatability.
Collapse
Affiliation(s)
- Ying Liu
- School of Medical Imaging, North Sichuan Medical College, Sichuan Province, Nanchong City, 637000, China
| | - Xiaoqin Wei
- School of Medical Imaging, North Sichuan Medical College, Sichuan Province, Nanchong City, 637000, China
| | | | - Yan Liu
- Department of Radiology, the Affiliated Hospital of North Sichuan Medical College, 1 Maoyuannan Road, Sichuan Province, 637000, Nanchong City, China
| | - Guiling Feng
- Department of Radiology, the Affiliated Hospital of North Sichuan Medical College, 1 Maoyuannan Road, Sichuan Province, 637000, Nanchong City, China
| | - Yong Du
- Department of Radiology, the Affiliated Hospital of North Sichuan Medical College, 1 Maoyuannan Road, Sichuan Province, 637000, Nanchong City, China.
| |
Collapse
|
19
|
Fajarda O, Almeida JR, Duarte-Pereira S, Silva RM, Oliveira JL. Methodology to identify a gene expression signature by merging microarray datasets. Comput Biol Med 2023; 159:106867. [PMID: 37060770 DOI: 10.1016/j.compbiomed.2023.106867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/01/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
Collapse
Affiliation(s)
- Olga Fajarda
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal.
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Sara Duarte-Pereira
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Raquel M Silva
- Universidade Católica Portuguesa, Faculty of Dental Medicine (FMD), Center for Interdisciplinary Research in Health (CIIS), Viseu, Portugal.
| | | |
Collapse
|
20
|
Zhang D, Wang Y, Zhao F, Yang Q. Integrated multiomics analyses unveil the implication of a costimulatory molecule score on tumor aggressiveness and immune evasion in breast cancer: A large-scale study through over 8,000 patients. Comput Biol Med 2023; 159:106866. [PMID: 37068318 DOI: 10.1016/j.compbiomed.2023.106866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 02/05/2023] [Accepted: 03/30/2023] [Indexed: 04/08/2023]
Abstract
BACKGROUND Although immunotherapy has revolutionised cancer management, reliable genomic biomarkers for identifying eligible patient subpopulations are lacking. Costimulatory molecules play a crucial role in mounting anti-tumour responses, and clinical trials targeting these novel biomarkers are underway. However, whether these molecules can determine tumour aggressiveness and the risk of tumour evasion in breast cancer (BC) remains largely unknown. METHODS The whole-tissue transcriptomic data of 8236 patients with BC from 15 independent cohorts were extracted. An integrated scoring system named 'costimulatory molecule score' (CMS) was constructed and sufficient validated using least absolute shrinkage and selection operator regression (1000 iterations) and the random survival forest algorithm (1000 trees). The correlation among CMSs, cancer genotypes and clinicopathological characteristics was examined. Extensive multiomics and immunogenomic analyses were performed to investigate and verify the association among CMSs, enriched pathways, potential intrinsic and extrinsic immune escape mechanisms, immunotherapy response and therapeutic options. RESULTS The predictive role of CMS model that relies on expression pattern of merely 5 costimulatory genes for prognosis is almost universally applicable to BC patients in a platform-independent manner. Through internal and external in silico validation, high CMS was characterized by favorable genotypes but decreased tumor immunogenicity, activation of stroma, immune-suppressive states and potential immunotherapeutic resistance. Similar results were observed in a real-world immunotherapy cohort and Pan-Cancer analysis. CONCLUSION This comprehensive characterization indicates CMS model may be complemented for predicting tumor aggressiveness and immune evasion in BC patients, underlining the future clinical potential for further exploration of resistance mechanisms and optimization of immunotherapeutic strategies.
Collapse
Affiliation(s)
- Dong Zhang
- Department of Breast Surgery, General Surgery, Qilu Hospital of Shandong University, Jinan, 250012, China; Department of Clinical Medicine, The First Clinical College, Shandong University, Jinan, 250012, China
| | - Yingnan Wang
- Department of Breast Surgery, General Surgery, Qilu Hospital of Shandong University, Jinan, 250012, China; Department of Clinical Medicine, The First Clinical College, Shandong University, Jinan, 250012, China
| | - Faming Zhao
- Key Laboratory of Environmental Health, Ministry of Education & Ministry of Environmental Protection, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Qifeng Yang
- Department of Breast Surgery, General Surgery, Qilu Hospital of Shandong University, Jinan, 250012, China; Pathology Tissue Bank, Qilu Hospital of Shandong University, Jinan, Shandong, 250012, China; Research Institute of Breast Cancer, Shandong University, Jinan, 250102, China.
| |
Collapse
|
21
|
Elingaard-Larsen LO, Villumsen SO, Justesen L, Thuesen ACB, Kim M, Ali M, Danielsen ER, Legido-Quigley C, van Hall G, Hansen T, Ahluwalia TS, Vaag AA, Brøns C. Circulating Metabolomic and Lipidomic Signatures Identify a Type 2 Diabetes Risk Profile in Low-Birth-Weight Men with Non-Alcoholic Fatty Liver Disease. Nutrients 2023; 15:nu15071590. [PMID: 37049431 PMCID: PMC10096690 DOI: 10.3390/nu15071590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/09/2023] [Accepted: 03/15/2023] [Indexed: 03/28/2023] Open
Abstract
The extent to which increased liver fat content influences differences in circulating metabolites and/or lipids between low-birth-weight (LBW) individuals, at increased risk of type 2 diabetes (T2D), and normal-birth-weight (NBW) controls is unknown. The objective of the study was to perform untargeted serum metabolomics and lipidomics analyses in 26 healthy, non-obese early-middle-aged LBW men, including five men with screen-detected and previously unrecognized non-alcoholic fatty liver disease (NAFLD), compared with 22 age- and BMI-matched NBW men (controls). While four metabolites (out of 65) and fifteen lipids (out of 279) differentiated the 26 LBW men from the 22 NBW controls (p ≤ 0.05), subgroup analyses of the LBW men with and without NAFLD revealed more pronounced differences, with 11 metabolites and 56 lipids differentiating (p ≤ 0.05) the groups. The differences in the LBW men with NAFLD included increased levels of ornithine and tyrosine (PFDR ≤ 0.1), as well as of triglycerides and phosphatidylcholines with shorter carbon-chain lengths and fewer double bonds. Pathway and network analyses demonstrated downregulation of transfer RNA (tRNA) charging, altered urea cycling, insulin resistance, and an increased risk of T2D in the LBW men with NAFLD. Our findings highlight the importance of increased liver fat in the pathogenesis of T2D in LBW individuals.
Collapse
|
22
|
Carry PM, Vigers T, Vanderlinden LA, Keeter C, Dong F, Buckner T, Litkowski E, Yang I, Norris JM, Kechris K. Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments. BMC Bioinformatics 2023; 24:86. [PMID: 36882691 PMCID: PMC9990331 DOI: 10.1186/s12859-023-05202-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 02/22/2023] [Indexed: 03/09/2023] Open
Abstract
BACKGROUND We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case-control study (30 per group) with a covariate (case vs control, represented as β1, set to be null) and two biologically relevant confounding variables (age, represented as β2, and hemoglobin A1c (HbA1c), represented as β3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the 'true' dataset (CAPN13 gene). RESULTS Pre-batch correction, under the null hypothesis (β1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (β2 and β3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses. CONCLUSIONS Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation.
Collapse
Affiliation(s)
- Patrick M Carry
- Colorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical Campus, 12631 E. 17Th Ave, Room 4602, Mail Stop B202, Aurora, CO, 80045, USA. .,Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA.
| | - Tim Vigers
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA.,Barbara Davis Center for Diabetes, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lauren A Vanderlinden
- Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA.,Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| | - Carson Keeter
- Colorado Program for Musculoskeletal Research, Department of Orthopedics, University of Colorado Anschutz Medical Campus, 12631 E. 17Th Ave, Room 4602, Mail Stop B202, Aurora, CO, 80045, USA
| | - Fran Dong
- Barbara Davis Center for Diabetes, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Teresa Buckner
- Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA
| | - Elizabeth Litkowski
- Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA
| | - Ivana Yang
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Jill M Norris
- Department of Epidemiology, Colorado School of Public Health, Aurora, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| |
Collapse
|
23
|
Juan H, Huang H. Quantitative analysis of high‐throughput biological data. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Hsueh‐Fen Juan
- Department of Life Science, Institute of Biomedical Electronics and Bioinformatics, and Center for Systems Biology National Taiwan University Taipei Taiwan
- Taiwan AI Labs Taipei Taiwan
| | - Hsuan‐Cheng Huang
- Institute of Biomedical Informatics National Yang Ming Chiao Tung University Taipei Taiwan
| |
Collapse
|
24
|
Gregori J, Sánchez À, Villanueva J. msmsEDA & msmsTests: Label-Free Differential Expression by Spectral Counts. Methods Mol Biol 2023; 2426:197-242. [PMID: 36308691 DOI: 10.1007/978-1-0716-1967-4_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
msmsTests is an R/Bioconductor package providing functions for statistical tests in label-free LC-MS/MS data by spectral counts. These functions aim at discovering differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package. The three models admit blocking factors to control for nuisance variables. To assure a good level of reproducibility a post-test filter is available, where (1) a minimum effect size considered biologically relevant, and (2) a minimum expression of the most abundant condition, may be set. A companion package, msmsEDA, proposes functions to explore datasets based on msms spectral counts. The provided graphics help in identifying outliers, the presence of eventual batch factors, and check the effects of different normalizing strategies. This protocol illustrates the use of both packages on two examples: A purely spike-in experiment of 48 human proteins in a standard yeast cell lysate; and a cancer cell-line secretome dataset requiring a biological normalization.
Collapse
Affiliation(s)
- Josep Gregori
- Vall Hebron Research Institute (VHIR), Barcelona, Spain.
| | - Àlex Sánchez
- VHIR, Barcelona, Spain
- Department of Genetics Statistics and Microbiology, UB, Barcelona, Spain
| | - Josep Villanueva
- Tumor Biomarkers Lab, Vall Hebron Institute of Oncology, Barcelona, Spain
| |
Collapse
|
25
|
Abstract
Applying computational statistics or machine learning methods to data is a key component of many scientific studies, in any field, but alone might not be sufficient to generate robust and reliable outcomes and results. Before applying any discovery method, preprocessing steps are necessary to prepare the data to the computational analysis. In this framework, data cleaning and feature engineering are key pillars of any scientific study involving data analysis and that should be adequately designed and performed since the first phases of the project. We call "feature" a variable describing a particular trait of a person or an observation, recorded usually as a column in a dataset. Even if pivotal, these data cleaning and feature engineering steps sometimes are done poorly or inefficiently, especially by beginners and unexperienced researchers. For this reason, we propose here our quick tips for data cleaning and feature engineering on how to carry out these important preprocessing steps correctly avoiding common mistakes and pitfalls. Although we designed these guidelines with bioinformatics and health informatics scenarios in mind, we believe they can more in general be applied to any scientific area. We therefore target these guidelines to any researcher or practitioners wanting to perform data cleaning or feature engineering. We believe our simple recommendations can help researchers and scholars perform better computational analyses that can lead, in turn, to more solid outcomes and more reliable discoveries.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Luca Oneto
- Dipartimento di Informatica Bioingegneria Robotica e Ingegneria dei Sistemi, Università di Genova, Genoa, Italy
- ZenaByte S.r.l., Genoa, Italy
| | - Erica Tavazzi
- Dipartimento di Ingegneria dell’Informazione, Università di Padova, Padua, Italy
| |
Collapse
|
26
|
Yang Q, Li B, Wang P, Xie J, Feng Y, Liu Z, Zhu F. LargeMetabo: an out-of-the-box tool for processing and analyzing large-scale metabolomic data. Brief Bioinform 2022; 23:6768054. [PMID: 36274234 DOI: 10.1093/bib/bbac455] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 09/06/2022] [Accepted: 09/24/2022] [Indexed: 12/14/2022] Open
Abstract
Large-scale metabolomics is a powerful technique that has attracted widespread attention in biomedical studies focused on identifying biomarkers and interpreting the mechanisms of complex diseases. Despite a rapid increase in the number of large-scale metabolomic studies, the analysis of metabolomic data remains a key challenge. Specifically, diverse unwanted variations and batch effects in processing many samples have a substantial impact on identifying true biological markers, and it is a daunting challenge to annotate a plethora of peaks as metabolites in untargeted mass spectrometry-based metabolomics. Therefore, the development of an out-of-the-box tool is urgently needed to realize data integration and to accurately annotate metabolites with enhanced functions. In this study, the LargeMetabo package based on R code was developed for processing and analyzing large-scale metabolomic data. This package is unique because it is capable of (1) integrating multiple analytical experiments to effectively boost the power of statistical analysis; (2) selecting the appropriate biomarker identification method by intelligent assessment for large-scale metabolic data and (3) providing metabolite annotation and enrichment analysis based on an enhanced metabolite database. The LargeMetabo package can facilitate flexibility and reproducibility in large-scale metabolomics. The package is freely available from https://github.com/LargeMetabo/LargeMetabo.
Collapse
Affiliation(s)
- Qingxia Yang
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, Chongqing 401331, China
| | - Panpan Wang
- College of Chemistry and Pharmaceutical Engineering, Huanghuai University, Zhumadian 463000, China
| | - Jicheng Xie
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Yuhao Feng
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Ziqiang Liu
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
27
|
Adamer MF, Brüningk SC, Tejada-Arranz A, Estermann F, Basler M, Borgwardt K. reComBat: batch-effect removal in large-scale multi-source gene-expression data integration. BIOINFORMATICS ADVANCES 2022; 2:vbac071. [PMID: 36699372 PMCID: PMC9710604 DOI: 10.1093/bioadv/vbac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/01/2022] [Accepted: 09/26/2022] [Indexed: 01/28/2023]
Abstract
Motivation With the steadily increasing abundance of omics data produced all over the world under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch-effect removal for entire databases lies in the large number of batches and biological variation, which can result in design matrix singularity. This problem can currently not be solved satisfactorily by any common batch-correction algorithm. Results We present reComBat, a regularized version of the empirical Bayes method to overcome this limitation and benchmark it against popular approaches for the harmonization of public gene-expression data (both microarray and bulkRNAsq) of the human opportunistic pathogen Pseudomonas aeruginosa. Batch-effects are successfully mitigated while biologically meaningful gene-expression variation is retained. reComBat fills the gap in batch-correction approaches applicable to large-scale, public omics databases and opens up new avenues for data-driven analysis of complex biological processes beyond the scope of a single study. Availability and implementation The code is available at https://github.com/BorgwardtLab/reComBat, all data and evaluation code can be found at https://github.com/BorgwardtLab/batchCorrectionPublicData. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | | | | | - Marek Basler
- Biozentrum, University of Basel, Basel 4056, Switzerland
| | - Karsten Borgwardt
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland,Swiss Institute for Bioinformatics (SIB), Lausanne 1015, Switzerland
| |
Collapse
|
28
|
Borisov N, Buzdin A. Transcriptomic Harmonization as the Way for Suppressing Cross-Platform Bias and Batch Effect. Biomedicines 2022; 10:2318. [PMID: 36140419 PMCID: PMC9496268 DOI: 10.3390/biomedicines10092318] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022] Open
Abstract
(1) Background: Emergence of methods interrogating gene expression at high throughput gave birth to quantitative transcriptomics, but also posed a question of inter-comparison of expression profiles obtained using different equipment and protocols and/or in different series of experiments. Addressing this issue is challenging, because all of the above variables can dramatically influence gene expression signals and, therefore, cause a plethora of peculiar features in the transcriptomic profiles. Millions of transcriptomic profiles were obtained and deposited in public databases of which the usefulness is however strongly limited due to the inter-comparison issues; (2) Methods: Dozens of methods and software packages that can be generally classified as either flexible or predefined format harmonizers have been proposed, but none has become to the date the gold standard for unification of this type of Big Data; (3) Results: However, recent developments evidence that platform/protocol/batch bias can be efficiently reduced not only for the comparisons of limited transcriptomic datasets. Instead, instruments were proposed for transforming gene expression profiles into the universal, uniformly shaped format that can support multiple inter-comparisons for reasonable calculation costs. This forms a basement for universal indexing of all or most of all types of RNA sequencing and microarray hybridization profiles; (4) Conclusions: In this paper, we attempted to overview the landscape of modern approaches and methods in transcriptomic harmonization and focused on the practical aspects of their application.
Collapse
Affiliation(s)
- Nicolas Borisov
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, 119435 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Anton Buzdin
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, 119435 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, 117997 Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), 1200 Brussels, Belgium
| |
Collapse
|
29
|
Liu H, Xing K, Jiang Y, Liu Y, Wang C, Ding X. Using Machine Learning to Identify Biomarkers Affecting Fat Deposition in Pigs by Integrating Multisource Transcriptome Information. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2022; 70:10359-10370. [PMID: 35953074 PMCID: PMC9413214 DOI: 10.1021/acs.jafc.2c03339] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/27/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
Fat deposition in pigs is not only closely related to pig production efficiency and pork quality but also an ideal model for human obesity. Transcriptome sequencing is widely used to study fat deposition. However, due to small sample sizes, high false positive rates, and poor consistency of results from different studies, new strategies are urgently needed. Machine learning, a new analysis method, can effectively fit complex data and accurately identify samples and genes. In this study, 36 samples of adipose tissue, muscle tissue, and liver tissue were collected from Songliao black pigs and Landrace pigs, and the mRNA of all the samples was sequenced. In addition, we collected transcriptome data for 64 samples in the GEO database from four different sources. After standardization and imputation of missing values in the data set comprising 100 samples, traditional differential expression analysis was carried out, and different numbers of expressed genes were selected as features for the training model of eight machine learning methods. In the 1000 replications of fourfold cross validation with 100 samples, AdaBoost performed best, with an average prediction accuracy greater than 93% and the highest mean area under the curve in predicting the high- and low-fat content groups among the eight ML methods. According to their performance-based ranks inferred by AdaBoost, 12 genes related to fat deposition were identified; among them, FASN and APOD were specifically expressed in adipose tissue, and APOA1 was specifically expressed in the liver, which could be important candidate biomarkers affecting fat deposition.
Collapse
|
30
|
Huang HH, Rao H, Miao R, Liang Y. A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression. BMC Bioinformatics 2022; 23:353. [PMID: 35999505 PMCID: PMC9396780 DOI: 10.1186/s12859-022-04887-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 08/10/2022] [Indexed: 12/22/2022] Open
Abstract
Background Gene expression analysis can provide useful information for analyzing complex biological mechanisms. However, many reported findings are unrepeatable due to small sample sizes relative to a large number of genes and the low signal-to-noise ratios of most gene expression datasets. Results Meta-analysis of multi-data sets is an efficient method for tackling the above problem. To improve the performance of meta-analysis, we propose a novel meta-analysis framework. It consists of two parts: (1) a novel data augmentation strategy. Various cross-platform normalization methods exist, which can preserve original biological information of gene expression datasets from different angles and add different “perturbations” to the dataset. Using such perturbation, we provide a feasible means for gene expression data augmentation; (2) elastic data shared lasso (DSL-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${{\varvec{L}}}_{\mathbf{2}}$$\end{document}L2). The DSL-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf{L}}_{\mathbf{2}}$$\end{document}L2 method spans the continuum between individual models for each dataset and one model for all datasets. It also overcomes the shortcomings of the data shared lasso method when dealing with highly correlated features. Comprehensive simulation experiment results show that the proposed method has high prediction and gene selection performance. We then apply the proposed method to non-small cell lung cancer (NSCLC) blood gene expression data in order to identify key tumor-related genes. The outcomes of our experiment indicate that the method could be used for identifying a set of robust disease-related gene signatures that may be used for NSCLC early diagnosis or prognosis or even targeting. Conclusion We propose a novel and effective meta-analysis method for biological research, extrapolating and integrating information from multiple gene expression datasets.
Collapse
Affiliation(s)
- Hai-Hui Huang
- Provincial Demonstration Software Institute, Shaoguan University, Shaoguan, China
| | - Hao Rao
- Provincial Demonstration Software Institute, Shaoguan University, Shaoguan, China
| | - Rui Miao
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Yong Liang
- The Peng Cheng Laboratory, Shenzhen, China.
| |
Collapse
|
31
|
Kumar R, Khatri A, Acharya V. Deep learning uncovers distinct behavior of rice network to pathogens response. iScience 2022; 25:104546. [PMID: 35754717 PMCID: PMC9218438 DOI: 10.1016/j.isci.2022.104546] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 05/06/2022] [Accepted: 06/02/2022] [Indexed: 12/15/2022] Open
Abstract
Rice, apart from abiotic stress, is prone to attack from multiple pathogens. Predominantly, the two rice pathogens, bacterial Xanthomonas oryzae (Xoo) and hemibiotrophic fungus, Magnaporthe oryzae, are extensively well explored for more than the last decade. However, because of lack of holistic studies, we design a deep learning-based rice network model (DLNet) that has explored the quantitative differences resulting in the distinct rice network architecture. Validation studies on rice in response to biotic stresses show that DLNet outperforms other machine learning methods. The current finding indicates the compactness of the rice PTI network and the rise of independent modules in the rice ETI network, resulting in similar patterns of the plant immune response. The results also show more independent network modules and minimum structural disorderness in rice-M. oryzae as compared to the rice-Xoo model revealing the different adaptation strategies of the rice plant to evade pathogen effectors.
Collapse
Affiliation(s)
- Ravi Kumar
- Functional Genomics and Complex System Lab, Biotechnology Division, The Himalayan Centre for High-throughput Computational Biology (HiCHiCoB, A BIC Supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Abhishek Khatri
- Functional Genomics and Complex System Lab, Biotechnology Division, The Himalayan Centre for High-throughput Computational Biology (HiCHiCoB, A BIC Supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh, India
| | - Vishal Acharya
- Functional Genomics and Complex System Lab, Biotechnology Division, The Himalayan Centre for High-throughput Computational Biology (HiCHiCoB, A BIC Supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| |
Collapse
|
32
|
Yuan Z, Murakoshi N, Xu D, Tajiri K, Okabe Y, Aonuma K, Murakata Y, Li S, Song Z, Shimoda Y, Mori H, Aonuma K, Ieda M. Identification of potential dilated cardiomyopathy-related targets by meta-analysis and co-expression analysis of human RNA-sequencing datasets. Life Sci 2022; 306:120807. [PMID: 35841977 DOI: 10.1016/j.lfs.2022.120807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/27/2022] [Accepted: 07/11/2022] [Indexed: 11/17/2022]
Abstract
AIMS Dilated cardiomyopathy (DCM) remains among the most refractory heart diseases because of its complicated pathogenesis, and the key molecules that cause it remain unclear. MAIN METHODS To elucidate the molecules and upstream pathways critical for DCM pathogenesis, we performed meta-analysis and co-expression analysis of RNA-sequencing (RNA-seq) datasets from publicly available databases. We analyzed three RNA-seq datasets containing comparisons of RNA expression in left ventricles between healthy controls and DCM patients. We extracted differentially expressed genes (DEGs) and clarified upstream regulators of cardiovascular disease-related DEGs by Ingenuity Pathway Analysis (IPA). Weighted Gene Co-expression Network Analysis (WGCNA) and Protein-Protein Interaction (PPI) analysis were also used to identify the hub gene candidates strongly associated with DCM. KEY FINDINGS In total, 406 samples (184 healthy, 222 DCM) were used in this study. Overall, 391 DEGs [absolute fold change (FC) ≥ 1.5; P < 0.01], including 221 upregulated and 170 downregulated ones in DCM, were extracted. Seven common hub genes (LUM, COL1A2, CXCL10, FMOD, COL3A1, ADAMTS4, MRC1) were finally screened. IPA showed several upstream transcriptional regulators, including activating (NFKBIA, TP73, CALR, NFKB1, KLF4) and inhibiting (CEBPA, PPARGC1A) ones. We further validated increased expression of several common hub genes in the transverse aortic constriction-induced heart failure model. SIGNIFICANCE In conclusion, meta-analysis and WGCNA using RNA-seq databases of DCM patients identified seven hub genes and seven upstream transcriptional regulators.
Collapse
Affiliation(s)
- Zixun Yuan
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Nobuyuki Murakoshi
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan.
| | - Dongzhu Xu
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Kazuko Tajiri
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Yuta Okabe
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Kazuhiro Aonuma
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Yoshiko Murakata
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Siqi Li
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Zonghu Song
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Yuzuno Shimoda
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Haruka Mori
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Kazutaka Aonuma
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| | - Masaki Ieda
- Department of Cardiology, Faculty of Medicine, University of Tsukuba, Tsukuba City, Japan
| |
Collapse
|
33
|
Sarafidis M, Lambrou GI, Zoumpourlis V, Koutsouris D. An Integrated Bioinformatics Analysis towards the Identification of Diagnostic, Prognostic, and Predictive Key Biomarkers for Urinary Bladder Cancer. Cancers (Basel) 2022; 14:cancers14143358. [PMID: 35884419 PMCID: PMC9319344 DOI: 10.3390/cancers14143358] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 07/03/2022] [Accepted: 07/06/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary Bladder cancer is evidently a challenge as far as its prognosis and treatment are concerned. The investigation of potential biomarkers and therapeutic targets is indispensable and still in progress. Most studies attempt to identify differential signatures between distinct molecular tumor subtypes. Therefore, keeping in mind the heterogeneity of urinary bladder tumors, we attempted to identify a consensus gene-related signature between the common expression profile of bladder cancer and control samples. In the quest for substantive features, we were able to identify key hub genes, whose signatures could hold diagnostic, prognostic, or therapeutic significance, but, primarily, could contribute to a better understanding of urinary bladder cancer biology. Abstract Bladder cancer (BCa) is one of the most prevalent cancers worldwide and accounts for high morbidity and mortality. This study intended to elucidate potential key biomarkers related to the occurrence, development, and prognosis of BCa through an integrated bioinformatics analysis. In this context, a systematic meta-analysis, integrating 18 microarray gene expression datasets from the GEO repository into a merged meta-dataset, identified 815 robust differentially expressed genes (DEGs). The key hub genes resulted from DEG-based protein–protein interaction and weighted gene co-expression network analyses were screened for their differential expression in urine and blood plasma samples of BCa patients. Subsequently, they were tested for their prognostic value, and a three-gene signature model, including COL3A1, FOXM1, and PLK4, was built. In addition, they were tested for their predictive value regarding muscle-invasive BCa patients’ response to neoadjuvant chemotherapy. A six-gene signature model, including ANXA5, CD44, NCAM1, SPP1, CDCA8, and KIF14, was developed. In conclusion, this study identified nine key biomarker genes, namely ANXA5, CDT1, COL3A1, SPP1, VEGFA, CDCA8, HJURP, TOP2A, and COL6A1, which were differentially expressed in urine or blood of BCa patients, held a prognostic or predictive value, and were immunohistochemically validated. These biomarkers may be of significance as prognostic and therapeutic targets for BCa.
Collapse
Affiliation(s)
- Michail Sarafidis
- Biomedical Engineering Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., 15780 Athens, Greece;
- Correspondence: ; Tel.: +30-210-772-2430
| | - George I. Lambrou
- Choremeio Research Laboratory, First Department of Pediatrics, National and Kapodistrian University of Athens, 8 Thivon & Levadeias Str., 11527 Athens, Greece;
- University Research Institute of Maternal and Child Health and Precision Medicine, National and Kapodistrian University of Athens, 8 Thivon & Levadeias Str., 11527 Athens, Greece
| | - Vassilis Zoumpourlis
- Biomedical Applications Unit, Institute of Chemical Biology, National Hellenic Research Foundation, 48 Vas. Konstantinou Ave., 11635 Athens, Greece;
| | - Dimitrios Koutsouris
- Biomedical Engineering Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., 15780 Athens, Greece;
| |
Collapse
|
34
|
Niu J, Yang J, Guo Y, Qian K, Wang Q. Joint deep learning for batch effect removal and classification toward MALDI MS based metabolomics. BMC Bioinformatics 2022; 23:270. [PMID: 35818047 PMCID: PMC9275160 DOI: 10.1186/s12859-022-04758-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 05/30/2022] [Indexed: 12/02/2022] Open
Abstract
Background Metabolomics is a primary omics topic, which occupies an important position in both clinical applications and basic researches for metabolic signatures and biomarkers. Unfortunately, the relevant studies are challenged by the batch effect caused by many external factors. In last decade, the technique of deep learning has become a dominant tool in data science, such that one may train a diagnosis network from a known batch and then generalize it to a new batch. However, the batch effect inevitably hinders such efforts, as the two batches under consideration can be highly mismatched. Results We propose an end-to-end deep learning framework, for joint batch effect removal and then classification upon metabolomics data. We firstly validate the proposed deep learning framework on a public CyTOF dataset as a simulated experiment. We also visually compare the t-SNE distribution and demonstrate that our method effectively removes the batch effects in latent space. Then, for a private MALDI MS dataset, we have achieved the highest diagnostic accuracy, with about 5.1 ~ 7.9% increase on average over state-of-the-art methods. Conclusions Both experiments conclude that our method performs significantly better in classification than conventional methods benefitting from the effective removal of batch effect. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04758-z.
Collapse
Affiliation(s)
- Jingyang Niu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Jing Yang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Yuyu Guo
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Kun Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
35
|
Kong J, Ha D, Lee J, Kim I, Park M, Im SH, Shin K, Kim S. Network-based machine learning approach to predict immunotherapy response in cancer patients. Nat Commun 2022; 13:3703. [PMID: 35764641 PMCID: PMC9240063 DOI: 10.1038/s41467-022-31535-6] [Citation(s) in RCA: 66] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 06/22/2022] [Indexed: 11/08/2022] Open
Abstract
Immune checkpoint inhibitors (ICIs) have substantially improved the survival of cancer patients over the past several years. However, only a minority of patients respond to ICI treatment (~30% in solid tumors), and current ICI-response-associated biomarkers often fail to predict the ICI treatment response. Here, we present a machine learning (ML) framework that leverages network-based analyses to identify ICI treatment biomarkers (NetBio) that can make robust predictions. We curate more than 700 ICI-treated patient samples with clinical outcomes and transcriptomic data, and observe that NetBio-based predictions accurately predict ICI treatment responses in three different cancer types-melanoma, gastric cancer, and bladder cancer. Moreover, the NetBio-based prediction is superior to predictions based on other conventional ICI treatment biomarkers, such as ICI targets or tumor microenvironment-associated markers. This work presents a network-based method to effectively select immunotherapy-response-associated biomarkers that can make robust ML-based predictions for precision oncology.
Collapse
Affiliation(s)
- JungHo Kong
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, 37673, Korea
| | - Doyeon Ha
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, 37673, Korea
| | - Juhun Lee
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, 37673, Korea
| | - Inhae Kim
- ImmunoBiome Inc., Pohang, 37666, Korea
| | - Minhyuk Park
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, 37673, Korea
| | - Sin-Hyeog Im
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, 37673, Korea
- ImmunoBiome Inc., Pohang, 37666, Korea
- Institute of Convergence Science, Yonsei University, Seoul, 03722, Korea
| | - Kunyoo Shin
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, 37673, Korea
- Institute of Convergence Science, Yonsei University, Seoul, 03722, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, 37673, Korea.
- Institute of Convergence Science, Yonsei University, Seoul, 03722, Korea.
| |
Collapse
|
36
|
Niu J, Xu W, Wei D, Qian K, Wang Q. Deep Learning Framework for Integrating Multibatch Calibration, Classification, and Pathway Activities. Anal Chem 2022; 94:8937-8946. [PMID: 35709357 DOI: 10.1021/acs.analchem.2c00601] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The amount of available biological data has exploded since the emergence of high-throughput technologies, which is not only revolting the way we recognize molecules and diseases but also bringing novel analytical challenges to bioinformatics analysis. In recent years, deep learning has become a dominant technique in data science. However, classification accuracy is plagued with domain discrepancy. Notably, in the presence of multiple batches, domain discrepancy typically happens between individual batches. Most pairwise adaptation approaches may be suboptimal as they fail to eliminate external factors across multiple batches and take the classification task into account simultaneously. We propose a joint deep learning framework for integrating batch effect removal, classification, and downstream pathway activities upon biological data. To this end, we validate it on two MALDI MS-based metabolomics datasets. We have achieved the highest diagnostic accuracy (ACC), with a notable ∼10% improvement over other methods. Overall, these results indicate that our approach removes batch effect more effectively than state-of-the-art methods and yields more accurate classification as well as biomarkers for smart diagnosis.
Collapse
Affiliation(s)
- JingYang Niu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Wei Xu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - DongMing Wei
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Kun Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
37
|
Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma. PLoS One 2022; 17:e0269570. [PMID: 35749395 PMCID: PMC9231717 DOI: 10.1371/journal.pone.0269570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 05/09/2022] [Indexed: 11/30/2022] Open
Abstract
Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction.
Collapse
|
38
|
Tamposis IA, Manios GA, Charitou T, Vennou KE, Kontou PI, Bagos PG. MAGE: An Open-Source Tool for Meta-Analysis of Gene Expression Studies. BIOLOGY 2022; 11:biology11060895. [PMID: 35741417 PMCID: PMC9220151 DOI: 10.3390/biology11060895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 06/05/2022] [Accepted: 06/08/2022] [Indexed: 11/16/2022]
Abstract
MAGE (Meta-Analysis of Gene Expression) is a Python open-source software package designed to perform meta-analysis and functional enrichment analysis of gene expression data. We incorporate standard methods for the meta-analysis of gene expression studies, bootstrap standard errors, corrections for multiple testing, and meta-analysis of multiple outcomes. Importantly, the MAGE toolkit includes additional features for the conversion of probes to gene identifiers, and for conducting functional enrichment analysis, with annotated results, of statistically significant enriched terms in several formats. Along with the tool itself, a web-based infrastructure was also developed to support the features of this package.
Collapse
Affiliation(s)
- Ioannis A. Tamposis
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece; (I.A.T.); (G.A.M.); (T.C.); (K.E.V.)
| | - Georgios A. Manios
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece; (I.A.T.); (G.A.M.); (T.C.); (K.E.V.)
| | - Theodosia Charitou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece; (I.A.T.); (G.A.M.); (T.C.); (K.E.V.)
| | - Konstantina E. Vennou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece; (I.A.T.); (G.A.M.); (T.C.); (K.E.V.)
| | | | - Pantelis G. Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece; (I.A.T.); (G.A.M.); (T.C.); (K.E.V.)
- Correspondence:
| |
Collapse
|
39
|
Reassessment of Reliability and Reproducibility for Triple-Negative Breast Cancer Subtyping. Cancers (Basel) 2022; 14:cancers14112571. [PMID: 35681552 PMCID: PMC9179838 DOI: 10.3390/cancers14112571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 05/05/2022] [Accepted: 05/06/2022] [Indexed: 11/17/2022] Open
Abstract
Simple Summary Triple-negative breast cancer (TNBC) is a heterogeneous disease. A proper classification system is needed to develop targetable biomarkers and guide personalized treatment in clinical practice. However, there has been no consensus on the molecular subtypes of TNBC, probably due to discrepancies in technical and computational methods chosen by different research groups. In this paper, we reassessed each major step for TNBC subtyping and provided suggestions, which promote rational workflow design and ensure reliable and reproducible results for future studies. We presented a recommended pipeline to the existing data, validated established TNBC subtypes with a larger sample size, and revealed two intermediate subtypes with prognostic significance. This work provides perspectives on issues and limitations regarding TNBC subtyping, indicating promising directions for developing targeted therapy based on the molecular characteristics of each TNBC subtype. Abstract Triple-negative breast cancer (TNBC) is a heterogeneous disease with diverse, often poor prognoses and treatment responses. In order to identify targetable biomarkers and guide personalized care, scientists have developed multiple molecular classification systems for TNBC based on transcriptomic profiling. However, there is no consensus on the molecular subtypes of TNBC, likely due to discrepancies in technical and computational methods used by different research groups. Here, we reassessed the major steps for TNBC subtyping, validated the reproducibility of established TNBC subtypes, and identified two more subtypes with a larger sample size. By comparing results from different workflows, we demonstrated the limitations of formalin-fixed, paraffin-embedded samples, as well as batch effect removal across microarray platforms. We also refined the usage of computational tools for TNBC subtyping. Furthermore, we integrated high-quality multi-institutional TNBC datasets (discovery set: n = 457; validation set: n = 165). Performing unsupervised clustering on the discovery and validation sets independently, we validated four previously discovered subtypes: luminal androgen receptor, mesenchymal, immunomodulatory, and basal-like immunosuppressed. Additionally, we identified two potential intermediate states of TNBC tumors based on their resemblance with more than one well-characterized subtype. In summary, we addressed the issues and limitations of previous TNBC subtyping through comprehensive analyses. Our results promote the rational design of future subtyping studies and provide new insights into TNBC patient stratification.
Collapse
|
40
|
Augustine J, Jereesh AS. Blood-based gene-expression biomarkers identification for the non-invasive diagnosis of Parkinson's disease using two-layer hybrid feature selection. Gene X 2022; 823:146366. [PMID: 35202733 DOI: 10.1016/j.gene.2022.146366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 02/15/2022] [Accepted: 02/18/2022] [Indexed: 11/19/2022] Open
Abstract
Parkinson's disease (PD) is one of the most prevalent neurodegenerative diseases. Understanding the molecular mechanism and identifying potential biomarkers of PD promote effective treatments to the patients. Due to less invasiveness and easy accessibility, biomarkers from blood support early detection and diagnosis of PD. This study combined three independent PD microarray gene expression data from blood samples applying the early integration approach. Moderated t-statistics was employed to identify differentially expressed genes (DEGs). Relevant genes were selected using a two-layer embedded wrapper feature selection method with gradient boosting machine (GBM) in the first layer followed by an ensemble of wrappers including Recursive Feature Elimination (RFE), Genetic algorithm (GA) and Bi-directional elimination (Stepwise). All three wrappers were based on logistic regression classifier (LR). The PD-predictability of the generated signature was tested using nine supervised classification models, including eight shallow machine learning and one deep learning. On an independent dataset, GSE72267, Support Vector Machine-Radial (SVMR), and Deep Neural Network (DNN) showed the best performance with AUC 0.821 and 0.82, respectively. Comparison with existing blood-based PD signatures and the biological analysis verified the reliability of the proposed signature.
Collapse
Affiliation(s)
- Jisha Augustine
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Kerala 682022, India.
| | - A S Jereesh
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Kerala 682022, India.
| |
Collapse
|
41
|
Zheng D, Zhu Y, Zhang J, Zhang W, Wang H, Chen H, Wu C, Ni J, Xu X, Nian B, Chen S, Wang B, Li X, Zhang Y, Zhang J, Zhong W, Xiong L, Li F, Zhang D, Xu J, Jiang G. Identification and evaluation of circulating small extracellular vesicle microRNAs as diagnostic biomarkers for patients with indeterminate pulmonary nodules. J Nanobiotechnology 2022; 20:172. [PMID: 35366907 PMCID: PMC8976298 DOI: 10.1186/s12951-022-01366-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 03/10/2022] [Indexed: 12/13/2022] Open
Abstract
Background The identification of indeterminate pulmonary nodules (IPNs) following a low-dose computed tomography (LDCT) is a major challenge for early diagnosis of lung cancer. The inadequate assessment of IPNs’ malignancy risk results in a large number of unnecessary surgeries or an increased risk of cancer metastases. However, limited studies on non-invasive diagnosis of IPNs have been reported. Methods In this study, we identified and evaluated the diagnostic value of circulating small extracellular vesicle (sEV) microRNAs (miRNAs) in patients with IPNs that had been newly detected using LDCT scanning and were scheduled for surgery. Out of 459 recruited patients, 109 eligible patients with IPNs were enrolled in the training cohort (n = 47) and the test cohort (n = 62). An external cohort (n = 99) was used for validation. MiRNAs were extracted from plasma sEVs, and assessed using Small RNA sequencing. 490 lung adenocarcinoma samples and follow-up data were used to investigate the role of miRNAs in overall survival. Results A circulating sEV miRNA (CirsEV-miR) model was constructed from five differentially expressed miRNAs (DEMs), showing 0.920 AUC in the training cohort (n = 47), and further identified in the test cohort (n = 62) and in an external validation cohort (n = 99). Among five DEMs of the CirsEV-miR model, miR-101-3p and miR-150-5p were significantly associated with better overall survival (p = 0.0001 and p = 0.0069). The CirsEV-miR scores were calculated, which significantly correlated with IPNs diameters (p < 0.05), and were able to discriminate between benign and malignant PNs (diameter ≤ 1 cm). The expression patterns of sEV miRNAs in the benign, adenocarcinoma in situ/minimally invasive adenocarcinoma, and invasive adenocarcinoma subgroups were found to gradually change with the increase in aggressiveness for the first time. Among all DEMs of the three subgroups, five miRNAs (miR-30c-5p, miR-30e-5p, miR-500a-3p, miR-125a-5p, and miR-99a-5p) were also significantly associated with overall survival of lung adenocarcinoma patients. Conclusions Our results indicate that the CirsEV-miR model could help distinguish between benign and malignant PNs, providing insights into the feasibility of circulating sEV miRNAs in diagnostic biomarker development. Trial registration: Chinese Clinical Trials: ChiCTR1800019877. Registered 05 December 2018, https://www.chictr.org.cn/showproj.aspx?proj=31346. Graphical Abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1186/s12951-022-01366-0.
Collapse
|
42
|
Liu YE, Saul S, Rao AM, Robinson ML, Agudelo Rojas OL, Sanz AM, Verghese M, Solis D, Sibai M, Huang CH, Sahoo MK, Gelvez RM, Bueno N, Estupiñan Cardenas MI, Villar Centeno LA, Rojas Garrido EM, Rosso F, Donato M, Pinsky BA, Einav S, Khatri P. An 8-gene machine learning model improves clinical prediction of severe dengue progression. Genome Med 2022; 14:33. [PMID: 35346346 PMCID: PMC8959795 DOI: 10.1186/s13073-022-01034-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 02/24/2022] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Each year 3-6 million people develop life-threatening severe dengue (SD). Clinical warning signs for SD manifest late in the disease course and are nonspecific, leading to missed cases and excess hospital burden. Better SD prognostics are urgently needed. METHODS We integrated 11 public datasets profiling the blood transcriptome of 365 dengue patients of all ages and from seven countries, encompassing biological, clinical, and technical heterogeneity. We performed an iterative multi-cohort analysis to identify differentially expressed genes (DEGs) between non-severe patients and SD progressors. Using only these DEGs, we trained an XGBoost machine learning model on public data to predict progression to SD. All model parameters were "locked" prior to validation in an independent, prospectively enrolled cohort of 377 dengue patients in Colombia. We measured expression of the DEGs in whole blood samples collected upon presentation, prior to SD progression. We then compared the accuracy of the locked XGBoost model and clinical warning signs in predicting SD. RESULTS We identified eight SD-associated DEGs in the public datasets and built an 8-gene XGBoost model that accurately predicted SD progression in the independent validation cohort with 86.4% (95% CI 68.2-100) sensitivity and 79.7% (95% CI 75.5-83.9) specificity. Given the 5.8% proportion of SD cases in this cohort, the 8-gene model had a positive and negative predictive value (PPV and NPV) of 20.9% (95% CI 16.7-25.6) and 99.0% (95% CI 97.7-100.0), respectively. Compared to clinical warning signs at presentation, which had 77.3% (95% CI 58.3-94.1) sensitivity and 39.7% (95% CI 34.7-44.9) specificity, the 8-gene model led to an 80% reduction in the number needed to predict (NNP) from 25.4 to 5.0. Importantly, the 8-gene model accurately predicted subsequent SD in the first three days post-fever onset and up to three days prior to SD progression. CONCLUSIONS The 8-gene XGBoost model, trained on heterogeneous public datasets, accurately predicted progression to SD in a large, independent, prospective cohort, including during the early febrile stage when SD prediction remains clinically difficult. The model has potential to be translated to a point-of-care prognostic assay to reduce dengue morbidity and mortality without overwhelming limited healthcare resources.
Collapse
Affiliation(s)
- Yiran E. Liu
- grid.168010.e0000000419368956Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA Stanford, USA ,grid.168010.e0000000419368956Cancer Biology Graduate Program, School of Medicine, Stanford University, CA Stanford, USA ,grid.168010.e0000000419368956Division of Infectious Diseases and Geographic Medicine, Department of Medicine, School of Medicine, Stanford University, CA Stanford, USA
| | - Sirle Saul
- grid.168010.e0000000419368956Division of Infectious Diseases and Geographic Medicine, Department of Medicine, School of Medicine, Stanford University, CA Stanford, USA
| | - Aditya Manohar Rao
- grid.168010.e0000000419368956Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA Stanford, USA ,grid.168010.e0000000419368956Immunology Graduate Program, School of Medicine, Stanford University, CA Stanford, USA
| | - Makeda Lucretia Robinson
- grid.168010.e0000000419368956Division of Infectious Diseases and Geographic Medicine, Department of Medicine, School of Medicine, Stanford University, CA Stanford, USA ,grid.168010.e0000000419368956Department of Pathology, School of Medicine, Stanford University, CA Stanford, USA
| | | | - Ana Maria Sanz
- grid.477264.4Clinical Research Center, Fundación Valle del Lili, Cali, Colombia
| | - Michelle Verghese
- grid.168010.e0000000419368956Department of Pathology, School of Medicine, Stanford University, CA Stanford, USA
| | - Daniel Solis
- grid.168010.e0000000419368956Department of Pathology, School of Medicine, Stanford University, CA Stanford, USA
| | - Mamdouh Sibai
- grid.168010.e0000000419368956Department of Pathology, School of Medicine, Stanford University, CA Stanford, USA
| | - Chun Hong Huang
- grid.168010.e0000000419368956Department of Pathology, School of Medicine, Stanford University, CA Stanford, USA
| | - Malaya Kumar Sahoo
- grid.168010.e0000000419368956Department of Pathology, School of Medicine, Stanford University, CA Stanford, USA
| | - Rosa Margarita Gelvez
- Centro de Atención y Diagnóstico de Enfermedades Infecciosas (CDI), Bucaramanga, Colombia
| | - Nathalia Bueno
- Centro de Atención y Diagnóstico de Enfermedades Infecciosas (CDI), Bucaramanga, Colombia
| | | | | | | | - Fernando Rosso
- grid.477264.4Clinical Research Center, Fundación Valle del Lili, Cali, Colombia ,grid.477264.4Division of Infectious Diseases, Department of Internal Medicine, Fundación Valle del Lili, Cali, Colombia
| | - Michele Donato
- grid.168010.e0000000419368956Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA Stanford, USA ,grid.168010.e0000000419368956Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA Stanford, USA
| | - Benjamin A. Pinsky
- grid.168010.e0000000419368956Division of Infectious Diseases and Geographic Medicine, Department of Medicine, School of Medicine, Stanford University, CA Stanford, USA ,grid.168010.e0000000419368956Department of Pathology, School of Medicine, Stanford University, CA Stanford, USA
| | - Shirit Einav
- grid.168010.e0000000419368956Division of Infectious Diseases and Geographic Medicine, Department of Medicine, School of Medicine, Stanford University, CA Stanford, USA ,grid.168010.e0000000419368956Department of Microbiology and Immunology, School of Medicine, Stanford University, CA Stanford, USA
| | - Purvesh Khatri
- grid.168010.e0000000419368956Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA Stanford, USA ,grid.168010.e0000000419368956Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA Stanford, USA
| |
Collapse
|
43
|
Bajo-Morales J, Prieto-Prieto JC, Herrera LJ, Rojas I, Castillo-Secilla D. COVID-19 Biomarkers Recognition & Classification Using Intelligent Systems. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220328125029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Background:
SARS-CoV-2 has paralyzed mankind due to its high transmissibility and its associated mortality, causing millions of infections and deaths worldwide. The search for gene expression biomarkers from the host transcriptional response to infection may help understand the underlying mechanisms by which the virus causes COVID-19. This research proposes a smart methodology integrating different RNA-Seq datasets from SARS-CoV-2, other respiratory diseases, and healthy patients.
Methods:
The proposed pipeline exploits the functionality of the ‘KnowSeq’ R/Bioc package, integrating different data sources and attaining a significantly larger gene expression dataset, thus endowing the results with higher statistical significance and robustness in comparison with previous studies in the literature. A detailed preprocessing step was carried out to homogenize the samples and build a clinical decision system for SARS-CoV-2. It uses machine learning techniques such as feature selection algorithm and supervised classification system. This clinical decision system uses the most differentially expressed genes among different diseases (including SARS-Cov-2) to develop a four-class classifier.
Results:
The multiclass classifier designed can discern SARS-CoV-2 samples, reaching an accuracy equal to 91.5%, a mean F1-Score equal to 88.5%, and a SARS-CoV-2 AUC equal to 94% by using only 15 genes as predictors. A biological interpretation of the gene signature extracted reveals relations with processes involved in viral responses.
Conclusion:
This work proposes a COVID-19 gene signature composed of 15 genes, selected after applying the feature selection ‘minimum Redundancy Maximum Relevance’ algorithm. The integration among several RNA-Seq datasets was a success, allowing for a considerable large number of samples and therefore providing greater statistical significance to the results than previous studies. Biological interpretation of the selected genes was also provided.
Collapse
Affiliation(s)
- Javier Bajo-Morales
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Juan Carlos Prieto-Prieto
- Nuclear Medicine Department, IMIBIC, University Hospital Reina Sofia, Menéndez Pidal Avenue, 14004, Córdoba, Spain
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Ignacio Rojas
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Daniel Castillo-Secilla
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| |
Collapse
|
44
|
Su L, Xu C, Zeng S, Su L, Joshi T, Stacey G, Xu D. Large-Scale Integrative Analysis of Soybean Transcriptome Using an Unsupervised Autoencoder Model. FRONTIERS IN PLANT SCIENCE 2022; 13:831204. [PMID: 35310659 PMCID: PMC8927983 DOI: 10.3389/fpls.2022.831204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 02/09/2022] [Indexed: 06/14/2023]
Abstract
Plant tissues are distinguished by their gene expression patterns, which can help identify tissue-specific highly expressed genes and their differential functional modules. For this purpose, large-scale soybean transcriptome samples were collected and processed starting from raw sequencing reads in a uniform analysis pipeline. To address the gene expression heterogeneity in different tissues, we utilized an adversarial deconfounding autoencoder (AD-AE) model to map gene expressions into a latent space and adapted a standard unsupervised autoencoder (AE) model to help effectively extract meaningful biological signals from the noisy data. As a result, four groups of 1,743, 914, 2,107, and 1,451 genes were found highly expressed specifically in leaf, root, seed and nodule tissues, respectively. To obtain key transcription factors (TFs), hub genes and their functional modules in each tissue, we constructed tissue-specific gene regulatory networks (GRNs), and differential correlation networks by using corrected and compressed gene expression data. We validated our results from the literature and gene enrichment analysis, which confirmed many identified tissue-specific genes. Our study represents the largest gene expression analysis in soybean tissues to date. It provides valuable targets for tissue-specific research and helps uncover broader biological patterns. Code is publicly available with open source at https://github.com/LingtaoSu/SoyMeta.
Collapse
Affiliation(s)
- Lingtao Su
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Chunhui Xu
- Institute for Data Science and Informatics, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Li Su
- Institute for Data Science and Informatics, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- Institute for Data Science and Informatics, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- Department of Health Management and Informatics and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Gary Stacey
- Division of Plant Sciences and Technology and Biochemistry Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Dong Xu
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- Institute for Data Science and Informatics, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| |
Collapse
|
45
|
McCulloch JA, Davar D, Rodrigues RR, Badger JH, Fang JR, Cole AM, Balaji AK, Vetizou M, Prescott SM, Fernandes MR, Costa RGF, Yuan W, Salcedo R, Bahadiroglu E, Roy S, DeBlasio RN, Morrison RM, Chauvin JM, Ding Q, Zidi B, Lowin A, Chakka S, Gao W, Pagliano O, Ernst SJ, Rose A, Newman NK, Morgun A, Zarour HM, Trinchieri G, Dzutsev AK. Intestinal microbiota signatures of clinical response and immune-related adverse events in melanoma patients treated with anti-PD-1. Nat Med 2022; 28:545-556. [PMID: 35228752 PMCID: PMC10246505 DOI: 10.1038/s41591-022-01698-2] [Citation(s) in RCA: 192] [Impact Index Per Article: 96.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 01/13/2022] [Indexed: 12/12/2022]
Abstract
Ample evidence indicates that the gut microbiome is a tumor-extrinsic factor associated with antitumor response to anti-programmed cell death protein-1 (PD-1) therapy, but inconsistencies exist between published microbial signatures associated with clinical outcomes. To resolve this, we evaluated a new melanoma cohort, along with four published datasets. Time-to-event analysis showed that baseline microbiota composition was optimally associated with clinical outcome at approximately 1 year after initiation of treatment. Meta-analysis and other bioinformatic analyses of the combined data show that bacteria associated with favorable response are confined within the Actinobacteria phylum and the Lachnospiraceae/Ruminococcaceae families of Firmicutes. Conversely, Gram-negative bacteria were associated with an inflammatory host intestinal gene signature, increased blood neutrophil-to-lymphocyte ratio, and unfavorable outcome. Two microbial signatures, enriched for Lachnospiraceae spp. and Streptococcaceae spp., were associated with favorable and unfavorable clinical response, respectively, and with distinct immune-related adverse effects. Despite between-cohort heterogeneity, optimized all-minus-one supervised learning algorithms trained on batch-corrected microbiome data consistently predicted outcomes to programmed cell death protein-1 therapy in all cohorts. Gut microbial communities (microbiotypes) with nonuniform geographical distribution were associated with favorable and unfavorable outcomes, contributing to discrepancies between cohorts. Our findings shed new light on the complex interaction between the gut microbiome and response to cancer immunotherapy, providing a roadmap for future studies.
Collapse
Affiliation(s)
- John A McCulloch
- Genetics and Microbiome Core, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Diwakar Davar
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Richard R Rodrigues
- Genetics and Microbiome Core, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
- Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Jonathan H Badger
- Genetics and Microbiome Core, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Jennifer R Fang
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Alicia M Cole
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Ascharya K Balaji
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Marie Vetizou
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Stephanie M Prescott
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Miriam R Fernandes
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Raquel G F Costa
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Wuxing Yuan
- Genetics and Microbiome Core, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
- Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Rosalba Salcedo
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Erol Bahadiroglu
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Soumen Roy
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Richelle N DeBlasio
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Robert M Morrison
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Joe-Marc Chauvin
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Quanquan Ding
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bochra Zidi
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ava Lowin
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Saranya Chakka
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wentao Gao
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ornella Pagliano
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Scarlett J Ernst
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Amy Rose
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nolan K Newman
- College of Pharmacy, Oregon State University, Corvallis, OR, USA
| | - Andrey Morgun
- College of Pharmacy, Oregon State University, Corvallis, OR, USA
| | - Hassane M Zarour
- Department of Medicine and UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Immunology, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Giorgio Trinchieri
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Amiran K Dzutsev
- Cancer Immunobiology Section, Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| |
Collapse
|
46
|
Bhavra K, Wilde M, Richardson M, Cordell R, Thomas CLP, Zhao B, Bryant L, Brightling CE, Ibrahim W, Salman D, Siddiqui S, Monks P, Gaillard E. The utility of a standardised breath sampler in school age children within a real-world prospective study. J Breath Res 2022; 16. [PMID: 35168217 DOI: 10.1088/1752-7163/ac5526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 02/15/2022] [Indexed: 11/12/2022]
Abstract
Clinical assessment of paediatric asthmatics is problematic, and non-invasive biomarkers are needed urgently. Monitoring exhaled volatile organic compounds (VOCs) is an attractive alternative to invasive tests (blood and sputum), and may be used as frequently as required. Standardised reproducible breath-sampling is essential for exhaled-VOC analysis, and although the ReCIVA (Owlstone Medical Limited) breath-sampler was designed to satisfy this requirement, paediatric use was not in the original design brief. The efficacy of the ReCIVA for sampling paediatric-breath has been studied, and 90 breath-samples from 64 children (5-15 years) with, and without asthma (controls), were collected with two different ReCIVA units. Seventy samples (77.8%) contained the specified 1L of sampled-breath. Median sampling times were longer in children with acute asthma (770.2 s, range: 532.2-900.1 s) compared to stable asthma (690.6 s, range: 477.5-900.1 s; p=0.01). The ReCIVA successfully detected operational faults, in 21 samples. A leak, caused by a poor fit of the face mask seal was the most common (15); the others were USB communication-faults (5); and, a single instance of a file-creation error. Paediatric breath-profiles were reliably monitored, however synchronisation of sampling to breathing-phases was sometimes lost, causing some breaths not to be sampled, and some to be sampled continuously. This occurred in 60 (66.7%) of the samples and was a source of variability. Three samples were lost from a combination of factors, however, and importantly, multi-variate modelling of untargeted VOC analysis indicated the absence of significant batch effects for 8 operational variables. The ReCIVA appears suitable for paediatric breath-sampling. Post-processing of breath-sample meta-data is recommended to assess the quality of sample-acquisition. Further, future studies should explore the effect of pump-synchronisation faults on recovered VOC profiles, and mask sizes to fit all ages will reduce the potential for leaks and importantly, provide higher levels of comfort to children with asthma.
Collapse
Affiliation(s)
- Kirandeep Bhavra
- Department of Respiratory Sciences, Leicester Royal Infirmary, NIHR Leicester Biomedical Research Centre (Respiratory theme), PO Box 65, Robert Kilpatrick Clinical Sciences Building, Leicester, LE2 7LX, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Michael Wilde
- University of Leicester, Department of Chemistry, Leicester, Leicestershire, LE1 7RH, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Matthew Richardson
- Loughborough University School of Science, Department of Chemistry, Loughborough, Leicestershire, LE11 3TU, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Rebecca Cordell
- University of Leicester Department of Chemistry, University of Leicester, Leicester, Leicester, LE1 7RH, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - C L Paul Thomas
- University of Leicester Department of Respiratory Sciences, NIHR Leicester Biomedical Research Centre (Respiratory theme), Glenfield Hospital, Groby Road, Leicester, East Midlands, LE3 9QP, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Bo Zhao
- University of Leicester College of Life Sciences, Leicester NIHR Biomedical Research Centre (Respiratory theme), Glenfield Hospital, Groby Road, Leicester, Leicester, LE3 9QP, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Luke Bryant
- University of Leicester Department of Chemistry, University of Leicester, University Road, Leicester, Leicester, LE1 7RH, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Christopher E Brightling
- Loughborough University School of Science, Department of Chemistry, Loughborough, Leicestershire, LE11 3TU, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Wadah Ibrahim
- Loughborough University School of Science, Department of Chemistry, Loughborough, Leicestershire, LE11 3TU, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Dahlia Salman
- University of Leicester Department of Respiratory Sciences, NIHR Leicester Biomedical Research Centre (Respiratory theme),, Glenfield Hospital, Groby Road, Leicester, East Midlands, LE3 9QP, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Salman Siddiqui
- Loughborough University School of Science, Department of Chemistry, Loughborough, Leicestershire, LE11 3TU, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Paul Monks
- University of Leicester, Department of Chemistry, Leicester, Leicestershire, LE1 7RH, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Erol Gaillard
- Department of Respiratory Sciences, University of Leicester, College of Life Sciences, Leicester, Leicestershire, LE1 7RH, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| |
Collapse
|
47
|
Noble AJ, Purcell RV, Adams AT, Lam YK, Ring PM, Anderson JR, Osborne AJ. A Final Frontier in Environment-Genome Interactions? Integrated, Multi-Omic Approaches to Predictions of Non-Communicable Disease Risk. Front Genet 2022; 13:831866. [PMID: 35211161 PMCID: PMC8861380 DOI: 10.3389/fgene.2022.831866] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 01/19/2022] [Indexed: 12/26/2022] Open
Abstract
Epidemiological and associative research from humans and animals identifies correlations between the environment and health impacts. The environment-health inter-relationship is effected through an individual's underlying genetic variation and mediated by mechanisms that include the changes to gene regulation that are associated with the diversity of phenotypes we exhibit. However, the causal relationships have yet to be established, in part because the associations are reduced to individual interactions and the combinatorial effects are rarely studied. This problem is exacerbated by the fact that our genomes are highly dynamic; they integrate information across multiple levels (from linear sequence, to structural organisation, to temporal variation) each of which is open to and responds to environmental influence. To unravel the complexities of the genomic basis of human disease, and in particular non-communicable diseases that are also influenced by the environment (e.g., obesity, type II diabetes, cancer, multiple sclerosis, some neurodegenerative diseases, inflammatory bowel disease, rheumatoid arthritis) it is imperative that we fully integrate multiple layers of genomic data. Here we review current progress in integrated genomic data analysis, and discuss cases where data integration would lead to significant advances in our ability to predict how the environment may impact on our health. We also outline limitations which should form the basis of future research questions. In so doing, this review will lay the foundations for future research into the impact of the environment on our health.
Collapse
Affiliation(s)
- Alexandra J. Noble
- Translational Gastroenterology Unit, Nuffield Department of Experimental Medicine, University of Oxford, Oxford, United Kingdom
| | - Rachel V. Purcell
- Department of Surgery, University of Otago Christchurch, Christchurch, New Zealand
| | - Alex T. Adams
- Translational Gastroenterology Unit, Nuffield Department of Experimental Medicine, University of Oxford, Oxford, United Kingdom
| | - Ying K. Lam
- Translational Gastroenterology Unit, Nuffield Department of Experimental Medicine, University of Oxford, Oxford, United Kingdom
| | - Paulina M. Ring
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Jessica R. Anderson
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Amy J. Osborne
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
48
|
Wang X, Wang J, Zhang H, Huang S, Yin Y. HDMC: a novel deep learning-based framework for removing batch effects in single-cell RNA-seq data. Bioinformatics 2022; 38:1295-1303. [PMID: 34864918 DOI: 10.1093/bioinformatics/btab821] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/25/2021] [Accepted: 11/30/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION With the development of single-cell RNA sequencing (scRNA-seq) techniques, increasingly more large-scale gene expression datasets become available. However, to analyze datasets produced by different experiments, batch effects among different datasets must be considered. Although several methods have been recently published to remove batch effects in scRNA-seq data, two problems remain to be challenging and not completely solved: (i) how to reduce the distribution differences of different batches more accurately; and (ii) how to align samples from different batches to recover the cell type clusters. RESULTS We proposed a novel deep-learning approach, which is a hierarchical distribution-matching framework assisted with contrastive learning to address these two problems. Firstly, we design a hierarchical framework for distribution matching based on a deep autoencoder. This framework employs an adversarial training strategy to match the global distribution of different batches. This provides an improved foundation to further match the local distributions with a maximum mean discrepancy-based loss. For local matching, we divide cells in each batch into clusters and develop a contrastive learning mechanism to simultaneously align similar cluster pairs and keep noisy pairs apart from each other. This allows to obtain clusters with all cells of the same type (true positives), and avoid clusters with cells of different type (false positives). We demonstrate the effectiveness of our method on both simulated and real datasets. Results show that our new method significantly outperforms the state-of-the-art methods and has the ability to prevent overcorrection. AVAILABILITY AND IMPLEMENTATION The python code to generate results and figures in this article is available at https://github.com/zhanglabNKU/HDMC, the data underlying this article is also available at this github repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiao Wang
- College of Computer Science, Nankai University, 300350 Tianjin, China.,Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, 300350 Tianjin, China
| | - Jia Wang
- College of Computer Science, Nankai University, 300350 Tianjin, China.,Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, 300350 Tianjin, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, 300350 Tianjin, China
| | - Shenwei Huang
- College of Computer Science, Nankai University, 300350 Tianjin, China.,Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, 300350 Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
49
|
Bajo-Morales J, Galvez JM, Prieto-Prieto JC, Herrera LJ, Rojas I, Castillo-Secilla D. Heterogeneous Gene Expression Cross-Evaluation of Robust Biomarkers
Using Machine Learning Techniques Applied to Lung Cancer. Curr Bioinform 2022. [DOI: 10.2174/1574893616666211005114934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Background:
Nowadays, gene expression analysis is one of the most promising pillars for
understanding and uncovering the mechanisms underlying the development and spread of cancer. In this
sense, Next Generation Sequencing technologies, such as RNA-Seq, are currently leading the market
due to their precision and cost. Nevertheless, there is still an enormous amount of non-analyzed data obtained
from older technologies, such as Microarray, which could still be useful to extract relevant
knowledge.
Methods:
Throughout this research, a complete machine learning methodology to cross-evaluate the
compatibility between both RNA-Seq and Microarray sequencing technologies is described and implemented.
In order to show a real application of the designed pipeline, a lung cancer case study is addressed
by considering two detected subtypes: adenocarcinoma and squamous cell carcinoma. Transcriptomic
datasets considered for our study have been obtained from the public repositories
NCBI/GEO, ArrayExpress and GDC-Portal. From them, several gene experiments have been carried
out with the aim of finding gene signatures for these lung cancer subtypes, linked to both transcriptomic
technologies. With these DEGs selected, intelligent predictive models capable of classifying new samples
belonging to these cancer subtypes have been developed.
Results:
The predictive models built using one technology are capable of discerning samples from a different
technology. The classification results are evaluated in terms of accuracy, F1-score and ROC
curves along with AUC. Finally, the biological information of the gene sets obtained and their relationship
with lung cancer are reviewed, encountering strong biological evidence linking them to the disease.
Conclusion:
Our method has the capability of finding strong gene signatures which are also independent
of the transcriptomic technology used to develop the analysis. In addition, our article highlights the
potential of using heterogeneous transcriptomic data to increase the amount of samples for the studies,
increasing the statistical significance of the results.
Collapse
Affiliation(s)
- Javier Bajo-Morales
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez
Montero, 2, 18014, Granada, Spain
| | - Juan Manuel Galvez
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez
Montero, 2, 18014, Granada, Spain
| | - Juan Carlos Prieto-Prieto
- Nuclear Medicine Department, IMIBIC, University Hospital Reina Sofia, Menéndez
Pidal Avenue, 14004, Córdoba, Spain
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada,Spain
| | - Ignacio Rojas
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez
Montero, 2, 18014, Granada, Spain
| | - Daniel Castillo-Secilla
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada,Spain
| |
Collapse
|
50
|
Zhang X, Ye Z, Chen J, Qiao F. AMDBNorm: an approach based on distribution adjustment to eliminate batch effects of gene expression data. Brief Bioinform 2021; 23:6485011. [PMID: 34958674 DOI: 10.1093/bib/bbab528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 10/16/2021] [Accepted: 11/14/2021] [Indexed: 11/14/2022] Open
Abstract
Batch effects explain a large part of the noise when merging gene expression data. Removing irrelevant variations introduced by batch effects plays an important role in gene expression studies. To obtain reliable differential analysis results, it is necessary to remove the variation caused by technical conditions between different batches while preserving biological variation. Usually, merging data directly with batch effects leads to a sharp rise in false positives. Although some methods of batch correction have been developed, they have some drawbacks. In this study, we develop a new algorithm, adjustment mean distribution-based normalization (AMDBNorm), which is based on a probability distribution to correct batch effects while preserving biological variation. AMDBNorm solves the defects of the existing batch correction methods. We compared several popular methods of batch correction with AMDBNorm using two real gene expression datasets with batch effects and analyzed the results of batch correction from the visual and quantitative perspectives. To ensure the biological variation was well protected, the effects of the batch correction methods were verified by hierarchical cluster analysis. The results showed that the AMDBNorm algorithm could remove batch effects of gene expression data effectively and retain more biological variation than other methods. Our approach provides the researchers with reliable data support in the study of differential gene expression analysis and prognostic biomarker selection.
Collapse
Affiliation(s)
- Xu Zhang
- School of Mathematics and Statistics, Southwest University, China
| | | | - Jing Chen
- School of Science, Southwest University of Science and Technology, China
| | | |
Collapse
|