1
|
Li L, Li L, Wang Y, Wu B, Guan Y, Chen Y, Zhao J. Integration of Machine Learning and Experimental Validation to Identify Anoikis-Related Prognostic Signature for Predicting the Breast Cancer Tumor Microenvironment and Treatment Response. Genes (Basel) 2024; 15:1458. [PMID: 39596658 PMCID: PMC11594124 DOI: 10.3390/genes15111458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 11/07/2024] [Accepted: 11/11/2024] [Indexed: 11/29/2024] Open
Abstract
Background/Objectives: Anoikis-related genes (ANRGs) are crucial in the invasion and metastasis of breast cancer (BC). The underlying role of ANRGs in the prognosis of breast cancer patients warrants further study. Methods: The anoikis-related prognostic signature (ANRS) was generated using a variety of machine learning methods, and the correlation between the ANRS and the tumor microenvironment (TME), drug sensitivity, and immunotherapy was investigated. Moreover, single-cell analysis and spatial transcriptome studies were conducted to investigate the expression of prognostic ANRGs across various cell types. Finally, the expression of ANRGs was verified by RT-PCR and Western blot analysis (WB), and the expression level of PLK1 in the blood was measured by the enzyme-linked immunosorbent assay (ELISA). Results: The ANRS, consisting of five ANRGs, was established. BC patients within the high-ANRS group exhibited poorer prognoses, characterized by elevated levels of immune suppression and stromal scores. The low-ANRS group had a better response to chemotherapy and immunotherapy. Single-cell analysis and spatial transcriptomics revealed variations in ANRGs across cells. The results of RT-PCR and WB were consistent with the differential expression analyses from databases. NU.1025 and imatinib were identified as potential inhibitors for SPIB and PLK1, respectively. Additionally, findings from ELISA demonstrated increased expression levels of PLK1 in the blood of BC patients. Conclusions: The ANRS can act as an independent prognostic indicator for BC patients, providing significant guidance for the implementation of chemotherapy and immunotherapy in these patients. Additionally, PLK1 has emerged as a potential blood-based diagnostic marker for breast cancer patients.
Collapse
Affiliation(s)
- Longpeng Li
- Institute of Physical Education and Sport, Shanxi University, Taiyuan 030006, China; (L.L.)
| | - Longhui Li
- School of Kinesiology and Health, Capital University of Physical Education and Sports, Beijing 100191, China
| | - Yaxin Wang
- Institute of Physical Education and Sport, Shanxi University, Taiyuan 030006, China; (L.L.)
| | - Baoai Wu
- Institute of Physical Education and Sport, Shanxi University, Taiyuan 030006, China; (L.L.)
| | - Yue Guan
- Institute of Physical Education and Sport, Shanxi University, Taiyuan 030006, China; (L.L.)
| | - Yinghua Chen
- Institute of Physical Education and Sport, Shanxi University, Taiyuan 030006, China; (L.L.)
| | - Jinfeng Zhao
- Institute of Physical Education and Sport, Shanxi University, Taiyuan 030006, China; (L.L.)
| |
Collapse
|
2
|
Zhao C, Zhu H, Tian Y, Sun Y, Zhang Z. SPINK5 is a key regulator of eosinophil extracellular traps in head and neck squamous cell carcinoma. Discov Oncol 2024; 15:627. [PMID: 39508915 PMCID: PMC11543977 DOI: 10.1007/s12672-024-01513-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 11/04/2024] [Indexed: 11/15/2024] Open
Abstract
Enhanced infiltration of eosinophils is observed surrounding solid tumors. Some studies indicate that Eosinophil extracellular traps (EETs) play a crucial role in tumor progression and metastasis. However, its specific role in head and neck squamous cell carcinoma (HNSCC) remains unclear. This study established a gene set associated with eosinophil differentiation, chemotaxis, and EETs release from previous research. Employing bioinformatics techniques, the expression and biological significance of these genes in HNSCC were analyzed. Briefly, unsupervised clustering based on expression patterns of 133 EETs-related genes to classify TCGA-HNSCC patients. Immune cell infiltration patterns were assessed using "ImmuCellAI" package. A prognostic model was constructed using ten algorithms, with EETs-related gene sets as input features. Here, unsupervised clustering of samples into two types revealed worse prognosis for Cluster 1 (C1) patients after the first year. Cluster 2 (C2) exhibited higher ImmuneScore, but with a distinct immune cell infiltration pattern from the C1. Additionally, high eosinophil abundance only in the C2 had a positive prognostic impact. Serine peptidase inhibitor kazal type 5 (SPINK5) emerged as a potential key gene mediating the formation of EETs in HNSCC. EETs not only exhibit a positive correlation with diverse anti-cancer pathways but also demonstrate positive associations with processes such as proliferation, migration, and other critical pathways. The random survival forest (RSF) model was identified as the optimal eosinophil-related prognostic model. Collectively, this study elucidates the potential impact and mediating pathways of EETs on tumors, providing a reference for targeted therapy based on EETs-related genes.
Collapse
Affiliation(s)
- Chifeng Zhao
- Department of Stomatology, Taizhou Central Hospital (Taizhou University Hospital), No.999, Donghai Avenue, Taizhou, 318000, Zhejiang, People's Republic of China
| | - Haoran Zhu
- Health Science Center, Xi'an Jiaotong University, Xi'an, 710000, Shaanxi, China
| | - Yu Tian
- Health Science Center, Xi'an Jiaotong University, Xi'an, 710000, Shaanxi, China
| | - Yuewen Sun
- Health Science Center, Xi'an Jiaotong University, Xi'an, 710000, Shaanxi, China
| | - Zhenxing Zhang
- Department of Stomatology, Taizhou Central Hospital (Taizhou University Hospital), No.999, Donghai Avenue, Taizhou, 318000, Zhejiang, People's Republic of China.
| |
Collapse
|
3
|
Zhang S, Ta N, Zhang S, Li S, Zhu X, Kong L, Gong X, Guo M, Liu Y. Unraveling pancreatic ductal adenocarcinoma immune prognostic signature through a naive B cell gene set. Cancer Lett 2024; 594:216981. [PMID: 38795761 DOI: 10.1016/j.canlet.2024.216981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/14/2024] [Accepted: 05/17/2024] [Indexed: 05/28/2024]
Abstract
BACKGROUND Pancreatic ductal adenocarcinoma (PDAC), a leading cause of cancer mortality, has a complex pathogenesis involving various immune cells, including B cells and their subpopulations. Despite emerging research on the role of these cells within the tumor microenvironment (TME), the detailed molecular interactions with tumor-infiltrating immune cells (TIICs) are not fully understood. METHODS We applied CIBERSORT to quantify TIICs and naive B cells, which are prognostic for PDAC. Marker genes from scRNA-seq and modular genes from weighted gene co-expression network analysis (WGCNA) were integrated to identify naive B cell-related genes. A prognostic signature was constructed utilizing ten machine-learning algorithms, with validation in external cohorts. We further assessed the immune cell diversity, ESTIMATE scores, and immune checkpoint genes (ICGs) between patient groups stratified by risk to clarify the immune landscape in PDAC. RESULTS Our analysis identified 994 naive B cell-related genes across single-cell and bulk transcriptomes, with 247 linked to overall survival. We developed a 12-gene prognostic signature using Lasso and plsRcox algorithms, which was confirmed by 10-fold cross-validation and showed robust predictive power in training and real-world cohorts. Notably, we observed substantial differences in immune infiltration between patients with high and low risk. CONCLUSION Our study presents a robust prognostic signature that effectively maps the complex immune interactions in PDAC, emphasizing the critical function of naive B cells and suggesting new avenues for immunotherapeutic interventions. This signature has potential clinical applications in personalizing PDAC treatment, enhancing the understanding of immune dynamics, and guiding immunotherapy strategies.
Collapse
Affiliation(s)
- Shichen Zhang
- Software Engineering Institute, East China Normal University, Shanghai 200062, China
| | - Na Ta
- Department of Pathology, Changhai Hospital, Navy Medical University, Shanghai 200433, China
| | - Shihao Zhang
- National Key Laboratory of Immunity and Inflammation & Institute of Immunology, Navy Medical University, Shanghai 200433, China
| | - Senhao Li
- National Key Laboratory of Immunity and Inflammation & Institute of Immunology, Navy Medical University, Shanghai 200433, China
| | - Xinyu Zhu
- National Key Laboratory of Immunity and Inflammation & Institute of Immunology, Navy Medical University, Shanghai 200433, China
| | - Lingyun Kong
- National Key Laboratory of Immunity and Inflammation & Institute of Immunology, Navy Medical University, Shanghai 200433, China
| | - Xueqing Gong
- Software Engineering Institute, East China Normal University, Shanghai 200062, China.
| | - Meng Guo
- National Key Laboratory of Immunity and Inflammation & Institute of Immunology, Navy Medical University, Shanghai 200433, China.
| | - Yanfang Liu
- Department of Pathology, Changhai Hospital, Navy Medical University, Shanghai 200433, China; National Key Laboratory of Immunity and Inflammation & Institute of Immunology, Navy Medical University, Shanghai 200433, China.
| |
Collapse
|
4
|
Qi Y, Reijneveld SA, Almansa J, Brouwer S, Vrooman JC. Diverging death risks: Mortality as a corollary of economic, social, cultural and person capital. SSM Popul Health 2024; 25:101644. [PMID: 38486801 PMCID: PMC10937154 DOI: 10.1016/j.ssmph.2024.101644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/17/2024] Open
Abstract
Introduction Diverging death risks are associated with a wide range of social factors, including not only education and income but also other economic and non-economic resources. The aim of this study was to assess the association of mortality risks with four types of resources: economic, social, cultural and person capital. Methods We used data of 2,952 participants from the Disparities in the Netherlands survey and annual mortality data from Statistics Netherlands for the period 2014 to 2021. Economic capital was measured through education, income, occupation, home equity, and liquid assets. Social capital was measured by the strength of social ties, the size of the core discussion network, and access to people in resourceful positions; cultural capital by lifestyle, digital skills, and mastery of English, and person capital by self-rated health, impediments to climbing stairs, self-confidence, self-image, people's appearance, and body mass index. To accommodate the fact that each capital was derived from several indicators, we used Partial Least Squares (PLS) Cox Regression. Results In multiple regression, higher economic, cultural, and person capital were associated with lower mortality (hazard ratio, 0.77; 95% confidence interval [CI, 0.65 to 0.90], 0.77 [0.64-0.93] and 0.80; [0.70-0.92]), adjusted for all capital measures and sex. Conclusion The finding that more economic, cultural and person capital is associated with lower mortality provides empirical support for an approach that uses a broad spectrum of capital measures - hitherto rarely included simultaneously in epidemiological research - in order to understand diverging death risks. By integrating sociological concepts, cohort data, and epidemiological research methods, our study highlights the need for further research on the interplay between different forms of resources in shaping health inequalities. In designing public health interventions, we advocate the adoption of a multidimensional capital-based framework for tackling social disparities in mortality.
Collapse
Affiliation(s)
- Yuwei Qi
- University of Groningen, University Medical Center Groningen, Department of Health Sciences, Groningen, the Netherlands
| | - Sijmen A. Reijneveld
- University of Groningen, University Medical Center Groningen, Department of Health Sciences, Groningen, the Netherlands
| | - Josué Almansa
- University of Groningen, University Medical Center Groningen, Department of Health Sciences, Groningen, the Netherlands
| | - Sandra Brouwer
- University of Groningen, University Medical Center Groningen, Department of Health Sciences, Groningen, the Netherlands
| | - J. Cok Vrooman
- Utrecht University, Department of Sociology/ICS, Utrecht, the Netherlands
- The Netherlands Institute for Social Research|SCP, the Netherlands
| |
Collapse
|
5
|
Ge X, Xu H, Weng S, Zhang Y, Liu L, Wang L, Xing Z, Ba Y, Liu S, Li L, Wang Y, Han X. Systematic analysis of transcriptome signature for improving outcomes in lung adenocarcinoma. J Cancer Res Clin Oncol 2023; 149:8951-8968. [PMID: 37160628 DOI: 10.1007/s00432-023-04814-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/23/2023] [Indexed: 05/11/2023]
Abstract
PURPOSE The updated guidelines highlight gene expression-based multigene panel as a critical tool to assess overall survival (OS) and improve treatment for lung adenocarcinoma (LUAD) patients. Nevertheless, genome-wide expression signatures are still limited in real clinical utility because of insufficient data utilization, a lack of critical validation, and inapposite machine learning algorithms. METHODS 2330 primary LUAD samples were enrolled from 11 independent cohorts. Seventy-six algorithm combinations based on ten machine learning algorithms were applied. A total of 108 published gene expression signatures were collected. Multiple pharmacogenomics databases and resources were utilized to identify precision therapeutic drugs. RESULTS We comprehensively developed a robust machine learning-derived genome-wide expression signature (RGS) according to stably OS-associated RNAs (OSRs). RGS was an independent risk element and remained robust and reproducible power by comparing it with general clinical parameters, molecular characteristics, and 108 published signatures. RGS-based stratification possessed different biological behaviors, molecular mechanisms, and immune microenvironment patterns. Integrating multiple databases and previous studies, we identified that alisertib was sensitive to the high-risk group, and RITA was sensitive to the low-risk group. CONCLUSION Our study offers an appealing platform to screen dismal prognosis LUAD patients to improve clinical outcomes by optimizing precision therapy.
Collapse
Affiliation(s)
- Xiaoyong Ge
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Hui Xu
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Siyuan Weng
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Yuyuan Zhang
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Long Liu
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Libo Wang
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Zhe Xing
- Department of Neurosurgery, The Fifth Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yuhao Ba
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Shutong Liu
- Department of Clinical Medicine, Zhengzhou University, Zhengzhou, 450052, Henan, China
| | - Lifeng Li
- Medical School, Huanghe Science and Technology University, 666 Zi Jing Shan Road, Zhengzhou, 450000, Henan, China
| | - Yuhui Wang
- Prenatal Diagnosis Center, The Third Affiliated Hospital of Zhengzhou University, No. 7, Kangfu Front Street, Erqi District, Zhengzhou, 450052, Henan, China.
| | - Xinwei Han
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China.
| |
Collapse
|
6
|
Jiang S, Cao J, Rosner B, Colditz GA. Supervised two-dimensional functional principal component analysis with time-to-event outcomes and mammogram imaging data. Biometrics 2023; 79:1359-1369. [PMID: 34854477 PMCID: PMC9160217 DOI: 10.1111/biom.13611] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 11/07/2021] [Accepted: 11/15/2021] [Indexed: 12/24/2022]
Abstract
Screening mammography aims to identify breast cancer early and secondarily measures breast density to classify women at higher or lower than average risk for future breast cancer in the general population. Despite the strong association of individual mammography features to breast cancer risk, the statistical literature on mammogram imaging data is limited. While functional principal component analysis (FPCA) has been studied in the literature for extracting image-based features, it is conducted independently of the time-to-event response variable. With the consideration of building a prognostic model for precision prevention, we present a set of flexible methods, supervised FPCA (sFPCA) and functional partial least squares (FPLS), to extract image-based features associated with the failure time while accommodating the added complication from right censoring. Throughout the article, we hope to demonstrate that one method is favored over the other under different clinical setups. The proposed methods are applied to the motivating data set from the Joanne Knight Breast Health cohort at Siteman Cancer Center. Our approaches not only obtain the best prediction performance compared to the benchmark model, but also reveal different risk patterns within the mammograms.
Collapse
Affiliation(s)
- Shu Jiang
- Division of Public Health Sciences, Washington University School of Medicine in St. Louis, Missouri
| | - Jiguo Cao
- Department of Statistics and Actuarial Science, Simon Fraser University, Canada
| | - Bernard Rosner
- Channing Division of Network Medicine, Harvard Medical School, Massachusetts
| | - Graham A Colditz
- Division of Public Health Sciences, Washington University School of Medicine in St. Louis, Missouri
| |
Collapse
|
7
|
Chen C, Wang J, Dong C, Lim D, Feng Z. Development of a risk model to predict prognosis in breast cancer based on cGAS-STING-related genes. Front Genet 2023; 14:1121018. [PMID: 37051596 PMCID: PMC10083333 DOI: 10.3389/fgene.2023.1121018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Accepted: 03/14/2023] [Indexed: 03/29/2023] Open
Abstract
Background: Breast cancer (BRCA) is regarded as a lethal and aggressive cancer with increasing morbidity and mortality worldwide. cGAS-STING signaling regulates the crosstalk between tumor cells and immune cells in the tumor microenvironment (TME), emerging as an important DNA-damage mechanism. However, cGAS-STING-related genes (CSRGs) have rarely been investigated for their prognostic value in breast cancer patients.Methods: Our study aimed to construct a risk model to predict the survival and prognosis of breast cancer patients. We obtained 1087 breast cancer samples and 179 normal breast tissue samples from the Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEX) database, 35 immune-related differentially expression genes (DEGs) from cGAS-STING-related genes were systematically assessed. The Cox regression was applied for further selection, and 11 prognostic-related DEGs were used to develop a machine learning-based risk assessment and prognostic model.Results: We successfully developed a risk model to predict the prognostic value of breast cancer patients and its performance acquired effective validation. The results derived from Kaplan-Meier analysis revealed that the low-risk score patients had better overall survival (OS). The nomogram that integrated the risk score and clinical information was established and had good validity in predicting the overall survival of breast cancer patients. Significant correlations were observed between the risk score and tumor-infiltrating immune cells, immune checkpoints and the response to immunotherapy. The cGAS-STING-related genes risk score was also relevant to a series of clinic prognostic indicators such as tumor staging, molecular subtype, tumor recurrence, and drug therapeutic sensibility in breast cancer patients.Conclusion: cGAS-STING-related genes risk model provides a new credible risk stratification method to improve the clinical prognostic assessment for breast cancer.
Collapse
Affiliation(s)
- Chen Chen
- Department of Occupational Health and Occupational Medicine, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Junxiao Wang
- Department of Occupational Health and Occupational Medicine, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Chao Dong
- Department of Occupational Health and Occupational Medicine, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - David Lim
- Translational Health Research Institute, School of Health Sciences, Western Sydney University, Campbelltown, NSW, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
| | - Zhihui Feng
- Department of Occupational Health and Occupational Medicine, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- *Correspondence: Zhihui Feng,
| |
Collapse
|
8
|
Huang RH, Hong YK, Du H, Ke WQ, Lin BB, Li YL. A machine learning framework develops a DNA replication stress model for predicting clinical outcomes and therapeutic vulnerability in primary prostate cancer. J Transl Med 2023; 21:20. [PMID: 36635710 PMCID: PMC9835390 DOI: 10.1186/s12967-023-03872-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 01/02/2023] [Indexed: 01/13/2023] Open
Abstract
Recent studies have identified DNA replication stress as an important feature of advanced prostate cancer (PCa). The identification of biomarkers for DNA replication stress could therefore facilitate risk stratification and help inform treatment options for PCa. Here, we designed a robust machine learning-based framework to comprehensively explore the impact of DNA replication stress on prognosis and treatment in 5 PCa bulk transcriptomic cohorts with a total of 905 patients. Bootstrap resampling-based univariate Cox regression and Boruta algorithm were applied to select a subset of DNA replication stress genes that were more clinically relevant. Next, we benchmarked 7 survival-related machine-learning algorithms for PCa recurrence using nested cross-validation. Multi-omic and drug sensitivity data were also utilized to characterize PCa with various DNA replication stress. We found that the hyperparameter-tuned eXtreme Gradient Boosting model outperformed other tuned models and was therefore used to establish a robust replication stress signature (RSS). RSS demonstrated superior performance over most clinical features and other PCa signatures in predicting PCa recurrence across cohorts. Lower RSS was characterized by enriched metabolism pathways, high androgen activity, and a favorable prognosis. In contrast, higher RSS was significantly associated with TP53, RB1, and PTEN deletion, exhibited increased proliferation and DNA replication stress, and was more immune-suppressive with a higher chance of immunotherapy response. In silico screening identified 13 potential targets (e.g. TOP2A, CDK9, and RRM2) from 2249 druggable targets, and 2 therapeutic agents (irinotecan and topotecan) for RSS-high patients. Additionally, RSS-high patients were more responsive to taxane-based chemotherapy and Poly (ADP-ribose) polymerase inhibitors, whereas RSS-low patients were more sensitive to androgen deprivation therapy. In conclusion, a robust machine-learning framework was used to reveal the great potential of RSS for personalized risk stratification and therapeutic implications in PCa.
Collapse
Affiliation(s)
- Rong-Hua Huang
- Department of Anesthesiology, The First Affiliated Hospital of Jinan University, Guangzhou, 510630, Guangdong, China
| | - Ying-Kai Hong
- Department of Urology, The First Affiliated Hospital of Shantou University Medical College, Shantou, 515000, Guangdong, China
| | - Heng Du
- Department of Secretion, Baoji Central Hospital, Baoji, 721008, Shaanxi, China
| | - Wei-Qi Ke
- Department of Anesthesiology, The First Affiliated Hospital of Shantou University Medical College, Shantou, 515000, Guangdong, China
| | - Bing-Biao Lin
- Department of Urology, Kidney and Urology Center, Pelvic Floor Disorders Center, The Seventh Affiliated Hospital, Sun Yat-Sen University, Shenzhen, 518000, Guangdong, China.
| | - Ya-Lan Li
- Department of Anesthesiology, The First Affiliated Hospital of Jinan University, Guangzhou, 510630, Guangdong, China.
| |
Collapse
|
9
|
Chen M, Landré B, Marques-Vidal P, van Hees VT, van Gennip AC, Bloomberg M, Yerramalla MS, Benadjaoud MA, Sabia S. Identification of physical activity and sedentary behaviour dimensions that predict mortality risk in older adults: Development of a machine learning model in the Whitehall II accelerometer sub-study and external validation in the CoLaus study. EClinicalMedicine 2023; 55:101773. [PMID: 36568684 PMCID: PMC9772789 DOI: 10.1016/j.eclinm.2022.101773] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/18/2022] [Accepted: 11/18/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Identification of new physical activity (PA) and sedentary behaviour (SB) features relevant for health at older age is important to diversify PA targets in guidelines, as older adults rarely adhere to current recommendations focusing on total duration. We aimed to identify accelerometer-derived dimensions of movement behaviours that predict mortality risk in older populations. METHODS We used data on 21 accelerometer-derived features of daily movement behaviours in 3991 participants of the UK-based Whitehall II accelerometer sub-study (25.8% women, 60-83 years, follow-up: 2012-2013 to 2021, mean = 8.3 years). A machine-learning procedure was used to identify core PA and SB features predicting mortality risk and derive a composite score. We estimated the added predictive value of the score compared to traditional sociodemographic, behavioural, and health-related risk factors. External validation in the Switzerland-based CoLaus study (N = 1329, 56.7% women, 60-86 years, follow-up: 2014-2017 to 2021, mean = 3.8 years) was conducted. FINDINGS In total, 11 features related to overall activity level, intensity distribution, bouts duration, frequency, and total duration of PA and SB, were identified as predictors of mortality in older adults and included in a composite score. Both in the derivation and validation cohorts, the score was associated with mortality (hazard ratio = 1.10 (95% confidence interval = 1.05-1.15) and 1.18 (1.10-1.26), respectively) and improved the predictive value of a model including traditional risk factors (increase in C-index = 0.007 (0.002-0.014) and 0.029 (0.002-0.055), respectively). INTERPRETATION The identified accelerometer-derived PA and SB features, beyond the currently recommended total duration, might be useful for screening of older adults at higher mortality risk and for diversifying PA and SB targets in older populations whose adherence to current guidelines is low. FUNDING National Institute on Aging; UK Medical Research Council; British Heart Foundation; Wellcome Trust; French National Research Agency; GlaxoSmithKline; Lausanne Faculty of Biology and Medicine; Swiss National Science Foundation.
Collapse
Affiliation(s)
- Mathilde Chen
- Université Paris Cité, Inserm U1153, CRESS, Epidemiology of Ageing and Neurodegenerative Diseases, 10 Avenue de Verdun, 75010, Paris, France
- Corresponding author.
| | - Benjamin Landré
- Université Paris Cité, Inserm U1153, CRESS, Epidemiology of Ageing and Neurodegenerative Diseases, 10 Avenue de Verdun, 75010, Paris, France
| | - Pedro Marques-Vidal
- Department of Medicine, Internal Medicine, Lausanne University Hospital and University of Lausanne, Switzerland
| | | | - April C.E. van Gennip
- Department of Internal Medicine, Maastricht University Medical Centre, the Netherlands
- School for Cardiovascular Diseases CARIM, Maastricht University, the Netherlands
| | - Mikaela Bloomberg
- Department of Epidemiology and Public Health, University College London, UK
| | - Manasa S. Yerramalla
- Université Paris Cité, Inserm U1153, CRESS, Epidemiology of Ageing and Neurodegenerative Diseases, 10 Avenue de Verdun, 75010, Paris, France
| | | | - Séverine Sabia
- Université Paris Cité, Inserm U1153, CRESS, Epidemiology of Ageing and Neurodegenerative Diseases, 10 Avenue de Verdun, 75010, Paris, France
- Department of Epidemiology and Public Health, University College London, UK
| |
Collapse
|
10
|
Liu Z, Chai T, Tang J, Yu W. Heterogeneous selective ensemble learning model for mill load parameters forecasting by using multiscale mechanical frequency spectrum. Soft comput 2022. [DOI: 10.1007/s00500-022-07449-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
11
|
Devaux A, Genuer R, Peres K, Proust-Lima C. Individual dynamic prediction of clinical endpoint from large dimensional longitudinal biomarker history: a landmark approach. BMC Med Res Methodol 2022; 22:188. [PMID: 35818025 PMCID: PMC9275051 DOI: 10.1186/s12874-022-01660-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 06/15/2022] [Indexed: 11/16/2022] Open
Abstract
Background The individual data collected throughout patient follow-up constitute crucial information for assessing the risk of a clinical event, and eventually for adapting a therapeutic strategy. Joint models and landmark models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the patient history includes much more repeated markers. Our objective was thus to propose a solution for the dynamic prediction of a health event that may exploit repeated measures of a possibly large number of markers. Methods We combined a landmark approach extended to endogenous markers history with machine learning methods adapted to survival data. Each marker trajectory is modeled using the information collected up to the landmark time, and summary variables that best capture the individual trajectories are derived. These summaries and additional covariates are then included in different prediction methods adapted to survival data, namely regularized regressions and random survival forests, to predict the event from the landmark time. We also show how predictive tools can be combined into a superlearner. The performances are evaluated by cross-validation using estimators of Brier Score and the area under the Receiver Operating Characteristic curve adapted to censored data. Results We demonstrate in a simulation study the benefits of machine learning survival methods over standard survival models, especially in the case of numerous and/or nonlinear relationships between the predictors and the event. We then applied the methodology in two prediction contexts: a clinical context with the prediction of death in primary biliary cholangitis, and a public health context with age-specific prediction of death in the general elderly population. Conclusions Our methodology, implemented in R, enables the prediction of an event using the entire longitudinal patient history, even when the number of repeated markers is large. Although introduced with mixed models for the repeated markers and methods for a single right censored time-to-event, the technique can be used with any other appropriate modeling technique for the markers and can be easily extended to competing risks setting. Supplementary Information The online version contains supplementary material available at (10.1186/s12874-022-01660-3).
Collapse
Affiliation(s)
| | - Robin Genuer
- INSERM, BPH, U1219, Univ. Bordeaux, Bordeaux, France.,INRIA Bordeaux Sud-Ouest, Talence, France
| | - Karine Peres
- INSERM, BPH, U1219, Univ. Bordeaux, Bordeaux, France
| | | |
Collapse
|
12
|
Li R, Zhu J, Zhong W, Jia Z. Comprehensive evaluation of machine learning models and gene expression signatures for prostate cancer prognosis using large population cohorts. Cancer Res 2022; 82:1832-1843. [PMID: 35358302 DOI: 10.1158/0008-5472.can-21-3074] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 01/07/2022] [Accepted: 03/07/2022] [Indexed: 11/16/2022]
Abstract
Overtreatment remains a pervasive problem in prostate cancer (PCa) management due to the highly variable and often indolent course of disease. Molecular signatures derived from gene expression profiling have played critical roles in guiding PCa treatment decisions. Many gene expression signatures have been developed to improve the risk stratification of PCa and some of them have already been applied to clinical practice. However, no comprehensive evaluation has been performed to compare the performance of these signatures. In this study, we conducted a systematic and unbiased evaluation of 15 machine learning (ML) algorithms and 30 published PCa gene expression-based prognostic signatures leveraging 10 transcriptomics datasets with 1,558 primary PCa patients from public data repositories. This analysis revealed that survival analysis models outperformed binary classification models for risk assessment, and the performance of the survival analysis methods - Cox model regularized with ridge penalty (Cox-Ridge) and partial least squares regression for Cox model (Cox-PLS) - were generally more robust than the other methods. Based on the Cox-Ridge algorithm, several top prognostic signatures displayed comparable or even better performance than commercial panels. These findings will facilitate the identification of existing prognostic signatures that are promising for further validation in prospective studies and promote the development of robust prognostic models to guide clinical decision-making. Moreover, this study provides a valuable data resource from large primary PCa cohorts, which can be used to develop, validate, and evaluate novel statistical methodologies and molecular signatures to improve PCa management.
Collapse
Affiliation(s)
- Ruidong Li
- University of California, Riverside, Riveside, United States
| | - Jianguo Zhu
- Guizhou Provincial People's Hospital, GuiYang, Guizhou, China
| | - Weide Zhong
- Guangzhou First People's Hospital, School of Medicine, South China University of Technology, Guangzhou, China
| | - Zhenyu Jia
- University of California of Riverside, Riverside, California, United States
| |
Collapse
|
13
|
Bertrand F, Maumy-Bertrand M. Fitting and Cross-Validating Cox Models to Censored Big Data With Missing Values Using Extensions of Partial Least Squares Regression Models. Front Big Data 2021; 4:684794. [PMID: 34790895 PMCID: PMC8591675 DOI: 10.3389/fdata.2021.684794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 10/07/2021] [Indexed: 11/22/2022] Open
Abstract
Fitting Cox models in a big data context -on a massive scale in terms of volume, intensity, and complexity exceeding the capacity of usual analytic tools-is often challenging. If some data are missing, it is even more difficult. We proposed algorithms that were able to fit Cox models in high dimensional settings using extensions of partial least squares regression to the Cox models. Some of them were able to cope with missing data. We were recently able to extend our most recent algorithms to big data, thus allowing to fit Cox model for big data with missing values. When cross-validating standard or extended Cox models, the commonly used criterion is the cross-validated partial loglikelihood using a naive or a van Houwelingen scheme -to make efficient use of the death times of the left out data in relation to the death times of all the data. Quite astonishingly, we will show, using a strong simulation study involving three different data simulation algorithms, that these two cross-validation methods fail with the extensions, either straightforward or more involved ones, of partial least squares regression to the Cox model. This is quite an interesting result for at least two reasons. Firstly, several nice features of PLS based models, including regularization, interpretability of the components, missing data support, data visualization thanks to biplots of individuals and variables -and even parsimony or group parsimony for Sparse partial least squares or sparse group SPLS based models, account for a common use of these extensions by statisticians who usually select their hyperparameters using cross-validation. Secondly, they are almost always featured in benchmarking studies to assess the performance of a new estimation technique used in a high dimensional or big data context and often show poor statistical properties. We carried out a vast simulation study to evaluate more than a dozen of potential cross-validation criteria, either AUC or prediction error based. Several of them lead to the selection of a reasonable number of components. Using these newly found cross-validation criteria to fit extensions of partial least squares regression to the Cox model, we performed a benchmark reanalysis that showed enhanced performances of these techniques. In addition, we proposed sparse group extensions of our algorithms and defined a new robust measure based on the Schmid score and the R coefficient of determination for least absolute deviation: the integrated R Schmid Score weighted. The R-package used in this article is available on the CRAN, http://cran.r-project.org/web/packages/plsRcox/index.html. The R package bigPLS will soon be available on the CRAN and, until then, is available on Github https://github.com/fbertran/bigPLS.
Collapse
Affiliation(s)
- Frédéric Bertrand
- LIST3N, Université de Technologie de Troyes, Troyes, France
- IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France
| | - Myriam Maumy-Bertrand
- LIST3N, Université de Technologie de Troyes, Troyes, France
- IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France
| |
Collapse
|
14
|
Spirko-Burns L, Devarajan K. Supervised Dimension Reduction for Large-Scale "Omics" Data With Censored Survival Outcomes Under Possible Non-Proportional Hazards. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2032-2044. [PMID: 31940547 DOI: 10.1109/tcbb.2020.2965934] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The past two decades have witnessed significant advances in high-throughput "omics" technologies such as genomics, proteomics, metabolomics, transcriptomics and radiomics. These technologies have enabled simultaneous measurement of the expression levels of tens of thousands of features from individual patient samples and have generated enormous amounts of data that require analysis and interpretation. One specific area of interest has been in studying the relationship between these features and patient outcomes, such as overall and recurrence-free survival, with the goal of developing a predictive "omics" profile. Large-scale studies often suffer from the presence of a large fraction of censored observations and potential time-varying effects of features, and methods for handling them have been lacking. In this paper, we propose supervised methods for feature selection and survival prediction that simultaneously deal with both issues. Our approach utilizes continuum power regression (CPR) - a framework that includes a variety of regression methods - in conjunction with the parametric or semi-parametric accelerated failure time (AFT) model. Both CPR and AFT fall within the linear models framework and, unlike black-box models, the proposed prognostic index has a simple yet useful interpretation. We demonstrate the utility of our methods using simulated and publicly available cancer genomics data.
Collapse
|
15
|
Vahabi N, McDonough CW, Desai AA, Cavallari LH, Duarte JD, Michailidis G. Cox-sMBPLS: An Algorithm for Disease Survival Prediction and Multi-Omics Module Discovery Incorporating Cis-Regulatory Quantitative Effects. Front Genet 2021; 12:701405. [PMID: 34408773 PMCID: PMC8366414 DOI: 10.3389/fgene.2021.701405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/07/2021] [Indexed: 12/03/2022] Open
Abstract
Background The development of high-throughput techniques has enabled profiling a large number of biomolecules across a number of molecular compartments. The challenge then becomes to integrate such multimodal Omics data to gain insights into biological processes and disease onset and progression mechanisms. Further, given the high dimensionality of such data, incorporating prior biological information on interactions between molecular compartments when developing statistical models for data integration is beneficial, especially in settings involving a small number of samples. Results We develop a supervised model for time to event data (e.g., death, biochemical recurrence) that simultaneously accounts for redundant information within Omics profiles and leverages prior biological associations between them through a multi-block PLS framework. The interactions between data from different molecular compartments (e.g., epigenome, transcriptome, methylome, etc.) were captured by using cis-regulatory quantitative effects in the proposed model. The model, coined Cox-sMBPLS, exhibits superior prediction performance and improved feature selection based on both simulation studies and analysis of data from heart failure patients. Conclusion The proposed supervised Cox-sMBPLS model can effectively incorporate prior biological information in the survival prediction system, leading to improved prediction performance and feature selection. It also enables the identification of multi-Omics modules of biomolecules that impact the patients’ survival probability and also provides insights into potential relevant risk factors that merit further investigation.
Collapse
Affiliation(s)
- Nasim Vahabi
- Informatics Institute, University of Florida, Gainesville, FL, United States
| | - Caitrin W McDonough
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, University of Florida, Gainesville, FL, United States
| | - Ankit A Desai
- Department of Medicine, Indiana University, Indianapolis, IN, United States
| | - Larisa H Cavallari
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, University of Florida, Gainesville, FL, United States
| | - Julio D Duarte
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, University of Florida, Gainesville, FL, United States
| | - George Michailidis
- Informatics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
16
|
Bertrand F, Aouadi I, Jung N, Carapito R, Vallat L, Bahram S, Maumy-Bertrand M. selectBoost: a general algorithm to enhance the performance of variable selection methods. Bioinformatics 2021; 37:659-668. [PMID: 33016991 PMCID: PMC8097688 DOI: 10.1093/bioinformatics/btaa855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 09/02/2020] [Accepted: 09/21/2020] [Indexed: 11/13/2022] Open
Abstract
Motivation With the growth of big data, variable selection has become one of the critical challenges in statistics. Although many methods have been proposed in the literature, their performance in terms of recall (sensitivity) and precision (predictive positive value) is limited in a context where the number of variables by far exceeds the number of observations or in a highly correlated setting. Results In this article, we propose a general algorithm, which improves the precision of any existing variable selection method. This algorithm is based on highly intensive simulations and takes into account the correlation structure of the data. Our algorithm can either produce a confidence index for variable selection or be used in an experimental design planning perspective. We demonstrate the performance of our algorithm on both simulated and real data. We then apply it in two different ways to improve biological network reverse-engineering. Availability and implementation Code is available as the SelectBoost package on the CRAN, https://cran.r-project.org/package=SelectBoost. Some network reverse-engineering functionalities are available in the Patterns CRAN package, https://cran.r-project.org/package=Patterns. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Frédéric Bertrand
- Institut de Recherche Mathématique Avancée, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France.,Université de Technologie de Troyes, ICD, ROSAS, M2S, Troyes, France
| | - Ismaïl Aouadi
- ImmunoRhumatologie Moléculaire, INSERM UMR_S 1109, LabEx TRANSPLANTEX, Centre de Recherche d'Immunologie et d'Hématologie, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, Strasbourg, France.,Laboratoire International Associé (LIA) INSERM, Strasbourg (France) - Nagano (Japan), Strasbourg, France.,Fédération Hospitalo-Universitaire (FHU) OMICARE, Laboratoire Central d'Immunologie, Pôle de Biologie, Nouvel Hôpital Civil, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Nicolas Jung
- Institut de Recherche Mathématique Avancée, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France.,ImmunoRhumatologie Moléculaire, INSERM UMR_S 1109, LabEx TRANSPLANTEX, Centre de Recherche d'Immunologie et d'Hématologie, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, Strasbourg, France
| | - Raphael Carapito
- ImmunoRhumatologie Moléculaire, INSERM UMR_S 1109, LabEx TRANSPLANTEX, Centre de Recherche d'Immunologie et d'Hématologie, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, Strasbourg, France.,Laboratoire International Associé (LIA) INSERM, Strasbourg (France) - Nagano (Japan), Strasbourg, France.,Fédération Hospitalo-Universitaire (FHU) OMICARE, Laboratoire Central d'Immunologie, Pôle de Biologie, Nouvel Hôpital Civil, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Laurent Vallat
- ImmunoRhumatologie Moléculaire, INSERM UMR_S 1109, LabEx TRANSPLANTEX, Centre de Recherche d'Immunologie et d'Hématologie, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, Strasbourg, France.,Fédération Hospitalo-Universitaire (FHU) OMICARE, Laboratoire Central d'Immunologie, Pôle de Biologie, Nouvel Hôpital Civil, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Seiamak Bahram
- ImmunoRhumatologie Moléculaire, INSERM UMR_S 1109, LabEx TRANSPLANTEX, Centre de Recherche d'Immunologie et d'Hématologie, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, Strasbourg, France.,Laboratoire International Associé (LIA) INSERM, Strasbourg (France) - Nagano (Japan), Strasbourg, France.,Fédération Hospitalo-Universitaire (FHU) OMICARE, Laboratoire Central d'Immunologie, Pôle de Biologie, Nouvel Hôpital Civil, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Myriam Maumy-Bertrand
- Institut de Recherche Mathématique Avancée, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France
| |
Collapse
|
17
|
So J, Mamatjan Y, Zadeh G, Aldape K, Moraes FY. Transcription factor networks of oligodendrogliomas treated with adjuvant radiotherapy or observation inform prognosis. Neuro Oncol 2021; 23:795-802. [PMID: 33367753 DOI: 10.1093/neuonc/noaa300] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Recent international sequencing efforts have allowed for the molecular taxonomy of lower-grade gliomas (LGG). We sought to analyze The Cancer Genome Atlas (TCGA, 2015) gene expression datasets on molecularly defined oligodendrogliomas (IDH-mutated and 1p/19q-codeleted) patients treated with adjuvant radiation or those observed to discover prognostic markers and pathways. METHODS mRNA expression and clinical information of patients with oligodendroglioma were taken from the TCGA "Brain Lower Grade Glioma" provisional dataset. Transcription factor network reconstruction and analysis were performed using the R packages "RTN" and "RTNsurvival." Elastic net regularization and survival modeling were performed using the "biospear," "plsRCox," "survival" packages. RESULTS From our cohort of 137 patients, 65 received adjuvant radiation and 72 were observed. In the cohort that received adjuvant radiotherapy, a transcription factor activity signature, that correlated with hypoxia, was associated with shorter disease-free survival (DFS) (median = 45 months vs 108 months, P < .001). This increased risk was not seen in patients who were observed (P = .2). Within the observation cohort, a transcription factor activity signature was generated that was associated with poor DFS (median = 72 months. vs 143 months., P < .01). CONCLUSIONS We identified a transcription factor activity signature associated with poor prognosis in patients with molecular oligodendroglioma treated with adjuvant radiotherapy. These patients would be potential candidates for treatment intensification. A second signature was generated for patients who were more likely to progress on observation. This potentially identifies a cohort who would benefit from upfront adjuvant radiotherapy.
Collapse
Affiliation(s)
- Jonathan So
- Radiation Medicine Program, Princess Margaret Cancer Center, University of Toronto, Toronto, Ontario, Canada
| | - Yasin Mamatjan
- Department of Laboratory Medicine and Pathobiology, Princess Margaret Cancer Centre, University of Toronto and MacFeeters-Hamilton Centre for Neuro-Oncology Research, Toronto, Ontario, Canada
| | - Gelareh Zadeh
- Department of Laboratory Medicine and Pathobiology, Princess Margaret Cancer Centre, University of Toronto and MacFeeters-Hamilton Centre for Neuro-Oncology Research, Toronto, Ontario, Canada.,Department of Neurosurgery, Toronto Western Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Kenneth Aldape
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland
| | - Fabio Y Moraes
- Department of Radiation Oncology, Kingston General Hospital, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
18
|
de Leeuw FA, van der Flier WM, Tijms BM, Scheltens P, Mendes VM, Manadas B, Bierau J, van Wijk N, van den Heuvel EG, Mohajeri MH, Teunissen CE, Kester MI. Specific Nutritional Biomarker Profiles in Mild Cognitive Impairment and Subjective Cognitive Decline Are Associated With Clinical Progression: The NUDAD Project. J Am Med Dir Assoc 2020; 21:1513.e1-1513.e17. [DOI: 10.1016/j.jamda.2019.12.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 09/06/2019] [Accepted: 12/10/2019] [Indexed: 02/06/2023]
|
19
|
Ajana S, Acar N, Bretillon L, Hejblum BP, Jacqmin-Gadda H, Delcourt C, Acar N, Ajana S, Berdeaux O, Bouton S, Bretillon L, Bron A, Buaud B, Cabaret S, Cougnard-Grégoire A, Creuzot-Garcher C, Delcourt C, Delyfer MN, Féart-Couret C, Febvret V, Grégoire S, He Z, Korobelnik JF, Martine L, Merle B, Vaysse C. Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size. Bioinformatics 2019; 35:3628-3634. [DOI: 10.1093/bioinformatics/btz135] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Revised: 02/08/2019] [Accepted: 02/23/2019] [Indexed: 01/10/2023] Open
Abstract
Abstract
Motivation
In some prediction analyses, predictors have a natural grouping structure and selecting predictors accounting for this additional information could be more effective for predicting the outcome accurately. Moreover, in a high dimension low sample size framework, obtaining a good predictive model becomes very challenging. The objective of this work was to investigate the benefits of dimension reduction in penalized regression methods, in terms of prediction performance and variable selection consistency, in high dimension low sample size data. Using two real datasets, we compared the performances of lasso, elastic net, group lasso, sparse group lasso, sparse partial least squares (PLS), group PLS and sparse group PLS.
Results
Considering dimension reduction in penalized regression methods improved the prediction accuracy. The sparse group PLS reached the lowest prediction error while consistently selecting a few predictors from a single group.
Availability and implementation
R codes for the prediction methods are freely available at https://github.com/SoufianeAjana/Blisar.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Soufiane Ajana
- Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
| | - Niyazi Acar
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
| | - Lionel Bretillon
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
| | - Boris P Hejblum
- ISPED, Inserm, Bordeaux Population Health Research Center 1219, Inria SISTM, University of Bordeaux, F-33000 Bordeaux, France
- Vaccine Research Institute (VRI), Hôpital Henri Mondor, Créteil, France
| | - Hélène Jacqmin-Gadda
- Inserm, Bordeaux Population Health Research Center, Team Biostatistics, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
| | - Cécile Delcourt
- Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
| | - Niyazi Acar
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
| | - Soufiane Ajana
- Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
| | - Olivier Berdeaux
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
| | | | - Lionel Bretillon
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
| | - Alain Bron
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
- Department of Ophthalmology, University Hospital, Dijon, France
| | - Benjamin Buaud
- ITERG—Equipe Nutrition Métabolisme & Santé, Bordeaux, France
| | - Stéphanie Cabaret
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
| | - Audrey Cougnard-Grégoire
- Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
| | - Catherine Creuzot-Garcher
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
- Department of Ophthalmology, University Hospital, Dijon, France
| | - Cécile Delcourt
- Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
| | - Marie-Noelle Delyfer
- Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
- Service d’Ophtalmologie, CHU de Bordeaux, F-33000 Bordeaux, France
| | - Catherine Féart-Couret
- Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
| | - Valérie Febvret
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
| | - Stéphane Grégoire
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
| | - Zhiguo He
- Laboratory for Biology, Imaging, and Engineering of Corneal Grafts, EA2521, Faculty of Medicine, University Jean Monnet, Saint-Etienne, France
| | - Jean-François Korobelnik
- Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
- Service d’Ophtalmologie, CHU de Bordeaux, F-33000 Bordeaux, France
| | - Lucy Martine
- Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France
| | - Bénédicte Merle
- Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France
| | - Carole Vaysse
- ITERG—Equipe Nutrition Métabolisme & Santé, Bordeaux, France
| | | |
Collapse
|
20
|
Wei W, Sun Z, da Silveira WA, Yu Z, Lawson A, Hardiman G, Kelemen LE, Chung D. Semi-supervised identification of cancer subgroups using survival outcomes and overlapping grouping information. Stat Methods Med Res 2018; 28:2137-2149. [PMID: 29336210 DOI: 10.1177/0962280217752980] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Identification of cancer patient subgroups using high throughput genomic data is of critical importance to clinicians and scientists because it can offer opportunities for more personalized treatment and overlapping treatments of cancers. In spite of tremendous efforts, this problem still remains challenging because of low reproducibility and instability of identified cancer subgroups and molecular features. In order to address this challenge, we developed Integrative Genomics Robust iDentification of cancer subgroups (InGRiD), a statistical approach that integrates information from biological pathway databases with high-throughput genomic data to improve the robustness for identification and interpretation of molecularly-defined subgroups of cancer patients. We applied InGRiD to the gene expression data of high-grade serous ovarian cancer from The Cancer Genome Atlas and the Australian Ovarian Cancer Study. The results indicate clear benefits of the pathway-level approaches over the gene-level approaches. In addition, using the proposed InGRiD framework, we also investigate and address the issue of gene sharing among pathways, which often occurs in practice, to further facilitate biological interpretation of key molecular features associated with cancer progression. The R package "InGRiD" implementing the proposed approach is currently available in our research group GitHub webpage ( https://dongjunchung.github.io/INGRID/ ).
Collapse
Affiliation(s)
- Wei Wei
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA.,2 Department of Biostatistics, Yale University, New Haven, USA
| | - Zequn Sun
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Willian A da Silveira
- 3 Department of Pathology and Laboratory Medicine, Medical University of South Carolina, Charleston, USA.,4 Center for Genomic Medicine, Medical University of South Carolina, Charleston, USA
| | - Zhenning Yu
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Andrew Lawson
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Gary Hardiman
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA.,4 Center for Genomic Medicine, Medical University of South Carolina, Charleston, USA.,5 Department of Medicine, Medical University of South Carolina, Charleston, USA
| | - Linda E Kelemen
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| | - Dongjun Chung
- 1 Department of Public Health Sciences, Medical University of South Carolina, Charleston, USA
| |
Collapse
|
21
|
Tang J, Zhang J, Wu Z, Liu Z, Chai T, Yu W. Modeling collinear data using double-layer GA-based selective ensemble kernel partial least squares algorithm. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.09.019] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
22
|
Wang D, Gu J. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. QUANTITATIVE BIOLOGY 2016. [DOI: 10.1007/s40484-016-0063-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
23
|
Žuvela P, Jay Liu J. On feature selection for supervised learning problems involving high-dimensional analytical information. RSC Adv 2016. [DOI: 10.1039/c6ra09336a] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Feature selection for supervised learning problems involving analytical information.
Collapse
Affiliation(s)
- P. Žuvela
- Department of Chemical Engineering
- Pukyong National University
- Busan
- Korea
| | - J. Jay Liu
- Department of Chemical Engineering
- Pukyong National University
- Busan
- Korea
| |
Collapse
|
24
|
Lehtinen S, Lees J, Bähler J, Shawe-Taylor J, Orengo C. Gene Function Prediction from Functional Association Networks Using Kernel Partial Least Squares Regression. PLoS One 2015; 10:e0134668. [PMID: 26288239 PMCID: PMC4545790 DOI: 10.1371/journal.pone.0134668] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 07/13/2015] [Indexed: 11/18/2022] Open
Abstract
With the growing availability of large-scale biological datasets, automated methods of extracting functionally meaningful information from this data are becoming increasingly important. Data relating to functional association between genes or proteins, such as co-expression or functional association, is often represented in terms of gene or protein networks. Several methods of predicting gene function from these networks have been proposed. However, evaluating the relative performance of these algorithms may not be trivial: concerns have been raised over biases in different benchmarking methods and datasets, particularly relating to non-independence of functional association data and test data. In this paper we propose a new network-based gene function prediction algorithm using a commute-time kernel and partial least squares regression (Compass). We compare Compass to GeneMANIA, a leading network-based prediction algorithm, using a number of different benchmarks, and find that Compass outperforms GeneMANIA on these benchmarks. We also explicitly explore problems associated with the non-independence of functional association data and test data. We find that a benchmark based on the Gene Ontology database, which, directly or indirectly, incorporates information from other databases, may considerably overestimate the performance of algorithms exploiting functional association data for prediction.
Collapse
Affiliation(s)
- Sonja Lehtinen
- CoMPLEX, University College London, London, United Kingdom
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jon Lees
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jürg Bähler
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - John Shawe-Taylor
- Department of Computer Science, University College London, London, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
- * E-mail:
| |
Collapse
|