1
|
Yu Z, Ouyang L. Identification Of key prognostic genes in ovarian cancer using WGCNA and LASSO analysis. ALL LIFE 2022. [DOI: 10.1080/26895293.2022.2087107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Affiliation(s)
- Zhong Yu
- Department of Obstetrics and Gynecology, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
- Key Laboratory of Obstetrics and Gynecology of Higher Education of Liaoning Province, Shenyang, People’s Republic of China
| | - Ling Ouyang
- Department of Obstetrics and Gynecology, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
- Key Laboratory of Obstetrics and Gynecology of Higher Education of Liaoning Province, Shenyang, People’s Republic of China
| |
Collapse
|
2
|
Wang T, Chen R, Liu W, Yu M. Structure-preserving integrated analysis for risk stratification with application to cancer staging. Biostatistics 2021; 23:990-1006. [PMID: 33738474 DOI: 10.1093/biostatistics/kxab005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 01/21/2021] [Accepted: 01/25/2021] [Indexed: 11/13/2022] Open
Abstract
To provide appropriate and practical level of health care, it is critical to group patients into relatively few strata that have distinct prognosis. Such grouping or stratification is typically based on well-established risk factors and clinical outcomes. A well-known example is the American Joint Committee on Cancer staging for cancer that uses tumor size, node involvement, and metastasis status. We consider a statistical method for such grouping based on individual patient data from multiple studies. The method encourages a common grouping structure as a basis for borrowing information, but acknowledges data heterogeneity including unbalanced data structures across multiple studies. We build on the "lasso-tree" method that is more versatile than the well-known classification and regression tree method in generating possible grouping patterns. In addition, the parametrization of the lasso-tree method makes it very natural to incorporate the underlying order information in the risk factors. In this article, we also strengthen the lasso-tree method by establishing its theoretical properties for which Lin and others (2013. Lasso tree for cancer staging with survival data. Biostatistics 14, 327-339) did not pursue. We evaluate our method in extensive simulation studies and an analysis of multiple breast cancer data sets.
Collapse
Affiliation(s)
- Tianjie Wang
- Department of Statistics, University of Wisconsin, Madison, WI, USA
| | - Rui Chen
- Department of Statistics, University of Wisconsin, Madison, WI, USA
| | - Wenshuo Liu
- Department of Research & Innovation, Interactions LLC, 31 Hayward Street Suite E, Franklin, MA 02038, USA
| | - Menggang Yu
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| |
Collapse
|
3
|
Goncalves A, Soper B, Nygård M, Nygård JF, Ray P, Widemann D, Sales AP. Improving five-year survival prediction via multitask learning across HPV-related cancers. PLoS One 2020; 15:e0241225. [PMID: 33196642 PMCID: PMC7668590 DOI: 10.1371/journal.pone.0241225] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 10/11/2020] [Indexed: 12/12/2022] Open
Abstract
Oncology is a highly siloed field of research in which sub-disciplinary specialization has limited the amount of information shared between researchers of distinct cancer types. This can be attributed to legitimate differences in the physiology and carcinogenesis of cancers affecting distinct anatomical sites. However, underlying processes that are shared across seemingly disparate cancers probably affect prognosis. The objective of the current study is to investigate whether multitask learning improves 5-year survival cancer patient survival prediction by leveraging information across anatomically distinct HPV related cancers. Data were obtained from the Surveillance, Epidemiology, and End Results (SEER) program database. The study cohort consisted of 29,768 primary cancer cases diagnosed in the United States between 2004 and 2015. Ten different cancer diagnoses were selected, all with a known association with HPV risk. In the analysis, the cancer diagnoses were categorized into three distinct topography groups of varying specificity. The most specific topography grouping consisted of 10 original cancer diagnoses differentiated by the first two digits of the ICD-O-3 topography code. The second topography grouping consisted of cancer diagnoses categorized into six distinct organ groups. Finally, the third topography grouping consisted of just two groups, head-neck cancers and ano-genital cancers. The tasks were to predict 5-year survival for patients within the different topography groups using 14 predictive features which were selected among descriptive variables available in the SEER database. The information from the predictive features was shared between tasks in three different ways, resulting in three distinct predictive models: 1) Information was not shared between patients assigned to different tasks (single task learning); 2) Information was shared between all patients, regardless of task (pooled model); 3) Only relevant information was shared between patients grouped to different tasks (multitask learning). Prediction performance was evaluated with Brier scores. All three models were evaluated against one another on each of the three distinct topography-defined tasks. The results showed that multitask classifiers achieved relative improvement for the majority of the scenarios studied compared to single task learning and pooled baseline methods. In this study, we have demonstrated that sharing information among anatomically distinct cancer types can lead to improved predictive survival models.
Collapse
Affiliation(s)
- Andre Goncalves
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | - Braden Soper
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | | | | | - Priyadip Ray
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | - David Widemann
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | - Ana Paula Sales
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| |
Collapse
|
4
|
Estimating equation for additive hazards model with censored length-biased data. J Korean Stat Soc 2020. [DOI: 10.1007/s42952-019-00006-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
5
|
Moni MA, Liò P. comoR: a software for disease comorbidity risk assessment. J Clin Bioinforma 2014; 4:8. [PMID: 25045465 PMCID: PMC4081507 DOI: 10.1186/2043-9113-4-8] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2014] [Accepted: 04/17/2014] [Indexed: 12/13/2022] Open
Abstract
Background The diagnosis of comorbidities, which refers to the coexistence of different acute and chronic diseases, is difficult due to the modern extreme specialisation of physicians. We envisage that a software dedicated to comorbidity diagnosis could result in an effective aid to the health practice. Results We have developed an R software comoR to compute novel estimators of the disease comorbidity associations. Starting from an initial diagnosis, genetic and clinical data of a patient the software identifies the risk of disease comorbidity. Then it provides a pipeline with different causal inference packages (e.g. pcalg, qtlnet etc) to predict the causal relationship of diseases. It also provides a pipeline with network regression and survival analysis tools (e.g. Net-Cox, rbsurv etc) to predict more accurate survival probability of patients. The input of this software is the initial diagnosis for a patient and the output provides evidences of disease comorbidity mapping. Conclusions The functions of the comoR offer flexibility for diagnostic applications to predict disease comorbidities, and can be easily integrated to high–throughput and clinical data analysis pipelines.
Collapse
Affiliation(s)
- Mohammad Ali Moni
- Computer Laboratory, University of Cambridge, William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK ; Department of Computer Science & Engineering, Pabna University of Science & Technology, Pabna, Bangladesh
| | - Pietro Liò
- Computer Laboratory, University of Cambridge, William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| |
Collapse
|
6
|
Lin Y, Yu M, Wang S, Chappell R, Imperiale TF. Advanced colorectal neoplasia risk stratification by penalized logistic regression. Stat Methods Med Res 2013; 25:1677-91. [PMID: 23907780 DOI: 10.1177/0962280213497432] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance.
Collapse
Affiliation(s)
- Yunzhi Lin
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Menggang Yu
- Department of Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Sijian Wang
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, USA Department of Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Richard Chappell
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, USA Department of Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Thomas F Imperiale
- Department of Medicine, Indiana University, Indianapolis, Indiana, USA Regenstrief Institute, Inc. and Roudebush VA Medical Center, Indianapolis, Indiana, USA
| |
Collapse
|