1
|
Tang X, Mo Z, Chang C, Qian X. Group-shrinkage feature selection with a spatial network for mining DNA methylation data. Comput Biol Med 2023; 154:106573. [PMID: 36706568 DOI: 10.1016/j.compbiomed.2023.106573] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/05/2023] [Accepted: 01/22/2023] [Indexed: 01/25/2023]
Abstract
Identifying disease-related biomarkers from high-dimensional DNA methylation data helps in reducing early screening costs and inferring pathogenesis mechanisms. Good discovery results have been achieved through spatial correlation methods of methylation sites, group-based regularization, and network constraints. However, these methods still have some key limitations as they cannot exclude isolated differential sites and only consider adjacent site ordering. Therefore, we propose a group-shrinkage feature selection algorithm to encourage the selection of clustered sites and discourage the selection of isolated differential sites. Specifically, a network-guided group-shrinkage strategy is developed to penalize weakly-correlated isolated methylation sites through a network structure constraint. The spatial network is constructed based on spatial correlation information of DNA methylation sites, where this information accounts for the uneven site distribution. The experimental simulations and applications demonstrated that the proposed method outperforms the advanced regularization methods, especially in rejecting isolated methylation sites; hence this study provides an efficient and clinical-valuable method for biomarker candidate discovery in DNA methylation data. Additionally, the proposed method exhibits enhanced reliability due to introducing biological prior knowledge into a regularization-based feature selection framework and could promote more research in the integration between biological prior knowledge and classical feature selection methods, thus facilitating their clinical application. Our source codes will be released at https://github.com/SJTUBME-QianLab/Group-shrinkage-Spatial-Network once this manuscript is accepted for publication.
Collapse
Affiliation(s)
- Xinlu Tang
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Zhanfeng Mo
- School of Computer Science and Engineering, Nanyang Technological University, Singapore.
| | - Cheng Chang
- Department of Nuclear Medicine, Shanghai, Chest Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200030, China.
| | - Xiaohua Qian
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
2
|
Cai L, Xiao G, Gerber D, D Minna J, Xie Y. Lung Cancer Computational Biology and Resources. Cold Spring Harb Perspect Med 2022; 12:a038273. [PMID: 34751162 PMCID: PMC8805643 DOI: 10.1101/cshperspect.a038273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Comprehensive clinical, pathological, and molecular data, when appropriately integrated with advanced computational approaches, are transforming the way we characterize and study lung cancer. Clinically, cancer registry and publicly available historical clinical trial data enable retrospective analyses to examine how socioeconomic factors, patient demographics, and cancer characteristics affect treatment and outcome. Pathologically, digital pathology and artificial intelligence are revolutionizing histopathological image analyses, not only with improved efficiency and accuracy, but also by extracting additional information for prognostication and tumor microenvironment characterization. Genetically and molecularly, individual patient tumors and preclinical models of lung cancer are profiled by various high-throughput platforms to characterize the molecular properties and functional liabilities. The resulting multi-omics data sets and their interrogation facilitate both basic research mechanistic studies and translation of the findings into the clinic. In this review, we provide a list of resources and tools potentially valuable for lung cancer basic and translational research. Importantly, we point out pitfalls and caveats when performing computational analyses of these data sets and provide a vision of future computational biology developments that will aid lung cancer translational research.
Collapse
Affiliation(s)
- Ling Cai
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Children's Medical Center Research Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Harrold C. Simmons Comprehensive Cancer Center, UT Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, Texas 75390, USA
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Harrold C. Simmons Comprehensive Cancer Center, UT Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Bioinformatics, UT Southwestern Medical Center, Dallas, Texas 75390, USA
| | - David Gerber
- Harrold C. Simmons Comprehensive Cancer Center, UT Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, Texas 75390, USA
- Hamon Center for Therapeutic Oncology Research, UT Southwestern Medical Center, Dallas, Texas 75390, USA
| | - John D Minna
- Harrold C. Simmons Comprehensive Cancer Center, UT Southwestern Medical Center, Dallas, Texas 75390, USA
- Hamon Center for Therapeutic Oncology Research, UT Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | - Yang Xie
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
- Harrold C. Simmons Comprehensive Cancer Center, UT Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, Texas 75390, USA
- Department of Bioinformatics, UT Southwestern Medical Center, Dallas, Texas 75390, USA
| |
Collapse
|
3
|
Li C, Gao Z, Su B, Xu G, Lin X. Data analysis methods for defining biomarkers from omics data. Anal Bioanal Chem 2021; 414:235-250. [PMID: 34951658 DOI: 10.1007/s00216-021-03813-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/26/2021] [Accepted: 11/29/2021] [Indexed: 02/01/2023]
Abstract
Omics mainly includes genomics, epigenomics, transcriptomics, proteomics and metabolomics. The rapid development of omics technology has opened up new ways to study disease diagnosis and prognosis and to define prospective information of complex diseases. Since omics data are usually large and complex, the method used to analyze the data and to define important information is crucial in omics study. In this review, we focus on advances in biomarker discovery methods based on omics data in the last decade, and categorize them as individual feature analysis, combinatorial feature analysis and network analysis. We also discuss the challenges and perspectives in this field.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Zhenbo Gao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| |
Collapse
|