1
|
Lin S, Lin Y, Wu K, Wang Y, Feng Z, Duan M, Liu S, Fan Y, Huang L, Zhou F. FeCO3, constructing the network biomarkers using the inter-feature correlation coefficients and its application in detecting high-order breast cancer biomarkers. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220124123303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aims:
This study aims to formulate the inter-feature correlation as the engineered features.
Background:
Modern biotechnologies tend to generate a huge number of characteristics of a sample, while an OMIC dataset usually has a few dozens or hundreds of samples due to the high costs of generating the OMIC data. So many bio-OMIC studies assumed the inter-feature independence and selected a feature with a high phenotype-association.
Objective:
However, many features are closely associated with each other due to their physical or functional interactions, which may be utilized as a new view of features.
Method:
This study proposed a feature engineering algorithm based on the correlation coefficients (FeCO3) by utilizing the correlations between a given sample and a few reference samples. A comprehensive evaluation was carried out for the proposed FeCO3 network features using 24 bio-OMIC datasets.
Result:
The experimental data suggested that the newly calculated FeCO3 network features tended to achieve better classification performances than the original features, using the same popular feature selection and classification algorithms. The FeCO3 network features were also consistently supported by the literature. FeCO3 was utilized to investigate the high-order engineered biomarkers of breast cancer, and detected the PBX2 gene (Pre-B-Cell Leukemia Transcription Factor 2) as one of the candidate breast cancer biomarkers. Although the two methylated residues cg14851325 (Pvalue=8.06e-2) and cg16602460 (Pvalue=1.19e-1) within PBX2 did not have statistically significant association with breast cancers, the high-order inter-feature correlations showed a significant association with breast cancers.
Conclusion:
The proposed FeCO3 network features calculated the high-order inter-feature correlations as novel features, and may facilitate the investigations of complex diseases from this new perspective. The source code is available in FigShare at 10.6084/m9.figshare.13550051 or the web site http://www.healthinformaticslab.org/supp/ .
Collapse
Affiliation(s)
- Shenggeng Lin
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yuqi Lin
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Kexin Wu
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yueying Wang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, Jilin Province, China
| | - Zixuan Feng
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Meiyu Duan
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Shuai Liu
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yusi Fan
- College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Lan Huang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Fengfeng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| |
Collapse
|
2
|
Zhang Y, Chen C, Duan M, Liu S, Huang L, Zhou F. BioDog, biomarker detection for improving identification power of breast cancer histologic grade in methylomics. Epigenomics 2019; 11:1717-1732. [PMID: 31625763 DOI: 10.2217/epi-2019-0230] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Aim: Breast cancer histologic grade (HG) is a well-established prognostic factor. This study aimed to select methylomic biomarkers to predict breast cancer HGs. Materials & methods: The proposed algorithm BioDog firstly used correlation bias reduction strategy to eliminate redundant features. Then incremental feature selection was applied to find the features with a high HG prediction accuracy. The sequential backward feature elimination strategy was employed to further refine the biomarkers. A comparison with existing algorithms were conducted. The HG-specific somatic mutations were investigated. Results & conclusions: BioDog achieved accuracy 0.9973 using 92 methylomic biomarkers for predicting breast cancer HGs. Many of these biomarkers were within the genes and lncRNAs associated with the HG development in breast cancer or other cancer types.
Collapse
Affiliation(s)
- Yexian Zhang
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Chaorong Chen
- College of Software, Jilin University, Changchun, Jilin 130012, PR China
| | - Meiyu Duan
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Shuai Liu
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Lan Huang
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Fengfeng Zhou
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| |
Collapse
|
4
|
Nagpal A, Singh V. Feature selection from high dimensional data based on iterative qualitative mutual information. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-181665] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Arpita Nagpal
- Department of Computer Science and Engineering, The Nothcap University, Sector-23A, Gurugram, India
| | - Vijendra Singh
- Department of Computer Science and Engineering, The Nothcap University, Sector-23A, Gurugram, India
| |
Collapse
|
5
|
Ventura-Molina E, Alarcón-Paredes A, Aldape-Pérez M, Yáñez-Márquez C, Adolfo Alonso G. Gene selection for enhanced classification on microarray data using a weighted k-NN based algorithm. INTELL DATA ANAL 2019. [DOI: 10.3233/ida-173720] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Elías Ventura-Molina
- Centro de Investigación en Computación, Instituto Politécnico Nacional. Av. Juan de Dios Bátiz, Esq. Miguel Othón de Mendizábal. Col. Nueva Industrial Vallejo, Gustavo A. Madero, 07738, Ciudad de México, México
| | - Antonio Alarcón-Paredes
- Facultad de Ingeniería, Universidad Autónoma de Guerrero. Av. Lázaro Cárdenas s/n, Ciudad Universitaria Zona Sur, 39087. Chilpancingo Guerrero, México
| | - Mario Aldape-Pérez
- Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, México. Av. Juan de Dios Bátiz, Col. Nueva Industrial Vallejo, 07700, Ciudad de México, México
| | - Cornelio Yáñez-Márquez
- Centro de Investigación en Computación, Instituto Politécnico Nacional. Av. Juan de Dios Bátiz, Esq. Miguel Othón de Mendizábal. Col. Nueva Industrial Vallejo, Gustavo A. Madero, 07738, Ciudad de México, México
| | - Gustavo Adolfo Alonso
- Facultad de Ingeniería, Universidad Autónoma de Guerrero. Av. Lázaro Cárdenas s/n, Ciudad Universitaria Zona Sur, 39087. Chilpancingo Guerrero, México
| |
Collapse
|