1
|
Yang H, Liu J, Yang N, Fu Q, Wang Y, Ye M, Tao S, Liu X, Li Q. Enhancing metastatic colorectal cancer prediction through advanced feature selection and machine learning techniques. Int Immunopharmacol 2024; 142:113033. [PMID: 39226823 DOI: 10.1016/j.intimp.2024.113033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 08/15/2024] [Accepted: 08/25/2024] [Indexed: 09/05/2024]
Abstract
BACKGROUND AND AIMS Colorectal cancer (CRC) is the third most prevalent cancer globally, posing a significant challenge due to its high rate of metastasis. Approximately 20% of patients with CRC present with distant metastases at diagnosis, and over 50% develop metastases within five years. Accurate prediction of metastasis is crucial for improving survival outcomes in patients with CRC. METHODS This study introduces an innovative cost-sensitive fast correlation-based filter (CS-FCBF) algorithm for feature selection, integrated with machine learning techniques to predict metastatic CRC. The CS-FCBF algorithm effectively reduced the number of genomic features from 184 to 9 critical genes: CXCL9, C2CD4B, RGCC, GFI1, BEX2, CXCL3, FOXQ1, PBK, and PLAG1. The methodology combined in vitro, in vivo, and analysis of publicly available single-cell RNA-seq datasets to validate the findings. RESULTS The application of the CS-FCBF algorithm led to a significant improvement in prediction model performance, with an average 21.16% increase in the area under the precision-recall curve. The nine identified genes hold potential as diagnostic biomarkers and therapeutic targets for metastatic CRC. CONCLUSIONS This study highlights the critical role of advanced feature selection methods, combined with machine learning, in addressing the challenge of class imbalance in medical diagnosis, particularly for CRC. Early detection of metastasis is vital, and the identified genes underscore their importance in the metastatic process of CRC. The methodology applied here offers valuable insights and paves the way for future research in other cancers or diseases that face similar diagnostic challenges.
Collapse
Affiliation(s)
- Hui Yang
- Central Laboratory, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui, China; Anhui Province Key Laboratory of Non-coding RNA Basic and Clinical Transformation, Wuhu, Anhui, China
| | - Jun Liu
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui, China
| | - Na Yang
- Department of Critical Care Medicine, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui, China; Clinical Research Center for Critical Respiratory Medicine of Anhui Province, Wuhu, Anhui, China
| | - Qingsheng Fu
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui, China
| | - Yingying Wang
- Department of Nuclear Medicine, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui 241001, China
| | - Mingquan Ye
- Research Center of Health Big Data Mining and Applications, School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
| | - Shaoneng Tao
- Department of Nuclear Medicine, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui 241001, China.
| | - Xiaocen Liu
- Department of Nuclear Medicine, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui 241001, China.
| | - Qingqing Li
- Research Center of Health Big Data Mining and Applications, School of Medical Information, Wannan Medical College, Wuhu, Anhui, China.
| |
Collapse
|
2
|
Li N, Liang XR, Bai X, Liang XH, Dang LH, Jin QQ, Cao J, Du QX, Sun JH. Novel ratio-expressions of genes enables estimation of wound age in contused skeletal muscle. Int J Legal Med 2024; 138:197-206. [PMID: 37804331 DOI: 10.1007/s00414-023-03095-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 09/18/2023] [Indexed: 10/09/2023]
Abstract
Given that combination with multiple biomarkers may well raise the predictive value of wound age, it appears critically essential to identify new features under the limited cost. For this purpose, the present study explored whether the gene expression ratios provide unique time information as an additional indicator for wound age estimation not requiring the detection of new biomarkers and allowing full use of the available data. The expression levels of four wound-healing genes (Arid5a, Ier3, Stom, and Lcp1) were detected by real-time polymerase chain reaction, and a total of six expression ratios were calculated among these four genes. The results showed that the expression levels of four genes and six ratios of expression changed time-dependent during wound repair. The six expression ratios provided additional temporal information, distinct from the four genes analyzed separately by principal component analysis. The overall performance metrics for cross-validation and external validation of four typical prediction models were improved when six ratios of expression were added as additional input variables. Overall, expression ratios among genes provide temporal information and have excellent potential as predictive markers for wound age estimation. Combining the expression levels of genes with ratio-expression of genes may allow for more accurate estimates of the time of injury.
Collapse
Affiliation(s)
- Na Li
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030604, Shanxi, China
| | - Xin-Rui Liang
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030604, Shanxi, China
| | - Xue Bai
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030604, Shanxi, China
| | - Xin-Hua Liang
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030604, Shanxi, China
| | - Li-Hong Dang
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030604, Shanxi, China
| | - Qian-Qian Jin
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030604, Shanxi, China
| | - Jie Cao
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030604, Shanxi, China
| | - Qiu-Xiang Du
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030604, Shanxi, China.
| | - Jun-Hong Sun
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030604, Shanxi, China.
| |
Collapse
|
3
|
Li C, Wang T, Lin X. Analyzing omics data by feature combinations based on kernel functions. J Bioinform Comput Biol 2023; 21:2350021. [PMID: 37852788 DOI: 10.1142/s021972002350021x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Abstract
Defining meaningful feature (molecule) combinations can enhance the study of disease diagnosis and prognosis. However, feature combinations are complex and various in biosystems, and the existing methods examine the feature cooperation in a single, fixed pattern for all feature pairs, such as linear combination. To identify the appropriate combination between two features and evaluate feature combination more comprehensively, this paper adopts kernel functions to study feature relationships and proposes a new omics data analysis method KF-[Formula: see text]-TSP. Besides linear combination, KF-[Formula: see text]-TSP also explores the nonlinear combination of features, and allows hybridizing multiple kernel functions to evaluate feature interaction from multiple views. KF-[Formula: see text]-TSP selects [Formula: see text] > 0 top-scoring pairs to build an ensemble classifier. Experimental results show that KF-[Formula: see text]-TSP with multiple kernel functions which evaluates feature combinations from multiple views is better than that with only one kernel function. Meanwhile, KF-[Formula: see text]-TSP performs better than TSP family algorithms and the previous methods based on conversion strategy in most cases. It performs similarly to the popular machine learning methods in omics data analysis, but involves fewer feature pairs. In the procedure of physiological and pathological changes, molecular interactions can be both linear and nonlinear. Hence, KF-[Formula: see text]-TSP, which can measure molecular combination from multiple perspectives, can help to mine information closely related to physiological and pathological changes and study disease mechanism.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024, P. R. China
| | - Tianxiang Wang
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024, P. R. China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024, P. R. China
| |
Collapse
|
4
|
Zhao T, Wu H, Wang X, Zhao Y, Wang L, Pan J, Mei H, Han J, Wang S, Lu K, Li M, Gao M, Cao Z, Zhang H, Wan K, Li J, Fang L, Zhang T, Guan X. Integration of eQTL and machine learning to dissect causal genes with pleiotropic effects in genetic regulation networks of seed cotton yield. Cell Rep 2023; 42:113111. [PMID: 37676770 DOI: 10.1016/j.celrep.2023.113111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/19/2023] [Accepted: 08/24/2023] [Indexed: 09/09/2023] Open
Abstract
The dissection of a gene regulatory network (GRN) that complements the genome-wide association study (GWAS) locus and the crosstalk underlying multiple agronomical traits remains a major challenge. In this study, we generate 558 transcriptional profiles of lint-bearing ovules at one day post-anthesis from a selective core cotton germplasm, from which 12,207 expression quantitative trait loci (eQTLs) are identified. Sixty-six known phenotypic GWAS loci are colocalized with 1,090 eQTLs, forming 38 functional GRNs associated predominantly with seed yield. Of the eGenes, 34 exhibit pleiotropic effects. Combining the eQTLs within the seed yield GRNs significantly increases the portion of narrow-sense heritability. The extreme gradient boosting (XGBoost) machine learning approach is applied to predict seed cotton yield phenotypes on the basis of gene expression. Top-ranking eGenes (NF-YB3, FLA2, and GRDP1) derived with pleiotropic effects on yield traits are validated, along with their potential roles by correlation analysis, domestication selection analysis, and transgenic plants.
Collapse
Affiliation(s)
- Ting Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Hongyu Wu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Xutong Wang
- Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Yongyan Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Luyao Wang
- Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Jiaying Pan
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Huan Mei
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Jin Han
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Siyuan Wang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Kening Lu
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Menglin Li
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Mengtao Gao
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Zeyi Cao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Hailin Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Ke Wan
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Jie Li
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Tianzhen Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Xueying Guan
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China.
| |
Collapse
|
5
|
Huang X, Su B, Zhu C, He X, Lin X. Dynamic Network Construction for Identifying Early Warning Signals Based On a Data-Driven Approach: Early Diagnosis Biomarker Discovery for Gastric Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:923-931. [PMID: 35594220 DOI: 10.1109/tcbb.2022.3176319] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
During the development of complex diseases, there is a critical transition from one status to another at a tipping point, which can be an early indicator of disease deterioration. To effectively enhance the performance of early risk identification, a novel dynamic network construction algorithm for identifying early warning signals based on a data-driven approach (EWS-DDA) was proposed. In EWS-DDA, the shrunken centroid was introduced to measure dynamic expression changes in assumed pathway reactions during the progression of complex disease for network construction and to define early warning signals by means of a data-driven approach. We applied EWS-DDA to perform a comprehensive analysis of gene expression profiles of gastric cancer (GC) from The Cancer Genome Atlas database and the Gene Expression Omnibus database. Six crucial genes were selected as potential biomarkers for the early diagnosis of GC. The experimental results of statistical analysis and biological analysis suggested that the six genes play important roles in GC occurrence and development. Then, EWS-DDA was compared with other state-of-the-art network methods to validate its performance. The theoretical analysis and comparison results suggested that EWS-DDA has great potential for a more complete presentation of disease deterioration and effective extraction of early warning information.
Collapse
|
6
|
Huang X, Su B, Wang X, Zhou Y, He X, Liu B. A network-based dynamic criterion for identifying prediction and early diagnosis biomarkers of complex diseases. J Bioinform Comput Biol 2022; 20:2250027. [PMID: 36573886 DOI: 10.1142/s0219720022500275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Lung adenocarcinoma (LUAD) seriously threatens human health and generally results from dysfunction of relevant module molecules, which dynamically change with time and conditions, rather than that of an individual molecule. In this study, a novel network construction algorithm for identifying early warning network signals (IEWNS) is proposed for improving the performance of LUAD early diagnosis. To this end, we theoretically derived a dynamic criterion, namely, the relationship of variation (RV), to construct dynamic networks. RV infers correlation [Formula: see text] statistics to measure dynamic changes in molecular relationships during the process of disease development. Based on the dynamic networks constructed by IEWNS, network warning signals used to represent the occurrence of LUAD deterioration can be defined without human intervention. IEWNS was employed to perform a comprehensive analysis of gene expression profiles of LUAD from The Cancer Genome Atlas (TCGA) database and the Gene Expression Omnibus (GEO) database. The experimental results suggest that the potential biomarkers selected by IEWNS can facilitate a better understanding of pathogenetic mechanisms and help to achieve effective early diagnosis of LUAD. In conclusion, IEWNS provides novel insight into the initiation and progression of LUAD and helps to define prospective biomarkers for assessing disease deterioration.
Collapse
Affiliation(s)
- Xin Huang
- School of Mathematics and Information Science, Anshan Normal University, Anshan, Liaoning 114007, P. R. China
| | - Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, P. R. China
| | - Xingyu Wang
- School of Mathematics and Information Science, Anshan Normal University, Anshan, Liaoning 114007, P. R. China
| | - Yang Zhou
- Liaoning Clinical Research Center for Lung Cancer, The Second Hospital of Dalian Medical University Dalian, Liaoning 116023, P. R. China
| | - Xinyu He
- School of Computer and Information Technology, Liaoning Normal University, Dalian, Liaoning 116029, P. R. China
| | - Bing Liu
- School of Mathematics and Information Science, Anshan Normal University, Anshan, Liaoning 114007, P. R. China
| |
Collapse
|
7
|
WeDIV – An improved k-means clustering algorithm with a weighted distance and a novel internal validation index. EGYPTIAN INFORMATICS JOURNAL 2022. [DOI: 10.1016/j.eij.2022.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
8
|
Li Q, Wang P, Yuan J, Zhou Y, Mei Y, Ye M. A two-stage hybrid gene selection algorithm combined with machine learning models to predict the rupture status in intracranial aneurysms. Front Neurosci 2022; 16:1034971. [PMID: 36340761 PMCID: PMC9631203 DOI: 10.3389/fnins.2022.1034971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/30/2022] [Indexed: 07/31/2023] Open
Abstract
An IA is an abnormal swelling of cerebral vessels, and a subset of these IAs can rupture causing aneurysmal subarachnoid hemorrhage (aSAH), often resulting in death or severe disability. Few studies have used an appropriate method of feature selection combined with machine learning by analyzing transcriptomic sequencing data to identify new molecular biomarkers. Following gene ontology (GO) and enrichment analysis, we found that the distinct status of IAs could lead to differential innate immune responses using all 913 differentially expressed genes, and considering that there are numerous irrelevant and redundant genes, we propose a mixed filter- and wrapper-based feature selection. First, we used the Fast Correlation-Based Filter (FCBF) algorithm to filter a large number of irrelevant and redundant genes in the raw dataset, and then used the wrapper feature selection method based on the he Multi-layer Perceptron (MLP) neural network and the Particle Swarm Optimization (PSO), accuracy (ACC) and mean square error (MSE) were then used as the evaluation criteria. Finally, we constructed a novel 10-gene signature (YIPF1, RAB32, WDR62, ANPEP, LRRCC1, AADAC, GZMK, WBP2NL, PBX1, and TOR1B) by the proposed two-stage hybrid algorithm FCBF-MLP-PSO and used different machine learning models to predict the rupture status in IAs. The highest ACC value increased from 0.817 to 0.919 (12.5% increase), the highest area under ROC curve (AUC) value increased from 0.87 to 0.94 (8.0% increase), and all evaluation metrics improved by approximately 10% after being processed by our proposed gene selection algorithm. Therefore, these 10 informative genes used to predict rupture status of IAs can be used as complements to imaging examinations in the clinic, meanwhile, this selected gene signature also provides new targets and approaches for the treatment of ruptured IAs.
Collapse
Affiliation(s)
- Qingqing Li
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Peipei Wang
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Jinlong Yuan
- Department of Neurosurgery, Yijishan Hospital of Wannan Medical College, Wannan Medical College, Wuhu, Anhui, China
| | - Yunfeng Zhou
- Department of Radiology, Yijishan Hospital of Wannan Medical College, Wannan Medical College, Wuhu, Anhui, China
| | - Yaxin Mei
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Mingquan Ye
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| |
Collapse
|
9
|
Chen Y, Ye Z, Zhang Y, Xie W, Chen Q, Lan C, Yang X, Zeng H, Zhu Y, Ma C, Tang H, Wang Q, Guan J, Chen S, Li F, Yang W, Yan H, Yu X, Zhang Z. A Deep Learning Model for Accurate Diagnosis of Infection Using Antibody Repertoires. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2022; 208:2675-2685. [PMID: 35606050 DOI: 10.4049/jimmunol.2200063] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/11/2022] [Indexed: 06/15/2023]
Abstract
The adaptive immune receptor repertoire consists of the entire set of an individual's BCRs and TCRs and is believed to contain a record of prior immune responses and the potential for future immunity. Analyses of TCR repertoires via deep learning (DL) methods have successfully diagnosed cancers and infectious diseases, including coronavirus disease 2019. However, few studies have used DL to analyze BCR repertoires. In this study, we collected IgG H chain Ab repertoires from 276 healthy control subjects and 326 patients with various infections. We then extracted a comprehensive feature set consisting of 10 subsets of repertoire-level features and 160 sequence-level features and tested whether these features can distinguish between infected individuals and healthy control subjects. Finally, we developed an ensemble DL model, namely, DL method for infection diagnosis (https://github.com/chenyuan0510/DeepID), and used this model to differentiate between the infected and healthy individuals. Four subsets of repertoire-level features and four sequence-level features were selected because of their excellent predictive performance. The DL method for infection diagnosis outperformed traditional machine learning methods in distinguishing between healthy and infected samples (area under the curve = 0.9883) and achieved a multiclassification accuracy of 0.9104. We also observed differences between the healthy and infected groups in V genes usage, clonal expansion, the complexity of reads within clone, the physical properties in the α region, and the local flexibility of the CDR3 amino acid sequence. Our results suggest that the Ab repertoire is a promising biomarker for the diagnosis of various infections.
Collapse
Affiliation(s)
- Yuan Chen
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Zhiming Ye
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- Division of Nephrology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Yanfang Zhang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Wenxi Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Qingyun Chen
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Chunhong Lan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Xiujia Yang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huikun Zeng
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Yan Zhu
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Cuiyu Ma
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Haipei Tang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Qilong Wang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Junjie Guan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Sen Chen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Fenxiang Li
- Department of Infectious Disease Control and Prevention, Center for Disease Control and Prevention of Southern Theatre Command, Guangzhou, China
| | - Wei Yang
- Department of Pathology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huacheng Yan
- Department of Infectious Disease Control and Prevention, Center for Disease Control and Prevention of Southern Theatre Command, Guangzhou, China
| | - Xueqing Yu
- Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China;
- Division of Nephrology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Zhenhai Zhang
- Center for Precision Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China;
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- State Key Laboratory of Organ Failure Research, Division of Nephrology, Southern Medical University, Guangzhou, China; and
- Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou, China
| |
Collapse
|
10
|
Li Q, Yang H, Wang P, Liu X, Lv K, Ye M. XGBoost-based and tumor-immune characterized gene signature for the prediction of metastatic status in breast cancer. J Transl Med 2022; 20:177. [PMID: 35436939 PMCID: PMC9014628 DOI: 10.1186/s12967-022-03369-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 03/26/2022] [Indexed: 12/23/2022] Open
Abstract
Background For a long time, breast cancer has been a leading cancer diagnosed in women worldwide, and approximately 90% of cancer-related deaths are caused by metastasis. For this reason, finding new biomarkers related to metastasis is an urgent task to predict the metastatic status of breast cancer and provide new therapeutic targets. Methods In this research, an efficient model of eXtreme Gradient Boosting (XGBoost) optimized by a grid search algorithm is established to realize auxiliary identification of metastatic breast tumors based on gene expression. Estimated by ten-fold cross-validation, the optimized XGBoost classifier can achieve an overall higher mean AUC of 0.82 compared to other classifiers such as DT, SVM, KNN, LR, and RF. Results A novel 6-gene signature (SQSTM1, GDF9, LINC01125, PTGS2, GVINP1, and TMEM64) was selected by feature importance ranking and a series of in vitro experiments were conducted to verify the potential role of each biomarker. In general, the effects of SQSTM in tumor cells are assigned as a risk factor, while the effects of the other 5 genes (GDF9, LINC01125, PTGS2, GVINP1, and TMEM64) in immune cells are assigned as protective factors. Conclusions Our findings will allow for a more accurate prediction of the metastatic status of breast cancer and will benefit the mining of breast cancer metastasis-related biomarkers. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-022-03369-9.
Collapse
|
11
|
Huang X, Liao Z, Liu B, Tao F, Su B, Lin X. A Novel Method for Constructing Classification Models by Combining Different Biomarker Patterns. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:786-794. [PMID: 32894721 DOI: 10.1109/tcbb.2020.3022076] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Different biomarker patterns, such as those of molecular biomarkers and ratio biomarkers, have their own merits in clinical applications. In this study, a novel machine learning method used in biomedical data analysis for constructing classification models by combining different biomarker patterns (CDBP)is proposed. CDBP uses relative expression reversals to measure the discriminative ability of different biomarker patterns, and selects the pattern with the higher score for classifier construction. The decision boundary of CDBP can be characterized in simple and biologically meaningful manners. The CDBP method was compared with eight state-of-the-art methods on eight gene expression datasets to test its performance. CDBP, with fewer features or ratio features, had the highest classification performance. Subsequently, CDBP was employed to extract crucial diagnostic information from a rat hepatocarcinogenesis metabolomics dataset. The potential biomarkers selected by CDBP provided better classification of hepatocellular carcinoma (HCC)and non-HCC stages than previous works in the animal model. The statistical analyses of these potential biomarkers in an independent human dataset confirmed their discriminative abilities of different liver diseases. These experimental results highlight the potential of CDBP for biomarker identification from high-dimensional biomedical datasets and demonstrate that it can be a useful tool for disease classification.
Collapse
|
12
|
Huang X, Wang Z, Su B, He X, Liu B, Kang B. A computational strategy for metabolic network construction based on the overlapping ratio: Study of patients' metabolic responses to different dialysis patterns. Comput Biol Chem 2021; 93:107539. [PMID: 34246891 DOI: 10.1016/j.compbiolchem.2021.107539] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/25/2021] [Accepted: 07/01/2021] [Indexed: 11/16/2022]
Abstract
BACKGROUND Uremia is a worldwide epidemic disease and poses a serious threat to human health. Both maintenance hemodialysis (HD) and maintenance high flux hemodialysis (HFD) are common treatments for uremia and are generally used in clinical applications. In-depth exploration of patients' metabolic responses to different dialysis patterns can facilitate the understanding of pathological alterations associated with uremia and the effects of different dialysis methods on uremia, which may be used for future personalized therapy. However, due to variations of multiple factors (i.e., genetic, epigenetic and environment) in the process of disease treatments, identification of the similarities and differences in plasma metabolite changes in uremic patients in response to HD and HFD remains challenging. METHODS In this study, a computational strategy for metabolic network construction based on the overlapping ratio (MNC-OR) was proposed for disease treatment effect research. In MNC-OR, the overlapping ratio was introduced to measure metabolic reactions and to construct metabolic networks for analysis of different treatment options. Then, MNC-OR was employed to analyze HD-pattern-dependent changes in plasma metabolites to explore the pathological alterations associated with uremia and the effectiveness of different dialysis patterns (i.e., HD and HFD) on uremia. Based on the networks constructed by MNC-OR, two network analysis techniques, namely, similarity analysis and difference analysis of network topology, were used to find the similarity and differences in metabolic signals in patients under treatment with either HD or HFD, which can facilitate the understanding of pathological alterations associated with uremia and provide the guidance for personalized dialysis therapy. RESULTS Similarity analysis of network topology suggested that abnormal energy metabolism, gut metabolism and pyrimidine metabolism might occur in uremic patients, and maintenance of both HFD and HD therapies have beneficial effects on uremia. Then, difference analysis of network topology was employed to extract the crucial information related to HD-pattern-dependent changes in plasma metabolites. Experimental results indicated that the amino acid metabolism was closer to the normal status in HFD-treated patients; however, in HD-treated patients, the ability of antioxidation showed greater reduction, and the protein O-GlcNAcylation level was higher. Our findings demonstrate the potential of MNC-OR for explaining the metabolic similarities and differences of patients in response to different dialysis methods, thereby contributing to the guidance of personalized dialysis therapy.
Collapse
Affiliation(s)
- Xin Huang
- School of Mathematics and Information Science, Anshan Normal University, Anshan, Liaoning, China.
| | - Zeyu Wang
- School of Mathematics and Information Science, Anshan Normal University, Anshan, Liaoning, China
| | - Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, China
| | - Xinyu He
- School of Computer and Information Technology, Liaoning Normal University, Dalian, Liaoning, China
| | - Bing Liu
- School of Mathematics and Information Science, Anshan Normal University, Anshan, Liaoning, China
| | - Baolin Kang
- School of Mathematics and Information Science, Anshan Normal University, Anshan, Liaoning, China
| |
Collapse
|
13
|
Differential metabolic network construction for personalized medicine: Study of type 2 diabetes mellitus patients' response to gliclazide-modified-release-treated. J Biomed Inform 2021; 118:103796. [PMID: 33932596 DOI: 10.1016/j.jbi.2021.103796] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 02/26/2021] [Accepted: 04/26/2021] [Indexed: 11/21/2022]
Abstract
Individual variation in genetic and environmental factors can cause the differences in metabolic phenotypes, which may have an effect on drug responses of patients. Deep exploration of patients' responses to therapeutic agents is a crucial and urgent event in the personalized treatment study. Using machine learning methods for the discovery of suitability evaluation biomarkers can provide deep insight into the mechanism of disease therapy and facilitate the development of personalized medicine. To find important metabolic network signals for the prediction of patients' drug responses, a novel method referred to as differential metabolic network construction (DMNC) was proposed. In DMNC, concentration changes in metabolite ratios between different pathological states are measured to construct differential metabolic networks, which can be used to advance clinical decision-making. In this study, DMNC was applied to characterize type 2 diabetes mellitus (T2DM) patients' responses against gliclazide modified-release (MR) therapy. Two T2DM metabolomics datasets from different batches of subjects treated by gliclazide MR were analyzed in depth. A network biomarker was defined to assess the patients' suitability for gliclazide MR. It can be effective in the prediction of significant responders from nonsignificant responders, achieving area under the curve values of 0.893 and 1.000 for the discovery and validation sets, respectively. Compared with the metabolites selected by the other methods, the network biomarker selected by DMNC was more stable and precise to reflect the metabolic responses in patients to gliclazide MR therapy, thereby contributing for the personalized medicine of T2DM patients. The better performance of DMNC validated its potential for the identification of network biomarkers to characterize the responses against therapeutic treatments and provide valuable information for personalized medicine.
Collapse
|
14
|
Chen L, Li J, Chang M. Cancer Diagnosis and Disease Gene Identification via Statistical Machine Learning. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200207094947] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Diagnosing cancer and identifying the disease gene by using DNA microarray gene
expression data are the hot topics in current bioinformatics. This paper is devoted to the latest
development in cancer diagnosis and gene selection via statistical machine learning. A support
vector machine is firstly introduced for the binary cancer diagnosis. Then, 1-norm support vector
machine, doubly regularized support vector machine, adaptive huberized support vector machine
and other extensions are presented to improve the performance of gene selection. Lasso, elastic
net, partly adaptive elastic net, group lasso, sparse group lasso, adaptive sparse group lasso and
other sparse regression methods are also introduced for performing simultaneous binary cancer
classification and gene selection. In addition to introducing three strategies for reducing multiclass
to binary, methods of directly considering all classes of data in a learning model (multi_class
support vector, sparse multinomial regression, adaptive multinomial regression and so on) are
presented for performing multiple cancer diagnosis. Limitations and promising directions are also
discussed.
Collapse
Affiliation(s)
- Liuyuan Chen
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Juntao Li
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Mingming Chang
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| |
Collapse
|
15
|
A Wrapper Feature Subset Selection Method Based on Randomized Search and Multilayer Structure. BIOMED RESEARCH INTERNATIONAL 2019; 2019:9864213. [PMID: 31828154 PMCID: PMC6885241 DOI: 10.1155/2019/9864213] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 08/10/2019] [Accepted: 08/27/2019] [Indexed: 12/11/2022]
Abstract
The identification of discriminative features from information-rich data with the goal of clinical diagnosis is crucial in the field of biomedical science. In this context, many machine-learning techniques have been widely applied and achieved remarkable results. However, disease, especially cancer, is often caused by a group of features with complex interactions. Unlike traditional feature selection methods, which only focused on finding single discriminative features, a multilayer feature subset selection method (MLFSSM), which employs randomized search and multilayer structure to select a discriminative subset, is proposed herein. In each level of this method, many feature subsets are generated to assure the diversity of the combinations, and the weights of features are evaluated on the performances of the subsets. The weight of a feature would increase if the feature is selected into more subsets with better performances compared with other features on the current layer. In this manner, the values of feature weights are revised layer-by-layer; the precision of feature weights is constantly improved; and better subsets are repeatedly constructed by the features with higher weights. Finally, the topmost feature subset of the last layer is returned. The experimental results based on five public gene datasets showed that the subsets selected by MLFSSM were more discriminative than the results by traditional feature methods including LVW (a feature subset method used the Las Vegas method for randomized search strategy), GAANN (a feature subset selection method based genetic algorithm (GA)), and support vector machine recursive feature elimination (SVM-RFE). Furthermore, MLFSSM showed higher classification performance than some state-of-the-art methods which selected feature pairs or groups, including top scoring pair (TSP), k-top scoring pairs (K-TSP), and relative simplicity-based direct classifier (RS-DC).
Collapse
|
16
|
A new data analysis method based on feature linear combination. J Biomed Inform 2019; 94:103173. [PMID: 30965135 DOI: 10.1016/j.jbi.2019.103173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 04/02/2019] [Accepted: 04/06/2019] [Indexed: 01/15/2023]
Abstract
In biological data, feature relationships are complex and diverse, they could reflect physiological and pathological changes. Defining simple and efficient classification rules based on feature relationships is helpful for discriminating different conditions and studying disease mechanism. The popular data analysis method, k top scoring pairs (k-TSP), explores the feature relationship by focusing on the difference of the relative level of two features in different groups and classifies samples based on the exploration. To define more efficient classification rules, we propose a new data analysis method based on the linear combination of k > 0 top scoring pairs (LC-k-TSP). LC-k-TSP applies support vector machine (SVM) to define the best linear relationship of each feature pair, scores feature pairs by the discriminative abilities of the corresponding linear combinations and selects k disjoint top scoring pairs to construct an ensemble classifier. Experiments on twelve public datasets showed the superiority of LC-k-TSP over k-TSP which evaluates the relationship of every two features in the same way. The experiment also illustrated that LC-k-TSP performed similarly to SVM and random forest (RF) in accuracy rate. LC-k-TSP studies the own unique linear combination for each feature pair and defines simple classification rules, it is easy to explore the biomedical explanation. Finally, we applied LC-k-TSP to analyze the hepatocellular carcinoma (HCC) metabolomics data and define the simple classification rules for discrimination of different liver diseases. It obtained accuracy rates of 89.76% and 89.13% in distinguishing between small HCC and hepatic cirrhosis (CIR) groups as well as between HCC and CIR groups, superior to 87.99% and 80.35% by k-TSP. Hence, defining classification rules based on feature relationships is an effective way to analyze biological data. LC-k-TSP which checks different feature pairs by their corresponding unique best linear relationship has the superiority over k-TSP which checks each pair by the same linear relationship. Availability and implementation: http://www.402.dicp.ac.cn/download_ok_4.htm.
Collapse
|
17
|
A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. ADV DATA ANAL CLASSI 2018. [DOI: 10.1007/s11634-018-0334-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
18
|
Analyzing omics data by pair-wise feature evaluation with horizontal and vertical comparisons. J Pharm Biomed Anal 2018; 157:20-26. [PMID: 29754039 DOI: 10.1016/j.jpba.2018.04.052] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 04/29/2018] [Accepted: 04/30/2018] [Indexed: 11/24/2022]
Abstract
Feature relationships are complex and may contain important information. k top scoring pairs (k-TSP) studies feature relationships by the horizontal comparison. This study examines feature relationships and proposes vertical and horizontal k-TSP (VH-k-TSP) to identify the discriminative feature pairs by evaluating feature pairs based on the vertical and horizontal comparisons. Complexity is introduced to compute the discriminative abilities of feature pairs by means of these two comparisons. VH-k-TSP was compared with support vector machine-recursive feature elimination, relative simplicity-support vector machine, k-TSP and M-k-TSP on nine public genomics datasets. For multi-class problems, one-to-one method was used. The experiments showed that VH-k-TSP outperformed the four methods in most cases. Then, VH-k-TSP was applied to a metabolomics data of liver disease. An accuracy rate of 88.11 ± 3.30% in discrimination between cirrhosis and hepatocellular carcinoma was obtained by VH-k-TSP, better than 77.39 ± 4.10% and 79.28 ± 3.73% obtained by k-TSP and M-k-TSP, respectively. Hence combining the vertical and horizontal comparisons could define more discriminative feature pairs.
Collapse
|
19
|
Xing P, Chen Y, Gao J, Bai L, Yuan Z. A fast approach to detect gene-gene synergy. Sci Rep 2017; 7:16437. [PMID: 29180805 PMCID: PMC5703944 DOI: 10.1038/s41598-017-16748-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 11/16/2017] [Indexed: 11/26/2022] Open
Abstract
Selecting informative genes, including individually discriminant genes and synergic genes, from expression data has been useful for medical diagnosis and prognosis. Detecting synergic genes is more difficult than selecting individually discriminant genes. Several efforts have recently been made to detect gene-gene synergies, such as dendrogram-based I(X1; X2; Y) (mutual information), doublets (gene pairs) and MIC(X1; X2; Y) based on the maximal information coefficient. It is unclear whether dendrogram-based I(X1; X2; Y) and doublets can capture synergies efficiently. Although MIC(X1; X2; Y) can capture a wide range of interaction, it has a high computational cost triggered by its 3-D search. In this paper, we developed a simple and fast approach based on abs conversion type (i.e. Z = |X1 − X2|) and t-test, to detect interactions in simulation and real-world datasets. Our results showed that dendrogram-based I(X1; X2; Y) and doublets are helpless for discovering pair-wise gene interactions, our approach can discover typical pair-wise synergic genes efficiently. These synergic genes can reach comparable accuracy to the individually discriminant genes using the same number of genes. Classifier cannot learn well if synergic genes have not been converted properly. Combining individually discriminant and synergic genes can improve the prediction performance.
Collapse
Affiliation(s)
- Pengwei Xing
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-making, Hunan Agricultural University, Changsha, Hunan, 410128, China.,Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, Hunan, 410128, China
| | - Yuan Chen
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-making, Hunan Agricultural University, Changsha, Hunan, 410128, China.,Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, Hunan, 410128, China
| | - Jun Gao
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Lianyang Bai
- Biotechnology Research Center, Hunan Academy of Agricultural Sciences, Changsha, Hunan, 410125, China.
| | - Zheming Yuan
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-making, Hunan Agricultural University, Changsha, Hunan, 410128, China. .,Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, Hunan, 410128, China.
| |
Collapse
|
20
|
Huang X, Lin X, Zeng J, Wang L, Yin P, Zhou L, Hu C, Yao W. A Computational Method of Defining Potential Biomarkers based on Differential Sub-Networks. Sci Rep 2017; 7:14339. [PMID: 29085035 PMCID: PMC5662748 DOI: 10.1038/s41598-017-14682-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 10/16/2017] [Indexed: 01/05/2023] Open
Abstract
Analyzing omics data from a network-based perspective can facilitate biomarker discovery. To improve disease diagnosis and identify prospective information indicating the onset of complex disease, a computational method for identifying potential biomarkers based on differential sub-networks (PB-DSN) is developed. In PB-DSN, Pearson correlation coefficient (PCC) is used to measure the relationship between feature ratios and to infer potential networks. A differential sub-network is extracted to identify crucial information for discriminating different groups and indicating the emergence of complex diseases. Subsequently, PB-DSN defines potential biomarkers based on the topological analysis of these differential sub-networks. In this study, PB-DSN is applied to handle a static genomics dataset of small, round blue cell tumors and a time-series metabolomics dataset of hepatocellular carcinoma. PB-DSN is compared with support vector machine-recursive feature elimination, multivariate empirical Bayes statistics, analyzing time-series data based on dynamic networks, molecular networks based on PCC, PinnacleZ, graph-based iterative group analysis, KeyPathwayMiner and BioNet. The better performance of PB-DSN not only demonstrates its effectiveness for the identification of discriminative features that facilitate disease classification, but also shows its potential for the identification of warning signals.
Collapse
Affiliation(s)
- Xin Huang
- School of Computer Science & Technology, Dalian University of Technology, 116024, Dalian, China
| | - Xiaohui Lin
- School of Computer Science & Technology, Dalian University of Technology, 116024, Dalian, China.
| | - Jun Zeng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Lichao Wang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Peiyuan Yin
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Lina Zhou
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Chunxiu Hu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Weihong Yao
- School of Computer Science & Technology, Dalian University of Technology, 116024, Dalian, China
| |
Collapse
|
21
|
Golestan Hashemi FS, Razi Ismail M, Rafii Yusop M, Golestan Hashemi MS, Nadimi Shahraki MH, Rastegari H, Miah G, Aslani F. Intelligent mining of large-scale bio-data: Bioinformatics applications. BIOTECHNOL BIOTEC EQ 2017. [DOI: 10.1080/13102818.2017.1364977] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Affiliation(s)
- Farahnaz Sadat Golestan Hashemi
- Plant Genetics, AgroBioChem Department, Gembloux Agro-Bio Tech, University of Liege, Liege, Belgium
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Razi Ismail
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Rafii Yusop
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mahboobe Sadat Golestan Hashemi
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Mohammad Hossein Nadimi Shahraki
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Hamid Rastegari
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
| | - Gous Miah
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Farzad Aslani
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| |
Collapse
|