1
|
Shen C, Cao Y, Qi GQ, Huang J, Liu ZP. Discovering pathway biomarkers of hepatocellular carcinoma occurrence and development by dynamic network entropy analysis. Gene 2023; 873:147467. [PMID: 37164125 DOI: 10.1016/j.gene.2023.147467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/26/2023] [Accepted: 05/03/2023] [Indexed: 05/12/2023]
Abstract
OBJECTIVE Gene expression profiling techniques measure the transcription of thousands of genes in a parallel manner. With more and more hepatocellular carcinoma (HCC) transcriptomic data becoming available, the high-throughput data provides an unprecedented opportunity to discover HCC diagnostic biomarkers. In this work, we propose a bioinformatics method based on dynamic network entropy analysis, called DNEA, to identify potential pathway biomarkers for HCC occurrence and development by integrating transcriptome and interactome. METHODS We firstly collect the pathways documented in different knowledge-bases and then impose the genome-wide human transcriptomic data of multistage cancerous tissues during the development and progression of HCC. After linking the gene sets of pathways into individual connected networks, we map the corresponding gene expression information onto these pathways. The dynamic network entropy of individual pathways is calculated to evaluate its activities and dysfunctionalities during the disease occurrence and development. We use the overall significant difference in the entropic dynamics during the time course to prioritize distinctive pathways during disease progression. Then machine learning classification methods are employed to screen out pathway biomarkers with the classification ability to distinguish different-stage samples of HCC progression. RESULTS Pathway biomarkers discovered based on DNEA demonstrate good classification performance in measuring HCC progression. The classification accuracy is as follows: DNA replication pathway (mean AUC= 0.82, 20 genes) from KEGG, FMLP pathway (mean AUC=0.84, 14 genes) from BioCarta, and downstream signaling of activated FGFR pathway (mean AUC =0.80, 15 genes) from Reactome. At the same time, previous studies have shown that these genes and pathways screened are closely related to the occurrence and development of HCC in terms of oncogenesis dysfunctions. CONCLUSIONS Our method for cancer biomarker discovery based on dynamic network entropy analysis is effective and efficient in identifying pathway biomarkers related to the progression of complex diseases.
Collapse
Affiliation(s)
- Chen Shen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China; Department of Data and Information, The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310052, China; Sino-Finland Joint AI Laboratory for Child Health of Zhejiang Province, Hangzhou, Zhejiang 310052, China
| | - Yi Cao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China; Center for Biomedical Engineering, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Guo-Qiang Qi
- Department of Data and Information, The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310052, China; Sino-Finland Joint AI Laboratory for Child Health of Zhejiang Province, Hangzhou, Zhejiang 310052, China
| | - Jian Huang
- Department of Data and Information, The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310052, China; Sino-Finland Joint AI Laboratory for Child Health of Zhejiang Province, Hangzhou, Zhejiang 310052, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China.
| |
Collapse
|
2
|
Liu J, Ding D, Zhong J, Liu R. Identifying the critical states and dynamic network biomarkers of cancers based on network entropy. J Transl Med 2022; 20:254. [PMID: 35668489 PMCID: PMC9172070 DOI: 10.1186/s12967-022-03445-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/17/2022] [Indexed: 02/07/2023] Open
Abstract
Background There are sudden deterioration phenomena during the progression of many complex diseases, including most cancers; that is, the biological system may go through a critical transition from one stable state (the normal state) to another (the disease state). It is of great importance to predict this critical transition or the so-called pre-disease state so that patients can receive appropriate and timely medical care. In practice, however, this critical transition is usually difficult to identify due to the high nonlinearity and complexity of biological systems. Methods In this study, we employed a model-free computational method, local network entropy (LNE), to identify the critical transition/pre-disease states of complex diseases. From a network perspective, this method effectively explores the key associations among biomolecules and captures their dynamic abnormalities. Results Based on LNE, the pre-disease states of ten cancers were successfully detected. Two types of new prognostic biomarkers, optimistic LNE (O-LNE) and pessimistic LNE (P-LNE) biomarkers, were identified, enabling identification of the pre-disease state and evaluation of prognosis. In addition, LNE helps to find “dark genes” with nondifferential gene expression but differential LNE values. Conclusions The proposed method effectively identified the critical transition states of complex diseases at the single-sample level. Our study not only identified the critical transition states of ten cancers but also provides two types of new prognostic biomarkers, O-LNE and P-LNE biomarkers, for further practical application. The method in this study therefore has great potential in personalized disease diagnosis. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-022-03445-0.
Collapse
Affiliation(s)
- Juntan Liu
- School of Mathematics, South China University of Technology, Guangzhou, 510640, China
| | - Dandan Ding
- Department of Thoracic Surgery, Affiliated Cancer Hospital & Institute of Guangzhou Medical University, Guangzhou, 510095, China
| | - Jiayuan Zhong
- School of Mathematics, South China University of Technology, Guangzhou, 510640, China. .,School of Mathematics and Big Data, Foshan University, Foshan, 528000, China.
| | - Rui Liu
- School of Mathematics, South China University of Technology, Guangzhou, 510640, China. .,Pazhou Lab, Guangzhou, 510330, China.
| |
Collapse
|
3
|
Qiao X, Zhang X, Chen W, Xu X, Chen YW, Liu ZP. tensorGSEA: Detecting Differential Pathways in Type 2 Diabetes via Tensor-Based Data Reconstruction. Interdiscip Sci 2022; 14:520-531. [PMID: 35195883 DOI: 10.1007/s12539-022-00506-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 01/24/2022] [Accepted: 02/07/2022] [Indexed: 06/14/2023]
Abstract
Detecting significant signaling pathways in disease progression highlights the dysfunctions and pathogenic mechanisms of complex disease development. Since tensor decomposition has been proven effective for multi-dimensional data representation and reconstruction, differences between original and tensor-processed data are expected to extract crucial information and differential indication. This paper provides a tensor-based gene set enrichment analysis, called tensorGSEA, based on a data reconstruction method to identify relevant significant pathways during disease development. As a proof-of-concept study, we identify the differential pathways of diabetes in rats. Specifically, we first arrange gene expression profiles of each documented pathway as tensors with three dimensions: genes, samples, and periods. Then we compress tensors into core tensors with lower ranks. The pathways with lower reconstruction rates are obtained after reconstructing gene expression profiles in another state via these cores. Thus, differences underlying pathways are extracted by cross-state data reconstruction between controls and diseases. The experiments reveal several critical pathways with diabetes-specific functions which otherwise cannot be identified by alternative methods. Our proposed tensorGSEA is efficient in evaluating pathways by achieving their empirical statistical significance, respectively. The classification experiments demonstrate that the selected pathways can be implemented as biomarkers to identify the diabetic state. The code of tensorGSEA is available at https://github.com/zhxr37/tensorGSEA .
Collapse
Affiliation(s)
- Xu Qiao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China
| | - Xianru Zhang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China
| | - Wei Chen
- Shandong Provincial Key Laboratory of Oral Tissue Regeneration, School of Stomatology, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Xin Xu
- Shandong Provincial Key Laboratory of Oral Tissue Regeneration, School of Stomatology, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China
| | - Yen-Wei Chen
- Graduate School of Information Science and Engineering, Ritsumeikan University, Shiga, 525-8577, Japan
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China.
| |
Collapse
|
4
|
Li C, Gao Z, Su B, Xu G, Lin X. Data analysis methods for defining biomarkers from omics data. Anal Bioanal Chem 2021; 414:235-250. [PMID: 34951658 DOI: 10.1007/s00216-021-03813-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/26/2021] [Accepted: 11/29/2021] [Indexed: 02/01/2023]
Abstract
Omics mainly includes genomics, epigenomics, transcriptomics, proteomics and metabolomics. The rapid development of omics technology has opened up new ways to study disease diagnosis and prognosis and to define prospective information of complex diseases. Since omics data are usually large and complex, the method used to analyze the data and to define important information is crucial in omics study. In this review, we focus on advances in biomarker discovery methods based on omics data in the last decade, and categorize them as individual feature analysis, combinatorial feature analysis and network analysis. We also discuss the challenges and perspectives in this field.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Zhenbo Gao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| |
Collapse
|
5
|
Zhang H, Zhao Y, Zhao D, Chen X, Khan NU, Liu X, Zheng Q, Liang Y, Zhu Y, Iqbal J, Lin J, Shen L. Potential biomarkers identified in plasma of patients with gestational diabetes mellitus. Metabolomics 2021; 17:99. [PMID: 34739593 DOI: 10.1007/s11306-021-01851-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 10/29/2021] [Indexed: 12/26/2022]
Abstract
Gestational diabetes mellitus (GDM) is a common complication during pregnancy. Looking for reliable diagnostic markers for early diagnosis can reduce the impact of the disease on the fetus OBJECTIVE: The present study is designed to find plasma metabolites that can be used as potential biomarkers for GDM, and to clarify GDM-related mechanisms METHODS: By non-target metabolomics analysis, compared with their respective controls, the plasma metabolites of GDM pregnant women at 12-16 weeks and 24-28 weeks of pregnancy were analyzed. Multiple reaction monitoring (MRM) analysis was performed to verify the potential marker RESULTS: One hundred and seventy-two (172) and 478 metabolites were identified as differential metabolites in the plasma of GDM pregnant women at 12-16 weeks and 24-28 weeks of pregnancy, respectively. Among these, 40 metabolites were overlapped. Most of them are associated with the mechanism of diabetes, and related to short-term and long-term complications in the perinatal period. Among them, 7 and 10 differential metabolites may serve as potential biomarkers at the 12-16 weeks and 24-28 weeks of pregnancy, respectively. By MRM analysis, compared with controls, increased levels of 17(S)-HDoHE and sebacic acid may serve as early prediction biomarkers of GDM. At 24-28 weeks of pregnancy, elevated levels of 17(S)-HDoHE and L-Serine may be used as auxiliary diagnostic markers for GDM CONCLUSION: Abnormal amino acid metabolism and lipid metabolism in patients with GDM may be related to GDM pathogenesis. Several differential metabolites identified in this study may serve as potential biomarkers for GDM prediction and diagnosis.
Collapse
Affiliation(s)
- Huajie Zhang
- College of Life Science and Oceanography, Shenzhen University, Shenzhen, 518071, People's Republic of China
| | - Yuxi Zhao
- College of Life Science and Oceanography, Shenzhen University, Shenzhen, 518071, People's Republic of China
| | - Danqing Zhao
- Department of Obstetrics and Gynecology, Affiliated Hospital of Guizhou Medical University, Guiyang, 550004, People's Republic of China
| | - Xinqian Chen
- College of Life Science and Oceanography, Shenzhen University, Shenzhen, 518071, People's Republic of China
| | - Naseer Ullah Khan
- College of Life Science and Oceanography, Shenzhen University, Shenzhen, 518071, People's Republic of China
| | - Xukun Liu
- College of Life Science and Oceanography, Shenzhen University, Shenzhen, 518071, People's Republic of China
| | - Qihong Zheng
- College of Life Science and Oceanography, Shenzhen University, Shenzhen, 518071, People's Republic of China
| | - Yi Liang
- Department of Obstetrics and Gynecology, Affiliated Hospital of Guizhou Medical University, Guiyang, 550004, People's Republic of China
| | - Yuhua Zhu
- Department of Obstetrics and Gynecology, Affiliated Hospital of Guizhou Medical University, Guiyang, 550004, People's Republic of China
| | - Javed Iqbal
- College of Life Science and Oceanography, Shenzhen University, Shenzhen, 518071, People's Republic of China
| | - Jing Lin
- College of Life Science and Oceanography, Shenzhen University, Shenzhen, 518071, People's Republic of China
- Shenzhen Key Laboratory of Marine Biotechnology and Ecology, Shenzhen, 518071, People's Republic of China
| | - Liming Shen
- College of Life Science and Oceanography, Shenzhen University, Shenzhen, 518071, People's Republic of China.
- Brain Disease and Big Data Research Institute, Shenzhen University, Shenzhen, 518071, People's Republic of China.
| |
Collapse
|
6
|
|
7
|
Basu S, Johnson KT, Berkowitz SA. Use of Machine Learning Approaches in Clinical Epidemiological Research of Diabetes. Curr Diab Rep 2020; 20:80. [PMID: 33270183 DOI: 10.1007/s11892-020-01353-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/26/2020] [Indexed: 12/12/2022]
Abstract
PURPOSE OF REVIEW Machine learning approaches-which seek to predict outcomes or classify patient features by recognizing patterns in large datasets-are increasingly applied to clinical epidemiology research on diabetes. Given its novelty and emergence in fields outside of biomedical research, machine learning terminology, techniques, and research findings may be unfamiliar to diabetes researchers. Our aim was to present the use of machine learning approaches in an approachable way, drawing from clinical epidemiological research in diabetes published from 1 Jan 2017 to 1 June 2020. RECENT FINDINGS Machine learning approaches using tree-based learners-which produce decision trees to help guide clinical interventions-frequently have higher sensitivity and specificity than traditional regression models for risk prediction. Machine learning approaches using neural networking and "deep learning" can be applied to medical image data, particularly for the identification and staging of diabetic retinopathy and skin ulcers. Among the machine learning approaches reviewed, researchers identified new strategies to develop standard datasets for rigorous comparisons across older and newer approaches, methods to illustrate how a machine learner was treating underlying data, and approaches to improve the transparency of the machine learning process. Machine learning approaches have the potential to improve risk stratification and outcome prediction for clinical epidemiology applications. Achieving this potential would be facilitated by use of universal open-source datasets for fair comparisons. More work remains in the application of strategies to communicate how the machine learners are generating their predictions.
Collapse
Affiliation(s)
- Sanjay Basu
- Center for Primary Care, Harvard Medical School, Boston, MA, USA.
- Research and Population Health, Collective Health, San Francisco, CA, USA.
- School of Public Health, Imperial College London, London, SW7, UK.
| | - Karl T Johnson
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Seth A Berkowitz
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
8
|
Li J, Zhang L, Li H, Ping Y, Xu Q, Wang R, Tan R, Wang Z, Liu B, Wang Y. Integrated entropy-based approach for analyzing exons and introns in DNA sequences. BMC Bioinformatics 2019; 20:283. [PMID: 31182012 PMCID: PMC6557737 DOI: 10.1186/s12859-019-2772-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Numerous essential algorithms and methods, including entropy-based quantitative methods, have been developed to analyze complex DNA sequences since the last decade. Exons and introns are the most notable components of DNA and their identification and prediction are always the focus of state-of-the-art research. RESULTS In this study, we designed an integrated entropy-based analysis approach, which involves modified topological entropy calculation, genomic signal processing (GSP) method and singular value decomposition (SVD), to investigate exons and introns in DNA sequences. We optimized and implemented the topological entropy and the generalized topological entropy to calculate the complexity of DNA sequences, highlighting the characteristics of repetition sequences. By comparing digitalizing entropy values of exons and introns, we observed that they are significantly different. After we converted DNA data to numerical topological entropy value, we applied SVD method to effectively investigate exon and intron regions on a single gene sequence. Additionally, several genes across five species are used for exon predictions. CONCLUSIONS Our approach not only helps to explore the complexity of DNA sequence and its functional elements, but also provides an entropy-based GSP method to analyze exon and intron regions. Our work is feasible across different species and extendable to analyze other components in both coding and noncoding region of DNA sequences.
Collapse
Affiliation(s)
- Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Li Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Huinian Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Yuan Ping
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Qingzhe Xu
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Rongjie Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| | - Renjie Tan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| | - Zhen Wang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| |
Collapse
|
9
|
Abstract
Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer.
Collapse
|
10
|
Zheng W, Wang D, Zou X. Control of multilayer biological networks and applied to target identification of complex diseases. BMC Bioinformatics 2019; 20:271. [PMID: 31138124 PMCID: PMC6540418 DOI: 10.1186/s12859-019-2841-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 04/22/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Networks have been widely used to model the structures of various biological systems. The ultimate aim of research on biological networks is to steer biological system structures to desired states by manipulating signals. Despite great advances in the linear control of single-layer networks, it has been observed that many complex biological systems have a multilayer networked structure and extremely complicated nonlinear processes. RESULT In this study, we propose a general framework for controlling nonlinear dynamical systems with multilayer networked structures by formulating the problem as a minimum union optimization problem. In particular, we offer a novel approach for identifying the minimal driver nodes that can steer a multilayered nonlinear dynamical system toward any desired dynamical attractor. Three disease-related biology multilayer networks are used to demonstrate the effectiveness of our approaches. Moreover, in the set of minimum driver nodes identified by the algorithm we proposed, we confirmed that some nodes can act as drug targets in the biological experiments. Other nodes have not been reported as drug targets; however, they are also involved in important biological processes from existing literature. CONCLUSIONS The proposed method could be a promising tool for determining higher drug target enrichment or more meaningful steering nodes for studying complex diseases.
Collapse
Affiliation(s)
- Wei Zheng
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
| | - Dingjie Wang
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
11
|
Maniruzzaman M, Jahanur Rahman M, Ahammed B, Abedin MM, Suri HS, Biswas M, El-Baz A, Bangeas P, Tsoulfas G, Suri JS. Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 176:173-193. [PMID: 31200905 DOI: 10.1016/j.cmpb.2019.04.008] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Revised: 02/28/2019] [Accepted: 04/08/2019] [Indexed: 02/08/2023]
Abstract
OBJECTIVE A colon microarray data is a repository of thousands of gene expressions with different strengths for each cancer cell. It is necessary to detect which genes are responsible for cancer growth. This study presents an exhaustive comparative study of different machine learning (ML) systems which serves two major purposes: (a) identification of high risk differential genes using statistical tests and (b) development of a ML strategy for predicting cancer genes. METHODS Four statistical tests namely: Wilcoxon sign rank sum (WCSRS), t test, Kruskal-Wallis (KW), and F-test were adapted for cancerous gene identification using their p-values. The extracted gene set was used to classify cancer patients using ten classifiers namely: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB), Gaussian process classification (GPC), support vector machine (SVM), artificial neural network (ANN), logistic regression (LR), decision tree (DT), Adaboost (AB), and random forest (RF). Performance was then evaluated using cross-validation protocols and standardized metrics viz. accuracy (ACC) and area under the curve (AUC). RESULTS The colon cancer dataset consists of 2000 genes from 62 patients (40 cancer vs. 22 control). The overall mean ACC of our ML system using all four statistical tests and all ten classifiers was 90.50%. The ML system showed an ACC of 99.81% using a combination WCSRS test and RF-based classifier. This is an improvement of 8% over previously published values in literature. CONCLUSIONS RF-based model with statistical tests for detection of high risk genes showed the best performance for accurate cancer classification in multi-center clinical trials.
Collapse
Affiliation(s)
- Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna, Bangladesh; Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
| | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | | | | | - Mainak Biswas
- Advanced Knowledge Engineering Centre, Global Biomedical Technologies, Inc., Roseville, CA, USA
| | - Ayman El-Baz
- Department of Bioengineering, University of Louisville, Louisville, Kentucky, USA
| | - Petros Bangeas
- Department of Surgery, Papageorgiou Hospital, Aristotle University Thessaloniki, Greece
| | - Georgios Tsoulfas
- Department of Surgery, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Jasjit S Suri
- Advanced Knowledge Engineering Centre, Global Biomedical Technologies, Inc., Roseville, CA, USA; AtheroPoint, Roseville, CA, USA.
| |
Collapse
|
12
|
DeepFog: Fog Computing-Based Deep Neural Architecture for Prediction of Stress Types, Diabetes and Hypertension Attacks. COMPUTATION 2018. [DOI: 10.3390/computation6040062] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The use of wearable and Internet-of-Things (IoT) for smart and affordable healthcare is trending. In traditional setups, the cloud backend receives the healthcare data and performs monitoring and prediction for diseases, diagnosis, and wellness prediction. Fog computing (FC) is a distributed computing paradigm that leverages low-power embedded processors in an intermediary node between the client layer and cloud layer. The diagnosis for wellness and fitness monitoring could be transferred to the fog layer from the cloud layer. Such a paradigm leads to a reduction in latency at an increased throughput. This paper processes a fog-based deep learning model, DeepFog that collects the data from individuals and predicts the wellness stats using a deep neural network model that can handle heterogeneous and multidimensional data. The three important abnormalities in wellness namely, (i) diabetes; (ii) hypertension attacks and (iii) stress type classification were chosen for experimental studies. We performed a detailed analysis of proposed models’ accuracy on standard datasets. The results validated the efficacy of the proposed system and architecture for accurate monitoring of these critical wellness and fitness criteria. We used standard datasets and open source software tools for our experiments.
Collapse
|