1
|
Xie X, Xia F, Wu Y, Liu S, Yan K, Xu H, Ji Z. A Novel Feature Selection Strategy Based on Salp Swarm Algorithm for Plant Disease Detection. PLANT PHENOMICS (WASHINGTON, D.C.) 2023; 5:0039. [PMID: 37228513 PMCID: PMC10204742 DOI: 10.34133/plantphenomics.0039] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 02/28/2023] [Indexed: 05/27/2023]
Abstract
Deep learning has been widely used for plant disease recognition in smart agriculture and has proven to be a powerful tool for image classification and pattern recognition. However, it has limited interpretability for deep features. With the transfer of expert knowledge, handcrafted features provide a new way for personalized diagnosis of plant diseases. However, irrelevant and redundant features lead to high dimensionality. In this study, we proposed a swarm intelligence algorithm for feature selection [salp swarm algorithm for feature selection (SSAFS)] in image-based plant disease detection. SSAFS is employed to determine the ideal combination of handcrafted features to maximize classification success while minimizing the number of features. To verify the effectiveness of the developed SSAFS algorithm, we conducted experimental studies using SSAFS and 5 metaheuristic algorithms. Several evaluation metrics were used to evaluate and analyze the performance of these methods on 4 datasets from the UCI machine learning repository and 6 plant phenomics datasets from PlantVillage. Experimental results and statistical analyses validated the outstanding performance of SSAFS compared to existing state-of-the-art algorithms, confirming the superiority of SSAFS in exploring the feature space and identifying the most valuable features for diseased plant image classification. This computational tool will allow us to explore an optimal combination of handcrafted features to improve plant disease recognition accuracy and processing time.
Collapse
Affiliation(s)
- Xiaojun Xie
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Center for Data Science and Intelligent Computing, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Fei Xia
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Yufeng Wu
- State Key Laboratory for Crop Genetics and Germplasm Enhancement, Bioinformatics Center, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Shouyang Liu
- Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Ke Yan
- Department of the Built Environment, College of Design and Engineering, National University of Singapore, 4 Architecture Drive, Singapore 117566, Singapore
| | - Huanliang Xu
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Zhiwei Ji
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Center for Data Science and Intelligent Computing, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| |
Collapse
|
2
|
Zhang Y, Bao W, Cao Y, Cong H, Chen B, Chen Y. A survey on protein–DNA-binding sites in computational biology. Brief Funct Genomics 2022; 21:357-375. [DOI: 10.1093/bfgp/elac009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/07/2022] [Accepted: 04/22/2022] [Indexed: 01/08/2023] Open
Abstract
Abstract
Transcription factors are important cellular components of the process of gene expression control. Transcription factor binding sites are locations where transcription factors specifically recognize DNA sequences, targeting gene-specific regions and recruiting transcription factors or chromatin regulators to fine-tune spatiotemporal gene regulation. As the common proteins, transcription factors play a meaningful role in life-related activities. In the face of the increase in the protein sequence, it is urgent how to predict the structure and function of the protein effectively. At present, protein–DNA-binding site prediction methods are based on traditional machine learning algorithms and deep learning algorithms. In the early stage, we usually used the development method based on traditional machine learning algorithm to predict protein–DNA-binding sites. In recent years, methods based on deep learning to predict protein–DNA-binding sites from sequence data have achieved remarkable success. Various statistical and machine learning methods used to predict the function of DNA-binding proteins have been proposed and continuously improved. Existing deep learning methods for predicting protein–DNA-binding sites can be roughly divided into three categories: convolutional neural network (CNN), recursive neural network (RNN) and hybrid neural network based on CNN–RNN. The purpose of this review is to provide an overview of the computational and experimental methods applied in the field of protein–DNA-binding site prediction today. This paper introduces the methods of traditional machine learning and deep learning in protein–DNA-binding site prediction from the aspects of data processing characteristics of existing learning frameworks and differences between basic learning model frameworks. Our existing methods are relatively simple compared with natural language processing, computational vision, computer graphics and other fields. Therefore, the summary of existing protein–DNA-binding site prediction methods will help researchers better understand this field.
Collapse
|
3
|
Yang B, Bao W, Wang J. Hypertension-Related Drug Activity Identification Based on Novel Ensemble Method. Front Genet 2021; 12:768747. [PMID: 34721551 PMCID: PMC8554208 DOI: 10.3389/fgene.2021.768747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 09/27/2021] [Indexed: 11/21/2022] Open
Abstract
Hypertension is a chronic disease and major risk factor for cardiovascular and cerebrovascular diseases that often leads to damage to target organs. The prevention and treatment of hypertension is crucially important for human health. In this paper, a novel ensemble method based on a flexible neural tree (FNT) is proposed to identify hypertension-related active compounds. In the ensemble method, the base classifiers are Multi-Grained Cascade Forest (gcForest), support vector machines (SVM), random forest (RF), AdaBoost, decision tree (DT), Gradient Boosting Decision Tree (GBDT), KNN, logical regression, and naïve Bayes (NB). The classification results of nine classifiers are utilized as the input vector of FNT, which is utilized as a nonlinear ensemble method to identify hypertension-related drug compounds. The experiment data are extracted from hypertension-unrelated and hypertension-related compounds collected from the up-to-date literature. The results reveal that our proposed ensemble method performs better than other single classifiers in terms of ROC curve, AUC, TPR, FRP, Precision, Specificity, and F1. Our proposed method is also compared with the averaged and voting ensemble methods. The results reveal that our method could identify hypertension-related compounds more accurately than two classical ensemble methods.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Jinglong Wang
- College of Food Science and Pharmaceutical Engineering, Zaozhuang University, Zaozhuang, China
| |
Collapse
|
4
|
Synthesis of bio-based waterborne polyesters as environmentally benign biodegradable material through regulation of unsaturated acid structure. Eur Polym J 2021. [DOI: 10.1016/j.eurpolymj.2021.110632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
5
|
Jones OT, Calanzani N, Saji S, Duffy SW, Emery J, Hamilton W, Singh H, de Wit NJ, Walter FM. Artificial Intelligence Techniques That May Be Applied to Primary Care Data to Facilitate Earlier Diagnosis of Cancer: Systematic Review. J Med Internet Res 2021; 23:e23483. [PMID: 33656443 PMCID: PMC7970165 DOI: 10.2196/23483] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 11/05/2020] [Accepted: 11/30/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND More than 17 million people worldwide, including 360,000 people in the United Kingdom, were diagnosed with cancer in 2018. Cancer prognosis and disease burden are highly dependent on the disease stage at diagnosis. Most people diagnosed with cancer first present in primary care settings, where improved assessment of the (often vague) presenting symptoms of cancer could lead to earlier detection and improved outcomes for patients. There is accumulating evidence that artificial intelligence (AI) can assist clinicians in making better clinical decisions in some areas of health care. OBJECTIVE This study aimed to systematically review AI techniques that may facilitate earlier diagnosis of cancer and could be applied to primary care electronic health record (EHR) data. The quality of the evidence, the phase of development the AI techniques have reached, the gaps that exist in the evidence, and the potential for use in primary care were evaluated. METHODS We searched MEDLINE, Embase, SCOPUS, and Web of Science databases from January 01, 2000, to June 11, 2019, and included all studies providing evidence for the accuracy or effectiveness of applying AI techniques for the early detection of cancer, which may be applicable to primary care EHRs. We included all study designs in all settings and languages. These searches were extended through a scoping review of AI-based commercial technologies. The main outcomes assessed were measures of diagnostic accuracy for cancer. RESULTS We identified 10,456 studies; 16 studies met the inclusion criteria, representing the data of 3,862,910 patients. A total of 13 studies described the initial development and testing of AI algorithms, and 3 studies described the validation of an AI algorithm in independent data sets. One study was based on prospectively collected data; only 3 studies were based on primary care data. We found no data on implementation barriers or cost-effectiveness. Risk of bias assessment highlighted a wide range of study quality. The additional scoping review of commercial AI technologies identified 21 technologies, only 1 meeting our inclusion criteria. Meta-analysis was not undertaken because of the heterogeneity of AI modalities, data set characteristics, and outcome measures. CONCLUSIONS AI techniques have been applied to EHR-type data to facilitate early diagnosis of cancer, but their use in primary care settings is still at an early stage of maturity. Further evidence is needed on their performance using primary care data, implementation barriers, and cost-effectiveness before widespread adoption into routine primary care clinical practice can be recommended.
Collapse
Affiliation(s)
- Owain T Jones
- Primary Care Unit, Department of Public Health & Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Natalia Calanzani
- Primary Care Unit, Department of Public Health & Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Smiji Saji
- Primary Care Unit, Department of Public Health & Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Stephen W Duffy
- Wolfson Institute for Preventive Medicine, Queen Mary University of London, London, United Kingdom
| | - Jon Emery
- Centre for Cancer Research and Department of General Practice, University of Melbourne, Victoria, Australia
| | - Willie Hamilton
- College of Medicine and Health, University of Exeter, Exeter, United Kingdom
| | - Hardeep Singh
- Center for Innovations in Quality, Effectiveness and Safety, Michael E DeBakey Veterans Affairs Medical Center and Baylor College of Medicine, Houston, TX, United States
| | - Niek J de Wit
- Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht, Netherlands
| | - Fiona M Walter
- Primary Care Unit, Department of Public Health & Primary Care, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
6
|
Bayesian networks in healthcare: Distribution by medical condition. Artif Intell Med 2020; 107:101912. [DOI: 10.1016/j.artmed.2020.101912] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/27/2020] [Accepted: 06/09/2020] [Indexed: 12/11/2022]
|
7
|
Das A, Acharya UR, Panda SS, Sabut S. Deep learning based liver cancer detection using watershed transform and Gaussian mixture model techniques. COGN SYST RES 2019. [DOI: 10.1016/j.cogsys.2018.12.009] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
8
|
Yun Y, Jung W, Kim H, Jang BH, Kim MH, Noh J, Ko SG, Choi I. Exploring syndrome differentiation using non-negative matrix factorization and cluster analysis in patients with atopic dermatitis. Comput Biol Med 2017; 87:70-76. [PMID: 28550741 DOI: 10.1016/j.compbiomed.2017.05.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 05/15/2017] [Accepted: 05/20/2017] [Indexed: 10/19/2022]
Abstract
Syndrome differentiation (SD) results in a diagnostic conclusion based on a cluster of concurrent symptoms and signs, including pulse form and tongue color. In Korea, there is a strong interest in the standardization of Traditional Medicine (TM). In order to standardize TM treatment, standardization of SD should be given priority. The aim of this study was to explore the SD, or symptom clusters, of patients with atopic dermatitis (AD) using non-negative factorization methods and k-means clustering analysis. We screened 80 patients and enrolled 73 eligible patients. One TM dermatologist evaluated the symptoms/signs using an existing clinical dataset from patients with AD. This dataset was designed to collect 15 dermatologic and 18 systemic symptoms/signs associated with AD. Non-negative matrix factorization was used to decompose the original data into a matrix with three features and a weight matrix. The point of intersection of the three coordinates from each patient was placed in three-dimensional space. With five clusters, the silhouette score reached 0.484, and this was the best silhouette score obtained from two to nine clusters. Patients were clustered according to the varying severity of concurrent symptoms/signs. Through the distribution of the null hypothesis generated by 10,000 permutation tests, we found significant cluster-specific symptoms/signs from the confidence intervals in the upper and lower 2.5% of the distribution. Patients in each cluster showed differences in symptoms/signs and severity. In a clinical situation, SD and treatment are based on the practitioners' observations and clinical experience. SD, identified through informatics, can contribute to development of standardized, objective, and consistent SD for each disease.
Collapse
Affiliation(s)
- Younghee Yun
- Department of Ophthalmology, Otorhinolaryngology, and Dermatology of Korean Medicine, Kyung Hee University Hospital at Gangdong, Seoul, 05278, Republic of Korea
| | - Wonmo Jung
- Acupuncture and Meridian Science Research Center, College of Korean Medicine, Kyung Hee University, Seoul, 130-701, Republic of Korea; Department of Science in Korean Medicine, Graduate School, Kyung Hee University Korean Medicine Hospital, Seoul, 130-701, Republic of Korea
| | - Hyunho Kim
- Department of Biofunctional Medicine & Diagnostics, Kyung Hee University Korean Medicine Hospital, Seoul, 130-701, Republic of Korea
| | - Bo-Hyoung Jang
- Department of Preventive Medicine, College of Korean Medicine, Kyung Hee University, Seoul 130-701, Republic of Korea
| | - Min-Hee Kim
- Department of Ophthalmology, Otorhinolaryngology, and Dermatology of Korean Medicine, Kyung Hee University Hospital at Gangdong, Seoul, 05278, Republic of Korea; Department of Clinical Korean Medicine, Graduate School, Kyung Hee University, Seoul, 130-701, Republic of Korea
| | - Jiseong Noh
- Department of Anesthesiology and Pain Medicine, Graduate School, Kyung Hee University, Seoul 130-701, Republic of Korea
| | - Seong-Gyu Ko
- Department of Preventive Medicine, College of Korean Medicine, Kyung Hee University, Seoul 130-701, Republic of Korea
| | - Inhwa Choi
- Department of Ophthalmology, Otorhinolaryngology, and Dermatology of Korean Medicine, Kyung Hee University Hospital at Gangdong, Seoul, 05278, Republic of Korea; Department of Ophthalmology, Otorhinolaryngology and Dermatology of Korean Medicine, Kyung Hee University, Seoul, 130-701, Republic of Korea.
| |
Collapse
|
9
|
Yan K, Ji Z, Shen W. Online fault detection methods for chillers combining extended kalman filter and recursive one-class SVM. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.09.076] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
10
|
Ji Z, Meng G, Huang D, Yue X, Wang B. NMFBFS: A NMF-Based Feature Selection Method in Identifying Pivotal Clinical Symptoms of Hepatocellular Carcinoma. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:846942. [PMID: 26579207 PMCID: PMC4633688 DOI: 10.1155/2015/846942] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2015] [Revised: 06/20/2015] [Accepted: 07/02/2015] [Indexed: 01/05/2023]
Abstract
BACKGROUND Hepatocellular carcinoma (HCC) is a highly aggressive malignancy. Traditional Chinese Medicine (TCM), with the characteristics of syndrome differentiation, plays an important role in the comprehensive treatment of HCC. This study aims to develop a nonnegative matrix factorization- (NMF-) based feature selection approach (NMFBFS) to identify potential clinical symptoms for HCC patient stratification. METHODS The NMFBFS approach consisted of three major steps. Firstly, statistics-based preliminary feature screening was designed to detect and remove irrelevant symptoms. Secondly, NMF was employed to infer redundant symptoms. Based on NMF-derived basis matrix, we defined a novel similarity measurement of intersymptoms. Finally, we converted each group of redundant symptoms to a new single feature so that the dimension was further reduced. RESULTS Based on a clinical dataset consisting of 407 patient samples of HCC with 57 symptoms, NMFBFS approach detected 8 irrelevant symptoms and then identified 16 redundant symptoms within 6 groups. Finally, an optimal feature subset with 39 clinical features was generated after compressing the redundant symptoms by groups. The validation of classification performance shows that these 39 features obviously improve the prediction accuracy of HCC patients. CONCLUSIONS Compared with other methods, NMFBFS has obvious advantages in identifying important clinical features of HCC.
Collapse
Affiliation(s)
- Zhiwei Ji
- Machine Learning & Systems Biology Lab, School of Electronics and Information Engineering, Tongji University, 4800 Caoan Road, Shanghai 201804, China
- School of Information Engineering, Zhejiang A&F University, 88 Huancheng North Road, Linan 311300, China
| | - Guanmin Meng
- Department of Clinical Laboratory, Tongde Hospital of Zhejiang Province, 234th Gucui Road, Hangzhou 310012, China
| | - Deshuang Huang
- Machine Learning & Systems Biology Lab, School of Electronics and Information Engineering, Tongji University, 4800 Caoan Road, Shanghai 201804, China
| | - Xiaoqiang Yue
- Department of Traditional Chinese Medicine, Changzheng Hospital, Second Military Medical University, 415 Fengyang Road, Shanghai 200003, China
| | - Bing Wang
- Machine Learning & Systems Biology Lab, School of Electronics and Information Engineering, Tongji University, 4800 Caoan Road, Shanghai 201804, China
- The Advanced Research Institute of Intelligent Sensing Network, Tongji University, 4800 Caoan Road, Shanghai 201804, China
- The Key Laboratory of Embedded System and Service Computing, Tongji University, 4800 Caoan Road, Shanghai 201804, China
| |
Collapse
|
11
|
Huang LL, Zhang Y, Zhang JX, He LJ, Lai YR, Liao YJ, Tian XP, Deng HX, Liang YJ, Kung HF, Xie D, Zhu SL. Overexpression of NKX6.1 is closely associated with progressive features and predicts unfavorable prognosis in human primary hepatocellular carcinoma. Tumour Biol 2015; 36:4405-15. [PMID: 25596704 DOI: 10.1007/s13277-015-3080-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 01/08/2015] [Indexed: 12/24/2022] Open
Abstract
The homeobox gene NKX6.1 was recently identified in cervical tumors. This study was designed to explore the clinical and prognostic significance of NKX6.1 further in patients with primary hepatocellular carcinoma (HCC). The expression levels of NKX6.1 were examined using real-time PCR, Western blotting, and immunohistochemistry in HCC cell lines and HCC tissues. The invasion capability of cell lines following silencing or overexpression of NKX6.1 was investigated by Transwell assay. Cells proliferation was tested by MTT assays. Epithelial-mesenchymal transition (EMT) marker expression levels were detected in relation to NKX6.1 expression. Correlation between NKX6.1 immunohistochemical staining, clinicopathologic parameters, and follow-up data of HCC patients was analyzed statistically. NKX6.1 expression was higher in HCC tissues compared to the adjacent noncancerous tissue. NKX6.1 overexpression was significantly correlated with tumor size, tumor differentiation, clinical stage, metastasis, and relapse. Kaplan-Meier analysis revealed that NKX6.1 overexpression was related to unfavorable 5-year disease-free survival and overall survival. Importantly, multivariate analysis indicated that NKX6.1 overexpression was an independent unfavorable marker for overall survival. Moreover, a significant relationship was observed between NKX6.1 and EMT marker expression levels, and NKX6.1 knockdown inhibited cell invasion, and overexpression of NKX6.1 promotes cell proliferation in vitro. NKX6.1 is upregulated in HCC and is a reliable prognostic marker for patients with HCC.
Collapse
Affiliation(s)
- Lin-Lin Huang
- Department of Gastroenterology and Hepatology, The First Affiliated Hospital, Sun Yat-sen University, 58 Zhongshan 2nd Road, Guangzhou, 510080, China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|