1
|
Inference of gene regulatory networks based on the Light Gradient Boosting Machine. Comput Biol Chem 2022; 101:107769. [DOI: 10.1016/j.compbiolchem.2022.107769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 08/12/2022] [Accepted: 09/06/2022] [Indexed: 11/23/2022]
|
2
|
Wang Z, Zhan XX, Liu C, Zhang ZK. Quantification of network structural dissimilarities based on network embedding. iScience 2022; 25:104446. [PMID: 35677641 PMCID: PMC9168171 DOI: 10.1016/j.isci.2022.104446] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 05/01/2022] [Accepted: 05/17/2022] [Indexed: 11/26/2022] Open
Abstract
Quantifying structural dissimilarities between networks is a fundamental and challenging problem in network science. Previous network comparison methods are based on the structural features, such as the length of shortest path and degree, which only contain part of the topological information. Therefore, we propose an efficient network comparison method based on network embedding, which considers the global structural information. In detail, we first construct a distance matrix for each network based on the distances between node embedding vectors derived from DeepWalk. Then, we define the dissimilarity between two networks based on Jensen-Shannon divergence of the distance distributions. Experiments on both synthetic and empirical networks show that our method outperforms the baseline methods and can distinguish networks well. In addition, we show that our method can capture network properties, e.g., average shortest path length and link density. Moreover, the experiment of modularity further implies the functionality of our method.
Collapse
Affiliation(s)
- Zhipeng Wang
- Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Xiu-Xiu Zhan
- Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Chuang Liu
- Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Zi-Ke Zhang
- College of Media and International Culture, Zhejiang University, Hangzhou 310058, PR China
| |
Collapse
|
3
|
Pio G, Mignone P, Magazzù G, Zampieri G, Ceci M, Angione C. Integrating genome-scale metabolic modelling and transfer learning for human gene regulatory network reconstruction. Bioinformatics 2022; 38:487-493. [PMID: 34499112 DOI: 10.1093/bioinformatics/btab647] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 07/23/2021] [Accepted: 09/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Gene regulation is responsible for controlling numerous physiological functions and dynamically responding to environmental fluctuations. Reconstructing the human network of gene regulatory interactions is thus paramount to understanding the cell functional organization across cell types, as well as to elucidating pathogenic processes and identifying molecular drug targets. Although significant effort has been devoted towards this direction, existing computational methods mainly rely on gene expression levels, possibly ignoring the information conveyed by mechanistic biochemical knowledge. Moreover, except for a few recent attempts, most of the existing approaches only consider the information of the organism under analysis, without exploiting the information of related model organisms. RESULTS We propose a novel method for the reconstruction of the human gene regulatory network, based on a transfer learning strategy that synergically exploits information from human and mouse, conveyed by gene-related metabolic features generated in silico from gene expression data. Specifically, we learn a predictive model from metabolic activity inferred via tissue-specific metabolic modelling of artificial gene knockouts. Our experiments show that the combination of our transfer learning approach with the constructed metabolic features provides a significant advantage in terms of reconstruction accuracy, as well as additional clues on the contribution of each constructed metabolic feature. AVAILABILITY AND IMPLEMENTATION The method, the datasets and all the results obtained in this study are available at: https://doi.org/10.6084/m9.figshare.c.5237687. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari 70125, Italy.,Big Data Lab, National Interuniversity Consortium for Informatics (CINI), Rome 00185, Italy
| | - Paolo Mignone
- Department of Computer Science, University of Bari Aldo Moro, Bari 70125, Italy.,Big Data Lab, National Interuniversity Consortium for Informatics (CINI), Rome 00185, Italy
| | - Giuseppe Magazzù
- School of Computing, Engineering & Digital Technologies, Teesside University, Tees Valley TS1 3BA, UK
| | - Guido Zampieri
- School of Computing, Engineering & Digital Technologies, Teesside University, Tees Valley TS1 3BA, UK.,Department of Biology, University of Padova, Padova 35121, Italy
| | - Michelangelo Ceci
- Department of Computer Science, University of Bari Aldo Moro, Bari 70125, Italy.,Big Data Lab, National Interuniversity Consortium for Informatics (CINI), Rome 00185, Italy.,Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana 1000, Slovenia
| | - Claudio Angione
- School of Computing, Engineering & Digital Technologies, Teesside University, Tees Valley TS1 3BA, UK.,Centre for Digital Innovation, Teesside University, Campus Heart, Tees Valley TS1 3BX, UK.,Healthcare Innovation Centre, Teesside University, Campus Heart, Tees Valley TS1 3BX, UK
| |
Collapse
|
4
|
Sehhati M, Tabatabaiefar M, Gholami A, Sattari M. Using classification and K-means methods to predict breast cancer recurrence in gene expression data. JOURNAL OF MEDICAL SIGNALS & SENSORS 2022; 12:122-126. [PMID: 35755980 PMCID: PMC9215834 DOI: 10.4103/jmss.jmss_117_21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 12/20/2021] [Indexed: 11/26/2022]
Abstract
Background: Breast cancer is a type of cancer that starts in the breast tissue and affects about 10% of women at different stages of their lives. In this study, we applied a new method to predict recurrence in biological networks made from gene expression data. Method: The method includes the steps such as data collection, clustering, determining differentiating genes, and classification. The eight techniques consist of random forest, support vector machine and neural network, randomforest + k-means, hidden markov model, joint mutual information, neural network + k-means and suportvector machine + k-menas were implemented on 12172 genes and 200 samples. Results: Thirty genes were considered as differentiating genes which used for the classification. The results showed that random forest + k-means get better performance than other techniques. The two techniques including neural network + k-means and random forest + k-means performed better than other techniques in identifying high risk cases. Conclusion: Thirty of 12,172 genes are considered for classification that the use of clustering has improved the classification techniques performance.
Collapse
|
5
|
Park B, Lee W, Han K. GeneCoNet: A web application server for constructing cancer patient-specific gene correlation networks with prognostic gene pairs. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 212:106465. [PMID: 34715518 DOI: 10.1016/j.cmpb.2021.106465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 10/06/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE Most prognostic gene signatures that have been known for cancer are either individual genes or combination of genes. Both individual genes and combination of genes do not provide information on gene-gene relations, and often have less prognostic significance than random genes associated with cell proliferation. Several methods for generating sample-specific gene networks have been proposed, but programs implementing the methods are not publicly available. METHODS We have developed a method that builds gene correlation networks specific to individual cancer patients and derives prognostic gene correlations from the networks. A gene correlation network specific to a patient is constructed by identifying gene-gene relations that are significantly different from normal samples. Prognostic gene pairs are obtained by carrying out the Cox proportional hazards regression and the log-rank test for every gene pair. RESULTS We built a web application server called GeneCoNet with thousands of tumor samples in TCGA. Given a tumor sample ID of TCGA, GeneCoNet dynamically constructs a gene correlation network specific to the sample as output. As an additional output, it provides information on prognostic gene correlations in the network. GeneCoNet found several prognostic gene correlations for six types of cancer, but there were no prognostic gene pairs common to multiple cancer types. CONCLUSION Extensive analysis of patient-specific gene correlation networks suggests that patients with a larger subnetwork of prognostic gene pairs have shorter survival time than the others and that patients with a subnetwork that contains more genes participating in prognostic gene pairs have shorter survival time than the others. GeneCoNet can be used as a valuable resource for generating gene correlation networks specific to individual patients and for identifying prognostic gene correlations. It is freely accessible at http://geneconet.inha.ac.kr.
Collapse
Affiliation(s)
- Byungkyu Park
- Department of Computer Engineering, Inha University, Incheon, 22212, South Korea
| | - Wook Lee
- Department of Computer Engineering, Inha University, Incheon, 22212, South Korea
| | - Kyungsook Han
- Department of Computer Engineering, Inha University, Incheon, 22212, South Korea. http://biocomputing.inha.ac.kr
| |
Collapse
|
6
|
Wani MA, Garg P, Roy KK. Machine learning-enabled predictive modeling to precisely identify the antimicrobial peptides. Med Biol Eng Comput 2021; 59:2397-2408. [PMID: 34632545 DOI: 10.1007/s11517-021-02443-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 09/14/2021] [Indexed: 10/20/2022]
Abstract
The ubiquitous antimicrobial peptides (AMPs), with a broad range of antimicrobial activities, represent a great promise for combating the multi-drug resistant infections. In this study, using a large and diverse set of AMPs (2638) and non-AMPs (3700), we have explored a variety of machine learning classifiers to build in silico models for AMP prediction, including Random Forest (RF), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), and ensemble learning. Among the various models generated, the RF classifier-based model top-performed in both the internal [Accuracy: 91.40%, Precision: 89.37%, Sensitivity: 90.05%, and Specificity: 92.36%] and external validations [Accuracy: 89.43%, Precision: 88.92%, Sensitivity: 85.21%, and Specificity: 92.43%]. In addition, the RF classifier-based model correctly predicted the known AMPs and non-AMPs; those kept aside as an additional external validation set. The performance assessment revealed three features viz. ChargeD2001, PAAC12 (pseudo amino acid composition), and polarity T13 that are likely to play vital roles in the antimicrobial activity of AMPs. The developed RF-based classification model may further be useful in the design and prediction of the novel potential AMPs.
Collapse
Affiliation(s)
- Mushtaq Ahmad Wani
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, 160062, Punjab, India
| | - Kuldeep K Roy
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India. .,Department of Pharmaceutical Sciences, School of Health Sciences, University of Petroleum and Energy Studies (UPES), P.O. Bidholi, Dehradun, 248007, Uttarakhand, India.
| |
Collapse
|
7
|
Nasiri E, Berahmand K, Rostami M, Dabiri M. A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding. Comput Biol Med 2021; 137:104772. [PMID: 34450380 DOI: 10.1016/j.compbiomed.2021.104772] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 07/29/2021] [Accepted: 08/13/2021] [Indexed: 10/20/2022]
Abstract
The prediction of interactions in protein networks is very critical in various biological processes. In recent years, scientists have focused on computational approaches to predict the interactions of proteins. In protein-protein interaction (PPI) networks, each protein is accompanied by various features, including amino acid sequence, subcellular location, and protein domains. Embedding-based methods have been widely applied for many network analysis tasks, such as link prediction. The Deepwalk algorithm is one of the most popular graph embedding methods that capture the network structure using pure random walking. Here in this paper, we treat the protein-protein interaction prediction problem as a link prediction in attributed networks, and we use an attributed embedding approach to predict the interactions between proteins in the PPI network. In particular, the present paper seeks to present a modified version of Deepwalk based on feature selection for solving link prediction in the protein-protein interaction, which will benefit both network structure and protein features. More specifically the feature selection step consists of two distinct parts. First, a set of relevant features are selected from the original feature set, such that the dimensionality of features is reduced. Second, in the selected set of features, each feature is assigned with a weight based on its significance and therefore the contribution of each feature is distinguished from others. In this method, the new random walk model for link prediction will be introduced by integrating network structure and protein features, based on the assumption that two nodes on the network will be linked since they are nearby in the network. In order to justify the proposal, the authors carry out many experiments on protein-protein interaction networks for comparison with the state-of-the-art network embedding methods. The experimental results from the graphs indicate that our proposed approach is more capable compared to other link prediction approaches and increases the accuracy of prediction.
Collapse
Affiliation(s)
- Elahe Nasiri
- Department of Information Technology and Communications, Azarbaijan Shahid Madani University, Tabriz, Iran.
| | - Kamal Berahmand
- School of Computer Sciences, Department of Science and Engineering, Queensland University of Technology, Brisbane, Australia.
| | - Mehrdad Rostami
- Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran.
| | - Mohammad Dabiri
- Department of Plant Biotechnology, University of Kurdistan, Sanandaj, Iran.
| |
Collapse
|
8
|
Deshmukh PR, Phalnikar R. Information extraction for prognostic stage prediction from breast cancer medical records using NLP and ML. Med Biol Eng Comput 2021; 59:1751-1772. [PMID: 34297300 DOI: 10.1007/s11517-021-02399-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 07/01/2021] [Indexed: 11/24/2022]
Abstract
For cancer prediction, the prognostic stage is the main factor that helps medical experts to decide the optimal treatment for a patient. Specialists study prognostic stage information from medical reports, often in an unstructured form, and take a larger review time. The main objective of this study is to suggest a generic clinical decision-unifying staging method to extract the most reliable prognostic stage information of breast cancer from medical records of various health institutions. Additional prognostic elements should be extracted from medical reports to identify the cancer stage for getting an exact measure of cancer and improving care quality. This study has collected 465 pathological and clinical reports of breast cancer sufferers from India's reputed medical institutions. The unstructured records were found distinct from each institute. Anatomic and biologic factors are extracted from medical records using the natural language processing, machine learning and rule-based method for prognostic stage detection. This study has extracted anatomic stage, grade, estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) from medical reports with high accuracy and predicted prognostic stage for both regions. The prognostic stage prediction's average accuracy is found 92% and 82% in rural and urban areas, respectively. It was essential to combine biological and anatomical elements under a single prognostic staging method. A generic clinical decision-unifying staging method for prognostic stage detection with great accuracy in various institutions of different regional areas suggests that the proposed research improves the prognosis of breast cancer.
Collapse
Affiliation(s)
- Pratiksha R Deshmukh
- School of Computer Engineering and Technology, MIT World Peace University, Pune, India, 411029. .,Department of Computer Engineering and Information Technology, College of Engineering, Pune, 411005, India.
| | - Rashmi Phalnikar
- School of Computer Engineering and Technology, MIT World Peace University, Pune, India, 411029
| |
Collapse
|
9
|
Dogu E, Albayrak YE, Tuncay E. Length of hospital stay prediction with an integrated approach of statistical-based fuzzy cognitive maps and artificial neural networks. Med Biol Eng Comput 2021; 59:483-496. [PMID: 33544271 DOI: 10.1007/s11517-021-02327-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 01/24/2021] [Indexed: 10/22/2022]
Abstract
Chronic obstructive pulmonary disease (COPD) is a global burden, which is estimated to be the third leading cause of death worldwide by 2030. The economic burden of COPD grows continuously because it is not a curable disease. These conditions make COPD an important research field of artificial intelligence (AI) techniques in medicine. In this study, an integrated approach of the statistical-based fuzzy cognitive maps (SBFCM) and artificial neural networks (ANN) is proposed for predicting length of hospital stay of patients with COPD, who admitted to the hospital with an acute exacerbation. The SBFCM method is developed to determine the input variables of the ANN model. The SBFCM conducts statistical analysis to prepare preliminary information for the experts and then collects expert opinions accordingly, to define a conceptual map of the system. The integration of SBFCM and ANN methods provides both statistical data and expert opinion in the prediction model. In the numerical application, the proposed approach outperformed the conventional approach and other machine learning algorithms with 79.95% accuracy, revealing the power of expert opinion involvement in medical decisions. A medical decision support framework is constructed for better prediction of length of hospital stay and more effective hospital management.
Collapse
Affiliation(s)
- Elif Dogu
- Industrial Engineering Dept., Galatasaray University, Ciragan Cad. No.: 36, Ortakoy, 34349, Istanbul, Turkey.
| | - Y Esra Albayrak
- Industrial Engineering Dept., Galatasaray University, Ciragan Cad. No.: 36, Ortakoy, 34349, Istanbul, Turkey
| | - Esin Tuncay
- Yedikule Chest Diseases & Thoracic Surgery Training & Research Hospital, Belgrad Kapi Yolu Cad. No.: 1 34020 Zeytinburnu, Istanbul, Turkey
| |
Collapse
|
10
|
Das J, Barman Mandal S. Classification of Homo sapiens gene behavior using linear discriminant analysis fused with minimum entropy mapping. Med Biol Eng Comput 2021; 59:673-691. [PMID: 33595791 DOI: 10.1007/s11517-021-02324-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 01/18/2021] [Indexed: 11/25/2022]
Abstract
Classification of Homo sapiens gene behavior employing computational biology is a recent research trend. But monitoring gene activity profile and genetic behavior from the alphabetic DNA sequence using a non-invasive method is a tremendous challenge in functional genomics. The present paper addresses such issue and attempts to differentiate Homo sapiens genes using linear discriminant analysis (LDA) method. Annotated protein coding sequences of Homo sapiens genes, collected from NCBI, are taken as test samples. Minimum entropy-based mapping (MEM) technique assists to extract highest information from the numerical DNA sequences. The proposed LDA technique has successfully classified Homo sapiens genes based on the following features: composition of hydrophilic amino acids, dominance of arginine amino acid, and magnitude and size of individual amino acids. The proposed algorithm is successfully tested on 84 Homo sapiens healthy and cancer genes of the prostate and breast cells. Classification performance of the proposed LDA technique is judged by sensitivity (89.12%), specificity (91.9%), accuracy (90.87%), F1 score (92.03%), Matthews' correlation coefficients (81.04%), and miss rate (9.12%), and it outperforms other four existing classifiers. The results are cross-validated through Rayleigh PDF and mutual information technique. Fisher test, 2-sample T-test, and relative entropy test are considered to verify the efficacy of the present classifier.
Collapse
Affiliation(s)
- Joyshri Das
- Institute of Radio Physics & Electronics, University of Calcutta, Kolkata, India
| | - Soma Barman Mandal
- Institute of Radio Physics & Electronics, University of Calcutta, Kolkata, India
| |
Collapse
|
11
|
Zhang Y, Li Y, Deng W, Huang K, Yang C. Complex networks identification using Bayesian model with independent Laplace prior. CHAOS (WOODBURY, N.Y.) 2021; 31:013107. [PMID: 33754749 DOI: 10.1063/5.0031134] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 12/10/2020] [Indexed: 06/12/2023]
Abstract
Identification of complex networks from limited and noise contaminated data is an important yet challenging task, which has attracted researchers from different disciplines recently. In this paper, the underlying feature of a complex network identification problem was analyzed and translated into a sparse linear programming problem. Then, a general framework based on the Bayesian model with independent Laplace prior was proposed to guarantee the sparseness and accuracy of identification results after analyzing influences of different prior distributions. At the same time, a three-stage hierarchical method was designed to resolve the puzzle that the Laplace distribution is not conjugated to the normal distribution. Last, the variational Bayesian was introduced to improve the efficiency of the network reconstruction task. The high accuracy and robust properties of the proposed method were verified by conducting both general synthetic network and real network identification tasks based on the evolutionary game dynamic. Compared with other five classical algorithms, the numerical experiments indicate that the proposed model can outperform these methods in both accuracy and robustness.
Collapse
Affiliation(s)
- Yichi Zhang
- School of Automation, Central South University, Changsha 410083, China
| | - Yonggang Li
- School of Automation, Central South University, Changsha 410083, China
| | - Wenfeng Deng
- School of Automation, Central South University, Changsha 410083, China
| | - Keke Huang
- School of Automation, Central South University, Changsha 410083, China
| | - Chunhua Yang
- School of Automation, Central South University, Changsha 410083, China
| |
Collapse
|
12
|
Jiang F, Yu X, Zhao H, Gong D, Du J. Ensemble learning based on random super-reduct and resampling. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09922-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
13
|
Cong H, Liu H, Chen Y, Cao Y. Self-evoluting framework of deep convolutional neural network for multilocus protein subcellular localization. Med Biol Eng Comput 2020; 58:3017-3038. [PMID: 33078303 DOI: 10.1007/s11517-020-02275-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 10/14/2020] [Indexed: 12/12/2022]
Abstract
In the present paper, deep convolutional neural network (DCNN) is applied to multilocus protein subcellular localization as it is more suitable for multi-class classification. There are two main problems with this application. First, the appropriate features for correlation between multiple sites are hard to find. Second, the classifier structure is difficult to determine as it is greatly affected by the distribution of classified data. To solve these problems, a self-evoluting framework using DCNNs for multilocus protein subcellular localization is proposed. It has three characteristics that the previous algorithms do not. The first is that it combines the ant colony algorithm with the DCNN to form a self-evoluting algorithm for multilocus protein subcellular localization. The second is that it randomly groups subcellular sites using a limited random k-labelsets multi-label classification method. It also solves complex problems in a divide-and-conquer approach and proposes a flexible expansion model. The third is that it realizes the random selection feature extraction method in the positioning process and avoids the defects in individual feature extraction methods. The algorithm in the present paper is tested on the human database, and the overall correct rate is 67.17%, which is higher than that for the stacked self-encoder (SAE), support vector machine (SVM), random forest classifier (RF), or single deep convolutional neural network.Graphical abstract The algorithm mentioned in the present paper mainly includes four parts. They are protein sequence data preprocessing, integrated DCNN model construction, finding optimal DCNN combination by ant colony optimization, and protein subcellular localization for sequences. These parts are sequential relationships and the data obtained in the previous part is the basis for the latter part of the function. In the part of data preprocessing, the limited RAkEL multi-label classification method is used to randomly group subcellular sites. At the same time, the feature fusion of protein sequences is carried out by using multiple feature extraction methods. Each combination including features and sites information corresponds to a DCNN model. In the part of finding optimal DCNN combination by ant colony optimization, the main purpose is to find the best combination of DCNN models through the global optimization ability of the ant colony algorithm. The positioning of sequences is mainly to obtain multilocus subcellular localization by the optimal model combination.
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China.,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China. .,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| |
Collapse
|
14
|
Avuçlu E, Elen A. Evaluation of train and test performance of machine learning algorithms and Parkinson diagnosis with statistical measurements. Med Biol Eng Comput 2020; 58:2775-2788. [PMID: 32920727 DOI: 10.1007/s11517-020-02260-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/29/2020] [Indexed: 01/23/2023]
Abstract
Parkinson's disease is a neurological disorder that causes partial or complete loss of motor reflexes and speech and affects thinking, behavior, and other vital functions affecting the nervous system. Parkinson's disease causes impaired speech and motor abilities (writing, balance, etc.) in about 90% of patients and is often seen in older people. Some signs (deterioration of vocal cords) in medical voice recordings from Parkinson's patients are used to diagnose this disease. The database used in this study contains biomedical speech voice from 31 people of different age and sex related to this disease. The performance comparison of the machine learning algorithms k-Nearest Neighborhood (k-NN), Random Forest, Naive Bayes, and Support Vector Machine classifiers was performed with the used database. Moreover, the best classifier was determined for the diagnosis of Parkinson's disease. Eleven different training and test data (45 × 55, 50 × 50, 55 × 45, 60 × 40, 65 × 35, 70 × 30, 75 × 25, 80 × 20, 85 × 15, 90 × 10, 95 × 5) were processed separately. The data obtained from these training and tests were compared with statistical measurements. The training results of the k-NN classification algorithm were generally 100% successful. The best test result was obtained from Random Forest classifier with 85.81%. All statistical results and measured values are given in detail in the experimental studies section.Graphical abstract.
Collapse
Affiliation(s)
- Emre Avuçlu
- Department of Computer Technology, Aksaray University, Aksaray, Turkey.
| | - Abdullah Elen
- Department of Computer Technology, Karabuk University, Karabuk, Turkey
| |
Collapse
|
15
|
Aerial Scene Classification through Fine-Tuning with Adaptive Learning Rates and Label Smoothing. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10175792] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Remote Sensing (RS) image classification has recently attracted great attention for its application in different tasks, including environmental monitoring, battlefield surveillance, and geospatial object detection. The best practices for these tasks often involve transfer learning from pre-trained Convolutional Neural Networks (CNNs). A common approach in the literature is employing CNNs for feature extraction, and subsequently train classifiers exploiting such features. In this paper, we propose the adoption of transfer learning by fine-tuning pre-trained CNNs for end-to-end aerial image classification. Our approach performs feature extraction from the fine-tuned neural networks and remote sensing image classification with a Support Vector Machine (SVM) model with linear and Radial Basis Function (RBF) kernels. To tune the learning rate hyperparameter, we employ a linear decay learning rate scheduler as well as cyclical learning rates. Moreover, in order to mitigate the overfitting problem of pre-trained models, we apply label smoothing regularization. For the fine-tuning and feature extraction process, we adopt the Inception-v3 and Xception inception-based CNNs, as well the residual-based networks ResNet50 and DenseNet121. We present extensive experiments on two real-world remote sensing image datasets: AID and NWPU-RESISC45. The results show that the proposed method exhibits classification accuracy of up to 98%, outperforming other state-of-the-art methods.
Collapse
|
16
|
Barracchia EP, Pio G, D’Elia D, Ceci M. Prediction of new associations between ncRNAs and diseases exploiting multi-type hierarchical clustering. BMC Bioinformatics 2020; 21:70. [PMID: 32093606 PMCID: PMC7041288 DOI: 10.1186/s12859-020-3392-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 01/22/2020] [Accepted: 01/29/2020] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The study of functional associations between ncRNAs and human diseases is a pivotal task of modern research to develop new and more effective therapeutic approaches. Nevertheless, it is not a trivial task since it involves entities of different types, such as microRNAs, lncRNAs or target genes whose expression also depends on endogenous or exogenous factors. Such a complexity can be faced by representing the involved biological entities and their relationships as a network and by exploiting network-based computational approaches able to identify new associations. However, existing methods are limited to homogeneous networks (i.e., consisting of only one type of objects and relationships) or can exploit only a small subset of the features of biological entities, such as the presence of a particular binding domain, enzymatic properties or their involvement in specific diseases. RESULTS To overcome the limitations of existing approaches, we propose the system LP-HCLUS, which exploits a multi-type hierarchical clustering method to predict possibly unknown ncRNA-disease relationships. In particular, LP-HCLUS analyzes heterogeneous networks consisting of several types of objects and relationships, each possibly described by a set of features, and extracts multi-type clusters that are subsequently exploited to predict new ncRNA-disease associations. The extracted clusters are overlapping, hierarchically organized, involve entities of different types, and allow LP-HCLUS to catch multiple roles of ncRNAs in diseases at different levels of granularity. Our experimental evaluation, performed on heterogeneous attributed networks consisting of microRNAs, lncRNAs, diseases, genes and their known relationships, shows that LP-HCLUS is able to obtain better results with respect to existing approaches. The biological relevance of the obtained results was evaluated according to both quantitative (i.e., TPR@k, Areas Under the TPR@k, ROC and Precision-Recall curves) and qualitative (i.e., according to the consultation of the existing literature) criteria. CONCLUSIONS The obtained results prove the utility of LP-HCLUS to conduct robust predictive studies on the biological role of ncRNAs in human diseases. The produced predictions can therefore be reliably considered as new, previously unknown, relationships among ncRNAs and diseases.
Collapse
Affiliation(s)
- Emanuele Pio Barracchia
- University of Bari Aldo Moro - Department of Computer Science, Via Orabona, 4, Bari, 70125 Italy
| | - Gianvito Pio
- University of Bari Aldo Moro - Department of Computer Science, Via Orabona, 4, Bari, 70125 Italy
| | - Domenica D’Elia
- CNR, Institute for Biomedical Technologies, Bari, 70126 Italy
| | - Michelangelo Ceci
- University of Bari Aldo Moro - Department of Computer Science, Via Orabona, 4, Bari, 70125 Italy
- Big Data Laboratory, National Interuniversity Consortium for Informatics (CINI), Rome, 00185 Italy
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, Ljubljana, 1000 Slovenia
| |
Collapse
|