1
|
Zhuo L, Chen Y, Song B, Liu Y, Su Y. A model for predicting ncRNA-protein interactions based on graph neural networks and community detection. Methods 2022; 207:74-80. [PMID: 36108992 DOI: 10.1016/j.ymeth.2022.09.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 08/07/2022] [Accepted: 09/03/2022] [Indexed: 10/31/2022] Open
Abstract
Non-coding RNA (ncRNA) s play an considerable role in the current biological sciences, such as gene transcription, gene expression, etc. Exploring the ncRNA-protein interactions(NPI) is of great significance, while some experimental techniques are very expensive in terms of time consumption and labor cost. This has promoted the birth of some computational algorithms related to traditional statistics and artificial intelligence. However, these algorithms usually require the sequence or structural feature vector of the molecule. Although graph neural network (GNN) s has been widely used in recent academic and industrial researches, its potential remains unexplored in the field of detecting NPI. Hence, we present a novel GNN-based model to detect NPI in this paper, where the detecting problem of NPI is transformed into the graph link prediction problem. Specifically, the proposed method utilizes two groups of labels to distinguish two different types of nodes: ncRNA and protein, which alleviates the problem of over-coupling in graph network. Subsequently, ncRNA and protein embedding is initially optimized based on the cluster ownership relationship of nodes in the graph. Moreover, the model applies a self-attention mechanism to preserve the graph topology to reduce information loss during pooling. The experimental results indicate that the proposed model indeed has superior performance.
Collapse
Affiliation(s)
- Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, Zhejiang 325035, China; College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yifan Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China.
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Yansen Su
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China.
| |
Collapse
|
2
|
|
3
|
Dutta S, Mallipeddi R, Das KN. Hybrid selection based multi/many-objective evolutionary algorithm. Sci Rep 2022; 12:6861. [PMID: 35478221 PMCID: PMC9046264 DOI: 10.1038/s41598-022-10997-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 04/18/2022] [Indexed: 11/09/2022] Open
Abstract
In the last decade, numerous multi/many-objective evolutionary algorithms (MOEAs) have been proposed to handle multi/many-objective problems (MOPs) with challenges such as discontinuous Pareto Front (PF), degenerate PF, etc. MOEAs in the literature can be broadly divided into three categories based on the selection strategy employed such as dominance, decomposition, and indicator-based MOEAs. Each category of MOEAs have their advantages and disadvantages when solving MOPs with diverse characteristics. In this work, we propose a Hybrid Selection based MOEA, referred to as HS-MOEA, which is a simple yet effective hybridization of dominance, decomposition and indicator-based concepts. In other words, we propose a new environmental selection strategy where the Pareto-dominance, reference vectors and an indicator are combined to effectively balance the diversity and convergence properties of MOEA during the evolution. The superior performance of HS-MOEA compared to the state-of-the-art MOEAs is demonstrated through experimental simulations on DTLZ and WFG test suites with up to 10 objectives.
Collapse
Affiliation(s)
- Saykat Dutta
- Department of Mathematics, National Institute of Technology Silchar, Silchar, India
| | - Rammohan Mallipeddi
- Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea.
| | - Kedar Nath Das
- Department of Mathematics, National Institute of Technology Silchar, Silchar, India
| |
Collapse
|
4
|
|
5
|
Elahi I, Ali H, Asif M, Iqbal K, Ghadi Y, Alabdulkreem E. An evolutionary algorithm for multi-objective optimization of freshwater consumption in textile dyeing industry. PeerJ Comput Sci 2022; 8:e932. [PMID: 35494829 PMCID: PMC9044317 DOI: 10.7717/peerj-cs.932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
Optimization is challenging even after numerous multi-objective evolutionary algorithms have been developed. Most of the multi-objective evolutionary algorithms failed to find out the best solutions spread and took more fitness evolution value to find the best solution. This article proposes an extended version of a multi-objective group counseling optimizer called MOGCO-II. The proposed algorithm is compared with MOGCO, MOPSO, MOCLPSO, and NSGA-II using the well-known benchmark problem such as Zitzler Deb Thieler (ZDT) function. The experiments show that the proposed algorithm generates a better solution than the other algorithms. The proposed algorithm also takes less fitness evolution value to find the optimal Pareto front. Moreover, the textile dyeing industry needs a large amount of fresh water for the dyeing process. After the dyeing process, the textile dyeing industry discharges a massive amount of polluted water, which leads to serious environmental problems. Hence, we proposed a MOGCO-II based optimization scheduling model to reduce freshwater consumption in the textile dyeing industry. The results show that the optimization scheduling model reduces freshwater consumption in the textile dyeing industry by up to 35% compared to manual scheduling.
Collapse
Affiliation(s)
- Ihsan Elahi
- Department of Computer Science, National Textile University, Faisalabad, Punjab, Pakistan
- Department of Computational Sciences, The University of Faisalabad (TUF), Faisalabad, Punjab, Pakistan
| | - Hamid Ali
- Department of Computer Science, National Textile University, Faisalabad, Punjab, Pakistan
| | - Muhammad Asif
- Department of Computer Science, National Textile University, Faisalabad, Punjab, Pakistan
| | - Kashif Iqbal
- Department of Textile Engineering, National Textile University, Faisalabad, Punjab, Pakistan
| | - Yazeed Ghadi
- Department of Software Engineering/Computer Science, Al Ain University, Al Ain, UAE
| | - Eatedal Alabdulkreem
- Computer Sciences Department, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University (PNU), Riyadh, Saudi Arabia
| |
Collapse
|
6
|
Jia Y, Huang S, Zhang T. KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest. Front Genet 2021; 12:811158. [PMID: 34912382 PMCID: PMC8667860 DOI: 10.3389/fgene.2021.811158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 11/15/2021] [Indexed: 02/04/2023] Open
Abstract
DNA-binding protein (DBP) is a protein with a special DNA binding domain that is associated with many important molecular biological mechanisms. Rapid development of computational methods has made it possible to predict DBP on a large scale; however, existing methods do not fully integrate DBP-related features, resulting in rough prediction results. In this article, we develop a DNA-binding protein identification method called KK-DBP. To improve prediction accuracy, we propose a feature extraction method that fuses multiple PSSM features. The experimental results show a prediction accuracy on the independent test dataset PDB186 of 81.22%, which is the highest of all existing methods.
Collapse
Affiliation(s)
- Yuran Jia
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
7
|
Chen X, Lin Y, Qu Q, Ning B, Chen H, Li X. An epistasis and heterogeneity analysis method based on maximum correlation and maximum consistence criteria. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:7711-7726. [PMID: 34814271 DOI: 10.3934/mbe.2021382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Tumor heterogeneity significantly increases the difficulty of tumor treatment. The same drugs and treatment methods have different effects on different tumor subtypes. Therefore, tumor heterogeneity is one of the main sources of poor prognosis, recurrence and metastasis. At present, there have been some computational methods to study tumor heterogeneity from the level of genome, transcriptome, and histology, but these methods still have certain limitations. In this study, we proposed an epistasis and heterogeneity analysis method based on genomic single nucleotide polymorphism (SNP) data. First of all, a maximum correlation and maximum consistence criteria was designed based on Bayesian network score K2 and information entropy for evaluating genomic epistasis. As the number of SNPs increases, the epistasis combination space increases sharply, resulting in a combination explosion phenomenon. Therefore, we next use an improved genetic algorithm to search the SNP epistatic combination space for identifying potential feasible epistasis solutions. Multiple epistasis solutions represent different pathogenic gene combinations, which may lead to different tumor subtypes, that is, heterogeneity. Finally, the XGBoost classifier is trained with feature SNPs selected that constitute multiple sets of epistatic solutions to verify that considering tumor heterogeneity is beneficial to improve the accuracy of tumor subtype prediction. In order to demonstrate the effectiveness of our method, the power of multiple epistatic recognition and the accuracy of tumor subtype classification measures are evaluated. Extensive simulation results show that our method has better power and prediction accuracy than previous methods.
Collapse
Affiliation(s)
- Xia Chen
- School of Basic Education, Changsha Aeronautical Vocational and Technical College, Changsha, Hunan 410124, China
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Yexiong Lin
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Qiang Qu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Bin Ning
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Haowen Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Xiong Li
- School of Software, East China Jiaotong University, Nanchang 330013, China
| |
Collapse
|
8
|
Ru X, Ye X, Sakurai T, Zou Q, Xu L, Lin C. Current status and future prospects of drug-target interaction prediction. Brief Funct Genomics 2021; 20:312-322. [PMID: 34189559 DOI: 10.1093/bfgp/elab031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 06/01/2021] [Accepted: 06/04/2021] [Indexed: 01/09/2023] Open
Abstract
Drug-target interaction prediction is important for drug development and drug repurposing. Many computational methods have been proposed for drug-target interaction prediction due to their potential to the time and cost reduction. In this review, we introduce the molecular docking and machine learning-based methods, which have been widely applied to drug-target interaction prediction. Particularly, machine learning-based methods are divided into different types according to the data processing form and task type. For each type of method, we provide a specific description and propose some solutions to improve its capability. The knowledge of heterogeneous network and learning to rank are also summarized in this review. As far as we know, this is the first comprehensive review that summarizes the knowledge of heterogeneous network and learning to rank in the drug-target interaction prediction. Moreover, we propose three aspects that can be explored in depth for future research.
Collapse
Affiliation(s)
| | - Xiucai Ye
- Department of Computer Science, and Center for Artificial Intelligence Research (C-AIR), University of Tsukuba
| | - Tetsuya Sakurai
- Department of Computer Science and is the director of the C-AIR, University of Tsukuba
| | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| | | |
Collapse
|
9
|
Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs. BIOMED RESEARCH INTERNATIONAL 2020; 2020:9235920. [PMID: 32596396 PMCID: PMC7273372 DOI: 10.1155/2020/9235920] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 04/22/2020] [Indexed: 11/17/2022]
Abstract
Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.
Collapse
|