1
|
Ladjal M, Bouamar M, Brik Y, Djerioui M. A decision fusion method based on classification models for water quality monitoring. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:22532-22549. [PMID: 36301387 DOI: 10.1007/s11356-022-23418-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Accepted: 09/29/2022] [Indexed: 06/16/2023]
Abstract
Monitoring of water quality is one of the world's main intentions for countries. Classification techniques based on support vector machines (SVMs) and artificial neural network (ANN) has been widely used in several applications of water research. Water quality assessment with high accuracy and efficiency with innovational approaches permitted us to acquire additional knowledge and information to obtain an intelligent monitoring system. In this paper, we present the use of principal component analysis (PCA) combined with SVM and ANN with decision templates combination data fusion method. PCA was used for features selection from original database. The multi-layer perceptron network (MLP) and the one-against-all strategy for SVM method have been widely used. Decision templates are applied to increase the accuracy of the water quality classification. The specific classification approach was employed to assess the water quality of the Tilesdit dam in Algeria as a study area, defined with a dataset of eight physicochemical parameters collected in the period 2009-2018, such as temperature, pH, electrical conductivity, and turbidity. The selection of the excellent parameters of the used models can be improving the performance of classification process. In order to assess their results, an experiment step using collected dataset corresponding to the accuracy and running time of training and test phases, and robustness to noise, is carried out. Various scenarios are examined in comparative study to obtain the most results of decision step with and without feature selection of the input data. From the results, we found that the integration of SVM and ANN with PCA yields accuracy up than 98%. The combination by decision templates of two classifiers SVM and ANN with PCA yields an accuracy of 99.24% using k-fold cross-validation. The combination data fusion enhanced expressively the results of the proposed monitoring framework that had proven a considerable ability in surface water quality assessment.
Collapse
Affiliation(s)
- Mohamed Ladjal
- LASS, Laboratory of Analysis of Signals and Systems, Department of Electronics, Faculty of Technology, University of M'sila, M'sila, Algeria.
| | - Mohamed Bouamar
- LASS, Laboratory of Analysis of Signals and Systems, Department of Electronics, Faculty of Technology, University of M'sila, M'sila, Algeria
| | - Youcef Brik
- LASS, Laboratory of Analysis of Signals and Systems, Department of Electronics, Faculty of Technology, University of M'sila, M'sila, Algeria
| | - Mohamed Djerioui
- LASS, Laboratory of Analysis of Signals and Systems, Department of Electronics, Faculty of Technology, University of M'sila, M'sila, Algeria
| |
Collapse
|
2
|
Muggia L, Ametrano CG, Sterflinger K, Tesei D. An Overview of Genomics, Phylogenomics and Proteomics Approaches in Ascomycota. Life (Basel) 2020; 10:E356. [PMID: 33348904 PMCID: PMC7765829 DOI: 10.3390/life10120356] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 12/10/2020] [Accepted: 12/12/2020] [Indexed: 12/26/2022] Open
Abstract
Fungi are among the most successful eukaryotes on Earth: they have evolved strategies to survive in the most diverse environments and stressful conditions and have been selected and exploited for multiple aims by humans. The characteristic features intrinsic of Fungi have required evolutionary changes and adaptations at deep molecular levels. Omics approaches, nowadays including genomics, metagenomics, phylogenomics, transcriptomics, metabolomics, and proteomics have enormously advanced the way to understand fungal diversity at diverse taxonomic levels, under changeable conditions and in still under-investigated environments. These approaches can be applied both on environmental communities and on individual organisms, either in nature or in axenic culture and have led the traditional morphology-based fungal systematic to increasingly implement molecular-based approaches. The advent of next-generation sequencing technologies was key to boost advances in fungal genomics and proteomics research. Much effort has also been directed towards the development of methodologies for optimal genomic DNA and protein extraction and separation. To date, the amount of proteomics investigations in Ascomycetes exceeds those carried out in any other fungal group. This is primarily due to the preponderance of their involvement in plant and animal diseases and multiple industrial applications, and therefore the need to understand the biological basis of the infectious process to develop mechanisms for biologic control, as well as to detect key proteins with roles in stress survival. Here we chose to present an overview as much comprehensive as possible of the major advances, mainly of the past decade, in the fields of genomics (including phylogenomics) and proteomics of Ascomycota, focusing particularly on those reporting on opportunistic pathogenic, extremophilic, polyextremotolerant and lichenized fungi. We also present a review of the mostly used genome sequencing technologies and methods for DNA sequence and protein analyses applied so far for fungi.
Collapse
Affiliation(s)
- Lucia Muggia
- Department of Life Sciences, University of Trieste, 34127 Trieste, Italy
| | - Claudio G. Ametrano
- Grainger Bioinformatics Center, Department of Science and Education, The Field Museum, Chicago, IL 60605, USA;
| | - Katja Sterflinger
- Academy of Fine Arts Vienna, Institute of Natual Sciences and Technology in the Arts, 1090 Vienna, Austria;
| | - Donatella Tesei
- Department of Biotechnology, University of Natural Resources and Life Sciences, 1190 Vienna, Austria;
| |
Collapse
|
3
|
Abstract
Background:
Revealing the subcellular location of a newly discovered protein can
bring insight into their function and guide research at the cellular level. The experimental methods
currently used to identify the protein subcellular locations are both time-consuming and expensive.
Thus, it is highly desired to develop computational methods for efficiently and effectively identifying
the protein subcellular locations. Especially, the rapidly increasing number of protein sequences
entering the genome databases has called for the development of automated analysis methods.
Methods:
In this review, we will describe the recent advances in predicting the protein subcellular
locations with machine learning from the following aspects: i) Protein subcellular location benchmark
dataset construction, ii) Protein feature representation and feature descriptors, iii) Common
machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web
servers.
Result & Conclusion:
Concomitant with a large number of protein sequences generated by highthroughput
technologies, four future directions for predicting protein subcellular locations with
machine learning should be paid attention. One direction is the selection of novel and effective features
(e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins.
Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth
one is the protein multiple location sites prediction.
Collapse
Affiliation(s)
- Ting-He Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
4
|
Prediction of drug-target interaction by integrating diverse heterogeneous information source with multiple kernel learning and clustering methods. Comput Biol Chem 2019; 78:460-467. [DOI: 10.1016/j.compbiolchem.2018.11.028] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 11/30/2018] [Accepted: 11/30/2018] [Indexed: 02/08/2023]
|
5
|
Gene Prediction in Metagenomic Fragments with Deep Learning. BIOMED RESEARCH INTERNATIONAL 2017; 2017:4740354. [PMID: 29250541 PMCID: PMC5698827 DOI: 10.1155/2017/4740354] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/08/2017] [Indexed: 01/14/2023]
Abstract
Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features) and using deep stacking networks learning model, we present a novel method (called Meta-MFDL) to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.
Collapse
|
6
|
Qiao S, Yan B, Li J. Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features. APPL INTELL 2017. [DOI: 10.1007/s10489-017-1029-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
7
|
Yan XY, Zhang SW, Zhang SY. Prediction of drug–target interaction by label propagation with mutual interaction information derived from heterogeneous network. MOLECULAR BIOSYSTEMS 2016; 12:520-31. [DOI: 10.1039/c5mb00615e] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
By implementing label propagation on drug/target similarity network with mutual interaction information derived from drug–target heterogeneous network, LPMIHN algorithm identifies potential drug–target interactions.
Collapse
Affiliation(s)
- Xiao-Ying Yan
- Key Laboratory of Information Fusion Technology of Ministry of Education
- School of Automation
- Northwestern Polytechnical University
- Xi'an
- China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education
- School of Automation
- Northwestern Polytechnical University
- Xi'an
- China
| | - Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education
- School of Automation
- Northwestern Polytechnical University
- Xi'an
- China
| |
Collapse
|
8
|
mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. Anal Biochem 2015; 473:14-27. [DOI: 10.1016/j.ab.2014.10.014] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Revised: 09/29/2014] [Accepted: 10/21/2014] [Indexed: 01/16/2023]
|
9
|
Li L, Yu S, Xiao W, Li Y, Hu W, Huang L, Zheng X, Zhou S, Yang H. Protein submitochondrial localization from integrated sequence representation and SVM-based backward feature extraction. MOLECULAR BIOSYSTEMS 2015; 11:170-7. [PMID: 25335193 DOI: 10.1039/c4mb00340c] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Mitochondrion, a tiny energy factory, plays an important role in various biological processes of most eukaryotic cells.
Collapse
Affiliation(s)
- Liqi Li
- Department of General Surgery
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Sanjiu Yu
- Institute of Cardiovascular Diseases of PLA
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Weidong Xiao
- Department of General Surgery
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Yongsheng Li
- Institute of Cancer
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Wenjuan Hu
- Department of Pathophysiology and High Altitude Pathology
- College of High Altitude Military Medicine
- Third Military Medical University
- Chongqing 400038
- China
| | - Lan Huang
- Institute of Cardiovascular Diseases of PLA
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Xiaoqi Zheng
- Department of Mathematics
- Shanghai Normal University
- Shanghai 200234
- China
| | - Shiwen Zhou
- National Drug Clinical Trial Institution
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Hua Yang
- Department of General Surgery
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| |
Collapse
|
10
|
Fan XN, Zhang SW. lncRNA-MFDL: identification of human long non-coding RNAs by fusing multiple features and using deep learning. MOLECULAR BIOSYSTEMS 2015; 11:892-7. [DOI: 10.1039/c4mb00650j] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
By fusing multiple features and using deep learning algorithms, a lncRNA-MFDL predictor was developed to identify lncRNAs, which is much more effective and robust.
Collapse
Affiliation(s)
- Xiao-Nan Fan
- Key Laboratory of Information Fusion Technology of Ministry of Education
- School of Automation
- Northwestern Polytechnical University
- Xi'an
- China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education
- School of Automation
- Northwestern Polytechnical University
- Xi'an
- China
| |
Collapse
|
11
|
Dehzangi A, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A. Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. J Theor Biol 2014; 364:284-94. [PMID: 25264267 DOI: 10.1016/j.jtbi.2014.09.029] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 08/11/2014] [Accepted: 09/17/2014] [Indexed: 11/17/2022]
Abstract
Protein subcellular localization is defined as predicting the functioning location of a given protein in the cell. It is considered an important step towards protein function prediction and drug design. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve protein subcellular localization prediction performance. However, relying solely on GO, this problem remains unsolved. At the same time, the impact of other sources of features especially evolutionary-based features has not been explored adequately for this task. In this study, we aim to extract discriminative evolutionary features to tackle this problem. To do this, we propose two segmentation based feature extraction methods to explore potential local evolutionary-based information for Gram-positive and Gram-negative subcellular localizations. We will show that by applying a Support Vector Machine (SVM) classifier to our extracted features, we are able to enhance Gram-positive and Gram-negative subcellular localization prediction accuracies by up to 6.4% better than previous studies including the studies that used GO for feature extraction.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; National ICT Australia (NICTA), Brisbane, Australia.
| | - Rhys Heffernan
- School of Engineering, Griffith University, Brisbane, Australia
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; School of Engineering and Physics, University of the South Pacific, Fiji
| | - James Lyons
- School of Engineering, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- School of Engineering, Griffith University, Brisbane, Australia
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; National ICT Australia (NICTA), Brisbane, Australia
| |
Collapse
|
12
|
Zhang SW, Zhang TH, Zhang JN, Huang Y. Prediction of Signal Peptide Cleavage Sites with Subsite-Coupled and Template Matching Fusion Algorithm. Mol Inform 2014; 33:230-9. [DOI: 10.1002/minf.201300077] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2013] [Accepted: 01/13/2014] [Indexed: 12/22/2022]
|
13
|
Prediction of protein-protein interaction with pairwise kernel support vector machine. Int J Mol Sci 2014; 15:3220-33. [PMID: 24566145 PMCID: PMC3958907 DOI: 10.3390/ijms15023220] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Revised: 01/27/2014] [Accepted: 01/29/2014] [Indexed: 11/17/2022] Open
Abstract
Protein–protein interactions (PPIs) play a key role in many cellular processes. Unfortunately, the experimental methods currently used to identify PPIs are both time-consuming and expensive. These obstacles could be overcome by developing computational approaches to predict PPIs. Here, we report two methods of amino acids feature extraction: (i) distance frequency with PCA reducing the dimension (DFPCA) and (ii) amino acid index distribution (AAID) representing the protein sequences. In order to obtain the most robust and reliable results for PPI prediction, pairwise kernel function and support vector machines (SVM) were employed to avoid the concatenation order of two feature vectors generated with two proteins. The highest prediction accuracies of AAID and DFPCA were 94% and 93.96%, respectively, using the 10 CV test, and the results of pairwise radial basis kernel function are considerably improved over those based on radial basis kernel function. Overall, the PPI prediction tool, termed PPI-PKSVM, which is freely available at http://159.226.118.31/PPI/index.html, promises to become useful in such areas as bio-analysis and drug development.
Collapse
|