1
|
Fahmy AM, Hammad MS, Mabrouk MS, Al-Atabany WI. On leveraging self-supervised learning for accurate HCV genotyping. Sci Rep 2024; 14:15463. [PMID: 38965254 PMCID: PMC11224313 DOI: 10.1038/s41598-024-64209-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 06/06/2024] [Indexed: 07/06/2024] Open
Abstract
Hepatitis C virus (HCV) is a major global health concern, affecting millions of individuals worldwide. While existing literature predominantly focuses on disease classification using clinical data, there exists a critical research gap concerning HCV genotyping based on genomic sequences. Accurate HCV genotyping is essential for patient management and treatment decisions. While the neural models excel at capturing complex patterns, they still face challenges, such as data scarcity, that exist a lot in computational genomics. To overcome this challenges, this paper introduces an advanced deep learning approach for HCV genotyping based on the graphical representation of nucleotide sequences that outperforms classical approaches. Notably, it is effective for both partial and complete HCV genomes and addresses challenges associated with imbalanced datasets. In this work, ten HCV genotypes: 1a, 1b, 2a, 2b, 2c, 3a, 3b, 4, 5, and 6 were used in the analysis. This study utilizes Chaos Game Representation for 2D mapping of genomic sequences, employing self-supervised learning using convolutional autoencoder for deep feature extraction, resulting in an outstanding performance for HCV genotyping compared to various machine learning and deep learning models. This baseline provides a benchmark against which the performance of the proposed approach and other models can be evaluated. The experimental results showcase a remarkable classification accuracy of over 99%, outperforming traditional deep learning models. This performance demonstrates the capability of the proposed model to accurately identify HCV genotypes in both partial and complete sequences and in dealing with data scarcity for certain genotypes. The results of the proposed model are compared to NCBI genotyping tool.
Collapse
Affiliation(s)
- Ahmed M Fahmy
- Computer Science program, School of Information Technology and Computer Science (ITCS), Nile University, Sheikh Zayed City, Egypt.
| | - Muhammed S Hammad
- Biomedical Engineering Department, Faculty of Engineering, Helwan University, Cairo, Egypt
| | - Mai S Mabrouk
- Biomedical informatics program, School of Information Technology and Computer Science (ITCS), Nile University, Sheikh Zayed City, Egypt
| | - Walid I Al-Atabany
- Biomedical informatics program, School of Information Technology and Computer Science (ITCS), Nile University, Sheikh Zayed City, Egypt
| |
Collapse
|
2
|
Liang Y, Ma X. iACP-GE: accurate identification of anticancer peptides by using gradient boosting decision tree and extra tree. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:1-19. [PMID: 36562289 DOI: 10.1080/1062936x.2022.2160011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]
Abstract
Cancer is one of the main diseases threatening human life, accounting for millions of deaths around the world each year. Traditional physical and chemical methods for cancer treatment are extremely time-consuming, lab-intensive, expensive, inefficient and difficult to be applied in a high-throughput way. Hence, it is an urgent task to develop automated computational methods to enable fast and accurate identification of anticancer peptides (ACPs). In this paper, we develop a novel model named iACP-GE to identify ACPs. Multi-features are extracted by using binary encoding, enhanced grouped amino acid composition and BLOSUM62 encoding based on the N5C5 sequence, as well as detrended forward moving-average auto-cross correlation analysis based on physicochemical properties of 20 natural amino acids. Thus, 835 features are obtained for each sample, in order to avoid information redundancy, gradient boosting decision tree was adopted as the feature selection strategy. Then, the optimal feature subset is input to the extra tree classifier. The accuracies of ACP740 and ACP240 datasets with the 5-fold cross-validation were 90.54% and 91.25%, respectively. Experimental results indicate that iACP-GE significantly outperforms several existing models on ACP740 and ACP240 datasets and can be used as an effective tool for the identification of ACPs. The datasets and source codes for iACP-GE are available at https://github.com/yunyunliang88/iACP-GE.
Collapse
Affiliation(s)
- Y Liang
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| | - X Ma
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| |
Collapse
|
3
|
Xie XH, Huang YJ, Han GS, Yu ZG, Ma YL. Microbial characterization based on multifractal analysis of metagenomes. Front Cell Infect Microbiol 2023; 13:1117421. [PMID: 36779183 PMCID: PMC9910082 DOI: 10.3389/fcimb.2023.1117421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/09/2023] [Indexed: 01/28/2023] Open
Abstract
Introduction The species diversity of microbiomes is a cutting-edge concept in metagenomic research. In this study, we propose a multifractal analysis for metagenomic research. Method and Results Firstly, we visualized the chaotic game representation (CGR) of simulated metagenomes and real metagenomes. We find that metagenomes are visualized with self-similarity. Then we defined and calculated the multifractal dimension for the visualized plot of simulated and real metagenomes, respectively. By analyzing the Pearson correlation coefficients between the multifractal dimension and the traditional species diversity index, we obtain that the correlation coefficients between the multifractal dimension and the species richness index and Shannon diversity index reached the maximum value when q = 0, 1, and the correlation coefficient between the multifractal dimension and the Simpson diversity index reached the maximum value when q = 5. Finally, we apply our method to real metagenomes of the gut microbiota of 100 infants who are newborn and 4 and 12 months old. The results show that the multifractal dimensions of an infant's gut microbiomes can distinguish age differences. Conclusion and Discussion There is self-similarity among the CGRs of WGS of metagenomes, and the multifractal spectrum is an important characteristic for metagenomes. The traditional diversity indicators can be unified under the framework of multifractal analysis. These results coincided with similar results in macrobial ecology. The multifractal spectrum of infants' gut microbiomes are related to the development of the infants.
Collapse
Affiliation(s)
- Xian-hua Xie
- Key Laboratory of Jiangxi Province for Numerical Simulation and Emulation Techniques, Gannan Normal University, Ganzhoiu, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
- *Correspondence: Xian-hua Xie,
| | - Yu-jie Huang
- Key Laboratory of Jiangxi Province for Numerical Simulation and Emulation Techniques, Gannan Normal University, Ganzhoiu, China
| | - Guo-sheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Zu-guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Yuan-lin Ma
- School of Economics, Zhengzhou University of Aeronautics, Zhengzhou, China
| |
Collapse
|
4
|
ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides. Int J Mol Sci 2022; 23:ijms232012194. [PMID: 36293050 PMCID: PMC9603247 DOI: 10.3390/ijms232012194] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 10/08/2022] [Accepted: 10/11/2022] [Indexed: 11/30/2022] Open
Abstract
Cancer is the second-leading cause of death worldwide, and therapeutic peptides that target and destroy cancer cells have received a great deal of interest in recent years. Traditional wet experiments are expensive and inefficient for identifying novel anticancer peptides; therefore, the development of an effective computational approach is essential to recognize ACP candidates before experimental methods are used. In this study, we proposed an Ada-boosting algorithm with the base learner random forest called ACP-ADA, which integrates binary profile feature, amino acid index, and amino acid composition with a 210-dimensional feature space vector to represent the peptides. Training samples in the feature space were augmented to increase the sample size and further improve the performance of the model in the case of insufficient samples. Furthermore, we used five-fold cross-validation to find model parameters, and the cross-validation results showed that ACP-ADA outperforms existing methods for this feature combination with data augmentation in terms of performance metrics. Specifically, ACP-ADA recorded an average accuracy of 86.4% and a Mathew’s correlation coefficient of 74.01% for dataset ACP740 and 90.83% and 81.65% for dataset ACP240; consequently, it can be a very useful tool in drug development and biomedical research.
Collapse
|
5
|
To Assist Oncologists: An Efficient Machine Learning-Based Approach for Anti-Cancer Peptides Classification. SENSORS 2022; 22:s22114005. [PMID: 35684624 PMCID: PMC9185351 DOI: 10.3390/s22114005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 05/19/2022] [Accepted: 05/20/2022] [Indexed: 12/10/2022]
Abstract
In the modern technological era, Anti-cancer peptides (ACPs) have been considered a promising cancer treatment. It’s critical to find new ACPs to ensure a better knowledge of their functioning processes and vaccine development. Thus, timely and efficient ACPs using a computational technique are highly needed because of the enormous peptide sequences generated in the post-genomic era. Recently, numerous adaptive statistical algorithms have been developed for separating ACPs and NACPs. Despite great advancements, existing approaches still have insufficient feature descriptors and learning methods, limiting predictive performance. To address this, a trustworthy framework is developed for the precise identification of ACPs. Particularly, the presented approach incorporates four hypothetical feature encoding mechanisms namely: amino acid, dipeptide, tripeptide, and an improved version of pseudo amino acid composition are applied to indicate the motif of the target class. Moreover, principal component analysis (PCA) is employed for feature pruning, while selecting optimal, deep, and highly variated features. Due to the diverse nature of learning, experiments are performed over numerous algorithms to select the optimum operating method. After investigating the empirical outcomes, the support vector machine with hybrid feature space shows better performance. The proposed framework achieved an accuracy of 97.09% and 98.25% over the benchmark and independent datasets, respectively. The comparative analysis demonstrates that our proposed model outperforms as compared to the existing methods and is beneficial in drug development, and oncology.
Collapse
|
6
|
Leysen H, Walter D, Christiaenssen B, Vandoren R, Harputluoğlu İ, Van Loon N, Maudsley S. GPCRs Are Optimal Regulators of Complex Biological Systems and Orchestrate the Interface between Health and Disease. Int J Mol Sci 2021; 22:ijms222413387. [PMID: 34948182 PMCID: PMC8708147 DOI: 10.3390/ijms222413387] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/08/2021] [Accepted: 12/09/2021] [Indexed: 02/06/2023] Open
Abstract
GPCRs arguably represent the most effective current therapeutic targets for a plethora of diseases. GPCRs also possess a pivotal role in the regulation of the physiological balance between healthy and pathological conditions; thus, their importance in systems biology cannot be underestimated. The molecular diversity of GPCR signaling systems is likely to be closely associated with disease-associated changes in organismal tissue complexity and compartmentalization, thus enabling a nuanced GPCR-based capacity to interdict multiple disease pathomechanisms at a systemic level. GPCRs have been long considered as controllers of communication between tissues and cells. This communication involves the ligand-mediated control of cell surface receptors that then direct their stimuli to impact cell physiology. Given the tremendous success of GPCRs as therapeutic targets, considerable focus has been placed on the ability of these therapeutics to modulate diseases by acting at cell surface receptors. In the past decade, however, attention has focused upon how stable multiprotein GPCR superstructures, termed receptorsomes, both at the cell surface membrane and in the intracellular domain dictate and condition long-term GPCR activities associated with the regulation of protein expression patterns, cellular stress responses and DNA integrity management. The ability of these receptorsomes (often in the absence of typical cell surface ligands) to control complex cellular activities implicates them as key controllers of the functional balance between health and disease. A greater understanding of this function of GPCRs is likely to significantly augment our ability to further employ these proteins in a multitude of diseases.
Collapse
Affiliation(s)
- Hanne Leysen
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Deborah Walter
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Bregje Christiaenssen
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Romi Vandoren
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - İrem Harputluoğlu
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
- Department of Chemistry, Middle East Technical University, Çankaya, Ankara 06800, Turkey
| | - Nore Van Loon
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Stuart Maudsley
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
- Correspondence:
| |
Collapse
|
7
|
Chen XG, Zhang W, Yang X, Li C, Chen H. ACP-DA: Improving the Prediction of Anticancer Peptides Using Data Augmentation. Front Genet 2021; 12:698477. [PMID: 34276801 PMCID: PMC8279753 DOI: 10.3389/fgene.2021.698477] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/07/2021] [Indexed: 12/09/2022] Open
Abstract
Anticancer peptides (ACPs) have provided a promising perspective for cancer treatment, and the prediction of ACPs is very important for the discovery of new cancer treatment drugs. It is time consuming and expensive to use experimental methods to identify ACPs, so computational methods for ACP identification are urgently needed. There have been many effective computational methods, especially machine learning-based methods, proposed for such predictions. Most of the current machine learning methods try to find suitable features or design effective feature learning techniques to accurately represent ACPs. However, the performance of these methods can be further improved for cases with insufficient numbers of samples. In this article, we propose an ACP prediction model called ACP-DA (Data Augmentation), which uses data augmentation for insufficient samples to improve the prediction performance. In our method, to better exploit the information of peptide sequences, peptide sequences are represented by integrating binary profile features and AAindex features, and then the samples in the training set are augmented in the feature space. After data augmentation, the samples are used to train the machine learning model, which is used to predict ACPs. The performance of ACP-DA exceeds that of existing methods, and ACP-DA achieves better performance in the prediction of ACPs compared with a method without data augmentation. The proposed method is available at http://github.com/chenxgscuec/ACPDA.
Collapse
Affiliation(s)
- Xian-Gan Chen
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China.,Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan, China
| | - Xiaofei Yang
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Chenhong Li
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Hengling Chen
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| |
Collapse
|
8
|
A chaotic viewpoint-based approach to solve haplotype assembly using hypergraph model. PLoS One 2020; 15:e0241291. [PMID: 33120403 PMCID: PMC7595403 DOI: 10.1371/journal.pone.0241291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 10/12/2020] [Indexed: 12/30/2022] Open
Abstract
Decreasing the cost of high-throughput DNA sequencing technologies, provides a huge amount of data that enables researchers to determine haplotypes for diploid and polyploid organisms. Although various methods have been developed to reconstruct haplotypes in diploid form, their accuracy is still a challenging task. Also, most of the current methods cannot be applied to polyploid form. In this paper, an iterative method is proposed, which employs hypergraph to reconstruct haplotype. The proposed method by utilizing chaotic viewpoint can enhance the obtained haplotypes. For this purpose, a haplotype set was randomly generated as an initial estimate, and its consistency with the input fragments was described by constructing a weighted hypergraph. Partitioning the hypergraph specifies those positions in the haplotype set that need to be corrected. This procedure is repeated until no further improvement could be achieved. Each element of the finalized haplotype set is mapped to a line by chaos game representation, and a coordinate series is defined based on the position of mapped points. Then, some positions with low qualities can be assessed by applying a local projection. Experimental results on both simulated and real datasets demonstrate that this method outperforms most other approaches, and is promising to perform the haplotype assembly.
Collapse
|
9
|
Spänig S, Heider D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min 2019; 12:7. [PMID: 30867681 PMCID: PMC6399931 DOI: 10.1186/s13040-019-0196-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 02/24/2019] [Indexed: 01/10/2023] Open
Abstract
Antimicrobial peptides (AMPs) are part of the inherent immune system. In fact, they occur in almost all organisms including, e.g., plants, animals, and humans. Remarkably, they show effectivity also against multi-resistant pathogens with a high selectivity. This is especially crucial in times, where society is faced with the major threat of an ever-increasing amount of antibiotic resistant microbes. In addition, AMPs can also exhibit antitumor and antiviral effects, thus a variety of scientific studies dealt with the prediction of active peptides in recent years. Due to their potential, even the pharmaceutical industry is keen on discovering and developing novel AMPs. However, AMPs are difficult to verify in vitro, hence researchers conduct sequence similarity experiments against known, active peptides. Unfortunately, this approach is very time-consuming and limits potential candidates to sequences with a high similarity to known AMPs. Machine learning methods offer the opportunity to explore the huge space of sequence variations in a timely manner. These algorithms have, in principal, paved the way for an automated discovery of AMPs. However, machine learning models require a numerical input, thus an informative encoding is very important. Unfortunately, developing an appropriate encoding is a major challenge, which has not been entirely solved so far. For this reason, the development of novel amino acid encodings is established as a stand-alone research branch. The present review introduces state-of-the-art encodings of amino acids as well as their properties in sequence and structure based aggregation. Moreover, albeit a well-chosen encoding is essential, performant classifiers are required, which is reflected by a tendency towards specifically designed models in the literature. Furthermore, we introduce these models with a particular focus on encodings derived from support vector machines and deep learning approaches. Albeit a strong focus has been set on AMP predictions, not all of the mentioned encodings have been elaborated as part of antimicrobial research studies, but rather as general protein or peptide representations.
Collapse
Affiliation(s)
- Sebastian Spänig
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| |
Collapse
|