1
|
Otesteanu CF, Caldelari R, Heussler V, Sznitman R. Machine learning for predicting Plasmodium liver stage development in vitro using microscopy imaging. Comput Struct Biotechnol J 2024; 24:334-342. [PMID: 38690550 PMCID: PMC11059334 DOI: 10.1016/j.csbj.2024.04.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/09/2024] [Accepted: 04/10/2024] [Indexed: 05/02/2024] Open
Abstract
Malaria, a significant global health challenge, is caused by Plasmodium parasites. The Plasmodium liver stage plays a pivotal role in the establishment of the infection. This study focuses on the liver stage development of the model organism Plasmodium berghei, employing fluorescent microscopy imaging and convolutional neural networks (CNNs) for analysis. Convolutional neural networks have been recently proposed as a viable option for tasks such as malaria detection, prediction of host-pathogen interactions, or drug discovery. Our research aimed to predict the transition of Plasmodium-infected liver cells to the merozoite stage, a key development phase, 15 hours in advance. We collected and analyzed hourly imaging data over a span of at least 38 hours from 400 sequences, encompassing 502 parasites. Our method was compared to human annotations to validate its efficacy. Performance metrics, including the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity, were evaluated on an independent test dataset. The outcomes revealed an AUC of 0.873, a sensitivity of 84.6%, and a specificity of 83.3%, underscoring the potential of our CNN-based framework to predict liver stage development of P. berghei. These findings not only demonstrate the feasibility of our methodology but also could potentially contribute to the broader understanding of parasite biology.
Collapse
Affiliation(s)
- Corin F. Otesteanu
- Artificial Intelligence in Medicine group, University of Bern, Switzerland
| | - Reto Caldelari
- Institute of Cell Biology, University of Bern, Switzerland
| | | | - Raphael Sznitman
- Artificial Intelligence in Medicine group, University of Bern, Switzerland
| |
Collapse
|
2
|
Suratanee A, Chutimanukul P, Saelao T, Chadchawan S, Buaboocha T, Plaimas K. Phenolic content discrimination in Thai holy basil using hyperspectral data analysis and machine learning techniques. PLoS One 2024; 19:e0309132. [PMID: 39356698 PMCID: PMC11446419 DOI: 10.1371/journal.pone.0309132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 08/05/2024] [Indexed: 10/04/2024] Open
Abstract
Hyperspectral imaging has emerged as a powerful tool for the non-destructive assessment of plant properties, including the quantification of phytochemical contents. Traditional methods for antioxidant analysis in holy basil (Ocimum tenuiflorum L.) are time-consuming, while hyperspectral imaging has the potential to rapidly observe holy basil. In this study, we employed hyperspectral imaging combined with machine learning techniques to determine the levels of total phenolic contents in Thai holy basil. Spectral data were acquired from 26 holy basil cultivars at different growth stages, and the total phenolic contents of the samples were measured. To extract the characteristics of the spectral data, we used 22 statistical features in both time and frequency domains. Relevant features were selected and combined with the corresponding total phenolic content values to develop a neural network model for classifying the phenolic content levels into 'low' and 'normal-to-high' categories. The neural network model demonstrated high performance, achieving an area under the receiver operating characteristic curve of 0.8113, highlighting its effectiveness in predicting phenolic content levels based on the spectral data. Comparative analysis with other machine learning techniques confirmed the superior performance of the neural network approach. Further investigation revealed that the model exhibited increased confidence in predicting the phenolic content levels of older holy basil samples. This study exhibits the potential of integrating hyperspectral imaging, feature extraction, and machine learning techniques for the rapid and non-destructive assessment of phenolic content levels in holy basil. The demonstrated effectiveness of this approach opens new possibilities for screening antioxidant properties in plants, facilitating efficient decision-making processes for researchers based on comprehensive spectral data.
Collapse
Affiliation(s)
- Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
- Intelligent and Nonlinear Dynamic Innovations Research Center, Science and Technology Research Institute, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
| | - Panita Chutimanukul
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Klong Luang, Thailand
| | - Tanapon Saelao
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
| | - Supachitra Chadchawan
- Center of Excellence in Environment and Plant Physiology (CEEPP), Department of Botany, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Omics Science and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Teerapong Buaboocha
- Omics Science and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence in Molecular Crop, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Kitiporn Plaimas
- Omics Science and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Advanced Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
3
|
Pan J, Zhang Z, Li Y, Yu J, You Z, Li C, Wang S, Zhu M, Ren F, Zhang X, Sun Y, Wang S. A microbial knowledge graph-based deep learning model for predicting candidate microbes for target hosts. Brief Bioinform 2024; 25:bbae119. [PMID: 38555472 PMCID: PMC10981679 DOI: 10.1093/bib/bbae119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/23/2024] [Accepted: 03/02/2024] [Indexed: 04/02/2024] Open
Abstract
Predicting interactions between microbes and hosts plays critical roles in microbiome population genetics and microbial ecology and evolution. How to systematically characterize the sophisticated mechanisms and signal interplay between microbes and hosts is a significant challenge for global health risks. Identifying microbe-host interactions (MHIs) can not only provide helpful insights into their fundamental regulatory mechanisms, but also facilitate the development of targeted therapies for microbial infections. In recent years, computational methods have become an appealing alternative due to the high risk and cost of wet-lab experiments. Therefore, in this study, we utilized rich microbial metagenomic information to construct a novel heterogeneous microbial network (HMN)-based model named KGVHI to predict candidate microbes for target hosts. Specifically, KGVHI first built a HMN by integrating human proteins, viruses and pathogenic bacteria with their biological attributes. Then KGVHI adopted a knowledge graph embedding strategy to capture the global topological structure information of the whole network. A natural language processing algorithm is used to extract the local biological attribute information from the nodes in HMN. Finally, we combined the local and global information and fed it into a blended deep neural network (DNN) for training and prediction. Compared to state-of-the-art methods, the comprehensive experimental results show that our model can obtain excellent results on the corresponding three MHI datasets. Furthermore, we also conducted two pathogenic bacteria case studies to further indicate that KGVHI has excellent predictive capabilities for potential MHI pairs.
Collapse
Affiliation(s)
- Jie Pan
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Zhen Zhang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Ying Li
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Jiaoyang Yu
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| | - Chenyu Li
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Shixu Wang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Minghui Zhu
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Fengzhi Ren
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Xuexia Zhang
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Yanmei Sun
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Shiwei Wang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, College of Life Sciences, Northwest University, Xi’an 710069, China
| |
Collapse
|
4
|
Tangmanussukum P, Kawichai T, Suratanee A, Plaimas K. Heterogeneous network propagation with forward similarity integration to enhance drug-target association prediction. PeerJ Comput Sci 2022; 8:e1124. [PMID: 36262151 PMCID: PMC9575853 DOI: 10.7717/peerj-cs.1124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 09/14/2022] [Indexed: 06/16/2023]
Abstract
Identification of drug-target interaction (DTI) is a crucial step to reduce time and cost in the drug discovery and development process. Since various biological data are publicly available, DTIs have been identified computationally. To predict DTIs, most existing methods focus on a single similarity measure of drugs and target proteins, whereas some recent methods integrate a particular set of drug and target similarity measures by a single integration function. Therefore, many DTIs are still missing. In this study, we propose heterogeneous network propagation with the forward similarity integration (FSI) algorithm, which systematically selects the optimal integration of multiple similarity measures of drugs and target proteins. Seven drug-drug and nine target-target similarity measures are applied with four distinct integration methods to finally create an optimal heterogeneous network model. Consequently, the optimal model uses the target similarity based on protein sequences and the fused drug similarity, which combines the similarity measures based on chemical structures, the Jaccard scores of drug-disease associations, and the cosine scores of drug-drug interactions. With an accuracy of 99.8%, this model significantly outperforms others that utilize different similarity measures of drugs and target proteins. In addition, the validation of the DTI predictions of this model demonstrates the ability of our method to discover missing potential DTIs.
Collapse
Affiliation(s)
- Piyanut Tangmanussukum
- Advanced Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Thitipong Kawichai
- Department of Mathematics and Computer Science, Academic Division, Chulachomklao Royal Military Academy, Nakhon Nayok, Thailand
| | - Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
- Intelligent and Nonlinear Dynamics Innovations Research Center, Science and Technology Research Institute, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
| | - Kitiporn Plaimas
- Advanced Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Omics Science and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
5
|
Kitsiranuwat S, Suratanee A, Plaimas K. Integration of various protein similarities using random forest technique to infer augmented drug-protein matrix for enhancing drug-disease association prediction. Sci Prog 2022; 105:368504221109215. [PMID: 35801312 PMCID: PMC10358641 DOI: 10.1177/00368504221109215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Identifying new therapeutic indications for existing drugs is a major challenge in drug repositioning. Most computational drug repositioning methods focus on known targets. Analyzing multiple aspects of various protein associations provides an opportunity to discover underlying drug-associated proteins that can be used to improve the performance of the drug repositioning approaches. In this study, machine learning models were developed based on the similarities of diversified biological features, including protein interaction, topological network, sequence alignment, and biological function to predict protein pairs associating with the same drugs. The crucial set of features was identified, and the high performances of protein pair predictions were achieved with an area under the curve (AUC) value of more than 93%. Based on drug chemical structures, the drug similarity levels of the promising protein pairs were used to quantify the inferred drug-associated proteins. Furthermore, these proteins were employed to establish an augmented drug-protein matrix to enhance the efficiency of three existing drug repositioning techniques: a similarity constrained matrix factorization for the drug-disease associations (SCMFDD), an ensemble meta-paths and singular value decomposition (EMP-SVD) model, and a topology similarity and singular value decomposition (TS-SVD) technique. The results showed that the augmented matrix helped to improve the performance up to 4% more in comparison to the original matrix for SCMFDD and EMP-SVD, and about 1% more for TS-SVD. In summary, inferring new protein pairs related to the same drugs increase the opportunity to reveal missing drug-associated proteins that are important for drug development via the drug repositioning technique.
Collapse
Affiliation(s)
- Satanat Kitsiranuwat
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Advanced Virtual and Intelligent Computing (AVIC) center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand
- Intelligent and Nonlinear Dynamic Innovations Research Center, Science and Technology Research Institute, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand
| | - Kitiporn Plaimas
- Advanced Virtual and Intelligent Computing (AVIC) center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Omics Sciences and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
6
|
Sagulkoo P, Chuntakaruk H, Rungrotmongkol T, Suratanee A, Plaimas K. Multi-Level Biological Network Analysis and Drug Repurposing Based on Leukocyte Transcriptomics in Severe COVID-19: In Silico Systems Biology to Precision Medicine. J Pers Med 2022; 12:jpm12071030. [PMID: 35887528 PMCID: PMC9319133 DOI: 10.3390/jpm12071030] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/16/2022] [Accepted: 06/20/2022] [Indexed: 01/08/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic causes many morbidity and mortality cases. Despite several developed vaccines and antiviral therapies, some patients experience severe conditions that need intensive care units (ICU); therefore, precision medicine is necessary to predict and treat these patients using novel biomarkers and targeted drugs. In this study, we proposed a multi-level biological network analysis framework to identify key genes via protein–protein interaction (PPI) network analysis as well as survival analysis based on differentially expressed genes (DEGs) in leukocyte transcriptomic profiles, discover novel biomarkers using microRNAs (miRNA) from regulatory network analysis, and provide candidate drugs targeting the key genes using drug–gene interaction network and structural analysis. The results show that upregulated DEGs were mainly enriched in cell division, cell cycle, and innate immune signaling pathways. Downregulated DEGs were primarily concentrated in the cellular response to stress, lysosome, glycosaminoglycan catabolic process, and mature B cell differentiation. Regulatory network analysis revealed that hsa-miR-6792-5p, hsa-let-7b-5p, hsa-miR-34a-5p, hsa-miR-92a-3p, and hsa-miR-146a-5p were predicted biomarkers. CDC25A, GUSB, MYBL2, and SDAD1 were identified as key genes in severe COVID-19. In addition, drug repurposing from drug–gene and drug–protein database searching and molecular docking showed that camptothecin and doxorubicin were candidate drugs interacting with the key genes. In conclusion, multi-level systems biology analysis plays an important role in precision medicine by finding novel biomarkers and targeted drugs based on key gene identification.
Collapse
Affiliation(s)
- Pakorn Sagulkoo
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand; (P.S.); (H.C.); (T.R.)
- Center of Biomedical Informatics, Department of Family Medicine, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Hathaichanok Chuntakaruk
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand; (P.S.); (H.C.); (T.R.)
- Center of Excellence in Biocatalyst and Sustainable Biotechnology Research Unit, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
| | - Thanyada Rungrotmongkol
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand; (P.S.); (H.C.); (T.R.)
- Center of Excellence in Biocatalyst and Sustainable Biotechnology Research Unit, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
| | - Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand;
- Intelligent and Nonlinear Dynamics Innovations Research Center, Science and Technology Research Institute, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand
| | - Kitiporn Plaimas
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand; (P.S.); (H.C.); (T.R.)
- Advance Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
- Omics Science and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
- Correspondence:
| |
Collapse
|
7
|
Sagulkoo P, Suratanee A, Plaimas K. Immune-Related Protein Interaction Network in Severe COVID-19 Patients toward the Identification of Key Proteins and Drug Repurposing. Biomolecules 2022; 12:biom12050690. [PMID: 35625619 PMCID: PMC9138873 DOI: 10.3390/biom12050690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 05/07/2022] [Accepted: 05/09/2022] [Indexed: 02/05/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19) is still an active global public health issue. Although vaccines and therapeutic options are available, some patients experience severe conditions and need critical care support. Hence, identifying key genes or proteins involved in immune-related severe COVID-19 is necessary to find or develop the targeted therapies. This study proposed a novel construction of an immune-related protein interaction network (IPIN) in severe cases with the use of a network diffusion technique on a human interactome network and transcriptomic data. Enrichment analysis revealed that the IPIN was mainly associated with antiviral, innate immune, apoptosis, cell division, and cell cycle regulation signaling pathways. Twenty-three proteins were identified as key proteins to find associated drugs. Finally, poly (I:C), mitomycin C, decitabine, gemcitabine, hydroxyurea, tamoxifen, and curcumin were the potential drugs interacting with the key proteins to heal severe COVID-19. In conclusion, IPIN can be a good representative network for the immune system that integrates the protein interaction network and transcriptomic data. Thus, the key proteins and target drugs in IPIN help to find a new treatment with the use of existing drugs to treat the disease apart from vaccination and conventional antiviral therapy.
Collapse
Affiliation(s)
- Pakorn Sagulkoo
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand;
- Center of Biomedical Informatics, Department of Family Medicine, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand;
- Intelligent and Nonlinear Dynamics Innovations Research Center, Science and Technology Research Institute, King Mongkut’s University of Technology North Bangkok, Bangkok 10800, Thailand
| | - Kitiporn Plaimas
- Advance Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
- Omics Science and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
- Correspondence:
| |
Collapse
|
8
|
Hu RS, Hesham AEL, Zou Q. Machine Learning and Its Applications for Protozoal Pathogens and Protozoal Infectious Diseases. Front Cell Infect Microbiol 2022; 12:882995. [PMID: 35573796 PMCID: PMC9097758 DOI: 10.3389/fcimb.2022.882995] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 03/28/2022] [Indexed: 12/24/2022] Open
Abstract
In recent years, massive attention has been attracted to the development and application of machine learning (ML) in the field of infectious diseases, not only serving as a catalyst for academic studies but also as a key means of detecting pathogenic microorganisms, implementing public health surveillance, exploring host-pathogen interactions, discovering drug and vaccine candidates, and so forth. These applications also include the management of infectious diseases caused by protozoal pathogens, such as Plasmodium, Trypanosoma, Toxoplasma, Cryptosporidium, and Giardia, a class of fatal or life-threatening causative agents capable of infecting humans and a wide range of animals. With the reduction of computational cost, availability of effective ML algorithms, popularization of ML tools, and accumulation of high-throughput data, it is possible to implement the integration of ML applications into increasing scientific research related to protozoal infection. Here, we will present a brief overview of important concepts in ML serving as background knowledge, with a focus on basic workflows, popular algorithms (e.g., support vector machine, random forest, and neural networks), feature extraction and selection, and model evaluation metrics. We will then review current ML applications and major advances concerning protozoal pathogens and protozoal infectious diseases through combination with correlative biology expertise and provide forward-looking insights for perspectives and opportunities in future advances in ML techniques in this field.
Collapse
Affiliation(s)
- Rui-Si Hu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Abd El-Latif Hesham
- Genetics Department, Faculty of Agriculture, Beni-Suef University, Beni-Suef, Egypt
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- *Correspondence: Quan Zou,
| |
Collapse
|
9
|
Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes. Int J Mol Sci 2021; 22:ijms221810019. [PMID: 34576183 PMCID: PMC8468833 DOI: 10.3390/ijms221810019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 12/15/2022] Open
Abstract
Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model's predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science.
Collapse
|