1
|
Zhang H, Jiao J, Zhao T, Zhao E, Li L, Li G, Zhang B, Qin QM. GERWR: Identifying the Key Pathogenicity- Associated sRNAs of Magnaporthe Oryzae Infection in Rice Based on Graph Embedding and Random Walk With Restart. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:227-239. [PMID: 38153818 DOI: 10.1109/tcbb.2023.3348080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
Rice blast, caused by Magnaporthe oryzae(M.oryzae), is a destructive rice disease that reduces rice yield by 10% to 30% annually. It also affects other cereal crops such as barley, wheat, rye, millet, sorghum, and maize. Small RNAs (sRNAs) play an essential regulatory role in fungus-plant interaction during the fungal invasion, but studies on pathogenic sRNAs during the fungal invasion of plants based on multi-omics data integration are rare. This paper proposes a novel approach called Graph Embedding combined with Random Walk with Restart (GERWR) to identify pathogenic sRNAs based on multi-omics data integration during M.oryzae invasion. By constructing a multi-omics network (MRMO), we identified 29 pathogenic sRNAs of rice blast fungus. Further analysis revealed that these sRNAs regulate rice genes in a many-to-many relationship, playing a significant regulatory role in the pathogenesis of rice blast disease. This paper explores the pathogenic factors of rice blast disease from the perspective of multi-omics data analysis, revealing the inherent connection between pathogenic factors of different omics. It has essential scientific significance for studying the pathogenic mechanism of rice blast fungus, the rice blast fungus-rice model system, and the pathogen-host interaction in related fields.
Collapse
|
2
|
Zhao E, Dong L, Zhao H, Zhang H, Zhang T, Yuan S, Jiao J, Chen K, Sheng J, Yang H, Wang P, Li G, Qin Q. A Relationship Prediction Method for Magnaporthe oryzae-Rice Multi-Omics Data Based on WGCNA and Graph Autoencoder. J Fungi (Basel) 2023; 9:1007. [PMID: 37888263 PMCID: PMC10607591 DOI: 10.3390/jof9101007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/02/2023] [Accepted: 10/07/2023] [Indexed: 10/28/2023] Open
Abstract
Magnaporthe oryzae Oryzae (MoO) pathotype is a devastating fungal pathogen of rice; however, its pathogenic mechanism remains poorly understood. The current research is primarily focused on single-omics data, which is insufficient to capture the complex cross-kingdom regulatory interactions between MoO and rice. To address this limitation, we proposed a novel method called Weighted Gene Autoencoder Multi-Omics Relationship Prediction (WGAEMRP), which combines weighted gene co-expression network analysis (WGCNA) and graph autoencoder to predict the relationship between MoO-rice multi-omics data. We applied WGAEMRP to construct a MoO-rice multi-omics heterogeneous interaction network, which identified 18 MoO small RNAs (sRNAs), 17 rice genes, 26 rice mRNAs, and 28 rice proteins among the key biomolecules. Most of the mined functional modules and enriched pathways were related to gene expression, protein composition, transportation, and metabolic processes, reflecting the infection mechanism of MoO. Compared to previous studies, WGAEMRP significantly improves the efficiency and accuracy of multi-omics data integration and analysis. This approach lays out a solid data foundation for studying the biological process of MoO infecting rice, refining the regulatory network of pathogenic markers, and providing new insights for developing disease-resistant rice varieties.
Collapse
Affiliation(s)
- Enshuang Zhao
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (E.Z.); (L.D.); (H.Z.); (T.Z.); (J.J.); (K.C.); (J.S.)
| | - Liyan Dong
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (E.Z.); (L.D.); (H.Z.); (T.Z.); (J.J.); (K.C.); (J.S.)
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Hengyi Zhao
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (E.Z.); (L.D.); (H.Z.); (T.Z.); (J.J.); (K.C.); (J.S.)
| | - Hao Zhang
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (E.Z.); (L.D.); (H.Z.); (T.Z.); (J.J.); (K.C.); (J.S.)
- College of Software, Jilin University, Changchun 130012, China; (S.Y.); (H.Y.); (P.W.)
| | - Tianyue Zhang
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (E.Z.); (L.D.); (H.Z.); (T.Z.); (J.J.); (K.C.); (J.S.)
| | - Shuai Yuan
- College of Software, Jilin University, Changchun 130012, China; (S.Y.); (H.Y.); (P.W.)
| | - Jiao Jiao
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (E.Z.); (L.D.); (H.Z.); (T.Z.); (J.J.); (K.C.); (J.S.)
| | - Kang Chen
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (E.Z.); (L.D.); (H.Z.); (T.Z.); (J.J.); (K.C.); (J.S.)
| | - Jianhua Sheng
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (E.Z.); (L.D.); (H.Z.); (T.Z.); (J.J.); (K.C.); (J.S.)
| | - Hongbo Yang
- College of Software, Jilin University, Changchun 130012, China; (S.Y.); (H.Y.); (P.W.)
| | - Pengyu Wang
- College of Software, Jilin University, Changchun 130012, China; (S.Y.); (H.Y.); (P.W.)
| | - Guihua Li
- College of Plant Science, Key Laboratory of Zoonosis Research, Ministry of Education, Jilin University, Changchun 130012, China;
| | - Qingming Qin
- Department of Molecular Microbiology and Immunology, School of Medicine, University of Missouri, Columbia, MI 65211-7310, USA;
| |
Collapse
|
3
|
Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023; 23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]
Abstract
Enhancers are non-coding DNA elements that play a crucial role in enhancing the transcription rate of a specific gene in the genome. Experiments for identifying enhancers can be restricted by their conditions and involve complicated, time-consuming, laborious, and costly steps. To overcome these challenges, computational platforms have been developed to complement experimental methods that enable high-throughput identification of enhancers. Over the last few years, the development of various enhancer computational tools has resulted in significant progress in predicting putative enhancers. Thus, researchers are now able to use a variety of strategies to enhance and advance enhancer study. In this review, an overview of machine learning (ML)-based prediction methods for enhancer identification and related databases has been provided. The existing enhancer-prediction methods have also been reviewed regarding their algorithms, feature selection processes, validation techniques, and software utility. In addition, the advantages and drawbacks of these ML approaches and guidelines for developing bioinformatic tools have been highlighted for a more efficient enhancer prediction. This review will serve as a useful resource for experimentalists in selecting the appropriate ML tool for their study, and for bioinformaticians in developing more accurate and advanced ML-based predictors.
Collapse
Affiliation(s)
- Le Thi Phan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Changmin Oh
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| |
Collapse
|
4
|
Niu M, Lin Y, Zou Q. sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks. PLANT MOLECULAR BIOLOGY 2021; 105:483-495. [PMID: 33385273 DOI: 10.1007/s11103-020-01102-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 12/01/2020] [Indexed: 06/12/2023]
Abstract
KEY MESSAGE We proposed an ensemble convolutional neural network model to identify sgRNA high on-target activity in four crops and we used one-hot encoding and k-mers for sequence encoding. As an important component of the CRISPR/Cas9 system, single-guide RNA (sgRNA) plays an important role in gene redirection and editing. sgRNA has played an important role in the improvement of agronomic species, but there is a lack of effective bioinformatics tools to identify the activity of sgRNA in agronomic species. Therefore, it is necessary to develop a method based on machine learning to identify sgRNA high on-target activity. In this work, we proposed a simple convolutional neural network method to identify sgRNA high on-target activity. Our study used one-hot encoding and k-mers for sequence data conversion and a voting algorithm for constructing the convolutional neural network ensemble model sgRNACNN for the prediction of sgRNA activity. The ensemble model sgRNACNN was used for predictions in four crops: Glycine max, Zea mays, Sorghum bicolor and Triticum aestivum. The accuracy rates of the four crops in the sgRNACNN model were 82.43%, 80.33%, 78.25% and 87.49%, respectively. The experimental results showed that sgRNACNN realizes the identification of high on-target activity sgRNA of agronomic data and can meet the demands of sgRNA activity prediction in agronomy to a certain extent. These results have certain significance for guiding crop gene editing and academic research. The source code and relevant dataset can be found in the following link: https://github.com/nmt315320/sgRNACNN.git .
Collapse
Affiliation(s)
- Mengting Niu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuan Lin
- Department of System Integration, Sparebanken Vest, Bergen, Norway.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
5
|
Wang C, Sun K, Wang J, Guo M. Data fusion-based algorithm for predicting miRNA–Disease associations. Comput Biol Chem 2020; 88:107357. [DOI: 10.1016/j.compbiolchem.2020.107357] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 07/24/2020] [Accepted: 08/05/2020] [Indexed: 11/30/2022]
|
6
|
Affiliation(s)
- Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China
| |
Collapse
|