1
|
Wang H, Yan S, Wang W, Chen Y, Hong J, He Q, Diao X, Lin Y, Chen Y, Cao Y, Guo W, Fang W. Cropformer: An interpretable deep learning framework for crop genomic prediction. PLANT COMMUNICATIONS 2025; 6:101223. [PMID: 39690739 PMCID: PMC11956090 DOI: 10.1016/j.xplc.2024.101223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 10/15/2024] [Accepted: 12/12/2024] [Indexed: 12/19/2024]
Abstract
Machine learning and deep learning are extensively employed in genomic selection (GS) to expedite the identification of superior genotypes and accelerate breeding cycles. However, a significant challenge with current data-driven deep learning models in GS lies in their low robustness and poor interpretability. To address these challenges, we developed Cropformer, a deep learning framework for predicting crop phenotypes and exploring downstream tasks. This framework combines convolutional neural networks with multiple self-attention mechanisms to improve accuracy. The ability of Cropformer to predict complex phenotypic traits was extensively evaluated on more than 20 traits across five major crops: maize, rice, wheat, foxtail millet, and tomato. Evaluation results show that Cropformer outperforms other GS methods in both precision and robustness, achieving up to a 7.5% improvement in prediction accuracy compared to the runner-up model. Additionally, Cropformer enhances the analysis and mining of genes associated with traits. We identified numerous single nucleotide polymorphisms (SNPs) with potential effects on maize phenotypic traits and revealed key genetic variations underlying these differences. Cropformer represents a significant advancement in predictive performance and gene identification, providing a powerful general tool for improving genomic design in crop breeding. Cropformer is freely accessible at https://cgris.net/cropformer.
Collapse
Affiliation(s)
- Hao Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shen Yan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Wenxi Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yongming Chen
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China; State Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Shandong 261325, China
| | - Jingpeng Hong
- College of Information and Management Science, Henan Agricultural University, Zhengzhou 450002, China
| | - Qiang He
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Xianmin Diao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yunan Lin
- School of Engineering and Design, Technical University Munich, 85521 Munich, Germany
| | - Yanqing Chen
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yongsheng Cao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Weilong Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| | - Wei Fang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| |
Collapse
|
2
|
Liu T, Qiao H, Wang Z, Yang X, Pan X, Yang Y, Ye X, Sakurai T, Lin H, Zhang Y. CodLncScape Provides a Self-Enriching Framework for the Systematic Collection and Exploration of Coding LncRNAs. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400009. [PMID: 38602457 PMCID: PMC11165466 DOI: 10.1002/advs.202400009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 03/19/2024] [Indexed: 04/12/2024]
Abstract
Recent studies have revealed that numerous lncRNAs can translate proteins under specific conditions, performing diverse biological functions, thus termed coding lncRNAs. Their comprehensive landscape, however, remains elusive due to this field's preliminary and dispersed nature. This study introduces codLncScape, a framework for coding lncRNA exploration consisting of codLncDB, codLncFlow, codLncWeb, and codLncNLP. Specifically, it contains a manually compiled knowledge base, codLncDB, encompassing 353 coding lncRNA entries validated by experiments. Building upon codLncDB, codLncFlow investigates the expression characteristics of these lncRNAs and their diagnostic potential in the pan-cancer context, alongside their association with spermatogenesis. Furthermore, codLncWeb emerges as a platform for storing, browsing, and accessing knowledge concerning coding lncRNAs within various programming environments. Finally, codLncNLP serves as a knowledge-mining tool to enhance the timely content inclusion and updates within codLncDB. In summary, this study offers a well-functioning, content-rich ecosystem for coding lncRNA research, aiming to accelerate systematic studies in this field.
Collapse
Affiliation(s)
- Tianyuan Liu
- Tsukuba Life Science Innovation ProgramUniversity of TsukubaTsukuba3058577Japan
| | - Huiyuan Qiao
- Innovative Institute of Chinese Medicine and PharmacyAcademy for InterdisciplineChengdu University of Traditional Chinese MedicineChengdu611137China
| | - Zixu Wang
- Department of Computer ScienceUniversity of TsukubaTsukuba3058577Japan
| | - Xinyan Yang
- Department of Developmental BiologySchool of Basic Medical SciencesSouthern Medical UniversityGuangzhou510515China
| | - Xianrun Pan
- Innovative Institute of Chinese Medicine and PharmacyAcademy for InterdisciplineChengdu University of Traditional Chinese MedicineChengdu611137China
| | - Yu Yang
- School of Healthcare TechnologyChengdu Neusoft UniversityChengdu611844China
| | - Xiucai Ye
- Tsukuba Life Science Innovation ProgramUniversity of TsukubaTsukuba3058577Japan
- Department of Computer ScienceUniversity of TsukubaTsukuba3058577Japan
| | - Tetsuya Sakurai
- Tsukuba Life Science Innovation ProgramUniversity of TsukubaTsukuba3058577Japan
- Department of Computer ScienceUniversity of TsukubaTsukuba3058577Japan
| | - Hao Lin
- School of Life Science and TechnologyUniversity of Electronic Science and Technology of ChinaChengdu611731China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and PharmacyAcademy for InterdisciplineChengdu University of Traditional Chinese MedicineChengdu611137China
| |
Collapse
|
3
|
Zhang S, Zhao Y, Liang Y. AACFlow: an end-to-end model based on attention augmented convolutional neural network and flow-attention mechanism for identification of anticancer peptides. Bioinformatics 2024; 40:btae142. [PMID: 38452348 PMCID: PMC10973939 DOI: 10.1093/bioinformatics/btae142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/09/2024] Open
Abstract
MOTIVATION Anticancer peptides (ACPs) have natural cationic properties and can act on the anionic cell membrane of cancer cells to kill cancer cells. Therefore, ACPs have become a potential anticancer drug with good research value and prospect. RESULTS In this article, we propose AACFlow, an end-to-end model for identification of ACPs based on deep learning. End-to-end models have more room to automatically adjust according to the data, making the overall fit better and reducing error propagation. The combination of attention augmented convolutional neural network (AAConv) and multi-layer convolutional neural network (CNN) forms a deep representation learning module, which is used to obtain global and local information on the sequence. Based on the concept of flow network, multi-head flow-attention mechanism is introduced to mine the deep features of the sequence to improve the efficiency of the model. On the independent test dataset, the ACC, Sn, Sp, and AUC values of AACFlow are 83.9%, 83.0%, 84.8%, and 0.892, respectively, which are 4.9%, 1.5%, 8.0%, and 0.016 higher than those of the baseline model. The MCC value is 67.85%. In addition, we visualize the features extracted by each module to enhance the interpretability of the model. Various experiments show that our model is more competitive in predicting ACPs.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Ya Zhao
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Yunyun Liang
- School of Science, Xi’an Polytechnic University, Xi'an 710048, China
| |
Collapse
|
4
|
He S, Ye X, Dou L, Sakurai T. FIAMol-AB: A feature fusion and attention-based deep learning method for enhanced antibiotic discovery. Comput Biol Med 2024; 168:107762. [PMID: 38056212 DOI: 10.1016/j.compbiomed.2023.107762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 10/31/2023] [Accepted: 11/21/2023] [Indexed: 12/08/2023]
Abstract
Antibiotic resistance continues to be a growing concern for global health, accentuating the need for novel antibiotic discoveries. Traditional methodologies in this field have relied heavily on extensive experimental screening, which is often time-consuming and costly. Contrastly, computer-assisted drug screening offers rapid, cost-effective solutions. In this work, we propose FIAMol-AB, a deep learning model that combines graph neural networks, text convolutional networks and molecular fingerprint techniques. This method also combines an attention mechanism to fuse multiple forms of information within the model. The experiments show that FIAMol-AB may offer potential advantages in antibiotic discovery tasks over some existing methods. We conducted some analysis based on our model's results, which help highlight the potential significance of certain features in the model's predictive performance. Compared to different models, ours demonstrate promising results, indicating potential robustness and versatility. This suggests that by integrating multi-view information and attention mechanisms, FIAMol-AB might better learn complex molecular structures, potentially improving the precision and efficiency of antibiotic discovery. We hope our FIAMol-AB can be used as a useful method in the ongoing fight against antibiotic resistance.
Collapse
Affiliation(s)
- Shida He
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan.
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH, 44106, USA
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan
| |
Collapse
|
5
|
Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review. BIOLOGY 2023; 12:1033. [PMID: 37508462 PMCID: PMC10376273 DOI: 10.3390/biology12071033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/18/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
Collapse
Affiliation(s)
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea;
| |
Collapse
|
6
|
Kim Y, Lee M. Deep Learning Approaches for lncRNA-Mediated Mechanisms: A Comprehensive Review of Recent Developments. Int J Mol Sci 2023; 24:10299. [PMID: 37373445 DOI: 10.3390/ijms241210299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 06/17/2023] [Indexed: 06/29/2023] Open
Abstract
This review paper provides an extensive analysis of the rapidly evolving convergence of deep learning and long non-coding RNAs (lncRNAs). Considering the recent advancements in deep learning and the increasing recognition of lncRNAs as crucial components in various biological processes, this review aims to offer a comprehensive examination of these intertwined research areas. The remarkable progress in deep learning necessitates thoroughly exploring its latest applications in the study of lncRNAs. Therefore, this review provides insights into the growing significance of incorporating deep learning methodologies to unravel the intricate roles of lncRNAs. By scrutinizing the most recent research spanning from 2021 to 2023, this paper provides a comprehensive understanding of how deep learning techniques are employed in investigating lncRNAs, thereby contributing valuable insights to this rapidly evolving field. The review is aimed at researchers and practitioners looking to integrate deep learning advancements into their lncRNA studies.
Collapse
Affiliation(s)
- Yoojoong Kim
- School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|