1
|
Wang Z, Yuan H, Yan J, Liu J. Identification, characterization, and design of plant genome sequences using deep learning. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2025; 121:e17190. [PMID: 39666835 DOI: 10.1111/tpj.17190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 11/11/2024] [Accepted: 11/23/2024] [Indexed: 12/14/2024]
Abstract
Due to its excellent performance in processing large amounts of data and capturing complex non-linear relationships, deep learning has been widely applied in many fields of plant biology. Here we first review the application of deep learning in analyzing genome sequences to predict gene expression, chromatin interactions, and epigenetic features (open chromatin, transcription factor binding sites, and methylation sites) in plants. Then, current motif mining and functional component design and synthesis based on generative adversarial networks, large models, and attention mechanisms are elaborated in detail. The progress of protein structure and function prediction, genomic prediction, and large model applications based on deep learning is also discussed. Finally, this work provides prospects for the future development of deep learning in plants with regard to multiple omics data, algorithm optimization, large language models, sequence design, and intelligent breeding.
Collapse
Affiliation(s)
- Zhenye Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hao Yuan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Hongshan Laboratory, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Hongshan Laboratory, Wuhan, 430070, China
| |
Collapse
|
2
|
Jyoti, Ritu, Gupta S, Shankar R. Comprehensive analysis of computational approaches in plant transcription factors binding regions discovery. Heliyon 2024; 10:e39140. [PMID: 39640721 PMCID: PMC11620080 DOI: 10.1016/j.heliyon.2024.e39140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/23/2024] [Accepted: 10/08/2024] [Indexed: 12/07/2024] Open
Abstract
Transcription factors (TFs) are regulatory proteins which bind to a specific DNA region known as the transcription factor binding regions (TFBRs) to regulate the rate of transcription process. The identification of TFBRs has been made possible by a number of experimental and computational techniques established during the past few years. The process of TFBR identification involves peak identification in the binding data, followed by the identification of motif characteristics. Using the same binding data attempts have been made to raise computational models to identify such binding regions which could save time and resources spent for binding experiments. These computational approaches depend a lot on what way they learn and how. These existing computational approaches are skewed heavily around human TFBRs discovery, while plants have drastically different genomic setup for regulation which these approaches have grossly ignored. Here, we provide a comprehensive study of the current state of the matters in plant specific TF discovery algorithms. While doing so, we encountered several software tools' issues rendering the tools not useable to researches. We fixed them and have also provided the corrected scripts for such tools. We expect this study to serve as a guide for better understanding of software tools' approaches for plant specific TFBRs discovery and the care to be taken while applying them, especially during cross-species applications. The corrected scripts of these software tools are made available at https://github.com/SCBB-LAB/Comparative-analysis-of-plant-TFBS-software.
Collapse
Affiliation(s)
- Jyoti
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC Supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, (HP), 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Ritu
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC Supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, (HP), 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Sagar Gupta
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC Supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, (HP), 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Ravi Shankar
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC Supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, (HP), 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| |
Collapse
|
3
|
Wang Z, Peng Y, Li J, Li J, Yuan H, Yang S, Ding X, Xie A, Zhang J, Wang S, Li K, Shi J, Xing G, Shi W, Yan J, Liu J. DeepCBA: A deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions. PLANT COMMUNICATIONS 2024; 5:100985. [PMID: 38859587 DOI: 10.1016/j.xplc.2024.100985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/25/2024] [Accepted: 06/05/2024] [Indexed: 06/12/2024]
Abstract
Chromatin interactions create spatial proximity between distal regulatory elements and target genes in the genome, which has an important impact on gene expression, transcriptional regulation, and phenotypic traits. To date, several methods have been developed for predicting gene expression. However, existing methods do not take into consideration the effect of chromatin interactions on target gene expression, thus potentially reducing the accuracy of gene expression prediction and mining of important regulatory elements. In this study, we developed a highly accurate deep learning-based gene expression prediction model (DeepCBA) based on maize chromatin interaction data. Compared with existing models, DeepCBA exhibits higher accuracy in expression classification and expression value prediction. The average Pearson correlation coefficients (PCCs) for predicting gene expression using gene promoter proximal interactions, proximal-distal interactions, and both proximal and distal interactions were 0.818, 0.625, and 0.929, respectively, representing an increase of 0.357, 0.16, and 0.469 over the PCCs obtained with traditional methods that use only gene proximal sequences. Some important motifs were identified through DeepCBA; they were enriched in open chromatin regions and expression quantitative trait loci and showed clear tissue specificity. Importantly, experimental results for the maize flowering-related gene ZmRap2.7 and the tillering-related gene ZmTb1 demonstrated the feasibility of DeepCBA for exploration of regulatory elements that affect gene expression. Moreover, promoter editing and verification of two reported genes (ZmCLE7 and ZmVTE4) demonstrated the utility of DeepCBA for the precise design of gene expression and even for future intelligent breeding. DeepCBA is available at http://www.deepcba.com/ or http://124.220.197.196/.
Collapse
Affiliation(s)
- Zhenye Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yong Peng
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Jie Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jiying Li
- Microsoft Corporation, Redmond, WA 98052, USA
| | - Hao Yuan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Shangpo Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinru Ding
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ao Xie
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jiangling Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Shouzhe Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China; WIMI Biotechnology Co., Ltd., Changzhou 213000, China
| | - Keqin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jiaqi Shi
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Guangjie Xing
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weihan Shi
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China.
| |
Collapse
|
4
|
Gupta S, Kesarwani V, Bhati U, Jyoti, Shankar R. PTFSpot: deep co-learning on transcription factors and their binding regions attains impeccable universality in plants. Brief Bioinform 2024; 25:bbae324. [PMID: 39013383 PMCID: PMC11250369 DOI: 10.1093/bib/bbae324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/07/2024] [Accepted: 06/19/2024] [Indexed: 07/18/2024] Open
Abstract
Unlike animals, variability in transcription factors (TFs) and their binding regions (TFBRs) across the plants species is a major problem that most of the existing TFBR finding software fail to tackle, rendering them hardly of any use. This limitation has resulted into underdevelopment of plant regulatory research and rampant use of Arabidopsis-like model species, generating misleading results. Here, we report a revolutionary transformers-based deep-learning approach, PTFSpot, which learns from TF structures and their binding regions' co-variability to bring a universal TF-DNA interaction model to detect TFBR with complete freedom from TF and species-specific models' limitations. During a series of extensive benchmarking studies over multiple experimentally validated data, it not only outperformed the existing software by >30% lead but also delivered consistently >90% accuracy even for those species and TF families that were never encountered during the model-building process. PTFSpot makes it possible now to accurately annotate TFBRs across any plant genome even in the total lack of any TF information, completely free from the bottlenecks of species and TF-specific models.
Collapse
Affiliation(s)
- Sagar Gupta
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Veerbhan Kesarwani
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Umesh Bhati
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Jyoti
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Ravi Shankar
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| |
Collapse
|
5
|
Yaschenko AE, Alonso JM, Stepanova AN. Arabidopsis as a model for translational research. THE PLANT CELL 2024:koae065. [PMID: 38411602 DOI: 10.1093/plcell/koae065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/26/2024] [Accepted: 01/26/2024] [Indexed: 02/28/2024]
Abstract
Arabidopsis thaliana is currently the most-studied plant species on earth, with an unprecedented number of genetic, genomic, and molecular resources having been generated in this plant model. In the era of translating foundational discoveries to crops and beyond, we aimed to highlight the utility and challenges of using Arabidopsis as a reference for applied plant biology research, agricultural innovation, biotechnology, and medicine. We hope that this review will inspire the next generation of plant biologists to continue leveraging Arabidopsis as a robust and convenient experimental system to address fundamental and applied questions in biology. We aim to encourage lab and field scientists alike to take advantage of the vast Arabidopsis datasets, annotations, germplasm, constructs, methods, molecular and computational tools in our pursuit to advance understanding of plant biology and help feed the world's growing population. We envision that the power of Arabidopsis-inspired biotechnologies and foundational discoveries will continue to fuel the development of resilient, high-yielding, nutritious plants for the betterment of plant and animal health and greater environmental sustainability.
Collapse
Affiliation(s)
- Anna E Yaschenko
- Department of Plant and Microbial Biology, Genetics and Genomics Academy, North Carolina State University, Raleigh, NC 27695, USA
| | - Jose M Alonso
- Department of Plant and Microbial Biology, Genetics and Genomics Academy, North Carolina State University, Raleigh, NC 27695, USA
| | - Anna N Stepanova
- Department of Plant and Microbial Biology, Genetics and Genomics Academy, North Carolina State University, Raleigh, NC 27695, USA
| |
Collapse
|
6
|
Cheng H, Liu L, Zhou Y, Deng K, Ge Y, Hu X. TSPTFBS 2.0: trans-species prediction of transcription factor binding sites and identification of their core motifs in plants. FRONTIERS IN PLANT SCIENCE 2023; 14:1175837. [PMID: 37229121 PMCID: PMC10203575 DOI: 10.3389/fpls.2023.1175837] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 04/13/2023] [Indexed: 05/27/2023]
Abstract
Introduction An emerging approach using promoter tiling deletion via genome editing is beginning to become popular in plants. Identifying the precise positions of core motifs within plant gene promoter is of great demand but they are still largely unknown. We previously developed TSPTFBS of 265 Arabidopsis transcription factor binding sites (TFBSs) prediction models, which now cannot meet the above demand of identifying the core motif. Methods Here, we additionally introduced 104 maize and 20 rice TFBS datasets and utilized DenseNet for model construction on a large-scale dataset of a total of 389 plant TFs. More importantly, we combined three biological interpretability methods including DeepLIFT, in-silico tiling deletion, and in-silico mutagenesis to identify the potential core motifs of any given genomic region. Results For the results, DenseNet not only has achieved greater predictability than baseline methods such as LS-GKM and MEME for above 389 TFs from Arabidopsis, maize and rice, but also has greater performance on trans-species prediction of a total of 15 TFs from other six plant species. A motif analysis based on TF-MoDISco and global importance analysis (GIA) further provide the biological implication of the core motif identified by three interpretability methods. Finally, we developed a pipeline of TSPTFBS 2.0, which integrates 389 DenseNet-based models of TF binding and the above three interpretability methods. Discussion TSPTFBS 2.0 was implemented as a user-friendly web-server (http://www.hzau-hulab.com/TSPTFBS/), which can support important references for editing targets of any given plant promoters and it has great potentials to provide reliable editing target of genetic screen experiments in plants.
Collapse
|
7
|
Deep learning in regulatory genomics: from identification to design. Curr Opin Biotechnol 2023; 79:102887. [PMID: 36640453 DOI: 10.1016/j.copbio.2022.102887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/12/2022] [Accepted: 12/14/2022] [Indexed: 01/14/2023]
Abstract
Genomics and deep learning are a natural match since both are data-driven fields. Regulatory genomics refers to functional noncoding DNA regulating gene expression. In recent years, deep learning applications on regulatory genomics have achieved remarkable advances so-much-so that it has revolutionized the rules of the game of the computational methods in this field. Here, we review two emerging trends: (i) the modeling of very long input sequence (up to 200 kb), which requires self-matched modularization of model architecture; (ii) on the balance of model predictability and model interpretability because the latter is more able to meet biological demands. Finally, we discuss how to employ these two routes to design synthetic regulatory DNA, as a promising strategy for optimizing crop agronomic properties.
Collapse
|
8
|
Yan W, Li Z, Pian C, Wu Y. PlantBind: an attention-based multi-label neural network for predicting plant transcription factor binding sites. Brief Bioinform 2022; 23:6713513. [PMID: 36155619 DOI: 10.1093/bib/bbac425] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 08/29/2022] [Accepted: 08/31/2022] [Indexed: 12/14/2022] Open
Abstract
Identification of transcription factor binding sites (TFBSs) is essential to understanding of gene regulation. Designing computational models for accurate prediction of TFBSs is crucial because it is not feasible to experimentally assay all transcription factors (TFs) in all sequenced eukaryotic genomes. Although many methods have been proposed for the identification of TFBSs in humans, methods designed for plants are comparatively underdeveloped. Here, we present PlantBind, a method for integrated prediction and interpretation of TFBSs based on DNA sequences and DNA shape profiles. Built on an attention-based multi-label deep learning framework, PlantBind not only simultaneously predicts the potential binding sites of 315 TFs, but also identifies the motifs bound by transcription factors. During the training process, this model revealed a strong similarity among TF family members with respect to target binding sequences. Trans-species prediction performance using four Zea mays TFs demonstrated the suitability of this model for transfer learning. Overall, this study provides an effective solution for identifying plant TFBSs, which will promote greater understanding of transcriptional regulatory mechanisms in plants.
Collapse
Affiliation(s)
| | - Zutan Li
- Nanjing Agricultur al University
| | - Cong Pian
- College of Sciences at Nanjing Agricultural University
| | - Yufeng Wu
- State Key Laboratory for Crop Genetics and Germplasm Enhancement, Bioinformatics Center, College of Agriculture, Academy for Advanced Interdisciplinary Studies at Nanjing Agricultural University
| |
Collapse
|
9
|
Wang M, Wang J. Non-coding RNA expression analysis revealed the molecular mechanism of flag leaf heterosis in inter-subspecific hybrid rice. FRONTIERS IN PLANT SCIENCE 2022; 13:990656. [PMID: 36226282 PMCID: PMC9549252 DOI: 10.3389/fpls.2022.990656] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 08/31/2022] [Indexed: 06/16/2023]
Abstract
Heterosis has been used widespread in agriculture, but its molecular mechanism is inadequately understood. Plants have a large number of non-coding RNAs (ncRNAs), among them, functional ncRNAs that have been studied widely containing long non-coding RNA (lncRNA) and circular RNA (circRNA) that play a role in varied biological processes, as well as microRNA (miRNA), which can not only regulate the post-transcriptional expression of target genes, but also target lncRNA and circRNA then participate the competing endogenous RNA (ceRNA) regulatory network. However, the influence of these three ncRNAs and their regulatory relationships on heterosis is unknown in rice. In this study, the expression profile of ncRNAs and the ncRNA regulatory network related to heterosis were comprehensively analyzed in inter-subspecific hybrid rice. A total of 867 miRNAs, 3,278 lncRNAs and 2,521 circRNAs were identified in the hybrid and its parents. Analysis of the global profiles of these three types of ncRNAs indicated that significant differences existed in the distribution and sequence characteristics of the corresponding genes. The numbers of miRNA and lncRNA in hybrid were higher than those in its parents. A total of 784 ncRNAs (169 miRNAs, 573 lncRNAs and 42 circRNAs) showed differentially expressed in the hybrid, and their target/host genes were vital in stress tolerance, growth and development in rice. These discoveries suggested that the expression plasticity of ncRNA has an important role of inter-subspecific hybrid rice heterosis. It is worth mentioning that miRNAs exhibited substantially more variations between hybrid and parents compared with observed variation for lncRNA and circRNA. Non-additive expression ncRNAs and allele-specific expression genes-related ncRNAs in hybrid were provided in this study, and multiple sets of ncRNA regulatory networks closely related to heterosis were obtained. Meanwhile, heterosis-related regulatory networks of ceRNA (lncRNA and circRNA) and miRNA were also demonstrated.
Collapse
|
10
|
Ruengsrichaiya B, Nukoolkit C, Kalapanulak S, Saithong T. Plant-DTI: Extending the landscape of TF protein and DNA interaction in plants by a machine learning-based approach. FRONTIERS IN PLANT SCIENCE 2022; 13:970018. [PMID: 36082286 PMCID: PMC9445498 DOI: 10.3389/fpls.2022.970018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 08/01/2022] [Indexed: 06/15/2023]
Abstract
As a sessile organism, plants hold elaborate transcriptional regulatory systems that allow them to adapt to variable surrounding environments. Current understanding of plant regulatory mechanisms is greatly constrained by limited knowledge of transcription factor (TF)-DNA interactions. To mitigate this problem, a Plant-DTI predictor (Plant DBD-TFBS Interaction) was developed here as the first machine-learning model that covered the largest experimental datasets of 30 plant TF families, including 7 plant-specific DNA binding domain (DBD) types, and their transcription factor binding sites (TFBSs). Plant-DTI introduced a novel TFBS feature construction, called TFBS base-preference, which enhanced the specificity of TFBS to DBD types. The proposed model showed better predictive performance with the TFBS base-preference than the simple binary representation. Plant-DTI was validated with 22 independent ChIP-seq datasets. It accurately predicted the measured DBD-TFBS pairs along with their TFBS motifs, and effectively predicted interactions of other TFs containing similar DBD types. Comparing to the existing state-of-art methods, Plant-DTI prediction showed a figure of merit in sensitivity and specificity with respect to the position weight matrix (PWM) and TSPTFBS methods. Finally, the proposed Plant-DTI model helped to fill the knowledge gap in the regulatory mechanisms of the cassava sucrose synthase 1 gene (MeSUS1). Plant-DTI predicted MeERF72 as a regulator of MeSUS1 in consistence with the yeast one-hybrid (Y1H) experiment. Taken together, Plant-DTI would help facilitate the prediction of TF-TFBS and TF-target gene (TG) interactions, thereby accelerating the study of transcriptional regulatory systems in plant species.
Collapse
Affiliation(s)
- Bhukrit Ruengsrichaiya
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology and School of Information Technology, King Mongkut’s University of Technology Thonburi (Bang KhunThian), Bangkok, Thailand
| | - Chakarida Nukoolkit
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology and School of Information Technology, King Mongkut’s University of Technology Thonburi (Bang KhunThian), Bangkok, Thailand
- School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Saowalak Kalapanulak
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology and School of Information Technology, King Mongkut’s University of Technology Thonburi (Bang KhunThian), Bangkok, Thailand
- Center for Agricultural Systems Biology, Systems Biology and Bioinformatics Research Group, Pilot Plant Development and Training Institute, King Mongkut’s University of Technology Thonburi (Bang KhunThian), Bangkok, Thailand
| | - Treenut Saithong
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology and School of Information Technology, King Mongkut’s University of Technology Thonburi (Bang KhunThian), Bangkok, Thailand
- Center for Agricultural Systems Biology, Systems Biology and Bioinformatics Research Group, Pilot Plant Development and Training Institute, King Mongkut’s University of Technology Thonburi (Bang KhunThian), Bangkok, Thailand
| |
Collapse
|
11
|
Katuwawala A, Zhao B, Kurgan L. DisoLipPred: accurate prediction of disordered lipid-binding residues in protein sequences with deep recurrent networks and transfer learning. Bioinformatics 2021; 38:115-124. [PMID: 34487138 DOI: 10.1093/bioinformatics/btab640] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/05/2021] [Accepted: 09/02/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Intrinsically disordered protein regions interact with proteins, nucleic acids and lipids. Regions that bind lipids are implicated in a wide spectrum of cellular functions and several human diseases. Motivated by the growing amount of experimental data for these interactions and lack of tools that can predict them from the protein sequence, we develop DisoLipPred, the first predictor of the disordered lipid-binding residues (DLBRs). RESULTS DisoLipPred relies on a deep bidirectional recurrent network that implements three innovative features: transfer learning, bypass module that sidesteps predictions for putative structured residues, and expanded inputs that cover physiochemical properties associated with the protein-lipid interactions. Ablation analysis shows that these features drive predictive quality of DisoLipPred. Tests on an independent test dataset and the yeast proteome reveal that DisoLipPred generates accurate results and that none of the related existing tools can be used to indirectly identify DLBR. We also show that DisoLipPred's predictions complement the results generated by predictors of the transmembrane regions. Altogether, we conclude that DisoLipPred provides high-quality predictions of DLBRs that complement the currently available methods. AVAILABILITY AND IMPLEMENTATION DisoLipPred's webserver is available at http://biomine.cs.vcu.edu/servers/DisoLipPred/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
12
|
Shen W, Pan J, Wang G, Li X. Deep learning-based prediction of TFBSs in plants. TRENDS IN PLANT SCIENCE 2021; 26:1301-1302. [PMID: 34312058 DOI: 10.1016/j.tplants.2021.06.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 06/18/2021] [Accepted: 06/24/2021] [Indexed: 06/13/2023]
Affiliation(s)
- Wei Shen
- Guangdong Technology Research Center for Marine Algal Bioengineering, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China; School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Jian Pan
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Guanjie Wang
- College of Life Science, Jilin Agricultural University, Jilin, China
| | - Xiaozheng Li
- Guangdong Technology Research Center for Marine Algal Bioengineering, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China.
| |
Collapse
|