1
|
Swamidatta SH, Lichman BR. Beyond co-expression: pathway discovery for plant pharmaceuticals. Curr Opin Biotechnol 2024; 88:103147. [PMID: 38833915 DOI: 10.1016/j.copbio.2024.103147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 05/07/2024] [Accepted: 05/09/2024] [Indexed: 06/06/2024]
Abstract
Plant natural products have been an important source of medicinal molecules since ancient times. To gain access to the whole diversity of these molecules for pharmaceutical applications, it is important to understand their biosynthetic origins. Whilst co-expression is a reliable tool for identifying gene candidates, a variety of complementary methods can aid in screening or refining candidate selection. Here, we review recently employed plant biosynthetic pathway discovery approaches, and highlight future directions in the field.
Collapse
Affiliation(s)
- Sandesh H Swamidatta
- Centre for Novel Agricultural Products, Department of Biology, University of York, York YO10 5DD, UK
| | - Benjamin R Lichman
- Centre for Novel Agricultural Products, Department of Biology, University of York, York YO10 5DD, UK.
| |
Collapse
|
2
|
Bai W, Li C, Li W, Wang H, Han X, Wang P, Wang L. Machine learning assists prediction of genes responsible for plant specialized metabolite biosynthesis by integrating multi-omics data. BMC Genomics 2024; 25:418. [PMID: 38679745 PMCID: PMC11057162 DOI: 10.1186/s12864-024-10258-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 03/26/2024] [Indexed: 05/01/2024] Open
Abstract
BACKGROUND Plant specialized (or secondary) metabolites (PSM), also known as phytochemicals, natural products, or plant constituents, play essential roles in interactions between plants and environment. Although many research efforts have focused on discovering novel metabolites and their biosynthetic genes, the resolution of metabolic pathways and identified biosynthetic genes was limited by rudimentary analysis approaches and enormous number of candidate genes. RESULTS Here we integrated state-of-the-art automated machine learning (ML) frame AutoGluon-Tabular and multi-omics data from Arabidopsis to predict genes encoding enzymes involved in biosynthesis of plant specialized metabolite (PSM), focusing on the three main PSM categories: terpenoids, alkaloids, and phenolics. We found that the related features of genomics and proteomics were the top two crucial categories of features contributing to the model performance. Using only these key features, we built a new model in Arabidopsis, which performed better than models built with more features including those related with transcriptomics and epigenomics. Finally, the built models were validated in maize and tomato, and models tested for maize and trained with data from two other species exhibited either equivalent or superior performance to intraspecies predictions. CONCLUSIONS Our external validation results in grape and poppy on the one hand implied the applicability of our model to the other species, and on the other hand showed enormous potential to improve the prediction of enzymes synthesizing PSM with the inclusion of valid data from a wider range of species.
Collapse
Affiliation(s)
- Wenhui Bai
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030024, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, China, 518000, Shenzhen
| | - Cheng Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, China, 518000, Shenzhen
| | - Wei Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, China, 518000, Shenzhen
| | - Hai Wang
- National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, Joint Laboratory for International Cooperation in Crop Molecular Breeding, China Agricultural University, Beijing, 100193, China
| | - Xiaohong Han
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030024, China.
| | - Peipei Wang
- Kunpeng Institute of Modern Agriculture at Foshan, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, China.
| | - Li Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, China, 518000, Shenzhen.
| |
Collapse
|
3
|
Wu TY, Li YR, Chang KJ, Fang JC, Urano D, Liu MJ. Modeling alternative translation initiation sites in plants reveals evolutionarily conserved cis-regulatory codes in eukaryotes. Genome Res 2024; 34:272-285. [PMID: 38479836 PMCID: PMC10984385 DOI: 10.1101/gr.278100.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 02/15/2024] [Indexed: 03/22/2024]
Abstract
mRNA translation relies on identifying translation initiation sites (TISs) in mRNAs. Alternative TISs are prevalent across plant transcriptomes, but the mechanisms for their recognition are unclear. Using ribosome profiling and machine learning, we developed models for predicting alternative TISs in the tomato (Solanum lycopersicum). Distinct feature sets were predictive of AUG and nonAUG TISs in 5' untranslated regions and coding sequences, including a novel CU-rich sequence that promoted plant TIS activity, a translational enhancer found across dicots and monocots, and humans and viruses. Our results elucidate the mechanistic and evolutionary basis of TIS recognition, whereby cis-regulatory RNA signatures affect start site selection. The TIS prediction model provides global estimates of TISs to discover neglected protein-coding genes across plant genomes. The prevalence of cis-regulatory signatures across plant species, humans, and viruses suggests their broad and critical roles in reprogramming the translational landscape.
Collapse
Affiliation(s)
- Ting-Ying Wu
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei 11529, Taiwan;
| | - Ya-Ru Li
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
| | - Kai-Jyun Chang
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
- Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan
| | - Jhen-Cheng Fang
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
| | - Daisuke Urano
- Temasek Life Sciences Laboratory, Singapore 117604, Singapore
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
| | - Ming-Jung Liu
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan;
- Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
4
|
Loyola-Vargas VM, Ochoa-Alejo N. An Introduction to Plant Cell, Tissue, and Organ Culture: Current Status and Perspectives. Methods Mol Biol 2024; 2827:1-13. [PMID: 38985259 DOI: 10.1007/978-1-0716-3954-2_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
Plant cell, tissue, and organ cultures (PCTOC) have been used as experimental systems in basic research, allowing gene function demonstration through gene overexpression or repression and investigating the processes involved in embryogenesis and organogenesis or those related to the potential production of secondary metabolites, among others. On the other hand, PCTOC has also been applied at the commercial level for the vegetative multiplication (micropropagation) of diverse plant species, mainly ornamentals but also horticultural crops such as potato or fruit and tree species, and to produce high-quality disease-free plants. Moreover, PCTOC protocols are important auxiliary systems in crop breeding crops to generate pure lines (homozygous) to produce hybrids for the obtention of polyploid plants with higher yields or better performance. PCTOC has been utilized to preserve and conserve the germplasm of different crops or threatened species. Plant genetic improvement through genetic engineering and genome editing has been only possible thanks to the establishment of efficient in vitro plant regeneration protocols. Different companies currently focus on commercializing plant secondary metabolites with interesting biological activities using in vitro PCTOC. The impact of omics on PCTOC is discussed.
Collapse
Affiliation(s)
- Víctor M Loyola-Vargas
- Unidad de Biología Integrativa, Centro de Investigación Científica de Yucatán, Mérida, Yucatán, Mexico.
| | - Neftalí Ochoa-Alejo
- Departamento de Ingeniería Genética, Unidad Irapuato, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Irapuato, Guanajuato, Mexico.
| |
Collapse
|
5
|
Bao H, Zhao J, Zhao X, Zhao C, Lu X, Xu G. Prediction of plant secondary metabolic pathways using deep transfer learning. BMC Bioinformatics 2023; 24:348. [PMID: 37726702 PMCID: PMC10507959 DOI: 10.1186/s12859-023-05485-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 09/14/2023] [Indexed: 09/21/2023] Open
Abstract
BACKGROUND Plant secondary metabolites are highly valued for their applications in pharmaceuticals, nutrition, flavors, and aesthetics. It is of great importance to elucidate plant secondary metabolic pathways due to their crucial roles in biological processes during plant growth and development. However, understanding plant biosynthesis and degradation pathways remains a challenge due to the lack of sufficient information in current databases. To address this issue, we proposed a transfer learning approach using a pre-trained hybrid deep learning architecture that combines Graph Transformer and convolutional neural network (GTC) to predict plant metabolic pathways. RESULTS GTC provides comprehensive molecular representation by extracting both structural features from the molecular graph and textual information from the SMILES string. GTC is pre-trained on the KEGG datasets to acquire general features, followed by fine-tuning on plant-derived datasets. Four metrics were chosen for model performance evaluation. The results show that GTC outperforms six other models, including three previously reported machine learning models, on the KEGG dataset. GTC yields an accuracy of 96.75%, precision of 85.14%, recall of 83.03%, and F1_score of 84.06%. Furthermore, an ablation study confirms the indispensability of all the components of the hybrid GTC model. Transfer learning is then employed to leverage the shared knowledge acquired from the KEGG metabolic pathways. As a result, the transferred GTC exhibits outstanding accuracy in predicting plant secondary metabolic pathways with an average accuracy of 98.30% in fivefold cross-validation and 97.82% on the final test. In addition, GTC is employed to classify natural products. It achieves a perfect accuracy score of 100.00% for alkaloids, while the lowest accuracy score of 98.42% for shikimates and phenylpropanoids. CONCLUSIONS The proposed GTC effectively captures molecular features, and achieves high performance in classifying KEGG metabolic pathways and predicting plant secondary metabolic pathways via transfer learning. Furthermore, GTC demonstrates its generalization ability by accurately classifying natural products. A user-friendly executable program has been developed, which only requires the input of the SMILES string of the query compound in a graphical interface.
Collapse
Affiliation(s)
- Han Bao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Jinhui Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Xinjie Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Chunxia Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China.
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China.
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China.
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China.
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China.
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China.
| |
Collapse
|
6
|
Wong DCJ, Pichersky E, Peakall R. Many different flowers make a bouquet: Lessons from specialized metabolite diversity in plant-pollinator interactions. CURRENT OPINION IN PLANT BIOLOGY 2023; 73:102332. [PMID: 36652780 DOI: 10.1016/j.pbi.2022.102332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 12/04/2022] [Accepted: 12/08/2022] [Indexed: 06/10/2023]
Abstract
Flowering plants have evolved extraordinarily diverse metabolites that underpin the floral visual and olfactory signals enabling plant-pollinator interactions. In some cases, these metabolites also provide unusual rewards that specific pollinators depend on. While some metabolites are shared by most flowering plants, many have evolved in restricted lineages in response to the specific selection pressures encountered within different niches. The latter are designated as specialized metabolites. Recent investigations continue to uncover a growing repertoire of unusual specialized metabolites. Increased accessibility to cutting-edge multi-omics technologies (e.g. genome, transcriptome, proteome, metabolome) is now opening new doors to simultaneously uncover the molecular basis of their synthesis and their evolution across diverse plant lineages. Drawing upon the recent literature, this perspective discusses these aspects and, where known, their ecological and evolutionary relevance. A primer on omics-guided approaches to discover the genetic and biochemical basis of functional specialized metabolites is also provided.
Collapse
Affiliation(s)
- Darren C J Wong
- Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia.
| | - Eran Pichersky
- Department of Molecular, Cellular and Developmental Biology, University of Michigan, Ann Arbor, MI, United States
| | - Rod Peakall
- Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| |
Collapse
|
7
|
Kisiel A, Krzemińska A, Cembrowska-Lech D, Miller T. Data Science and Plant Metabolomics. Metabolites 2023; 13:metabo13030454. [PMID: 36984894 PMCID: PMC10054611 DOI: 10.3390/metabo13030454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/16/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
The study of plant metabolism is one of the most complex tasks, mainly due to the huge amount and structural diversity of metabolites, as well as the fact that they react to changes in the environment and ultimately influence each other. Metabolic profiling is most often carried out using tools that include mass spectrometry (MS), which is one of the most powerful analytical methods. All this means that even when analyzing a single sample, we can obtain thousands of data. Data science has the potential to revolutionize our understanding of plant metabolism. This review demonstrates that machine learning, network analysis, and statistical modeling are some techniques being used to analyze large quantities of complex data that provide insights into plant development, growth, and how they interact with their environment. These findings could be key to improving crop yields, developing new forms of plant biotechnology, and understanding the relationship between plants and microbes. It is also necessary to consider the constraints that come with data science such as quality and availability of data, model complexity, and the need for deep knowledge of the subject in order to achieve reliable outcomes.
Collapse
Affiliation(s)
- Anna Kisiel
- Institute of Marine and Environmental Sciences, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland
| | - Adrianna Krzemińska
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland
| | - Danuta Cembrowska-Lech
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland
- Department of Physiology and Biochemistry, Institute of Biology, University of Szczecin, Felczaka 3c, 71-412 Szczecin, Poland
| | - Tymoteusz Miller
- Institute of Marine and Environmental Sciences, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland
| |
Collapse
|