1
|
Song R, J Sutton G, Li F, Liu Q, Wong JJL. Variable calling of m6A and associated features in databases: a guide for end-users. Brief Bioinform 2024; 25:bbae434. [PMID: 39258883 PMCID: PMC11388104 DOI: 10.1093/bib/bbae434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 07/01/2024] [Accepted: 08/19/2024] [Indexed: 09/12/2024] Open
Abstract
N6-methyladenosine (m$^{6}$A) is a widely-studied methylation to messenger RNAs, which has been linked to diverse cellular processes and human diseases. Numerous databases that collate m$^{6}$A profiles of distinct cell types have been created to facilitate quick and easy mining of m$^{6}$A signatures associated with cell-specific phenotypes. However, these databases contain inherent complexities that have not been explicitly reported, which may lead to inaccurate identification and interpretation of m$^{6}$A-associated biology by end-users who are unaware of them. Here, we review various m$^{6}$A-related databases, and highlight several critical matters. In particular, differences in peak-calling pipelines across databases drive substantial variability in both peak number and coordinates with only moderate reproducibility, and the inclusion of peak calls from early m$^{6}$A sequencing protocols may lead to the reporting of false positives or negatives. The awareness of these matters will help end-users avoid the inclusion of potentially unreliable data in their studies and better utilize m$^{6}$A databases to derive biologically meaningful results.
Collapse
Affiliation(s)
- Renhua Song
- Epigenetics and RNA Biology Laboratory, School of Medical Sciences, The University of Sydney, Camperdown, NSW 2050, Australia
- Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2050, Australia
| | - Gavin J Sutton
- Epigenetics and RNA Biology Laboratory, School of Medical Sciences, The University of Sydney, Camperdown, NSW 2050, Australia
- Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2050, Australia
| | - Fuyi Li
- College of Information Engineering, Northwest A&F University, Yangling 712100, Shaanxi, China
- South Australian immunoGENomics Cancer Institute (SAiGENCI), The University of Adelaide, Adelaide, South Australia 5005, Australia
| | - Qian Liu
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Maryland Pkwy, NV 89154, United States
- School of Life Sciences, College of Sciences, University of Nevada, Las Vegas, Maryland Pkwy, NV 89154, United States
| | - Justin J-L Wong
- Epigenetics and RNA Biology Laboratory, School of Medical Sciences, The University of Sydney, Camperdown, NSW 2050, Australia
- Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2050, Australia
| |
Collapse
|
2
|
Zhang T, Gao S, Zhang SW, Cui XD. m 6Aexpress-enet: Predicting the regulatory expression m 6A sites by an enet-regularization negative binomial regression model. Methods 2024; 226:61-70. [PMID: 38631404 DOI: 10.1016/j.ymeth.2024.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 04/04/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
As the most abundant mRNA modification, m6A controls and influences many aspects of mRNA metabolism including the mRNA stability and degradation. However, the role of specific m6A sites in regulating gene expression still remains unclear. In additional, the multicollinearity problem caused by the correlation of methylation level of multiple m6A sites in each gene could influence the prediction performance. To address the above challenges, we propose an elastic-net regularized negative binomial regression model (called m6Aexpress-enet) to predict which m6A site could potentially regulate its gene expression. Comprehensive evaluations on simulated datasets demonstrate that m6Aexpress-enet could achieve the top prediction performance. Applying m6Aexpress-enet on real MeRIP-seq data from human lymphoblastoid cell lines, we have uncovered the complex regulatory pattern of predicted m6A sites and their unique enrichment pathway of the constructed co-methylation modules. m6Aexpress-enet proves itself as a powerful tool to enable biologists to discover the mechanism of m6A regulatory gene expression. Furthermore, the source code and the step-by-step implementation of m6Aexpress-enet is freely accessed at https://github.com/tengzhangs/m6Aexpress-enet.
Collapse
Affiliation(s)
- Teng Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710027 Shaanxi, China; School of Computer, Jiangsu University of Science and Technology, ZhenJiang, 212100 JiangSu, China
| | - Shang Gao
- School of Computer, Jiangsu University of Science and Technology, ZhenJiang, 212100 JiangSu, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710027 Shaanxi, China.
| | - Xiao-Dong Cui
- School of Marine Science and Technology Northwestern Polytechnical University, Xi'an, 710027 Shaanxi, China.
| |
Collapse
|
3
|
Song M, Zhao J, Zhang C, Jia C, Yang J, Zhao H, Zhai J, Lei B, Tao S, Chen S, Su R, Ma C. PEA-m6A: an ensemble learning framework for accurately predicting N6-methyladenosine modifications in plants. PLANT PHYSIOLOGY 2024; 195:1200-1213. [PMID: 38428981 DOI: 10.1093/plphys/kiae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 01/11/2024] [Accepted: 02/01/2024] [Indexed: 03/03/2024]
Abstract
N 6-methyladenosine (m6A), which is the mostly prevalent modification in eukaryotic mRNAs, is involved in gene expression regulation and many RNA metabolism processes. Accurate prediction of m6A modification is important for understanding its molecular mechanisms in different biological contexts. However, most existing models have limited range of application and are species-centric. Here we present PEA-m6A, a unified, modularized and parameterized framework that can streamline m6A-Seq data analysis for predicting m6A-modified regions in plant genomes. The PEA-m6A framework builds ensemble learning-based m6A prediction models with statistic-based and deep learning-driven features, achieving superior performance with an improvement of 6.7% to 23.3% in the area under precision-recall curve compared with state-of-the-art regional-scale m6A predictor WeakRM in 12 plant species. Especially, PEA-m6A is capable of leveraging knowledge from pretrained models via transfer learning, representing an innovation in that it can improve prediction accuracy of m6A modifications under small-sample training tasks. PEA-m6A also has a strong capability for generalization, making it suitable for application in within- and cross-species m6A prediction. Overall, this study presents a promising m6A prediction tool, PEA-m6A, with outstanding performance in terms of its accuracy, flexibility, transferability, and generalization ability. PEA-m6A has been packaged using Galaxy and Docker technologies for ease of use and is publicly available at https://github.com/cma2015/PEA-m6A.
Collapse
Affiliation(s)
- Minggui Song
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jiawen Zhao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chujun Zhang
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chengchao Jia
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jing Yang
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Haonan Zhao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jingjing Zhai
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Beilei Lei
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Shiheng Tao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Siqi Chen
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Ran Su
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Chuang Ma
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
4
|
Yang Y, Liu Z, Lu J, Sun Y, Fu Y, Pan M, Xie X, Ge Q. Analysis approaches for the identification and prediction of N6-methyladenosine sites. Epigenetics 2023; 18:2158284. [PMID: 36562485 PMCID: PMC9980620 DOI: 10.1080/15592294.2022.2158284] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The global dynamics in a variety of biological processes can be revealed by mapping transcriptional m6A sites, in particular full-transcriptome m6A. And individual m6A sites have contributed to biological function, which can be evaluated by stoichiometric information obtained from the single nucleotide resolution. Currently, the identification of m6A sites is mainly carried out by experiment and prediction methods, based on high-throughput sequencing and machine learning model respectively. This review summarizes the recent topics and progress made in bioinformatics methods of deciphering the m6A methylation, including the experimental detection of m6A methylation sites, techniques of data analysis, the way of predicting m6A methylation sites, m6A methylation databases, and detection of m6A modification in circRNA. At the end, the essay makes a brief discussion for the development perspective in this area.
Collapse
Affiliation(s)
- Yuwei Yang
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Zhiyu Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Junru Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yuqing Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yue Fu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Min Pan
- Department of Pathology and Pathophysiology School of Medicine, Southeast University, Nanjing, China
| | - Xueying Xie
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| |
Collapse
|
5
|
Wang Y, Wei Z, Su J, Coenen F, Meng J. RgnTX: Colocalization analysis of transcriptome elements in the presence of isoform heterogeneity and ambiguity. Comput Struct Biotechnol J 2023; 21:4110-4117. [PMID: 37671241 PMCID: PMC10475473 DOI: 10.1016/j.csbj.2023.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 08/13/2023] [Accepted: 08/23/2023] [Indexed: 09/07/2023] Open
Abstract
Colocalization analysis of genomic region sets has been widely adopted to unveil potential functional interactions between corresponding biological attributes, which often serves as the basis for further investigation. A number of methods have been developed for colocalization analysis of genomic elements. However, none of them explicitly considered the transcriptome heterogeneity and isoform ambiguity, making them less appropriate for analyzing transcriptome elements. Here, we developed RgnTX, an R/Bioconductor tool for the colocalization analysis of transcriptome elements with permutation tests. Different from existing approaches, RgnTX directly takes advantage of transcriptome annotation, and offers high flexibility in the null model to simulate realistic transcriptome-wide background, such as the complex alternative splicing patterns. Importantly, it supports the testing of transcriptome elements without clear isoform association, which is often the real scenario due to technical limitations. Proposed package offers a wide selection of pre-defined functions, easy to be utilized by users for visualizing permutation results, calculating shifted z-scores and conducting multiple hypothesis testing under Benjamini-Hochberg correction. Moreover, with synthetic and real datasets, we show that RgnTX novel testing modes return distinct and more significant results compared to existing genome-based methods. We believe RgnTX should make a useful tool to characterize the randomness of the transcriptome, and for conducting statistical association analysis for genomic region sets within the heterogeneous transcriptome. The package now has been accepted by Bioconductor and is freely available at: https://bioconductor.org/packages/RgnTX.
Collapse
Affiliation(s)
- Yue Wang
- Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Department of Computer Science, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jionglong Su
- School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Frans Coenen
- Department of Computer Science, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| |
Collapse
|
6
|
Guo Z, Shafik AM, Jin P, Wu Z, Wu H. Analyzing mRNA Epigenetic Sequencing Data with TRESS. Methods Mol Biol 2023; 2624:163-183. [PMID: 36723816 DOI: 10.1007/978-1-0716-2962-8_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
RNA epigenetics has emerged as an active topic to study gene regulation mechanisms. In this regard, the MeRIP-seq technology allows profiling transcriptome-wide mRNA modifications, in particular m6A. The primary goals for the analysis of MeRIP-seq data are the identification of m6A-methylated regions under each condition and across different biological conditions. Here we describe detailed procedures to guide researchers in MeRIP-seq data analyses by providing step-by-step instructions of the dedicated bioconductor package TRESS.
Collapse
Affiliation(s)
- Zhenxing Guo
- Department of Biostatistics and Bioinformatics, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Andrew M Shafik
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Zhijin Wu
- Department of Biostatistics, Brown University, Providence, RI, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University Rollins School of Public Health, Atlanta, GA, USA.
| |
Collapse
|
7
|
Guo Z, Shafik AM, Jin P, Wu H. Differential RNA methylation analysis for MeRIP-seq data under general experimental design. Bioinformatics 2022; 38:4705-4712. [PMID: 36063045 PMCID: PMC9563684 DOI: 10.1093/bioinformatics/btac601] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 08/03/2022] [Accepted: 09/02/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION RNA epigenetics is an emerging field to study the post-transcriptional gene regulation. The dynamics of RNA epigenetic modification have been reported to associate with many human diseases. Recently developed high-throughput technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables the transcriptome-wide profiling of N6-methyladenosine (m6A) modification and comparison of RNA epigenetic modifications. There are a few computational methods for the comparison of mRNA modifications under different conditions but they all suffer from serious limitations. RESULTS In this work, we develop a novel statistical method to detect differentially methylated mRNA regions from MeRIP-seq data. We model the sequence count data by a hierarchical negative binomial model that accounts for various sources of variations and derive parameter estimation and statistical testing procedures for flexible statistical inferences under general experimental designs. Extensive benchmark evaluations in simulation and real data analyses demonstrate that our method is more accurate, robust and flexible compared to existing methods. AVAILABILITY AND IMPLEMENTATION Our method TRESS is implemented as an R/Bioconductor package and is available at https://bioconductor.org/packages/devel/TRESS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhenxing Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Andrew M Shafik
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Peng Jin
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
8
|
Shafik AM, Allen EG, Jin P. Epitranscriptomic dynamics in brain development and disease. Mol Psychiatry 2022; 27:3633-3646. [PMID: 35474104 PMCID: PMC9596619 DOI: 10.1038/s41380-022-01570-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 04/06/2022] [Accepted: 04/07/2022] [Indexed: 02/08/2023]
Abstract
Distinct cell types are generated at specific times during brain development and are regulated by epigenetic, transcriptional, and newly emerging epitranscriptomic mechanisms. RNA modifications are known to affect many aspects of RNA metabolism and have been implicated in the regulation of various biological processes and in disease. Recent studies imply that dysregulation of the epitranscriptome may be significantly associated with neuropsychiatric, neurodevelopmental, and neurodegenerative disorders. Here we review the current knowledge surrounding the role of the RNA modifications N6-methyladenosine, 5-methylcytidine, pseudouridine, A-to-I RNA editing, 2'O-methylation, and their associated machinery, in brain development and human diseases. We also highlight the need for the development of new technologies in the pursuit of directly mapping RNA modifications in both genome- and single-molecule-level approach.
Collapse
Affiliation(s)
- Andrew M Shafik
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, 30322, USA
| | - Emily G Allen
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, 30322, USA
| | - Peng Jin
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, 30322, USA.
| |
Collapse
|
9
|
Ma L, He LN, Kang S, Gu B, Gao S, Zuo Z. Advances in detecting N6-methyladenosine modification in circRNAs. Methods 2022; 205:234-246. [PMID: 35878749 DOI: 10.1016/j.ymeth.2022.07.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/15/2022] [Accepted: 07/18/2022] [Indexed: 12/14/2022] Open
Abstract
Circular RNAs (circRNAs) are a class of noncoding RNAs with covalently single-stranded closed loop structures derived from back-splicing event of linear precursor mRNAs (pre-mRNAs). N6-methyladenosine (m6A), the most abundant epigenetic modification in eukaryotic RNAs, has been shown to play a crucial role in regulating the fate and biological function of circRNAs, and thus affecting various physiological and pathological processes. Accurate identification of m6A modification in circRNAs is an essential step to fully elucidate the crosstalk between m6A and circRNAs. In recent years, the rapid development of high-throughput sequencing technology and bioinformatic methodology has propelled the establishment of a multitude of approaches to detect circRNAs and m6A modification, including in vitro-based and in silico methods. Based on this, the research community has started on a new journey to develop methods for identification of m6A modification in circRNAs. In this review, we provide a comprehensive review and evaluation of the existing methods responsible for detecting circRNAs, m6A modification, and especially, m6A modification in circRNAs, which mainly focused on those developed based on high-throughput technologies and methodology of bioinformatics. This handy reference can help researchers figure out towards which direction this field will go.
Collapse
Affiliation(s)
- Lixia Ma
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China
| | - Li-Na He
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Shiyang Kang
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Bianli Gu
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China
| | - Shegan Gao
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China.
| | - Zhixiang Zuo
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China.
| |
Collapse
|
10
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|