1
|
Chan YO, Biová J, Mahmood A, Dietz N, Bilyeu K, Škrabišová M, Joshi T. Genomic Variations Explorer (GenVarX): a toolset for annotating promoter and CNV regions using genotypic and phenotypic differences. Front Genet 2023; 14:1251382. [PMID: 37928239 PMCID: PMC10623549 DOI: 10.3389/fgene.2023.1251382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 09/27/2023] [Indexed: 11/07/2023] Open
Abstract
The rapid growth of sequencing technology and its increasing popularity in biology-related research over the years has made whole genome re-sequencing (WGRS) data become widely available. A large amount of WGRS data can unlock the knowledge gap between genomics and phenomics through gaining an understanding of the genomic variations that can lead to phenotype changes. These genomic variations are usually comprised of allele and structural changes in DNA, and these changes can affect the regulatory mechanisms causing changes in gene expression and altering the phenotypes of organisms. In this research work, we created the GenVarX toolset, that is backed by transcription factor binding sequence data in promoter regions, the copy number variations data, SNPs and Indels data, and phenotypes data which can potentially provide insights about phenotypic differences and solve compelling questions in plant research. Analytics-wise, we have developed strategies to better utilize the WGRS data and mine the data using efficient data processing scripts, libraries, tools, and frameworks to create the interactive and visualization-enhanced GenVarX toolset that encompasses both promoter regions and copy number variation analysis components. The main capabilities of the GenVarX toolset are to provide easy-to-use interfaces for users to perform queries, visualize data, and interact with the data. Based on different input windows on the user interface, users can provide inputs corresponding to each field and submit the information as a query. The data returned on the results page is usually displayed in a tabular fashion. In addition, interactive figures are also included in the toolset to facilitate the visualization of statistical results or tool outputs. Currently, the GenVarX toolset supports soybean, rice, and Arabidopsis. The researchers can access the soybean GenVarX toolset from SoyKB via https://soykb.org/SoybeanGenVarX/, rice GenVarX toolset, and Arabidopsis GenVarX toolset from KBCommons web portal with links https://kbcommons.org/system/tools/GenVarX/Osativa and https://kbcommons.org/system/tools/GenVarX/Athaliana, respectively.
Collapse
Affiliation(s)
- Yen On Chan
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, United States
| | - Jana Biová
- Department of Biochemistry, Faculty of Science, Palacky University in Olomouc, Olomouc, Czechia
| | - Anser Mahmood
- Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
| | - Nicholas Dietz
- Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
| | - Kristin Bilyeu
- Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
- Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Columbia, MO, United States
| | - Mária Škrabišová
- Department of Biochemistry, Faculty of Science, Palacky University in Olomouc, Olomouc, Czechia
| | - Trupti Joshi
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, United States
- Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, United States
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, United States
- Department of Biomedical Informatics, Biostatistics and Medical Epidemiology, University of Missouri-Columbia, Columbia, MO, United States
| |
Collapse
|
2
|
Xu Z, Cheng S, Qiu X, Wang X, Hu Q, Shi Y, Liu Y, Lin J, Tian J, Peng Y, Jiang Y, Yang Y, Ye J, Wang Y, Meng X, Li Z, Li H, Wang Y. A pipeline for sample tagging of whole genome bisulfite sequencing data using genotypes of whole genome sequencing. BMC Genomics 2023; 24:347. [PMID: 37353738 DOI: 10.1186/s12864-023-09413-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 05/27/2023] [Indexed: 06/25/2023] Open
Abstract
BACKGROUND In large-scale high-throughput sequencing projects and biobank construction, sample tagging is essential to prevent sample mix-ups. Despite the availability of fingerprint panels for DNA data, little research has been conducted on sample tagging of whole genome bisulfite sequencing (WGBS) data. This study aims to construct a pipeline and identify applicable fingerprint panels to address this problem. RESULTS Using autosome-wide A/T polymorphic single nucleotide variants (SNVs) obtained from whole genome sequencing (WGS) and WGBS of individuals from the Third China National Stroke Registry, we designed a fingerprint panel and constructed an optimized pipeline for tagging WGBS data. This pipeline used Bis-SNP to call genotypes from the WGBS data, and optimized genotype comparison by eliminating wildtype homozygous and missing genotypes, and retaining variants with identical genomic coordinates and reference/alternative alleles. WGS-based and WGBS-based genotypes called from identical or different samples were extensively compared using hap.py. In the first batch of 94 samples, the genotype consistency rates were between 71.01%-84.23% and 51.43%-60.50% for the matched and mismatched WGS and WGBS data using the autosome-wide A/T polymorphic SNV panel. This capability to tag WGBS data was validated among the second batch of 240 samples, with genotype consistency rates ranging from 70.61%-84.65% to 49.58%-61.42% for the matched and mismatched data, respectively. We also determined that the number of genetic variants required to correctly tag WGBS data was on the order of thousands through testing six fingerprint panels with different orders for the number of variants. Additionally, we affirmed this result with two self-designed panels of 1351 and 1278 SNVs, respectively. Furthermore, this study confirmed that using the number of genetic variants with identical coordinates and ref/alt alleles, or identical genotypes could not correctly tag WGBS data. CONCLUSION This study proposed an optimized pipeline, applicable fingerprint panels, and a lower boundary for the number of fingerprint genetic variants needed for correct sample tagging of WGBS data, which are valuable for tagging WGBS data and integrating multi-omics data for biobanks.
Collapse
Affiliation(s)
- Zhe Xu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
- Center of excellence for Omics Research (CORe), Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
| | - Si Cheng
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
- Center of excellence for Omics Research (CORe), Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- Clinical Center for Precision Medicine in Stroke, Capital Medical University, Beijing, 100069, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
| | - Xin Qiu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
| | - Xiaoqi Wang
- BioChain (Beijing) Science and Technology, Inc, Economic and Technological Development Area, 100176, Beijing, P. R. China
| | - Qiuwen Hu
- BioChain (Beijing) Science and Technology, Inc, Economic and Technological Development Area, 100176, Beijing, P. R. China
| | - Yanfeng Shi
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
- Center of excellence for Omics Research (CORe), Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
| | - Yang Liu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
- Center of excellence for Omics Research (CORe), Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
| | - Jinxi Lin
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
| | - Jichao Tian
- BioChain (Beijing) Science and Technology, Inc, Economic and Technological Development Area, 100176, Beijing, P. R. China
| | - Yongfei Peng
- BioChain (Beijing) Science and Technology, Inc, Economic and Technological Development Area, 100176, Beijing, P. R. China
| | - Yong Jiang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
| | - Yadong Yang
- BioChain (Beijing) Science and Technology, Inc, Economic and Technological Development Area, 100176, Beijing, P. R. China
| | - Jianwei Ye
- BioChain (Beijing) Science and Technology, Inc, Economic and Technological Development Area, 100176, Beijing, P. R. China
| | - Yilong Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
| | - Xia Meng
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
| | - Zixiao Li
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
| | - Hao Li
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
- Center of excellence for Omics Research (CORe), Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
| | - Yongjun Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China.
- China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China.
- Center of excellence for Omics Research (CORe), Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China.
- Clinical Center for Precision Medicine in Stroke, Capital Medical University, Beijing, 100069, China.
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China.
| |
Collapse
|
3
|
Chan YO, Dietz N, Zeng S, Wang J, Flint-Garcia S, Salazar-Vidal MN, Škrabišová M, Bilyeu K, Joshi T. The Allele Catalog Tool: a web-based interactive tool for allele discovery and analysis. BMC Genomics 2023; 24:107. [PMID: 36899307 PMCID: PMC10007842 DOI: 10.1186/s12864-023-09161-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 01/31/2023] [Indexed: 03/12/2023] Open
Abstract
BACKGROUND The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. RESULTS The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. CONCLUSIONS The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.
Collapse
Affiliation(s)
- Yen On Chan
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, USA.,Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, USA
| | - Nicholas Dietz
- Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, USA
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA
| | - Juexin Wang
- Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, USA.,Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA
| | - Sherry Flint-Garcia
- United States Department of Agriculture-Agricultural Research Service, Plant Genetics Research Unit, Columbia, MO, USA
| | - M Nancy Salazar-Vidal
- Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, USA.,Department of Evolution and Ecology, University of California-Davis, Davis, CA, USA
| | - Mária Škrabišová
- Department of Biochemistry, Faculty of Science, Palacky University in Olomouc, Olomouc, Czech Republic
| | - Kristin Bilyeu
- United States Department of Agriculture-Agricultural Research Service, Plant Genetics Research Unit, Columbia, MO, USA.
| | - Trupti Joshi
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, USA. .,Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, USA. .,Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA. .,Department of Health Management and Informatics, University of Missouri-Columbia, Columbia, MO, USA.
| |
Collapse
|
4
|
Zhang R, Zhang C, Yu C, Dong J, Hu J. Integration of multi-omics technologies for crop improvement: Status and prospects. FRONTIERS IN BIOINFORMATICS 2022; 2:1027457. [PMID: 36438626 PMCID: PMC9689701 DOI: 10.3389/fbinf.2022.1027457] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 09/28/2022] [Indexed: 08/03/2023] Open
Abstract
With the rapid development of next-generation sequencing (NGS), multi-omics techniques have been emerging as effective approaches for crop improvement. Here, we focus mainly on addressing the current status and future perspectives toward omics-related technologies and bioinformatic resources with potential applications in crop breeding. Using a large amount of omics-level data from the functional genome, transcriptome, proteome, epigenome, metabolome, and microbiome, clarifying the interaction between gene and phenotype formation will become possible. The integration of multi-omics datasets with pan-omics platforms and systems biology could predict the complex traits of crops and elucidate the regulatory networks for genetic improvement. Different scales of trait predictions and decision-making models will facilitate crop breeding more intelligent. Potential challenges that integrate the multi-omics data with studies of gene function and their network to efficiently select desirable agronomic traits are discussed by proposing some cutting-edge breeding strategies for crop improvement. Multi-omics-integrated approaches together with other artificial intelligence techniques will contribute to broadening and deepening our knowledge of crop precision breeding, resulting in speeding up the breeding process.
Collapse
|
5
|
Yang Y, La TC, Gillman JD, Lyu Z, Joshi T, Usovsky M, Song Q, Scaboo A. Linkage analysis and residual heterozygotes derived near isogenic lines reveals a novel protein quantitative trait loci from a Glycine soja accession. FRONTIERS IN PLANT SCIENCE 2022; 13:938100. [PMID: 35968122 PMCID: PMC9372550 DOI: 10.3389/fpls.2022.938100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Modern soybean [Glycine max (L.) Merr] cultivars have low overall genetic variation due to repeated bottleneck events that arose during domestication and from selection strategies typical of many soybean breeding programs. In both public and private soybean breeding programs, the introgression of wild soybean (Glycine soja Siebold and Zucc.) alleles is a viable option to increase genetic diversity and identify new sources for traits of value. The objectives of our study were to examine the genetic architecture responsible for seed protein and oil using a recombinant inbred line (RIL) population derived from hybridizing a G. max line ('Osage') with a G. soja accession (PI 593983). Linkage mapping identified a total of seven significant quantitative trait loci on chromosomes 14 and 20 for seed protein and on chromosome 8 for seed oil with LOD scores ranging from 5.3 to 31.7 for seed protein content and from 9.8 to 25.9 for seed oil content. We analyzed 3,015 single F4:9 soybean plants to develop two residual heterozygotes derived near isogenic lines (RHD-NIL) populations by targeting nine SNP markers from genotype-by-sequencing, which corresponded to two novel quantitative trait loci (QTL) derived from G. soja: one for a novel seed oil QTL on chromosome 8 and another for a novel protein QTL on chromosome 14. Single marker analysis and linkage analysis using 50 RHD-NILs validated the chromosome 14 protein QTL, and whole genome sequencing of RHD-NILs allowed us to reduce the QTL interval from ∼16.5 to ∼4.6 Mbp. We identified two genomic regions based on recombination events which had significant increases of 0.65 and 0.72% in seed protein content without a significant decrease in seed oil content. A new Kompetitive allele-specific polymerase chain reaction (KASP) assay, which will be useful for introgression of this trait into modern elite G. max cultivars, was developed in one region. Within the significantly associated genomic regions, a total of eight genes are considered as candidate genes, based on the presence of gene annotations associated with the protein or amino acid metabolism/movement. Our results provide better insights into utilizing wild soybean as a source of genetic diversity for soybean cultivar improvement utilizing native traits.
Collapse
Affiliation(s)
- Yia Yang
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Thang C. La
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Jason D. Gillman
- Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Columbia, MO, United States
| | - Zhen Lyu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| | - Trupti Joshi
- Department of Health Management and Informatics, MU Institute of Data Science and Informatics and Christopher S. Bond Life Science Center, University of Missouri, Columbia, MO, United States
| | - Mariola Usovsky
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Qijian Song
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture-Agricultural Research Service, Beltsville, MD, United States
| | - Andrew Scaboo
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| |
Collapse
|
6
|
Wang J, Sidharth S, Zeng S, Jiang Y, Chan YO, Lyu Z, McCubbin T, Mertz R, Sharp RE, Joshi T. Bioinformatics for plant and agricultural discoveries in the age of multiomics: A review and case study of maize nodal root growth under water deficit. PHYSIOLOGIA PLANTARUM 2022; 174:e13672. [PMID: 35297059 DOI: 10.1111/ppl.13672] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 03/03/2022] [Accepted: 03/11/2022] [Indexed: 06/14/2023]
Abstract
Advances in next-generation sequencing and other high-throughput technologies have facilitated multiomics research, such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, and phenomics. The resultant emerging multiomics data have brought new challenges as well as opportunities, as seen in the plant and agriculture science domains. We reviewed several bioinformatic and computational methods, models, and platforms, and we have highlighted some of our in-house developed efforts aimed at multiomics data analysis, integration, and management issues faced by the research community. A case study using multiomics datasets generated from our studies of maize nodal root growth under water deficit stress demonstrates the power of these datasets and some other publicly available tools. This analysis also sheds light on the landscape of such applied bioinformatic tools currently available for plant and crop science studies and introduces emerging trends and how they may affect the future.
Collapse
Affiliation(s)
- Juexin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
| | - Sen Sidharth
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Yuexu Jiang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
| | - Yen On Chan
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
| | - Zhen Lyu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Tyler McCubbin
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
| | - Rachel Mertz
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
- Division of Biological Sciences, University of Missouri, Columbia, Missouri, USA
| | - Robert E Sharp
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
- Department of Health Management and Informatics, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
7
|
Krassowski M, Das V, Sahu SK, Misra BB. State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Front Genet 2020; 11:610798. [PMID: 33362867 PMCID: PMC7758509 DOI: 10.3389/fgene.2020.610798] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 11/20/2020] [Indexed: 12/24/2022] Open
Abstract
Multi-omics, variously called integrated omics, pan-omics, and trans-omics, aims to combine two or more omics data sets to aid in data analysis, visualization and interpretation to determine the mechanism of a biological process. Multi-omics efforts have taken center stage in biomedical research leading to the development of new insights into biological events and processes. However, the mushrooming of a myriad of tools, datasets, and approaches tends to inundate the literature and overwhelm researchers new to the field. The aims of this review are to provide an overview of the current state of the field, inform on available reliable resources, discuss the application of statistics and machine/deep learning in multi-omics analyses, discuss findable, accessible, interoperable, reusable (FAIR) research, and point to best practices in benchmarking. Thus, we provide guidance to interested users of the domain by addressing challenges of the underlying biology, giving an overview of the available toolset, addressing common pitfalls, and acknowledging current methods' limitations. We conclude with practical advice and recommendations on software engineering and reproducibility practices to share a comprehensive awareness with new researchers in multi-omics for end-to-end workflow.
Collapse
Affiliation(s)
- Michal Krassowski
- Nuffield Department of Women’s & Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Vivek Das
- Novo Nordisk Research Center Seattle, Inc, Seattle, WA, United States
| | | | | |
Collapse
|
8
|
Jamil IN, Remali J, Azizan KA, Nor Muhammad NA, Arita M, Goh HH, Aizat WM. Systematic Multi-Omics Integration (MOI) Approach in Plant Systems Biology. FRONTIERS IN PLANT SCIENCE 2020; 11:944. [PMID: 32754171 PMCID: PMC7371031 DOI: 10.3389/fpls.2020.00944] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/10/2020] [Indexed: 05/03/2023]
Abstract
Across all facets of biology, the rapid progress in high-throughput data generation has enabled us to perform multi-omics systems biology research. Transcriptomics, proteomics, and metabolomics data can answer targeted biological questions regarding the expression of transcripts, proteins, and metabolites, independently, but a systematic multi-omics integration (MOI) can comprehensively assimilate, annotate, and model these large data sets. Previous MOI studies and reviews have detailed its usage and practicality on various organisms including human, animals, microbes, and plants. Plants are especially challenging due to large poorly annotated genomes, multi-organelles, and diverse secondary metabolites. Hence, constructive and methodological guidelines on how to perform MOI for plants are needed, particularly for researchers newly embarking on this topic. In this review, we thoroughly classify multi-omics studies on plants and verify workflows to ensure successful omics integration with accurate data representation. We also propose three levels of MOI, namely element-based (level 1), pathway-based (level 2), and mathematical-based integration (level 3). These MOI levels are described in relation to recent publications and tools, to highlight their practicality and function. The drawbacks and limitations of these MOI are also discussed for future improvement toward more amenable strategies in plant systems biology.
Collapse
Affiliation(s)
- Ili Nadhirah Jamil
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Juwairiah Remali
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Kamalrul Azlan Azizan
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Masanori Arita
- Bioinformation & DDBJ Center, National Institute of Genetics (NIG), Mishima, Japan
- Metabolome Informatics Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Hoe-Han Goh
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Wan Mohd Aizat
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| |
Collapse
|
9
|
Naba A, Ricard-Blum S. The Extracellular Matrix Goes -Omics: Resources and Tools. EXTRACELLULAR MATRIX OMICS 2020. [DOI: 10.1007/978-3-030-58330-9_1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|