1
|
Arifuzzaman M, Mamidi S, Sanz-Saez A, Zakeri H, Scaboo A, Fritschi FB. Identification of loci associated with water use efficiency and symbiotic nitrogen fixation in soybean. FRONTIERS IN PLANT SCIENCE 2023; 14:1271849. [PMID: 38034552 PMCID: PMC10687445 DOI: 10.3389/fpls.2023.1271849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 10/20/2023] [Indexed: 12/02/2023]
Abstract
Soybean (Glycine max) production is greatly affected by persistent and/or intermittent droughts in rainfed soybean-growing regions worldwide. Symbiotic N2 fixation (SNF) in soybean can also be significantly hampered even under moderate drought stress. The objective of this study was to identify genomic regions associated with shoot carbon isotope ratio (δ13C) as a surrogate measure for water use efficiency (WUE), nitrogen isotope ratio (δ15N) to assess relative SNF, N concentration ([N]), and carbon/nitrogen ratio (C/N). Genome-wide association mapping was performed with 105 genotypes and approximately 4 million single-nucleotide polymorphism markers derived from whole-genome resequencing information. A total of 11, 21, 22, and 22 genomic loci associated with δ13C, δ15N, [N], and C/N, respectively, were identified in two environments. Nine of these 76 loci were stable across environments, as they were detected in both environments. In addition to the 62 novel loci identified, 14 loci aligned with previously reported quantitative trait loci for different C and N traits related to drought, WUE, and N2 fixation in soybean. A total of 58 Glyma gene models encoding for different genes related to the four traits were identified in the vicinity of the genomic loci.
Collapse
Affiliation(s)
- Muhammad Arifuzzaman
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Sujan Mamidi
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, United States
| | - Alvaro Sanz-Saez
- Department of Crop, Soil and Environmental Sciences, Auburn University, Auburn, AL, United States
| | - Hossein Zakeri
- College of Agriculture, California State University-Chico, Chico, CA, United States
| | - Andrew Scaboo
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Felix B. Fritschi
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| |
Collapse
|
2
|
Chan YO, Biová J, Mahmood A, Dietz N, Bilyeu K, Škrabišová M, Joshi T. Genomic Variations Explorer (GenVarX): a toolset for annotating promoter and CNV regions using genotypic and phenotypic differences. Front Genet 2023; 14:1251382. [PMID: 37928239 PMCID: PMC10623549 DOI: 10.3389/fgene.2023.1251382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 09/27/2023] [Indexed: 11/07/2023] Open
Abstract
The rapid growth of sequencing technology and its increasing popularity in biology-related research over the years has made whole genome re-sequencing (WGRS) data become widely available. A large amount of WGRS data can unlock the knowledge gap between genomics and phenomics through gaining an understanding of the genomic variations that can lead to phenotype changes. These genomic variations are usually comprised of allele and structural changes in DNA, and these changes can affect the regulatory mechanisms causing changes in gene expression and altering the phenotypes of organisms. In this research work, we created the GenVarX toolset, that is backed by transcription factor binding sequence data in promoter regions, the copy number variations data, SNPs and Indels data, and phenotypes data which can potentially provide insights about phenotypic differences and solve compelling questions in plant research. Analytics-wise, we have developed strategies to better utilize the WGRS data and mine the data using efficient data processing scripts, libraries, tools, and frameworks to create the interactive and visualization-enhanced GenVarX toolset that encompasses both promoter regions and copy number variation analysis components. The main capabilities of the GenVarX toolset are to provide easy-to-use interfaces for users to perform queries, visualize data, and interact with the data. Based on different input windows on the user interface, users can provide inputs corresponding to each field and submit the information as a query. The data returned on the results page is usually displayed in a tabular fashion. In addition, interactive figures are also included in the toolset to facilitate the visualization of statistical results or tool outputs. Currently, the GenVarX toolset supports soybean, rice, and Arabidopsis. The researchers can access the soybean GenVarX toolset from SoyKB via https://soykb.org/SoybeanGenVarX/, rice GenVarX toolset, and Arabidopsis GenVarX toolset from KBCommons web portal with links https://kbcommons.org/system/tools/GenVarX/Osativa and https://kbcommons.org/system/tools/GenVarX/Athaliana, respectively.
Collapse
Affiliation(s)
- Yen On Chan
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, United States
| | - Jana Biová
- Department of Biochemistry, Faculty of Science, Palacky University in Olomouc, Olomouc, Czechia
| | - Anser Mahmood
- Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
| | - Nicholas Dietz
- Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
| | - Kristin Bilyeu
- Division of Plant Science and Technology, University of Missouri-Columbia, Columbia, MO, United States
- Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Columbia, MO, United States
| | - Mária Škrabišová
- Department of Biochemistry, Faculty of Science, Palacky University in Olomouc, Olomouc, Czechia
| | - Trupti Joshi
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, United States
- Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, United States
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, United States
- Department of Biomedical Informatics, Biostatistics and Medical Epidemiology, University of Missouri-Columbia, Columbia, MO, United States
| |
Collapse
|
3
|
Zhang Y, Calyam P, Joshi T, Nair S, Xu D. Domain-specific Topic Model for Knowledge Discovery in Computational and Data-Intensive Scientific Communities. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2023; 35:1402-1420. [PMID: 36798878 PMCID: PMC9928187 DOI: 10.1109/tkde.2021.3093350] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Shortened time to knowledge discovery and adapting prior domain knowledge is a challenge for computational and data-intensive communities such as e.g., bioinformatics and neuroscience. The challenge for a domain scientist lies in the actions to obtain guidance through query of massive information from diverse text corpus comprising of a wide-ranging set of topics when: investigating new methods, developing new tools, or integrating datasets. In this paper, we propose a novel "domain-specific topic model" (DSTM) to discover latent knowledge patterns about relationships among research topics, tools and datasets from exemplary scientific domains. Our DSTM is a generative model that extends the Latent Dirichlet Allocation (LDA) model and uses the Markov chain Monte Carlo (MCMC) algorithm to infer latent patterns within a specific domain in an unsupervised manner. We apply our DSTM to large collections of data from bioinformatics and neuroscience domains that include more than 25,000 of papers over the last ten years, featuring hundreds of tools and datasets that are commonly used in relevant studies. Evaluation experiments based on generalization and information retrieval metrics show that our model has better performance than the state-of-the-art baseline models for discovering highly-specific latent topics within a domain. Lastly, we demonstrate applications that benefit from our DSTM to discover intra-domain, cross-domain and trend knowledge patterns.
Collapse
Affiliation(s)
- Yuanxun Zhang
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, 65211
| | - Prasad Calyam
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, 65211
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, 65211
| | - Satish Nair
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, 65211
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, 65211
| |
Collapse
|
4
|
Yang Y, La TC, Gillman JD, Lyu Z, Joshi T, Usovsky M, Song Q, Scaboo A. Linkage analysis and residual heterozygotes derived near isogenic lines reveals a novel protein quantitative trait loci from a Glycine soja accession. FRONTIERS IN PLANT SCIENCE 2022; 13:938100. [PMID: 35968122 PMCID: PMC9372550 DOI: 10.3389/fpls.2022.938100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Modern soybean [Glycine max (L.) Merr] cultivars have low overall genetic variation due to repeated bottleneck events that arose during domestication and from selection strategies typical of many soybean breeding programs. In both public and private soybean breeding programs, the introgression of wild soybean (Glycine soja Siebold and Zucc.) alleles is a viable option to increase genetic diversity and identify new sources for traits of value. The objectives of our study were to examine the genetic architecture responsible for seed protein and oil using a recombinant inbred line (RIL) population derived from hybridizing a G. max line ('Osage') with a G. soja accession (PI 593983). Linkage mapping identified a total of seven significant quantitative trait loci on chromosomes 14 and 20 for seed protein and on chromosome 8 for seed oil with LOD scores ranging from 5.3 to 31.7 for seed protein content and from 9.8 to 25.9 for seed oil content. We analyzed 3,015 single F4:9 soybean plants to develop two residual heterozygotes derived near isogenic lines (RHD-NIL) populations by targeting nine SNP markers from genotype-by-sequencing, which corresponded to two novel quantitative trait loci (QTL) derived from G. soja: one for a novel seed oil QTL on chromosome 8 and another for a novel protein QTL on chromosome 14. Single marker analysis and linkage analysis using 50 RHD-NILs validated the chromosome 14 protein QTL, and whole genome sequencing of RHD-NILs allowed us to reduce the QTL interval from ∼16.5 to ∼4.6 Mbp. We identified two genomic regions based on recombination events which had significant increases of 0.65 and 0.72% in seed protein content without a significant decrease in seed oil content. A new Kompetitive allele-specific polymerase chain reaction (KASP) assay, which will be useful for introgression of this trait into modern elite G. max cultivars, was developed in one region. Within the significantly associated genomic regions, a total of eight genes are considered as candidate genes, based on the presence of gene annotations associated with the protein or amino acid metabolism/movement. Our results provide better insights into utilizing wild soybean as a source of genetic diversity for soybean cultivar improvement utilizing native traits.
Collapse
Affiliation(s)
- Yia Yang
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Thang C. La
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Jason D. Gillman
- Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Columbia, MO, United States
| | - Zhen Lyu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| | - Trupti Joshi
- Department of Health Management and Informatics, MU Institute of Data Science and Informatics and Christopher S. Bond Life Science Center, University of Missouri, Columbia, MO, United States
| | - Mariola Usovsky
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Qijian Song
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture-Agricultural Research Service, Beltsville, MD, United States
| | - Andrew Scaboo
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| |
Collapse
|
5
|
Škrabišová M, Dietz N, Zeng S, Chan YO, Wang J, Liu Y, Biová J, Joshi T, Bilyeu KD. A novel Synthetic phenotype association study approach reveals the landscape of association for genomic variants and phenotypes. J Adv Res 2022; 42:117-133. [PMID: 36513408 PMCID: PMC9788956 DOI: 10.1016/j.jare.2022.04.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 02/14/2022] [Accepted: 04/08/2022] [Indexed: 12/27/2022] Open
Abstract
INTRODUCTION Genome-Wide Association Studies (GWAS) identify tagging variants in the genome that are statistically associated with the phenotype because of their linkage disequilibrium (LD) relationship with the causative mutation (CM). When both low-density genotyped accession panels with phenotypes and resequenced data accession panels are available, tagging variants can assist with post-GWAS challenges in CM discovery. OBJECTIVES Our objective was to identify additional GWAS evaluation criteria to assess correspondence between genomic variants and phenotypes, as well as enable deeper analysis of the localized landscape of association. METHODS We used genomic variant positions as Synthetic phenotypes in GWAS that we named "Synthetic phenotype association study" (SPAS). The extreme case of SPAS is what we call an "Inverse GWAS" where we used CM positions of cloned soybean genes. We developed and validated the Accuracy concept as a measure of the correspondence between variant positions and phenotypes. RESULTS The SPAS approach demonstrated that the genotype status of an associated variant used as a Synthetic phenotype enabled us to explore the relationships between tagging variants and CMs, and further, that utilizing CMs as Synthetic phenotypes in Inverse GWAS illuminated the landscape of association. We implemented the Accuracy calculation for a curated accession panel to an online Accuracy calculation tool (AccuTool) as a resource for gene identification in soybean. We demonstrated our concepts on three examples of soybean cloned genes. As a result of our findings, we devised an enhanced "GWAS to Genes" analysis (Synthetic phenotype to CM strategy, SP2CM). Using SP2CM, we identified a CM for a novel gene. CONCLUSION The SP2CM strategy utilizing Synthetic phenotypes and the Accuracy calculation of correspondence provides crucial information to assist researchers in CM discovery. The impact of this work is a more effective evaluation of landscapes of GWAS associations.
Collapse
Affiliation(s)
- Mária Škrabišová
- Department of Biochemistry, Faculty of Science, Palacky University Olomouc, Olomouc 78371, Czech Republic
| | - Nicholas Dietz
- Division of Plant Sciences, University of Missouri, Columbia, MO 65201, USA
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65212, USA,Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA
| | - Yen On Chan
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA,MU Data Science and Informatics Institute, University of Missouri, Columbia, MO 65212, USA
| | - Juexin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65212, USA,Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA
| | - Yang Liu
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA,MU Data Science and Informatics Institute, University of Missouri, Columbia, MO 65212, USA
| | - Jana Biová
- Department of Biochemistry, Faculty of Science, Palacky University Olomouc, Olomouc 78371, Czech Republic
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65212, USA,Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65212, USA,MU Data Science and Informatics Institute, University of Missouri, Columbia, MO 65212, USA,Department of Health Management and Informatics, School of Medicine, University of Missouri, Columbia, MO 65212, USA,Corresponding authors at: Department of Health Management and Informatics, School of Medicine, 1201 E Rollins St, 271B Life Science Center, Columbia, MO 65201, USA (T. Joshi). Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, 110 Waters Hall, University of Missouri, Columbia, MO 65211, USA (K.D. Bilyeu).
| | - Kristin D. Bilyeu
- Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, University of Missouri, Columbia, MO 65211, USA,Corresponding authors at: Department of Health Management and Informatics, School of Medicine, 1201 E Rollins St, 271B Life Science Center, Columbia, MO 65201, USA (T. Joshi). Plant Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, 110 Waters Hall, University of Missouri, Columbia, MO 65211, USA (K.D. Bilyeu).
| |
Collapse
|
6
|
Wang J, Sidharth S, Zeng S, Jiang Y, Chan YO, Lyu Z, McCubbin T, Mertz R, Sharp RE, Joshi T. Bioinformatics for plant and agricultural discoveries in the age of multiomics: A review and case study of maize nodal root growth under water deficit. PHYSIOLOGIA PLANTARUM 2022; 174:e13672. [PMID: 35297059 DOI: 10.1111/ppl.13672] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 03/03/2022] [Accepted: 03/11/2022] [Indexed: 06/14/2023]
Abstract
Advances in next-generation sequencing and other high-throughput technologies have facilitated multiomics research, such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, and phenomics. The resultant emerging multiomics data have brought new challenges as well as opportunities, as seen in the plant and agriculture science domains. We reviewed several bioinformatic and computational methods, models, and platforms, and we have highlighted some of our in-house developed efforts aimed at multiomics data analysis, integration, and management issues faced by the research community. A case study using multiomics datasets generated from our studies of maize nodal root growth under water deficit stress demonstrates the power of these datasets and some other publicly available tools. This analysis also sheds light on the landscape of such applied bioinformatic tools currently available for plant and crop science studies and introduces emerging trends and how they may affect the future.
Collapse
Affiliation(s)
- Juexin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
| | - Sen Sidharth
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Yuexu Jiang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
| | - Yen On Chan
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
| | - Zhen Lyu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Tyler McCubbin
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
| | - Rachel Mertz
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
- Division of Biological Sciences, University of Missouri, Columbia, Missouri, USA
| | - Robert E Sharp
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
- Division of Plant Science and Technology, University of Missouri, Columbia, Missouri, USA
- Interdisciplinary Plant Group, University of Missouri, Columbia, Missouri, USA
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
- Department of Health Management and Informatics, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
7
|
Pavlovikj N, Gomes-Neto JC, Deogun JS, Benson AK. ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses. PeerJ 2021; 9:e11376. [PMID: 34055480 PMCID: PMC8142932 DOI: 10.7717/peerj.11376] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 04/08/2021] [Indexed: 12/28/2022] Open
Abstract
Whole Genome Sequence (WGS) data from bacterial species is used for a variety of applications ranging from basic microbiological research, diagnostics, and epidemiological surveillance. The availability of WGS data from hundreds of thousands of individual isolates of individual microbial species poses a tremendous opportunity for discovery and hypothesis-generating research into ecology and evolution of these microorganisms. Flexibility, scalability, and user-friendliness of existing pipelines for population-scale inquiry, however, limit applications of systematic, population-scale approaches. Here, we present ProkEvo, an automated, scalable, reproducible, and open-source framework for bacterial population genomics analyses using WGS data. ProkEvo was specifically developed to achieve the following goals: (1) Automation and scaling of complex combinations of computational analyses for many thousands of bacterial genomes from inputs of raw Illumina paired-end sequence reads; (2) Use of workflow management systems (WMS) such as Pegasus WMS to ensure reproducibility, scalability, modularity, fault-tolerance, and robust file management throughout the process; (3) Use of high-performance and high-throughput computational platforms; (4) Generation of hierarchical-based population structure analysis based on combinations of multi-locus and Bayesian statistical approaches for classification for ecological and epidemiological inquiries; (5) Association of antimicrobial resistance (AMR) genes, putative virulence factors, and plasmids from curated databases with the hierarchically-related genotypic classifications; and (6) Production of pan-genome annotations and data compilation that can be utilized for downstream analysis such as identification of population-specific genomic signatures. The scalability of ProkEvo was measured with two datasets comprising significantly different numbers of input genomes (one with ~2,400 genomes, and the second with ~23,000 genomes). Depending on the dataset and the computational platform used, the running time of ProkEvo varied from ~3-26 days. ProkEvo can be used with virtually any bacterial species, and the Pegasus WMS uniquely facilitates addition or removal of programs from the workflow or modification of options within them. To demonstrate versatility of the ProkEvo platform, we performed a hierarchical-based population structure analyses from available genomes of three distinct pathogenic bacterial species as individual case studies. The specific case studies illustrate how hierarchical analyses of population structures, genotype frequencies, and distribution of specific gene functions can be integrated into an analysis. Collectively, our study shows that ProkEvo presents a practical viable option for scalable, automated analyses of bacterial populations with direct applications for basic microbiology research, clinical microbiological diagnostics, and epidemiological surveillance.
Collapse
Affiliation(s)
- Natasha Pavlovikj
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Joao Carlos Gomes-Neto
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America.,Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Jitender S Deogun
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Andrew K Benson
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America.,Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| |
Collapse
|
8
|
Valliyodan B, Brown AV, Wang J, Patil G, Liu Y, Otyama PI, Nelson RT, Vuong T, Song Q, Musket TA, Wagner R, Marri P, Reddy S, Sessions A, Wu X, Grant D, Bayer PE, Roorkiwal M, Varshney RK, Liu X, Edwards D, Xu D, Joshi T, Cannon SB, Nguyen HT. Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing. Sci Data 2021; 8:50. [PMID: 33558550 PMCID: PMC7870887 DOI: 10.1038/s41597-021-00834-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 01/06/2021] [Indexed: 12/28/2022] Open
Abstract
We report characteristics of soybean genetic diversity and structure from the resequencing of 481 diverse soybean accessions, comprising 52 wild (Glycine soja) selections and 429 cultivated (Glycine max) varieties (landraces and elites). This data was used to identify 7.8 million SNPs, to predict SNP effects relative to genic regions, and to identify the genetic structure, relationships, and linkage disequilibrium. We found evidence of distinct, mostly independent selection of lineages by particular geographic location. Among cultivated varieties, we identified numerous highly conserved regions, suggesting selection during domestication. Comparisons of these accessions against the whole U.S. germplasm genotyped with the SoySNP50K iSelect BeadChip revealed that over 95% of the re-sequenced accessions have a high similarity to their SoySNP50K counterparts. Probable errors in seed source or genotype tracking were also identified in approximately 5% of the accessions.
Collapse
Affiliation(s)
- Babu Valliyodan
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO, 65211, USA
- Department of Agriculture and Environmental Sciences, Lincoln University, Jefferson City, MO, 65101, USA
| | - Anne V Brown
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA, 50011, USA
| | - Juexin Wang
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Gunvant Patil
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO, 65211, USA
- Institute of Genomics for Crop Abiotic Stress Tolerance, Department of Plant and Soil Science, Texas Tech University, Lubbock, TX, 79409, USA
| | - Yang Liu
- MU Institute of Data Science and Informatics, University of Missouri, Columbia, MO, 65211, USA
| | - Paul I Otyama
- Department of Agronomy, Iowa State University, Ames, IA, 50011, USA
| | - Rex T Nelson
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA, 50011, USA
| | - Tri Vuong
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO, 65211, USA
| | - Qijian Song
- USDA-ARS, Soybean Genomics and Improvement Lab, Beltsville, MD, 20705, USA
| | - Theresa A Musket
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO, 65211, USA
| | - Ruth Wagner
- Bayer CropScience, St. Louis, MO, 63141, USA
| | - Pradeep Marri
- Corteva Agriscience, Indianapolis, IN, 46268, USA
- Pairwise Plants LLC, Durham, NC, 27709, USA
| | - Sam Reddy
- Corteva Agriscience, Indianapolis, IN, 46268, USA
| | - Allen Sessions
- Bayer CropScience, Research Triangle Park, NC, 27709, USA
| | - Xiaolei Wu
- Bayer CropScience, Research Triangle Park, NC, 27709, USA
| | - David Grant
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA, 50011, USA
- Department of Agronomy, Iowa State University, Ames, IA, 50011, USA
| | - Philipp E Bayer
- School of Biological Sciences, The University of Western Australia, Perth, WA, 6009, Australia
| | - Manish Roorkiwal
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, Telangana, 502324, India
| | - Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, Telangana, 502324, India
| | - Xin Liu
- Beijing Genomics Institute-Shenzhen, Shenzhen, 518083, China
- State Key Laboratory of Agricultural Genomics, China National GeneBank, BGI-Shenzhen, Shenzhen, 518083, China
| | - David Edwards
- School of Biological Sciences, The University of Western Australia, Perth, WA, 6009, Australia
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
- MU Institute of Data Science and Informatics, University of Missouri, Columbia, MO, 65211, USA
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
- MU Institute of Data Science and Informatics, University of Missouri, Columbia, MO, 65211, USA
- Department of Health Management and Informatics, University of Missouri, Columbia, MO, 65211, USA
| | - Steven B Cannon
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA, 50011, USA
| | - Henry T Nguyen
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
9
|
Zeng S, Lyu Z, Narisetti SRK, Xu D, Joshi T. Knowledge Base Commons (KBCommons) v1.1: a universal framework for multi-omics data integration and biological discoveries. BMC Genomics 2019; 20:947. [PMID: 31856718 PMCID: PMC6923931 DOI: 10.1186/s12864-019-6287-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Knowledge Base Commons (KBCommons) v1.1 is a universal and all-inclusive web-based framework providing generic functionalities for storing, sharing, analyzing, exploring, integrating and visualizing multiple organisms' genomics and integrative omics data. KBCommons is designed and developed to integrate diverse multi-level omics data and to support biological discoveries for all species via a common platform. METHODS KBCommons has four modules including data storage, data processing, data accessing, and web interface for data management and retrieval. It provides a comprehensive framework for new plant-specific, animal-specific, virus-specific, bacteria-specific or human disease-specific knowledge base (KB) creation, for adding new genome versions and additional multi-omics data to existing KBs, and for exploring existing datasets within current KBs. RESULTS KBCommons has an array of tools for data visualization and data analytics such as multiple gene/metabolite search, gene family/Pfam/Panther function annotation search, miRNA/metabolite/trait/SNP search, differential gene expression analysis, and bulk data download capacity. It contains a highly reliable data privilege management system to make users' data publicly available easily and to share private or pre-publication data with members in their collaborative groups safely and securely. It allows users to conduct data analysis using our in-house developed workflow functionalities that are linked to XSEDE high performance computing resources. Using KBCommons' intuitive web interface, users can easily retrieve genomic data, multi-omics data and analysis results from workflow according to their requirements and interests. CONCLUSIONS KBCommons addresses the needs of many diverse research communities to have a comprehensive multi-level OMICS web resource for data retrieval, sharing, analysis and visualization. KBCommons can be publicly accessed through a dedicated link for all organisms at http://kbcommons.org/.
Collapse
Affiliation(s)
- Shuai Zeng
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO USA
- Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO USA
| | - Zhen Lyu
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO USA
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO USA
| | - Siva Ratna Kumari Narisetti
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO USA
- Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO USA
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO USA
| | - Trupti Joshi
- Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO USA
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO USA
- Department of Health Management, Informatics University of Missouri-Columbia, Columbia, MO USA
| |
Collapse
|
10
|
Tran TM, McCubbin TJ, Bihmidine S, Julius BT, Baker RF, Schauflinger M, Weil C, Springer N, Chomet P, Wagner R, Woessner J, Grote K, Peevers J, Slewinski TL, Braun DM. Maize Carbohydrate Partitioning Defective33 Encodes an MCTP Protein and Functions in Sucrose Export from Leaves. MOLECULAR PLANT 2019; 12:1278-1293. [PMID: 31102785 DOI: 10.1016/j.molp.2019.05.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 04/09/2019] [Accepted: 05/03/2019] [Indexed: 05/29/2023]
Abstract
To sustain plant growth, development, and crop yield, sucrose must be transported from leaves to distant parts of the plant, such as seeds and roots. To identify genes that regulate sucrose accumulation and transport in maize (Zea mays), we isolated carbohydrate partitioning defective33 (cpd33), a recessive mutant that accumulated excess starch and soluble sugars in mature leaves. The cpd33 mutants also exhibited chlorosis in the leaf blades, greatly diminished plant growth, and reduced fertility. Cpd33 encodes a protein containing multiple C2 domains and transmembrane regions. Subcellular localization experiments showed the CPD33 protein localized to plasmodesmata (PD), the plasma membrane, and the endoplasmic reticulum. We also found that a loss-of-function mutant of the CPD33 homolog in Arabidopsis, QUIRKY, had a similar carbohydrate hyperaccumulation phenotype. Radioactively labeled sucrose transport assays showed that sucrose export was significantly lower in cpd33 mutant leaves relative to wild-type leaves. However, PD transport in the adaxial-abaxial direction was unaffected in cpd33 mutant leaves. Intriguingly, transmission electron microscopy revealed fewer PD at the companion cell-sieve element interface in mutant phloem tissue, providing a possible explanation for the reduced sucrose export in mutant leaves. Collectively, our results suggest that CPD33 functions to promote symplastic transport into sieve elements.
Collapse
Affiliation(s)
- Thu M Tran
- Division of Biological Sciences, Interdisciplinary Plant Group, Missouri Maize Center, University of Missouri, Columbia, MO 65211, USA; Division of Plant Sciences, University of Missouri, Columbia, MO 65211, USA; National Key Laboratory for Plant Cell Technology, Agricultural Genetics Institute, Hanoi, Vietnam
| | - Tyler J McCubbin
- Division of Biological Sciences, Interdisciplinary Plant Group, Missouri Maize Center, University of Missouri, Columbia, MO 65211, USA; Division of Plant Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Saadia Bihmidine
- Division of Biological Sciences, Interdisciplinary Plant Group, Missouri Maize Center, University of Missouri, Columbia, MO 65211, USA
| | - Benjamin T Julius
- Division of Biological Sciences, Interdisciplinary Plant Group, Missouri Maize Center, University of Missouri, Columbia, MO 65211, USA
| | - R Frank Baker
- Division of Biological Sciences, Interdisciplinary Plant Group, Missouri Maize Center, University of Missouri, Columbia, MO 65211, USA
| | - Martin Schauflinger
- Electron Microscopy Core Facility, University of Missouri, Columbia, MO 65211, USA
| | - Clifford Weil
- Department of Agronomy, Purdue University, West Lafayette, IN 47907, USA
| | - Nathan Springer
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN 55108, USA
| | - Paul Chomet
- NRGene Inc., 8910 University Center Lane, ∖r∖nSuite 400, San Diego, CA 92122, USA
| | - Ruth Wagner
- Bayer Crop Science, Chesterfield, MO 63017, USA
| | | | - Karen Grote
- Bayer Crop Science, Chesterfield, MO 63017, USA
| | | | | | - David M Braun
- Division of Biological Sciences, Interdisciplinary Plant Group, Missouri Maize Center, University of Missouri, Columbia, MO 65211, USA.
| |
Collapse
|
11
|
Deelman E, Vahi K, Rynge M, Mayani R, da Silva RF, Papadimitriou G, Livny M. The Evolution of the Pegasus Workflow Management Software. Comput Sci Eng 2019. [DOI: 10.1109/mcse.2019.2919690] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Ewa Deelman
- Information Sciences InstituteUniversity of Southern California
| | - Karan Vahi
- Information Sciences InstituteUniversity of Southern California
| | - Mats Rynge
- Information Sciences InstituteUniversity of Southern California
| | - Rajiv Mayani
- Information Sciences InstituteUniversity of Southern California
| | | | | | | |
Collapse
|
12
|
Zhang Y, Calyam P, Debroy S, Nuguri SS. Social Plane for Recommenders in Network Performance Expectation Management. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2018. [DOI: 10.1109/tnsm.2017.2772905] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
13
|
Wren JD, Toby I, Hong H, Nanduri B, Kaundal R, Dozmorov MG, Thakkar S. Proceedings of the 2016 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 2016; 17:356. [PMID: 27766933 PMCID: PMC5073803 DOI: 10.1186/s12859-016-1213-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
- Jonathan D Wren
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, 825 N.E. 13th Street, Oklahoma City, OK, 73104-5005, USA. .,Biochemistry and Molecular Biology Department, University of Oklahoma Health Sciences Center, Oklahoma City, USA. .,Stephenson Cancer Center, University of Oklahoma Health Sciences Center, Oklahoma City, USA. .,Department of Geriatric Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, USA.
| | - Inimary Toby
- Department of Clinical Sciences, UT Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX, 75390-9066, USA
| | - Huxiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Bindu Nanduri
- Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi, MS, USA
| | - Rakesh Kaundal
- Bioinformatics Facility, Institute for Integrative Genome Biology, University of California, Riverside, California, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Richmond Academy of Medicine, Virginia Commonwealth University, Virginia, USA
| | - Shraddha Thakkar
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| |
Collapse
|