1
|
Hadish JA, Hargarten HL, Zhang H, Mattheis JP, Honaas LA, Ficklin SP. Towards identification of postharvest fruit quality transcriptomic markers in Malus domestica. PLoS One 2024; 19:e0297015. [PMID: 38446822 PMCID: PMC10917293 DOI: 10.1371/journal.pone.0297015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 12/27/2023] [Indexed: 03/08/2024] Open
Abstract
Gene expression is highly impacted by the environment and can be reflective of past events that affected developmental processes. It is therefore expected that gene expression can serve as a signal of a current or future phenotypic traits. In this paper we identify sets of genes, which we call Prognostic Transcriptomic Biomarkers (PTBs), that can predict firmness in Malus domestica (apple) fruits. In apples, all individuals of a cultivar are clones, and differences in fruit quality are due to the environment. The apples transcriptome responds to these differences in environment, which makes PTBs an attractive predictor of future fruit quality. PTBs have the potential to enhance supply chain efficiency, reduce crop loss, and provide higher and more consistent quality for consumers. However, several questions must be addressed. In this paper we answer the question of which of two common modeling approaches, Random Forest or ElasticNet, outperforms the other. We answer if PTBs with few genes are efficient at predicting traits. This is important because we need few genes to perform qPCR, and we answer the question if qPCR is a cost-effective assay as input for PTBs modeled using high-throughput RNA-seq. To do this, we conducted a pilot study using fruit texture in the 'Gala' variety of apples across several postharvest storage regiments. Fruit texture in 'Gala' apples is highly controllable by post-harvest treatments and is therefore a good candidate to explore the use of PTBs. We find that the RandomForest model is more consistent than an ElasticNet model and is predictive of firmness (r2 = 0.78) with as few as 15 genes. We also show that qPCR is reasonably consistent with RNA-seq in a follow up experiment. Results are promising for PTBs, yet more work is needed to ensure that PTBs are robust across various environmental conditions and storage treatments.
Collapse
Affiliation(s)
- John A. Hadish
- Molecular Plant Science Department, Washington State University, Pullman, Washington, United States of America
- Department of Horticulture, Washington State University, Pullman, Washington, United States of America
| | - Heidi L. Hargarten
- USDA Agricultural Research Service Physiology and Pathology of Tree Fruits Research, Wenatchee, Washington, United States of America
| | - Huiting Zhang
- Department of Horticulture, Washington State University, Pullman, Washington, United States of America
| | - James P. Mattheis
- USDA Agricultural Research Service Physiology and Pathology of Tree Fruits Research, Wenatchee, Washington, United States of America
| | - Loren A. Honaas
- USDA Agricultural Research Service Physiology and Pathology of Tree Fruits Research, Wenatchee, Washington, United States of America
| | - Stephen P. Ficklin
- Molecular Plant Science Department, Washington State University, Pullman, Washington, United States of America
- Department of Horticulture, Washington State University, Pullman, Washington, United States of America
| |
Collapse
|
2
|
Hang Y, Burns J, Shealy BT, Pauly R, Ficklin SP, Feltus FA. Identification of condition-specific regulatory mechanisms in normal and cancerous human lung tissue. BMC Genomics 2022; 23:350. [PMID: 35524179 PMCID: PMC9077899 DOI: 10.1186/s12864-022-08591-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 04/25/2022] [Indexed: 12/24/2022] Open
Abstract
Background Lung cancer is the leading cause of cancer death in both men and women. The most common lung cancer subtype is non-small cell lung carcinoma (NSCLC) comprising about 85% of all cases. NSCLC can be further divided into three subtypes: adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), and large cell lung carcinoma. Specific genetic mutations and epigenetic aberrations play an important role in the developmental transition to a specific tumor subtype. The elucidation of normal lung versus lung tumor gene expression patterns and regulatory targets yields biomarker systems that discriminate lung phenotypes (i.e., biomarkers) and provide a foundation for the discovery of normal and aberrant gene regulatory mechanisms. Results We built condition-specific gene co-expression networks (csGCNs) for normal lung, LUAD, and LUSC conditions. Then, we integrated normal lung tissue-specific gene regulatory networks (tsGRNs) to elucidate control-target biomarker systems for normal and cancerous lung tissue. We characterized co-expressed gene edges, possibly under common regulatory control, for relevance in lung cancer. Conclusions Our approach demonstrates the ability to elucidate csGCN:tsGRN merged biomarker systems based on gene expression correlation and regulation. The biomarker systems we describe can be used to classify and further describe lung specimens. Our approach is generalizable and can be used to discover and interpret complex gene expression patterns for any condition or species. Supplementary Information The online version contains available at 10.1186/s12864-022-08591-9.
Collapse
Affiliation(s)
- Yuqing Hang
- Department of Genetics & Biochemistry, Clemson University, Clemson, 29634, USA
| | - Josh Burns
- Department of Horticulture, Washington State University, Pullman, 99164, USA
| | - Benjamin T Shealy
- Department of Electrical and Computer Engineering, Clemson University, Clemson, 29634, USA
| | - Rini Pauly
- Biomedical Data Science and Informatics Program, Clemson University, Clemson, 29634, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, 99164, USA
| | - Frank A Feltus
- Department of Genetics & Biochemistry, Clemson University, Clemson, 29634, USA. .,Biomedical Data Science and Informatics Program, Clemson University, Clemson, 29634, USA. .,Center for Human Genetics, Clemson University, Clemson, 29634, USA. .,Biosystems Research Complex, 302C, 105 Collings St, Clemson, SC, 29634, USA.
| |
Collapse
|
3
|
Burns JJR, Shealy BT, Greer MS, Hadish JA, McGowan MT, Biggs T, Smith MC, Feltus FA, Ficklin SP. Addressing noise in co-expression network construction. Brief Bioinform 2021; 23:6446269. [PMID: 34850822 PMCID: PMC8769892 DOI: 10.1093/bib/bbab495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 11/13/2022] Open
Abstract
Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.
Collapse
Affiliation(s)
- Joshua J R Burns
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Benjamin T Shealy
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - Mitchell S Greer
- School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| | - John A Hadish
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Matthew T McGowan
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Tyler Biggs
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Melissa C Smith
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - F Alex Feltus
- Department of Genetics and Biochemistry, 130 McGinty Court. Clemson University, Clemson, SC 29634. USA.,Biomedical Data Science & Informatics Program, 100 McAdams Hall. Clemson University, Clemson, SC 29634. USA.,Clemson Center for Human Genetics, 114 Gregor Mendel Circle, Greenwood, SC 29646. USA
| | - Stephen P Ficklin
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA.,School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| |
Collapse
|
4
|
Spoor S, Wytko C, Soto B, Chen M, Almsaeed A, Condon B, Herndon N, Hough H, Jung S, Staton M, Wegrzyn J, Main D, Feltus FA, Ficklin SP. Tripal and Galaxy: supporting reproducible scientific workflows for community biological databases. Database (Oxford) 2021; 2020:5866148. [PMID: 32621602 PMCID: PMC7334887 DOI: 10.1093/database/baaa032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 02/25/2020] [Accepted: 03/31/2020] [Indexed: 12/12/2022]
Abstract
Online biological databases housing genomics, genetic and breeding data can be constructed using the Tripal toolkit. Tripal is an open-source, internationally developed framework that implements FAIR data principles and is meant to ease the burden of constructing such websites for research communities. Use of a common, open framework improves the sustainability and manageability of such as site. Site developers can create extensions for their site and in turn share those extensions with others. One challenge that community databases often face is the need to provide tools for their users that analyze increasingly larger datasets using multiple software tools strung together in a scientific workflow on complicated computational resources. The Tripal Galaxy module, a ‘plug-in’ for Tripal, meets this need through integration of Tripal with the Galaxy Project workflow management system. Site developers can create workflows appropriate to the needs of their community using Galaxy and then share those for execution on their Tripal sites via automatically constructed, but configurable, web forms or using an application programming interface to power web-based analytical applications. The Tripal Galaxy module helps reduce duplication of effort by allowing site developers to spend time constructing workflows and building their applications rather than rebuilding infrastructure for job management of multi-step applications.
Collapse
Affiliation(s)
- Shawna Spoor
- Dept of Horticulture, Washington State University, 149 Johnson Hall 646414, Pullman, WA 99164-6414, USA
| | - Connor Wytko
- Dept of Horticulture, Washington State University, 149 Johnson Hall 646414, Pullman, WA 99164-6414, USA
| | - Brian Soto
- Dept of Horticulture, Washington State University, 149 Johnson Hall 646414, Pullman, WA 99164-6414, USA
| | - Ming Chen
- Entomology and Plant Pathology, University of Tennessee, 2505, 370 E J. Chapman Dr Plant Biotechnology Building, Knoxville, TN 37996, USA
| | - Abdullah Almsaeed
- Entomology and Plant Pathology, University of Tennessee, 2505, 370 E J. Chapman Dr Plant Biotechnology Building, Knoxville, TN 37996, USA
| | - Bradford Condon
- Entomology and Plant Pathology, University of Tennessee, 2505, 370 E J. Chapman Dr Plant Biotechnology Building, Knoxville, TN 37996, USA
| | - Nic Herndon
- Dept of Computer Science, East Carolina University, College of Engineering and Technology East 5th Street Greenville, NC 27858-4353, USA
| | - Heidi Hough
- Dept of Horticulture, Washington State University, 149 Johnson Hall 646414, Pullman, WA 99164-6414, USA
| | - Sook Jung
- Dept of Horticulture, Washington State University, 149 Johnson Hall 646414, Pullman, WA 99164-6414, USA
| | - Meg Staton
- Entomology and Plant Pathology, University of Tennessee, 2505, 370 E J. Chapman Dr Plant Biotechnology Building, Knoxville, TN 37996, USA
| | - Jill Wegrzyn
- Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043 Storrs, CT 06269-3043, USA
| | - Dorrie Main
- Dept of Horticulture, Washington State University, 149 Johnson Hall 646414, Pullman, WA 99164-6414, USA
| | - F Alex Feltus
- Dept of Genetics and Biochemistry, Clemson University, 154 Poole Agricultural Center Clemson, SC 29634, USA
| | - Stephen P Ficklin
- Dept of Horticulture, Washington State University, 149 Johnson Hall 646414, Pullman, WA 99164-6414, USA
| |
Collapse
|
5
|
Honaas L, Hargarten H, Hadish J, Ficklin SP, Serra S, Musacchi S, Wafula E, Mattheis J, dePamphilis CW, Rudell D. Transcriptomics of Differential Ripening in 'd'Anjou' Pear ( Pyrus communis L.). Front Plant Sci 2021; 12:609684. [PMID: 34220875 PMCID: PMC8243007 DOI: 10.3389/fpls.2021.609684] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 04/06/2021] [Indexed: 06/13/2023]
Abstract
Estimating maturity in pome fruits is a critical task that directs virtually all postharvest supply chain decisions. This is especially important for European pear (Pyrus communis) cultivars because losses due to spoilage and senescence must be minimized while ensuring proper ripening capacity is achieved (in part by satisfying a fruit chilling requirement). Reliable methods are lacking for accurate estimation of pear fruit maturity, and because ripening is maturity dependent it makes predicting ripening capacity a challenge. In this study of the European pear cultivar 'd'Anjou', we sorted fruit at harvest based upon on-tree fruit position to build contrasts of maturity. Our sorting scheme showed clear contrasts of maturity between canopy positions, yet there was substantial overlap in the distribution of values for the index of absorbance difference (I AD ), a non-destructive spectroscopic measurement that has been used as a proxy for pome fruit maturity. This presented an opportunity to explore a contrast of maturity that was more subtle than I AD could differentiate, and thus guided our subsequent transcriptome analysis of tissue samples taken at harvest and during storage. Using a novel approach that tests for condition-specific differences of co-expressed genes, we discovered genes with a phased character that mirrored our sorting scheme. The expression patterns of these genes are associated with fruit quality and ripening differences across the experiment. Functional profiles of these co-expressed genes are concordant with previous findings, and also offer new clues, and thus hypotheses, about genes involved in pear fruit quality, maturity, and ripening. This work may lead to new tools for enhanced postharvest management based on activity of gene co-expression modules, rather than individual genes. Further, our results indicate that modules may have utility within specific windows of time during postharvest management of 'd'Anjou' pear.
Collapse
Affiliation(s)
- Loren Honaas
- USDA, ARS, Tree Fruit Research Laboratory, Wenatchee, WA, United States
| | - Heidi Hargarten
- USDA, ARS, Tree Fruit Research Laboratory, Wenatchee, WA, United States
| | - John Hadish
- Molecular Plant Sciences, Washington State University, Pullman, WA, United States
| | - Stephen P. Ficklin
- Molecular Plant Sciences, Washington State University, Pullman, WA, United States
- Department of Horticulture, Washington State University, Pullman, WA, United States
| | - Sara Serra
- Department of Horticulture, Washington State University, Pullman, WA, United States
- Tree Fruit Research and Extension Center, Washington State University, Wenatchee, WA, United States
| | - Stefano Musacchi
- Department of Horticulture, Washington State University, Pullman, WA, United States
- Tree Fruit Research and Extension Center, Washington State University, Wenatchee, WA, United States
| | - Eric Wafula
- Department of Biology, The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, United States
| | - James Mattheis
- USDA, ARS, Tree Fruit Research Laboratory, Wenatchee, WA, United States
| | - Claude W. dePamphilis
- Department of Biology, The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, United States
| | - David Rudell
- USDA, ARS, Tree Fruit Research Laboratory, Wenatchee, WA, United States
| |
Collapse
|
6
|
McGowan MT, Zhang Z, Ficklin SP. Chromosomal characteristics of salt stress heritable gene expression in the rice genome. BMC Genom Data 2021; 22:17. [PMID: 34044788 PMCID: PMC8162008 DOI: 10.1186/s12863-021-00970-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene expression is potentially an important heritable quantitative trait that mediates between genetic variation and higher-level complex phenotypes through time and condition-dependent regulatory interactions. Therefore, we sought to explore both the genomic and condition-specific characteristics of gene expression heritability within the context of chromosomal structure. RESULTS Heritability was estimated for biological gene expression using a diverse, 84-line, Oryza sativa (rice) population under optimal and salt-stressed conditions. Overall, 5936 genes were found to have heritable expression regardless of condition and 1377 genes were found to have heritable expression only during salt stress. These genes with salt-specific heritable expression are enriched for functional terms associated with response to stimulus and transcription factor activity. Additionally, we discovered that highly and lowly expressed genes, and genes with heritable expression are distributed differently along the chromosomes in patterns that follow previously identified high-throughput chromosomal conformation capture (Hi-C) A/B chromatin compartments. Furthermore, multiple genomic hot-spots enriched for genes with salt-specific heritability were identified on chromosomes 1, 4, 6, and 8. These hotspots were found to contain genes functionally enriched for transcriptional regulation and overlaps with a previously identified major QTL for salt-tolerance in rice. CONCLUSIONS Investigating the heritability of traits, and in-particular gene expression traits, is important towards developing a basic understanding of how regulatory networks behave across a population. This work provides insights into spatial patterns of heritable gene expression at the chromosomal level.
Collapse
Affiliation(s)
- Matthew T McGowan
- Molecular Plant Sciences Program, Washington State University, French Ad 324G, Pullman, WA, 99164, USA.
| | - Zhiwu Zhang
- Molecular Plant Sciences Program, Washington State University, French Ad 324G, Pullman, WA, 99164, USA.,Department of Crops and Soils, Washington State University, 105 Johnson Hall, Pullman, WA, 99164, USA
| | - Stephen P Ficklin
- Molecular Plant Sciences Program, Washington State University, French Ad 324G, Pullman, WA, 99164, USA.,Department of Horticulture, Washington State University, 149 Johnson Hall, Pullman, WA, 99164, USA
| |
Collapse
|
7
|
Ogle C, Reddick D, McKnight C, Biggs T, Pauly R, Ficklin SP, Feltus FA, Shannigrahi S. Named Data Networking for Genomics Data Management and Integrated Workflows. Front Big Data 2021; 4:582468. [PMID: 33748749 PMCID: PMC7968724 DOI: 10.3389/fdata.2021.582468] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Accepted: 01/04/2021] [Indexed: 11/25/2022] Open
Abstract
Advanced imaging and DNA sequencing technologies now enable the diverse biology community to routinely generate and analyze terabytes of high resolution biological data. The community is rapidly heading toward the petascale in single investigator laboratory settings. As evidence, the single NCBI SRA central DNA sequence repository contains over 45 petabytes of biological data. Given the geometric growth of this and other genomics repositories, an exabyte of mineable biological data is imminent. The challenges of effectively utilizing these datasets are enormous as they are not only large in the size but also stored in geographically distributed repositories in various repositories such as National Center for Biotechnology Information (NCBI), DNA Data Bank of Japan (DDBJ), European Bioinformatics Institute (EBI), and NASA’s GeneLab. In this work, we first systematically point out the data-management challenges of the genomics community. We then introduce Named Data Networking (NDN), a novel but well-researched Internet architecture, is capable of solving these challenges at the network layer. NDN performs all operations such as forwarding requests to data sources, content discovery, access, and retrieval using content names (that are similar to traditional filenames or filepaths) and eliminates the need for a location layer (the IP address) for data management. Utilizing NDN for genomics workflows simplifies data discovery, speeds up data retrieval using in-network caching of popular datasets, and allows the community to create infrastructure that supports operations such as creating federation of content repositories, retrieval from multiple sources, remote data subsetting, and others. Named based operations also streamlines deployment and integration of workflows with various cloud platforms. Our contributions in this work are as follows 1) we enumerate the cyberinfrastructure challenges of the genomics community that NDN can alleviate, and 2) we describe our efforts in applying NDN for a contemporary genomics workflow (GEMmaker) and quantify the improvements. The preliminary evaluation shows a sixfold speed up in data insertion into the workflow. 3) As a pilot, we have used an NDN naming scheme (agreed upon by the community and discussed in Section 4) to publish data from broadly used data repositories including the NCBI SRA. We have loaded the NDN testbed with these pre-processed genomes that can be accessed over NDN and used by anyone interested in those datasets. Finally, we discuss our continued effort in integrating NDN with cloud computing platforms, such as the Pacific Research Platform (PRP). The reader should note that the goal of this paper is to introduce NDN to the genomics community and discuss NDN’s properties that can benefit the genomics community. We do not present an extensive performance evaluation of NDN—we are working on extending and evaluating our pilot deployment and will present systematic results in a future work.
Collapse
Affiliation(s)
- Cameron Ogle
- School of Computing, Clemson University, Clemson, SC, United States
| | - David Reddick
- Department of Computer Science, Tennessee Tech University, Cookeville, TN, United States
| | - Coleman McKnight
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, United States
| | - Tyler Biggs
- Department of Horticulture, Washington State University, Pullman, WA, United States
| | - Rini Pauly
- Biomedical Data Science and Informatics Program, Clemson, SC, United States
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA, United States
| | - F Alex Feltus
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, United States.,Biomedical Data Science and Informatics Program, Clemson, SC, United States.,Center for Human Genetics, Clemson University, Greenwood, SC, United States
| | - Susmit Shannigrahi
- Department of Computer Science, Tennessee Tech University, Cookeville, TN, United States
| |
Collapse
|
8
|
McConnel CS, Crisp SA, Biggs TD, Ficklin SP, Parrish LM, Trombetta SC, Sischo WM, Adams-Progar A. A Fixed Cohort Field Study of Gene Expression in Circulating Leukocytes From Dairy Cows With and Without Mastitis. Front Vet Sci 2020; 7:559279. [PMID: 33195534 PMCID: PMC7554338 DOI: 10.3389/fvets.2020.559279] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 09/03/2020] [Indexed: 12/04/2022] Open
Abstract
Specifically designed gene expression studies can be used to prioritize candidate genes and identify novel biomarkers affecting resilience against mastitis and other diseases in dairy cattle. The primary goal of this study was to assess whether specific peripheral leukocyte genes expressed differentially in a previous study of dairy cattle with postpartum disease, also would be expressed differentially in peripheral leukocytes from a diverse set of different dairy cattle with moderate to severe clinical mastitis. Four genes were selected for this study due to their differential expression in a previous transcriptomic analysis of circulating leukocytes from dairy cows with and without evidence of early postpartum disease. An additional 15 genes were included based on their cellular, immunologic, and inflammatory functions associated with resistance and tolerance to mastitis. This fixed cohort study was conducted on a conventional dairy in Washington state. Cows >50 days in milk (DIM) with mastitis (n = 12) were enrolled along with healthy cows (n = 8) selected to match the DIM and lactation numbers of mastitic cows. Blood was collected for a complete blood count (CBC), serum biochemistry, leukocyte isolation, and RNA extraction on the day of enrollment and twice more at 6 to 8-days intervals. Latent class analysis was performed to discriminate healthy vs. mastitic cows and to describe disease resolution. RNA samples were processed by the Primate Diagnostic Services Laboratory (University of Washington, Seattle, WA). Gene expression analysis was performed using the Nanostring System (Nanostring Technologies, Seattle, Washington, USA). Of the four genes (C5AR1, CATHL6, LCN2, and PGLYRP1) with evidence of upregulation in cows with mastitis, three of those genes (CATHL6, LCN2, and PGLYRP1) were investigated due to their previously identified association with postpartum disease. These genes are responsible for immunomodulatory molecules that selectively enhance or alter host innate immune defense mechanisms and modulate pathogen-induced inflammatory responses. Although further research is warranted to explain their functional mechanisms and bioactivity in cattle, our findings suggest that these conserved elements of innate immunity have the potential to bridge disease states and target tissues in diverse dairy populations.
Collapse
Affiliation(s)
- Craig S McConnel
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | - Sierra A Crisp
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | - Tyler D Biggs
- Department of Horticulture, College of Agriculture, Human, and Natural Resource Sciences, Washington State University, Pullman, WA, United States
| | - Stephen P Ficklin
- Department of Horticulture, College of Agriculture, Human, and Natural Resource Sciences, Washington State University, Pullman, WA, United States
| | - Lindsay M Parrish
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | - Sophie C Trombetta
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | - William M Sischo
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Washington State University, Pullman, WA, United States
| | - Amber Adams-Progar
- Department of Animal Sciences, College of Agriculture, Human, and Natural Resource Sciences, Washington State University, Pullman, WA, United States
| |
Collapse
|
9
|
Ma Y, Marzougui A, Coyne CJ, Sankaran S, Main D, Porter LD, Mugabe D, Smitchger JA, Zhang C, Amin MN, Rasheed N, Ficklin SP, McGee RJ. Dissecting the Genetic Architecture of Aphanomyces Root Rot Resistance in Lentil by QTL Mapping and Genome-Wide Association Study. Int J Mol Sci 2020; 21:ijms21062129. [PMID: 32244875 PMCID: PMC7139309 DOI: 10.3390/ijms21062129] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 03/13/2020] [Accepted: 03/16/2020] [Indexed: 12/15/2022] Open
Abstract
Lentil (Lens culinaris Medikus) is an important source of protein for people in developing countries. Aphanomyces root rot (ARR) has emerged as one of the most devastating diseases affecting lentil production. In this study, we applied two complementary quantitative trait loci (QTL) analysis approaches to unravel the genetic architecture underlying this complex trait. A recombinant inbred line (RIL) population and an association mapping population were genotyped using genotyping by sequencing (GBS) to discover novel single nucleotide polymorphisms (SNPs). QTL mapping identified 19 QTL associated with ARR resistance, while association mapping detected 38 QTL and highlighted accumulation of favorable haplotypes in most of the resistant accessions. Seven QTL clusters were discovered on six chromosomes, and 15 putative genes were identified within the QTL clusters. To validate QTL mapping and genome-wide association study (GWAS) results, expression analysis of five selected genes was conducted on partially resistant and susceptible accessions. Three of the genes were differentially expressed at early stages of infection, two of which may be associated with ARR resistance. Our findings provide valuable insight into the genetic control of ARR, and genetic and genomic resources developed here can be used to accelerate development of lentil cultivars with high levels of partial resistance to ARR.
Collapse
Affiliation(s)
- Yu Ma
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (Y.M.); (D.M.); (S.P.F.)
| | - Afef Marzougui
- Department of Biological Systems Engineering, Washington State University, Pullman, WA 99164, USA; (A.M.); (S.S.); (C.Z.)
| | - Clarice J. Coyne
- USDA-ARS Plant Germplasm Introduction and Testing Unit, Washington State University, Pullman, WA 99164, USA;
| | - Sindhuja Sankaran
- Department of Biological Systems Engineering, Washington State University, Pullman, WA 99164, USA; (A.M.); (S.S.); (C.Z.)
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (Y.M.); (D.M.); (S.P.F.)
| | - Lyndon D. Porter
- USDA-ARS Grain Legume Genetics and Physiology Research Unit, Prosser, WA 99350, USA;
| | - Deus Mugabe
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA; (D.M.); (J.A.S.)
| | - Jamin A. Smitchger
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA; (D.M.); (J.A.S.)
| | - Chongyuan Zhang
- Department of Biological Systems Engineering, Washington State University, Pullman, WA 99164, USA; (A.M.); (S.S.); (C.Z.)
| | - Md. Nurul Amin
- Breeder Seed Production Center, Bangladesh Agricultural Research Institute, Debiganj-5020, Panchagarh, Bangladesh;
| | - Naser Rasheed
- Institute of Soil and Environmental Sciences, University of Agriculture, Faisalabad 38000, Pakistan;
| | - Stephen P. Ficklin
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (Y.M.); (D.M.); (S.P.F.)
| | - Rebecca J. McGee
- USDA-ARS Grain Legume Genetics and Physiology Research Unit, Pullman, WA 99164, USA
- Correspondence: ; Tel.: +1-509-335-0300
| |
Collapse
|
10
|
Spoor S, Cheng CH, Sanderson LA, Condon B, Almsaeed A, Chen M, Bretaudeau A, Rasche H, Jung S, Main D, Bett K, Staton M, Wegrzyn JL, Feltus FA, Ficklin SP. Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases. Database (Oxford) 2020; 2019:5532788. [PMID: 31328773 PMCID: PMC6643302 DOI: 10.1093/database/baz077] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 05/12/2019] [Accepted: 05/22/2019] [Indexed: 12/20/2022]
Abstract
Community biological databases provide an important online resource for both public and private data, analysis tools and community engagement. These sites house genomic, transcriptomic, genetic, breeding and ancillary data for specific species, families or clades. Due to the complexity and increasing quantities of these data, construction of online resources is increasingly difficult especially with limited funding and access to technical expertise. Furthermore, online repositories are expected to promote FAIR data principles (findable, accessible, interoperable and reusable) that presents additional challenges. The open-source Tripal database toolkit seeks to mitigate these challenges by creating both the software and an interactive community of developers for construction of online community databases. Additionally, through coordinated, distributed co-development, Tripal sites encourage community-wide sustainability. Here, we report the release of Tripal version 3 that improves data accessibility and data sharing through systematic use of controlled vocabularies (CVs). Tripal uses the community-developed Chado database as a default data store, but now provides tools to support other data stores, while ensuring that CVs remain the central organizational structure for the data. A new site developer can use Tripal to develop a basic site with little to no programming, with the ability to integrate other data types using extension modules and the Tripal application programming interface. A thorough online User’s Guide and Developer’s Handbook are available at http://tripal.info, providing download, installation and step-by-step setup instructions.
Collapse
Affiliation(s)
- Shawna Spoor
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Chun-Huai Cheng
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | | | - Bradford Condon
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Abdullah Almsaeed
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Ming Chen
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Anthony Bretaudeau
- INRA, UMR IGEPP, BIPAA/GenOuest, INRIA/Irisa - Campus de Beaulieu, Rennes Cedex, France
| | - Helena Rasche
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Kirstin Bett
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK, Canada
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA.,Computational Biology Core, Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - F Alex Feltus
- Dept. of Genetics and Biochemistry, Clemson University, Clemson, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA, USA
| |
Collapse
|
11
|
Conford B, Almsaeed A, Buehler S, Childers CP, Ficklin SP, Staton ME, Poelchau MF. Tripal EUtils: a Tripal module to increase exchange and reuse of genome assembly metadata. Database (Oxford) 2020; 2019:5709695. [PMID: 31960040 DOI: 10.1093/database/baz143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 11/04/2019] [Accepted: 11/17/2019] [Indexed: 11/13/2022]
Abstract
Data and metadata interoperability between data storage systems is a critical component of the FAIR data principles. Programmatic and consistent means of reconciling metadata models between databases promote data exchange and thus increases its access to the scientific community. This process requires (i) metadata mapping between the models and (ii) software to perform the mapping. Here, we describe our efforts to map metadata associated with genome assemblies between the National Center for Biotechnology Information (NCBI) data resources and the Chado biological database schema. We present mappings for multiple NCBI data structures and introduce a Tripal software module, Tripal EUtils, to pull metadata from NCBI into a Tripal/Chado database. We discuss potential mapping challenges and solutions and provide suggestions for future development to further increase interoperability between these platforms. Database URL: https://github.com/NAL-i5K/tripal_eutils.
Collapse
Affiliation(s)
- B Conford
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - A Almsaeed
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - S Buehler
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - C P Childers
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - S P Ficklin
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - M E Staton
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - M F Poelchau
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| |
Collapse
|
12
|
Jung S, Lee T, Cheng CH, Buble K, Zheng P, Yu J, Humann J, Ficklin SP, Gasic K, Scott K, Frank M, Ru S, Hough H, Evans K, Peace C, Olmstead M, DeVetter LW, McFerson J, Coe M, Wegrzyn JL, Staton ME, Abbott AG, Main D. 15 years of GDR: New data and functionality in the Genome Database for Rosaceae. Nucleic Acids Res 2019; 47:D1137-D1145. [PMID: 30357347 PMCID: PMC6324069 DOI: 10.1093/nar/gky1000] [Citation(s) in RCA: 187] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Accepted: 10/09/2018] [Indexed: 12/13/2022] Open
Abstract
The Genome Database for Rosaceae (GDR, https://www.rosaceae.org) is an integrated web-based community database resource providing access to publicly available genomics, genetics and breeding data and data-mining tools to facilitate basic, translational and applied research in Rosaceae. The volume of data in GDR has increased greatly over the last 5 years. The GDR now houses multiple versions of whole genome assembly and annotation data from 14 species, made available by recent advances in sequencing technology. Annotated and searchable reference transcriptomes, RefTrans, combining peer-reviewed published RNA-Seq as well as EST datasets, are newly available for major crop species. Significantly more quantitative trait loci, genetic maps and markers are available in MapViewer, a new visualization tool that better integrates with other pages in GDR. Pathways can be accessed through the new GDR Cyc Pathways databases, and synteny among the newest genome assemblies from eight species can be viewed through the new synteny browser, SynView. Collated single-nucleotide polymorphism diversity data and phenotypic data from publicly available breeding datasets are integrated with other relevant data. Also, the new Breeding Information Management System allows breeders to upload, manage and analyze their private breeding data within the secure GDR server with an option to release data publicly.
Collapse
Affiliation(s)
- Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Taein Lee
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Chun-Huai Cheng
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Katheryn Buble
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Ping Zheng
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Jing Yu
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Jodi Humann
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Ksenija Gasic
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC 29634-0310, USA
| | - Kristin Scott
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Morgan Frank
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Sushan Ru
- Department of Agronomy and Plant Genetics, University of Minnesota, St Paul, MN 55108, USA
| | - Heidi Hough
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Kate Evans
- Department of Horticulture, Washington State University Tree Fruit Research and Extension Center, Wenatchee, WA 98801, USA
| | - Cameron Peace
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Mercy Olmstead
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Lisa W DeVetter
- Department of Horticulture, Washington State University, Northwestern Washington Research and Extension Center, Mount Vernon, WA 98273, USA
| | - James McFerson
- Department of Horticulture, Washington State University Tree Fruit Research and Extension Center, Wenatchee, WA 98801, USA
| | - Michael Coe
- Cedar Lake Research Group, LLC, Portland, OR 97293, USA
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Margaret E Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA
| | - Albert G Abbott
- Forest Health Research and Extension Center, University of Kentucky, Lexington, KY 40546-0091, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| |
Collapse
|
13
|
Buble K, Jung S, Humann JL, Yu J, Cheng CH, Lee T, Ficklin SP, Hough H, Condon B, Staton ME, Wegrzyn JL, Main D. Tripal MapViewer: A tool for interactive visualization and comparison of genetic maps. Database (Oxford) 2019; 2019:baz100. [PMID: 31688940 PMCID: PMC6829499 DOI: 10.1093/database/baz100] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 06/09/2019] [Accepted: 07/16/2019] [Indexed: 11/14/2022]
Abstract
Tripal is an open-source, resource-efficient toolkit for construction of genomic, genetic and breeding databases. It facilitates development of biological websites by providing tools to integrate and display biological data using the generic database schema, Chado, together with Drupal, a popular website creation and content management system. Tripal MapViewer is a new interactive tool for visualizing genetic map data. Developed as a Tripal replacement for Comparative Map Viewer (CMap), it enables visualization of entire maps or linkage groups and features such as molecular markers, quantitative trait loci (QTLs) and heritable phenotypic markers. It also provides graphical comparison of maps sharing the same markers as well as dot plot and correspondence matrices. MapViewer integrates directly with the Tripal application programming interface framework, improving data searching capability and providing a more seamless experience for site visitors. The Tripal MapViewer interface can be integrated in any Tripal map page and linked from any Tripal page for markers, QTLs, heritable morphological markers or genes. Configuration of the display is available through a control panel and the administration interface. The administration interface also allows configuration of the custom database query for building materialized views, providing better performance and flexibility in the way data is stored in the Chado database schema. MapViewer is implemented with the D3.js technology and is currently being used at the Genome Database for Rosaceae (https://www.rosaceae.org), CottonGen (https://www.cottongen.org), Citrus Genome Database (https://citrusgenomedb.org), Vaccinium Genome Database (https://www.vaccinium.org) and Cool Season Food Legume Database (https://www.coolseasonfoodlegume.org). It is also currently in development on the Hardwood Genomics Web (https://hardwoodgenomics.org) and TreeGenes (https://treegenesdb.org). Database URL: https://gitlab.com/mainlabwsu/tripal_map.
Collapse
Affiliation(s)
- Katheryn Buble
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Jodi L Humann
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Jing Yu
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Chun-Huai Cheng
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Taein Lee
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Heidi Hough
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Bradford Condon
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA
| | - Margaret E Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| |
Collapse
|
14
|
Abstract
Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network.
Collapse
Affiliation(s)
- Leland J Dunwoodie
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
| | - William L Poehlman
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA
| | | |
Collapse
|
15
|
Harper L, Campbell J, Cannon EKS, Jung S, Poelchau M, Walls R, Andorf C, Arnaud E, Berardini TZ, Birkett C, Cannon S, Carson J, Condon B, Cooper L, Dunn N, Elsik CG, Farmer A, Ficklin SP, Grant D, Grau E, Herndon N, Hu ZL, Humann J, Jaiswal P, Jonquet C, Laporte MA, Larmande P, Lazo G, McCarthy F, Menda N, Mungall CJ, Munoz-Torres MC, Naithani S, Nelson R, Nesdill D, Park C, Reecy J, Reiser L, Sanderson LA, Sen TZ, Staton M, Subramaniam S, Tello-Ruiz MK, Unda V, Unni D, Wang L, Ware D, Wegrzyn J, Williams J, Woodhouse M, Yu J, Main D. AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database (Oxford) 2018; 2018:5096675. [PMID: 30239679 PMCID: PMC6146126 DOI: 10.1093/database/bay088] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 07/19/2018] [Accepted: 07/30/2018] [Indexed: 01/07/2023]
Abstract
The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices.
Collapse
Affiliation(s)
- Lisa Harper
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | | | - Ethalinda K S Cannon
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
- Computer Science, Iowa State University, Ames, IA, USA
| | - Sook Jung
- Horticulture, Washington State University, Pullman, WA, USA
| | - Monica Poelchau
- National Agricultural Library, USDA Agricultural Research Service, Beltsville, MD, USA
| | | | - Carson Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
- Computer Science, Iowa State University, Ames, IA, USA
| | - Elizabeth Arnaud
- Bioversity International, Informatics Unit, Conservation and Availability Programme, Parc Scientifique Agropolis II, Montpellier, France
| | - Tanya Z Berardini
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Fremont, CA, USA
| | | | - Steve Cannon
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | - James Carson
- Texas Advanced Computing Center, The University of Texas at Austin, Austin, TX, USA
| | - Bradford Condon
- Entomology and Plant Pathology, University of Tennessee Knoxville, Knoxville, TN, USA
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Nathan Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christine G Elsik
- Division of Animal Sciences and Division of Plant Sciences, University of Missouri, Columbia, MO, USA
| | - Andrew Farmer
- National Center for Genome Resources, Santa Fe, NM, USA
| | | | - David Grant
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | - Emily Grau
- National Center for Genome Resources, Santa Fe, NM, USA
| | - Nic Herndon
- Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Zhi-Liang Hu
- Animal Science, Iowa State University, Ames, USA
| | - Jodi Humann
- Horticulture, Washington State University, Pullman, WA, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Clement Jonquet
- Laboratory of Informatics, Robotics, Microelectronics of Montpellier, University of Montpellier & CNRS, Montpellier, France
| | - Marie-Angélique Laporte
- Bioversity International, Informatics Unit, Conservation and Availability Programme, Parc Scientifique Agropolis II, Montpellier, France
| | | | - Gerard Lazo
- Crop Improvement and Genetics Research Unit, USDA-ARS, Albany, CA, USA
| | - Fiona McCarthy
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA
| | | | | | | | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Rex Nelson
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | - Daureen Nesdill
- Marriott Library, University of Utah, Salt Lake City, UT, USA
| | - Carissa Park
- Animal Science, Iowa State University, Ames, USA
| | - James Reecy
- Animal Science, Iowa State University, Ames, USA
| | - Leonore Reiser
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Fremont, CA, USA
| | | | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, USDA-ARS, Albany, CA, USA
| | - Margaret Staton
- Entomology and Plant Pathology, University of Tennessee Knoxville, Knoxville, TN, USA
| | | | | | - Victor Unda
- Horticulture, Washington State University, Pullman, WA, USA
| | - Deepak Unni
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Liya Wang
- Plant Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Doreen Ware
- USDA, Plant, Soil and Nutrition Research, Ithaca, NY, USA
- Plant Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jill Wegrzyn
- Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Jason Williams
- Cold Spring Harbor Laboratory, DNA Learning Center, Cold Spring Harbor, NY, USA
| | - Margaret Woodhouse
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Jing Yu
- Horticulture, Washington State University, Pullman, WA, USA
| | - Doreen Main
- Horticulture, Washington State University, Pullman, WA, USA
| |
Collapse
|
16
|
Ficklin SP, Dunwoodie LJ, Poehlman WL, Watson C, Roche KE, Feltus FA. Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study. Sci Rep 2017; 7:8617. [PMID: 28819158 PMCID: PMC5561081 DOI: 10.1038/s41598-017-09094-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 07/21/2017] [Indexed: 01/10/2023] Open
Abstract
A gene co-expression network (GCN) describes associations between genes and points to genetic coordination of biochemical pathways. However, genetic correlations in a GCN are only detectable if they are present in the sampled conditions. With the increasing quantity of gene expression samples available in public repositories, there is greater potential for discovery of genetic correlations from a variety of biologically interesting conditions. However, even if gene correlations are present, their discovery can be masked by noise. Noise is introduced from natural variation (intrinsic and extrinsic), systematic variation (caused by sample measurement protocols and instruments), and algorithmic and statistical variation created by selection of data processing tools. A variety of published studies, approaches and methods attempt to address each of these contributions of variation to reduce noise. Here we describe an approach using Gaussian Mixture Models (GMMs) to address natural extrinsic (condition-specific) variation during network construction from mixed input conditions. To demonstrate utility, we build and analyze a condition-annotated GCN from a compendium of 2,016 mixed gene expression data sets from five tumor subtypes obtained from The Cancer Genome Atlas. Our results show that GMMs help discover tumor subtype specific gene co-expression patterns (modules) that are significantly enriched for clinical attributes.
Collapse
Affiliation(s)
- Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA.
| | - Leland J Dunwoodie
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA
| | - William L Poehlman
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA
| | - Christopher Watson
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, 99164, USA
| | - Kimberly E Roche
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA
| | - F Alex Feltus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA.
| |
Collapse
|
17
|
Wytko C, Soto B, Ficklin SP. blend4php: a PHP API for galaxy. Database (Oxford) 2017; 2017:baw154. [PMID: 28077564 PMCID: PMC5225400 DOI: 10.1093/database/baw154] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Revised: 10/12/2016] [Accepted: 11/01/2016] [Indexed: 01/17/2023]
Abstract
Galaxy is a popular framework for execution of complex analytical pipelines typically for large data sets, and is a commonly used for (but not limited to) genomic, genetic and related biological analysis. It provides a web front-end and integrates with high performance computing resources. Here we report the development of the blend4php library that wraps Galaxy’s RESTful API into a PHP-based library. PHP-based web applications can use blend4php to automate execution, monitoring and management of a remote Galaxy server, including its users, workflows, jobs and more. The blend4php library was specifically developed for the integration of Galaxy with Tripal, the open-source toolkit for the creation of online genomic and genetic web sites. However, it was designed as an independent library for use by any application, and is freely available under version 3 of the GNU Lesser General Public License (LPGL v3.0) at https://github.com/galaxyproject/blend4php. Database URL:https://github.com/galaxyproject/blend4php
Collapse
Affiliation(s)
- Connor Wytko
- Department of Horticulture and.,School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USA
| | - Brian Soto
- Department of Horticulture and.,School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USA
| | | |
Collapse
|
18
|
Wang Y, Ficklin SP, Wang X, Feltus FA, Paterson AH. Large-Scale Gene Relocations following an Ancient Genome Triplication Associated with the Diversification of Core Eudicots. PLoS One 2016; 11:e0155637. [PMID: 27195960 PMCID: PMC4873151 DOI: 10.1371/journal.pone.0155637] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 05/02/2016] [Indexed: 11/19/2022] Open
Abstract
Different modes of gene duplication including whole-genome duplication (WGD), and tandem, proximal and dispersed duplications are widespread in angiosperm genomes. Small-scale, stochastic gene relocations and transposed gene duplications are widely accepted to be the primary mechanisms for the creation of dispersed duplicates. However, here we show that most surviving ancient dispersed duplicates in core eudicots originated from large-scale gene relocations within a narrow window of time following a genome triplication (γ) event that occurred in the stem lineage of core eudicots. We name these surviving ancient dispersed duplicates as relocated γ duplicates. In Arabidopsis thaliana, relocated γ, WGD and single-gene duplicates have distinct features with regard to gene functions, essentiality, and protein interactions. Relative to γ duplicates, relocated γ duplicates have higher non-synonymous substitution rates, but comparable levels of expression and regulation divergence. Thus, relocated γ duplicates should be distinguished from WGD and single-gene duplicates for evolutionary investigations. Our results suggest large-scale gene relocations following the γ event were associated with the diversification of core eudicots.
Collapse
Affiliation(s)
- Yupeng Wang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
| | - Stephen P. Ficklin
- Department of Horticulture, Washington State University, Pullman, Washington, United States of America
| | - Xiyin Wang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
| | - F. Alex Feltus
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America
| | - Andrew H. Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- * E-mail:
| |
Collapse
|
19
|
Jung S, Ficklin SP, Lee T, Cheng CH, Blenda A, Zheng P, Yu J, Bombarely A, Cho I, Ru S, Evans K, Peace C, Abbott AG, Mueller LA, Olmstead MA, Main D. The Genome Database for Rosaceae (GDR): year 10 update. Nucleic Acids Res 2013; 42:D1237-44. [PMID: 24225320 PMCID: PMC3965003 DOI: 10.1093/nar/gkt1012] [Citation(s) in RCA: 154] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making.
Collapse
Affiliation(s)
- Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA, Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA, Boyce Thompson Institute for Plant Research, Tower Road, Ithaca, NY 14853, USA, Department of Computer Science, Saginaw Valley State University, University Center, MI 48710, USA and Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Yu J, Jung S, Cheng CH, Ficklin SP, Lee T, Zheng P, Jones D, Percy RG, Main D. CottonGen: a genomics, genetics and breeding database for cotton research. Nucleic Acids Res 2013; 42:D1229-36. [PMID: 24203703 PMCID: PMC3964939 DOI: 10.1093/nar/gkt1064] [Citation(s) in RCA: 185] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST.
Collapse
Affiliation(s)
- Jing Yu
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA, Cotton Incorporated, Cary, NC 27513, USA and Crop Germplasm Research Unit, USDA-ARS-SPARC, College Station, TX 77845, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Sanderson LA, Ficklin SP, Cheng CH, Jung S, Feltus FA, Bett KE, Main D. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases. Database (Oxford) 2013; 2013:bat075. [PMID: 24163125 PMCID: PMC3808541 DOI: 10.1093/database/bat075] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. Database URL: http://tripal.info/
Collapse
Affiliation(s)
- Lacey-Anne Sanderson
- Department of Plant Sciences, University of Saskatchewan. Saskatoon, SK Canada, Department of Horticulture, Washington State University. Pullman, WA, USA and Department of Genetics and Biochemistry, Clemson University. Clemson, SC, USA
| | | | | | | | | | | | | |
Collapse
|
22
|
Ficklin SP, Feltus FA. A systems-genetics approach and data mining tool to assist in the discovery of genes underlying complex traits in Oryza sativa. PLoS One 2013; 8:e68551. [PMID: 23874666 PMCID: PMC3713027 DOI: 10.1371/journal.pone.0068551] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 05/30/2013] [Indexed: 12/13/2022] Open
Abstract
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.
Collapse
Affiliation(s)
- Stephen P Ficklin
- Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
| | | |
Collapse
|
23
|
Feltus FA, Ficklin SP, Gibson SM, Smith MC. Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study. BMC Syst Biol 2013; 7:44. [PMID: 23738693 PMCID: PMC3679940 DOI: 10.1186/1752-0509-7-44] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 05/14/2013] [Indexed: 12/11/2022]
Abstract
Background In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. Results A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Conclusions Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.
Collapse
Affiliation(s)
- F Alex Feltus
- Department of Genetics & Biochemistry, Clemson University, 105 Collings Street, Clemson, SC 29634, USA.
| | | | | | | |
Collapse
|
24
|
Gibson SM, Ficklin SP, Isaacson S, Luo F, Feltus FA, Smith MC. Massive-scale gene co-expression network construction and robustness testing using random matrix theory. PLoS One 2013; 8:e55871. [PMID: 23409071 PMCID: PMC3567026 DOI: 10.1371/journal.pone.0055871] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 01/03/2013] [Indexed: 11/18/2022] Open
Abstract
The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust.
Collapse
Affiliation(s)
- Scott M. Gibson
- Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, South Carolina, United States of America
| | - Stephen P. Ficklin
- Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
| | - Sven Isaacson
- Department of Computer Science, Wittenberg University, Springfield, Ohio, United States of America
| | - Feng Luo
- School of Computing, Clemson University, Clemson, South Carolina, United States of America
| | - Frank A. Feltus
- Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
- Department of Genetics & Biochemistry, Clemson University, Clemson, South Carolina, United States of America
- * E-mail:
| | - Melissa C. Smith
- Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, South Carolina, United States of America
| |
Collapse
|
25
|
Wang Y, Wang X, Tang H, Tan X, Ficklin SP, Feltus FA, Paterson AH. Modes of gene duplication contribute differently to genetic novelty and redundancy, but show parallels across divergent angiosperms. PLoS One 2011; 6:e28150. [PMID: 22164235 PMCID: PMC3229532 DOI: 10.1371/journal.pone.0028150] [Citation(s) in RCA: 109] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 11/02/2011] [Indexed: 11/18/2022] Open
Abstract
Background Both single gene and whole genome duplications (WGD) have recurred in angiosperm evolution. However, the evolutionary effects of different modes of gene duplication, especially regarding their contributions to genetic novelty or redundancy, have been inadequately explored. Results In Arabidopsis thaliana and Oryza sativa (rice), species that deeply sample botanical diversity and for which expression data are available from a wide range of tissues and physiological conditions, we have compared expression divergence between genes duplicated by six different mechanisms (WGD, tandem, proximal, DNA based transposed, retrotransposed and dispersed), and between positional orthologs. Both neo-functionalization and genetic redundancy appear to contribute to retention of duplicate genes. Genes resulting from WGD and tandem duplications diverge slowest in both coding sequences and gene expression, and contribute most to genetic redundancy, while other duplication modes contribute more to evolutionary novelty. WGD duplicates may more frequently be retained due to dosage amplification, while inferred transposon mediated gene duplications tend to reduce gene expression levels. The extent of expression divergence between duplicates is discernibly related to duplication modes, different WGD events, amino acid divergence, and putatively neutral divergence (time), but the contribution of each factor is heterogeneous among duplication modes. Gene loss may retard inter-species expression divergence. Members of different gene families may have non-random patterns of origin that are similar in Arabidopsis and rice, suggesting the action of pan-taxon principles of molecular evolution. Conclusion Gene duplication modes differ in contribution to genetic novelty and redundancy, but show some parallels in taxa separated by hundreds of millions of years of evolution.
Collapse
Affiliation(s)
- Yupeng Wang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- Institute of Bioinformatics, University of Georgia, Athens, Georgia, United States of America
| | - Xiyin Wang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- College of Life Sciences, Hebei United University, Tangshan, Hebei, China
| | - Haibao Tang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
| | - Xu Tan
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
| | - Stephen P. Ficklin
- Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
| | - F. Alex Feltus
- Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America
| | - Andrew H. Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- Institute of Bioinformatics, University of Georgia, Athens, Georgia, United States of America
- Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
- Department of Crop and Soil Sciences, University of Georgia, Athens, Georgia, United States of America
- Department of Genetics, University of Georgia, Athens, Georgia, United States of America
- * E-mail:
| |
Collapse
|
26
|
Ficklin SP, Sanderson LA, Cheng CH, Staton ME, Lee T, Cho IH, Jung S, Bett KE, Main D. Tripal: a construction toolkit for online genome databases. Database (Oxford) 2011; 2011:bar044. [PMID: 21959868 PMCID: PMC3263599 DOI: 10.1093/database/bar044] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at http://tripal.sourceforge.net
Collapse
Affiliation(s)
- Stephen P Ficklin
- Department of Horticulture and Landscape Architecture, Washington State University, Pullman, WA 99164, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Saski CA, Feltus FA, Staton ME, Blackmon BP, Ficklin SP, Kuhn DN, Schnell RJ, Shapiro H, Motamayor JC. A genetically anchored physical framework for Theobroma cacao cv. Matina 1-6. BMC Genomics 2011; 12:413. [PMID: 21846342 PMCID: PMC3173454 DOI: 10.1186/1471-2164-12-413] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Accepted: 08/16/2011] [Indexed: 12/16/2022] Open
Abstract
Background The fermented dried seeds of Theobroma cacao (cacao tree) are the main ingredient in chocolate. World cocoa production was estimated to be 3 million tons in 2010 with an annual estimated average growth rate of 2.2%. The cacao bean production industry is currently under threat from a rise in fungal diseases including black pod, frosty pod, and witches' broom. In order to address these issues, genome-sequencing efforts have been initiated recently to facilitate identification of genetic markers and genes that could be utilized to accelerate the release of robust T. cacao cultivars. However, problems inherent with assembly and resolution of distal regions of complex eukaryotic genomes, such as gaps, chimeric joins, and unresolvable repeat-induced compressions, have been unavoidably encountered with the sequencing strategies selected. Results Here, we describe the construction of a BAC-based integrated genetic-physical map of the T. cacao cultivar Matina 1-6 which is designed to augment and enhance these sequencing efforts. Three BAC libraries, each comprised of 10× coverage, were constructed and fingerprinted. 230 genetic markers from a high-resolution genetic recombination map and 96 Arabidopsis-derived conserved ortholog set (COS) II markers were anchored using pooled overgo hybridization. A dense tile path consisting of 29,383 BACs was selected and end-sequenced. The physical map consists of 154 contigs and 4,268 singletons. Forty-nine contigs are genetically anchored and ordered to chromosomes for a total span of 307.2 Mbp. The unanchored contigs (105) span 67.4 Mbp and therefore the estimated genome size of T. cacao is 374.6 Mbp. A comparative analysis with A. thaliana, V. vinifera, and P. trichocarpa suggests that comparisons of the genome assemblies of these distantly related species could provide insights into genome structure, evolutionary history, conservation of functional sites, and improvements in physical map assembly. A comparison between the two T. cacao cultivars Matina 1-6 and Criollo indicates a high degree of collinearity in their genomes, yet rearrangements were also observed. Conclusions The results presented in this study are a stand-alone resource for functional exploitation and enhancement of Theobroma cacao but are also expected to complement and augment ongoing genome-sequencing efforts. This resource will serve as a template for refinement of the T. cacao genome through gap-filling, targeted re-sequencing, and resolution of repetitive DNA arrays.
Collapse
Affiliation(s)
- Christopher A Saski
- Subtropical Horticulture Research Station, USDA-ARS, 13601 Old Culter Road, Miami, FL 33158, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Feltus FA, Saski CA, Mockaitis K, Haiminen N, Parida L, Smith Z, Ford J, Staton ME, Ficklin SP, Blackmon BP, Cheng CH, Schnell RJ, Kuhn DN, Motamayor JC. Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes. BMC Genomics 2011; 12:379. [PMID: 21794110 PMCID: PMC3154204 DOI: 10.1186/1471-2164-12-379] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Accepted: 07/27/2011] [Indexed: 11/25/2022] Open
Abstract
Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
Collapse
Affiliation(s)
- Frank A Feltus
- Clemson University Genomics Institute, Clemson University, 51 New Cherry Street, Clemson, SC 29634, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Ficklin SP, Feltus FA. Gene coexpression network alignment and conservation of gene modules between two grass species: maize and rice. Plant Physiol 2011; 156:1244-56. [PMID: 21606319 PMCID: PMC3135956 DOI: 10.1104/pp.111.173047] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2011] [Accepted: 05/20/2011] [Indexed: 05/17/2023]
Abstract
One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species.
Collapse
|
30
|
Ficklin SP, Luo F, Feltus FA. The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks. Plant Physiol 2010; 154:13-24. [PMID: 20668062 PMCID: PMC2938148 DOI: 10.1104/pp.110.159459] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 07/21/2010] [Indexed: 05/18/2023]
Abstract
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
Collapse
|