1
|
Swartz LG, Liu S, Dahlquist D, Kramer ST, Walter ES, McInturf SA, Bucksch A, Mendoza-Cózatl DG. OPEN leaf: an open-source cloud-based phenotyping system for tracking dynamic changes at leaf-specific resolution in Arabidopsis. Plant J 2023; 116:1600-1616. [PMID: 37733751 DOI: 10.1111/tpj.16449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 08/16/2023] [Indexed: 09/23/2023]
Abstract
The first draft of the Arabidopsis genome was released more than 20 years ago and despite intensive molecular research, more than 30% of Arabidopsis genes remained uncharacterized or without an assigned function. This is in part due to gene redundancy within gene families or the essential nature of genes, where their deletion results in lethality (i.e., the dark genome). High-throughput plant phenotyping (HTPP) offers an automated and unbiased approach to characterize subtle or transient phenotypes resulting from gene redundancy or inducible gene silencing; however, access to commercial HTPP platforms remains limited. Here we describe the design and implementation of OPEN leaf, an open-source phenotyping system with cloud connectivity and remote bilateral communication to facilitate data collection, sharing and processing. OPEN leaf, coupled with our SMART imaging processing pipeline was able to consistently document and quantify dynamic changes at the whole rosette level and leaf-specific resolution when plants experienced changes in nutrient availability. Our data also demonstrate that VIS sensors remain underutilized and can be used in high-throughput screens to identify and characterize previously unidentified phenotypes in a leaf-specific time-dependent manner. Moreover, the modular and open-source design of OPEN leaf allows seamless integration of additional sensors based on users and experimental needs.
Collapse
Affiliation(s)
- Landon G Swartz
- Department of Electrical Engineering and Computer Science, University of Missouri, 411 S 6th St., Columbia, Missouri, 65201, USA
- Division of Plant Science and Technology, C.S. Bond Life Sciences Center, University of Missouri, 1201 Rollins St., Columbia, Missouri, 65211, USA
| | - Suxing Liu
- School of Plant Sciences, University of Arizona, 1140 E South Campus, Tucson, Arizona, 85721, USA
| | - Drew Dahlquist
- Department of Electrical Engineering and Computer Science, University of Missouri, 411 S 6th St., Columbia, Missouri, 65201, USA
| | - Skyler T Kramer
- MU Institute of Data Science and Informatics, C.S. Bond Life Sciences Center, University of Missouri, 1201 Rollinst St., Columbia, Missouri, 65211, USA
| | - Emily S Walter
- Division of Plant Science and Technology, C.S. Bond Life Sciences Center, University of Missouri, 1201 Rollins St., Columbia, Missouri, 65211, USA
| | - Samuel A McInturf
- Division of Plant Science and Technology, C.S. Bond Life Sciences Center, University of Missouri, 1201 Rollins St., Columbia, Missouri, 65211, USA
| | - Alexander Bucksch
- School of Plant Sciences, University of Arizona, 1140 E South Campus, Tucson, Arizona, 85721, USA
| | - David G Mendoza-Cózatl
- Department of Electrical Engineering and Computer Science, University of Missouri, 411 S 6th St., Columbia, Missouri, 65201, USA
- Division of Plant Science and Technology, C.S. Bond Life Sciences Center, University of Missouri, 1201 Rollins St., Columbia, Missouri, 65211, USA
| |
Collapse
|
2
|
Raveendran K, Freese NH, Kintali C, Tiwari S, Bole P, Dias C, Loraine AE. BioViz Connect: Web Application Linking CyVerse Cloud Resources to Genomic Visualization in the Integrated Genome Browser. Front Bioinform 2022; 2:764619. [PMID: 36304269 PMCID: PMC9580933 DOI: 10.3389/fbinf.2022.764619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 04/28/2022] [Indexed: 11/19/2022] Open
Abstract
Genomics researchers do better work when they can interactively explore and visualize data. Due to the vast size of experimental datasets, researchers are increasingly using powerful, cloud-based systems to process and analyze data. These remote systems, called science gateways, offer user-friendly, Web-based access to high performance computing and storage resources, but typically lack interactive visualization capability. In this paper, we present BioViz Connect, a middleware Web application that links CyVerse science gateway resources to the Integrated Genome Browser (IGB), a highly interactive native application implemented in Java that runs on the user's personal computer. Using BioViz Connect, users can 1) stream data from the CyVerse data store into IGB for visualization, 2) improve the IGB user experience for themselves and others by adding IGB specific metadata to CyVerse data files, including genome version and track appearance, and 3) run compute-intensive visual analytics functions on CyVerse infrastructure to create new datasets for visualization in IGB or other applications. To demonstrate how BioViz Connect facilitates interactive data visualization, we describe an example RNA-Seq data analysis investigating how heat and desiccation stresses affect gene expression in the model plant Arabidopsis thaliana. The RNA-Seq use case illustrates how interactive visualization with IGB can help a user identify problematic experimental samples, sanity-check results using a positive control, and create new data files for interactive visualization in IGB (or other tools) using a Docker image deployed to CyVerse via the Terrain API. Lastly, we discuss limitations of the technologies used and suggest opportunities for future work. BioViz Connect is available from https://bioviz.org.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Ann E. Loraine
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
| |
Collapse
|
3
|
Abstract
Comparative genomic and transcriptomic analyses can help prioritize and facilitate the functional analysis of long noncoding RNAs (lncRNAs). Evolinc-II is a bioinformatic pipeline that automates comparative analyses, searching for sequence and structural conservation for thousands of lncRNAs at once. In addition, Evolinc-II takes a phylogenetic approach to infer key evolutionary events that may have occurred during the emergence of each query lncRNA. Here, we describe how to use command line or GUI (CyVerse's Discovery Environment) versions of Evolinc-II to identify lncRNA homologs and prioritize them for functional analysis.
Collapse
|
4
|
Wieser F, Stryeck S, Lang K, Hahn C, Thallinger G, Feichtinger J, Hack P, Stepponat M, Merchant N, Lindstaedt S, Oberdorfer G. A local platform for user-friendly FAIR data management and reproducible analytics. J Biotechnol 2021; 341:43-50. [PMID: 34400238 DOI: 10.1016/j.jbiotec.2021.08.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 06/24/2021] [Accepted: 08/04/2021] [Indexed: 10/20/2022]
Abstract
Collaborative research is common practice in modern life sciences. For most projects several researchers from multiple universities collaborate on a specific topic. Frequently, these research projects produce a wealth of data that requires central and secure storage, which should also allow for easy sharing among project participants. Only under best circumstances, this comes with minimal technical overhead for the researchers. Moreover, the need for data to be analyzed in a reproducible way often poses a challenge for researchers without a data science background and thus represents an overly time-consuming process. Here, we report on the integration of CyVerse Austria (CAT), a new cyberinfrastructure for a local community of life science researchers and provide two examples how it can be used to facilitate FAIR data management and reproducible analytics for teaching and research. In particular, we describe in detail how CAT can be used (i) as a teaching platform with a defined software environment and data management/sharing possibilities, and (ii) to build a data analysis pipeline using the Docker technology tailored to the needs and interests of the researcher.
Collapse
Affiliation(s)
- Florian Wieser
- Institute of Biochemistry, Graz University of Technology, 8010, Graz, Austria
| | - Sarah Stryeck
- Institute for Interactive Systems and Data Science, Graz University of Technology, 8010, Graz, Austria; Know-Center GmbH, 8010, Graz, Austria
| | - Konrad Lang
- Institute for Interactive Systems and Data Science, Graz University of Technology, 8010, Graz, Austria; Know-Center GmbH, 8010, Graz, Austria
| | - Christoph Hahn
- Institute of Biology, University of Graz, 8010, Graz, Austria
| | - Gerhard Thallinger
- Institute of Biomedical Informatics, Graz University of Technology, 8010, Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010, Graz, Styria, Austria
| | - Julia Feichtinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center, Medical University of Graz, 8010, Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010, Graz, Styria, Austria
| | - Philipp Hack
- Central Information Technology, Graz University of Technology, 8010, Graz, Austria
| | - Manfred Stepponat
- Central Information Technology, Graz University of Technology, 8010, Graz, Austria
| | - Nirav Merchant
- Data Science Institute, University of Arizona, BSRL 200 A, Tucson, AZ, 85721, United States
| | - Stefanie Lindstaedt
- Institute for Interactive Systems and Data Science, Graz University of Technology, 8010, Graz, Austria; Know-Center GmbH, 8010, Graz, Austria.
| | - Gustav Oberdorfer
- Institute of Biochemistry, Graz University of Technology, 8010, Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010, Graz, Styria, Austria.
| |
Collapse
|
5
|
Hubbard A, Bomhoff M, Schmidt CJ. fRNAkenseq: a fully powered-by- CyVerse cloud integrated RNA-sequencing analysis tool. PeerJ 2020; 8:e8592. [PMID: 32461821 PMCID: PMC7231498 DOI: 10.7717/peerj.8592] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 01/18/2020] [Indexed: 11/20/2022] Open
Abstract
Background Decreasing costs make RNA sequencing technologies increasingly affordable for biologists. However, many researchers who can now afford sequencing lack access to resources necessary for downstream analysis. This means that even as algorithms to process RNA-Seq data improve, many biologists still struggle to manage the sheer volume of data produced by next generation sequencing (NGS) technologies. Scalable bioinformatics tools that exploit multiple platforms are needed to democratize bioinformatics resources in the sequencing era. This is essential for equipping many research groups in the life sciences with the tools to process the increasingly unwieldy datasets they produce. Methods One strategy to address this challenge is to develop a modern generation of sequence analysis tools capable of seamless data sharing and communication. Such tools will provide interoperability through offerings of interlinked resources. Systems of interlinked, scalable resources, which often incorporate cloud data storage, are broadly referred to as cyberinfrastructure. Cyberinfrastructure integrated tools will help researchers to robustly analyze large scale datasets by efficiently sharing data burdens across a distributed architecture. Additionally, interoperability will allow emerging tools to cross-adapt features of existing tools. It is important that these tools are designed to be easy to use for biologists. Results We introduce fRNAkenseq, a powered-by-CyVerse RNA sequencing analysis tool that exhibits interoperability with other resources and meets the needs of biologists for comprehensive, easy to use RNA sequencing analysis. fRNAkenseq leverages a complex set of Application Programming Interfaces (APIs) associated with the NSF-funded cyberinfrastructure project, CyVerse, to execute FASTQ-to-differential expression RNA-Seq analyses. Integrating across bioinformatics platforms, fRNAkenseq also exploits cloud integration and cross-talk with another CyVerse associated tool, CoGe. fRNAkenseq offers novel features for the biologist such as more robust and comprehensive pipelines for enrichment than those currently available by default in a single tool, whether they are cloud-based or local installation. Importantly, cross-talk with CoGe allows fRNAkenseq users to execute RNA-Seq pipelines on an inventory of 47,000 archived genomes stored in CoGe or upload their own draft genome.
Collapse
Affiliation(s)
- Allen Hubbard
- Donald Danforth Plant Science Center, Saint Louis, MO, USA
| | - Matthew Bomhoff
- Department of Plant and Soil Sciences, University of Arizona, Tucson, AZ, USA
| | - Carl J Schmidt
- Department of Animal and Food Sciences, University of Delaware, Newark, DE, USA
| |
Collapse
|
6
|
Chougule KM, Wang L, Stein JC, Wang X, Devisetty UK, Klein RR, Ware D. Improved RNA-seq Workflows Using CyVerse Cyberinfrastructure. ACTA ACUST UNITED AC 2018; 63:e53. [PMID: 30168903 DOI: 10.1002/cpbi.53] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
RNA-seq is a vital method for understanding gene structure and expression patterns. Typical RNA-seq analysis protocols use sequencing reads of length 50 to 150 nucleotides for alignment to the reference genome and assembly of transcripts. The resultant transcripts are quantified and used for differential expression and visualization. Existing tools and protocols for RNA-seq are vast and diverse; given their differences in performance, it is critical to select an analysis protocol that is scalable, accurate, and easy to use. Tuxedo, a popular alignment-based protocol for RNA-seq analysis, has been updated with HISAT2, StringTie, StringTie-merge, and Ballgown, and the updated protocol outperforms its predecessor. Similarly, new pseudo-alignment-based protocols like Kallisto and Sleuth reduce runtime and improve performance. However, these tools are challenging for researchers lacking command-line experience. Here, we describe two new RNA-seq analysis protocols, in which all tools are deployed on CyVerse Cyberinfrastructure with user-friendly graphical user interfaces, and validate their performance using plant RNA-seq data. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
| | - Liya Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
| | - Joshua C Stein
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
| | - Xiaofei Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
| | | | - Robert R Klein
- United States Department of Agriculture-Agriculture Research Service, Southern Plains Agricultural Research Center, College Station, Texas
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York.,United States Department of Agriculture-Agriculture Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, New York
| |
Collapse
|
7
|
Abstract
Docker has become a very popular container-based virtualization platform for software distribution that has revolutionized the way in which scientific software and software dependencies (software stacks) can be packaged, distributed, and deployed. Docker makes the complex and time-consuming installation procedures needed for scientific software a one-time process. Because it enables platform-independent installation, versioning of software environments, and easy redeployment and reproducibility, Docker is an ideal candidate for the deployment of identical software stacks on different compute environments such as XSEDE and Amazon AWS. CyVerse's Discovery Environment also uses Docker for integrating its powerful, community-recommended software tools into CyVerse's production environment for public use. This paper will help users bring their tools into CyVerse Discovery Environment (DE) which will not only allows users to integrate their tools with relative ease compared to the earlier method of tool deployment in DE but will also help users to share their apps with collaborators and release them for public use.
Collapse
Affiliation(s)
| | | | - Paul Sarando
- CyVerse, University of Arizona, Tucson, AR, 85721, USA
| | | | - Eric Lyons
- CyVerse, University of Arizona, Tucson, AR, 85721, USA
| |
Collapse
|