1
|
Afgan E, Lonie A, Taylor J, Goonasekera N. CloudLaunch: Discover and Deploy Cloud Applications. FUTURE GENERATIONS COMPUTER SYSTEMS : FGCS 2019; 94:802-810. [PMID: 34366521 PMCID: PMC8340934 DOI: 10.1016/j.future.2018.04.037] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cloud computing is a common platform for delivering software to end users. However, the process of making complex-to-deploy applications available across different cloud providers requires isolated and uncoordinated application-specific solutions, often locking-in developers to a particular cloud provider. Here, we present the CloudLaunch application as a uniform platform for discovering and deploying applications for different cloud providers. CloudLaunch allows arbitrary applications to be added to a catalog with each application having its own customizable user interface and control over the launch process, while preserving cloud-agnosticism so that authors can easily make their applications available on multiple clouds with minimal effort. It then provides a uniform interface for launching available applications by end users across different cloud providers. Architecture details are presented along with examples of different deployable applications that highlight architectural features.
Collapse
Affiliation(s)
- Enis Afgan
- Department of Biology, Johns Hopkins University, 3400 N Charles Street, Baltimore, MD 21210, USA
| | - Andrew Lonie
- Melbourne Bioinformatics, University of Melbourne, 700 Swanston Street, Carlton, VICTORIA 3053, Australia
| | - James Taylor
- Department of Biology, Johns Hopkins University, 3400 N Charles Street, Baltimore, MD 21210, USA
| | - Nuwan Goonasekera
- Melbourne Bioinformatics, University of Melbourne, 700 Swanston Street, Carlton, VICTORIA 3053, Australia
| |
Collapse
|
2
|
Menegidio FB, Jabes DL, Costa de Oliveira R, Nunes LR. Dugong: a Docker image, based on Ubuntu Linux, focused on reproducibility and replicability for bioinformatics analyses. Bioinformatics 2018; 34:514-515. [PMID: 28968637 DOI: 10.1093/bioinformatics/btx554] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 09/01/2017] [Indexed: 11/12/2022] Open
Abstract
Summary This manuscript introduces and describes Dugong, a Docker image based on Ubuntu 16.04, which automates installation of more than 3500 bioinformatics tools (along with their respective libraries and dependencies), in alternative computational environments. The software operates through a user-friendly XFCE4 graphic interface that allows software management and installation by users not fully familiarized with the Linux command line and provides the Jupyter Notebook to assist in the delivery and exchange of consistent and reproducible protocols and results across laboratories, assisting in the development of open science projects. Availability and implementation Source code and instructions for local installation are available at https://github.com/DugongBioinformatics, under the MIT open source license. Contact Luiz.nunes@ufabc.edu.br.
Collapse
Affiliation(s)
- Fabiano B Menegidio
- Núcleo Integrado de Biotecnologia, Universidade de Mogi das Cruzes (UMC), 08780-911, Mogi das Cruzes, SP, Brazil
| | - Daniela L Jabes
- Núcleo Integrado de Biotecnologia, Universidade de Mogi das Cruzes (UMC), 08780-911, Mogi das Cruzes, SP, Brazil
| | - Regina Costa de Oliveira
- Núcleo Integrado de Biotecnologia, Universidade de Mogi das Cruzes (UMC), 08780-911, Mogi das Cruzes, SP, Brazil
| | - Luiz R Nunes
- Centro de Ciências Naturais e Humanas, Universidade Federal do ABC (UFABC), 09210-170, Santo André, SP, Brazil
| |
Collapse
|
3
|
Deletion of transketolase triggers a stringent metabolic response in promastigotes and loss of virulence in amastigotes of Leishmania mexicana. PLoS Pathog 2018; 14:e1006953. [PMID: 29554142 PMCID: PMC5882173 DOI: 10.1371/journal.ppat.1006953] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 04/03/2018] [Accepted: 02/28/2018] [Indexed: 11/22/2022] Open
Abstract
Transketolase (TKT) is part of the non-oxidative branch of the pentose phosphate pathway (PPP). Here we describe the impact of removing this enzyme from the pathogenic protozoan Leishmania mexicana. Whereas the deletion had no obvious effect on cultured promastigote forms of the parasite, the Δtkt cells were not virulent in mice. Δtkt promastigotes were more susceptible to oxidative stress and various leishmanicidal drugs than wild-type, and metabolomics analysis revealed profound changes to metabolism in these cells. In addition to changes consistent with those directly related to the role of TKT in the PPP, central carbon metabolism was substantially decreased, the cells consumed significantly less glucose, flux through glycolysis diminished, and production of the main end products of metabolism was decreased. Only minor changes in RNA abundance from genes encoding enzymes in central carbon metabolism, however, were detected although fructose-1,6-bisphosphate aldolase activity was decreased two-fold in the knock-out cell line. We also showed that the dual localisation of TKT between cytosol and glycosomes is determined by the C-terminus of the enzyme and by engineering different variants of the enzyme we could alter its sub-cellular localisation. However, no effect on the overall flux of glucose was noted irrespective of whether the enzyme was found uniquely in either compartment, or in both. Leishmania parasites endanger over 1 billion people worldwide, infecting 300,000 people and causing 20,000 deaths annually. In this study, we scrutinized metabolism in Leishmania mexicana after deletion of the gene encoding transketolase (TKT), an enzyme involved in sugar metabolism via the pentose phosphate pathway which plays key roles in creating ribose 5-phosphate for nucleotide synthesis and also defence against oxidative stress. The insect stage of the parasite, grown in culture medium, did not suffer from any obvious growth defect after the gene was deleted. However, its metabolism changed dramatically, with metabolomics indicating profound changes to flux through the pentose phosphate pathway: decreased glucose consumption, and generally enhanced efficiency in using metabolic substrates with reduced secretion of partially oxidised end products of metabolism. This ‘stringent’ metabolism is reminiscent of the mammalian stage parasites. The cells were also more sensitive to oxidative stress inducing agents and leishmanicidal drugs. Crucially, mice inoculated with the TKT knock-out parasites did not develop an infection pointing to the enzyme playing a key role in allowing the parasites to remain viable in the host, indicating that TKT may be considered a useful target for development of new drugs against leishmaniasis.
Collapse
|
4
|
Henke K, Bowen ME, Harris MP. Identification of Mutations in Zebrafish Using Next‐Generation Sequencing. ACTA ACUST UNITED AC 2018; 104:7.13.1-7.13.33. [DOI: 10.1002/0471142727.mb0713s104] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Katrin Henke
- Department of Genetics, Harvard Medical School, and Department of Orthopedics, Boston Children's Hospital Boston Massachusetts
| | - Margot E. Bowen
- Department of Genetics, Harvard Medical School, and Department of Orthopedics, Boston Children's Hospital Boston Massachusetts
| | - Matthew P. Harris
- Department of Genetics, Harvard Medical School, and Department of Orthopedics, Boston Children's Hospital Boston Massachusetts
| |
Collapse
|
5
|
Maciejewski MW, Schuyler AD, Gryk MR, Moraru II, Romero PR, Ulrich EL, Eghbalnia HR, Livny M, Delaglio F, Hoch JC. NMRbox: A Resource for Biomolecular NMR Computation. Biophys J 2017; 112:1529-1534. [PMID: 28445744 DOI: 10.1016/j.bpj.2017.03.011] [Citation(s) in RCA: 274] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 03/06/2017] [Accepted: 03/13/2017] [Indexed: 10/19/2022] Open
Abstract
Advances in computation have been enabling many recent advances in biomolecular applications of NMR. Due to the wide diversity of applications of NMR, the number and variety of software packages for processing and analyzing NMR data is quite large, with labs relying on dozens, if not hundreds of software packages. Discovery, acquisition, installation, and maintenance of all these packages is a burdensome task. Because the majority of software packages originate in academic labs, persistence of the software is compromised when developers graduate, funding ceases, or investigators turn to other projects. To simplify access to and use of biomolecular NMR software, foster persistence, and enhance reproducibility of computational workflows, we have developed NMRbox, a shared resource for NMR software and computation. NMRbox employs virtualization to provide a comprehensive software environment preconfigured with hundreds of software packages, available as a downloadable virtual machine or as a Platform-as-a-Service supported by a dedicated compute cloud. Ongoing development includes a metadata harvester to regularize, annotate, and preserve workflows and facilitate and enhance data depositions to BioMagResBank, and tools for Bayesian inference to enhance the robustness and extensibility of computational analyses. In addition to facilitating use and preservation of the rich and dynamic software environment for biomolecular NMR, NMRbox fosters the development and deployment of a new class of metasoftware packages. NMRbox is freely available to not-for-profit users.
Collapse
Affiliation(s)
- Mark W Maciejewski
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, Connecticut
| | - Adam D Schuyler
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, Connecticut
| | - Michael R Gryk
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, Connecticut
| | - Ion I Moraru
- Department of Cell Biology, UConn Health, Farmington, Connecticut
| | - Pedro R Romero
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin
| | - Eldon L Ulrich
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin
| | - Hamid R Eghbalnia
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin
| | - Miron Livny
- Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin
| | - Frank Delaglio
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology and the University of Maryland, Rockville, Maryland
| | - Jeffrey C Hoch
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, Connecticut.
| |
Collapse
|
6
|
Sobeslav V, Maresova P, Krejcar O, Franca TC, Kuca K. Use of cloud computing in biomedicine. J Biomol Struct Dyn 2016; 34:2688-2697. [DOI: 10.1080/07391102.2015.1127182] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
7
|
Feltus FA, Breen JR, Deng J, Izard RS, Konger CA, Ligon WB, Preuss D, Wang KC. The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology. Bioinform Biol Insights 2015; 9:9-19. [PMID: 26568680 PMCID: PMC4636112 DOI: 10.4137/bbi.s28988] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 08/10/2015] [Accepted: 08/12/2015] [Indexed: 01/14/2023] Open
Abstract
In the last decade, high-throughput DNA sequencing has become a disruptive technology and pushed the life sciences into a distributed ecosystem of sequence data producers and consumers. Given the power of genomics and declining sequencing costs, biology is an emerging “Big Data” discipline that will soon enter the exabyte data range when all subdisciplines are combined. These datasets must be transferred across commercial and research networks in creative ways since sending data without thought can have serious consequences on data processing time frames. Thus, it is imperative that biologists, bioinformaticians, and information technology engineers recalibrate data processing paradigms to fit this emerging reality. This review attempts to provide a snapshot of Big Data transfer across networks, which is often overlooked by many biologists. Specifically, we discuss four key areas: 1) data transfer networks, protocols, and applications; 2) data transfer security including encryption, access, firewalls, and the Science DMZ; 3) data flow control with software-defined networking; and 4) data storage, staging, archiving and access. A primary intention of this article is to orient the biologist in key aspects of the data transfer process in order to frame their genomics-oriented needs to enterprise IT professionals.
Collapse
Affiliation(s)
- Frank A Feltus
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Joseph R Breen
- University of Utah Center for High Performance Computing, Salt Lake City, UT, USA
| | - Juan Deng
- Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA
| | - Ryan S Izard
- Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA
| | - Christopher A Konger
- Clemson Computing and Information Technology, Clemson University, Anderson, SC, USA
| | - Walter B Ligon
- Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA
| | | | - Kuang-Ching Wang
- Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA
| |
Collapse
|
8
|
Spjuth O, Bongcam-Rudloff E, Hernández GC, Forer L, Giovacchini M, Guimera RV, Kallio A, Korpelainen E, Kańduła MM, Krachunov M, Kreil DP, Kulev O, Łabaj PP, Lampa S, Pireddu L, Schönherr S, Siretskiy A, Vassilev D. Experiences with workflows for automating data-intensive bioinformatics. Biol Direct 2015; 10:43. [PMID: 26282399 PMCID: PMC4539931 DOI: 10.1186/s13062-015-0071-8] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 08/03/2015] [Indexed: 11/16/2022] Open
Abstract
High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution. Reviewers This article was reviewed by Dr Andrew Clark.
Collapse
Affiliation(s)
- Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, SE-75124, Uppsala, P.O. Box 591, Sweden.
| | - Erik Bongcam-Rudloff
- SLU-Global Bioinformatics Centre, Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden.
| | | | - Lukas Forer
- Division of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, 6020, Austria.
| | - Mario Giovacchini
- Science for Life Laboratory, Karolinska Institutet, SE-17121, Stockholm, P.O. Box 1031, Sweden.
| | - Roman Valls Guimera
- Science for Life Laboratory, Karolinska Institutet, SE-17121, Stockholm, P.O. Box 1031, Sweden.
| | - Aleksi Kallio
- CSC - IT Center for Science Ltd., FI-02101, Espoo, P.O. Box 405, Finland.
| | - Eija Korpelainen
- CSC - IT Center for Science Ltd., FI-02101, Espoo, P.O. Box 405, Finland.
| | - Maciej M Kańduła
- Chair of Bioinformatics Research Group, Boku University, Vienna, Austria.
| | - Milko Krachunov
- Faculty of Mathematics and Informatics, Sofia University, Sofia, Bulgaria.
| | - David P Kreil
- Chair of Bioinformatics Research Group, Boku University, Vienna, Austria.
| | - Ognyan Kulev
- Faculty of Mathematics and Informatics, Sofia University, Sofia, Bulgaria.
| | - Paweł P Łabaj
- Chair of Bioinformatics Research Group, Boku University, Vienna, Austria.
| | - Samuel Lampa
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, SE-75124, Uppsala, P.O. Box 591, Sweden.
| | | | - Sebastian Schönherr
- Division of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, 6020, Austria.
| | - Alexey Siretskiy
- Department of Information Technology, Uppsala University, SE-75105, Uppsala, P.O. Box 337, Sweden.
| | | |
Collapse
|
9
|
Shanahan HP, Owen AM, Harrison AP. Bioinformatics on the cloud computing platform Azure. PLoS One 2014; 9:e102642. [PMID: 25050811 PMCID: PMC4106841 DOI: 10.1371/journal.pone.0102642] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Accepted: 06/20/2014] [Indexed: 12/27/2022] Open
Abstract
We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development.
Collapse
Affiliation(s)
- Hugh P. Shanahan
- Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
- * E-mail:
| | - Anne M. Owen
- Department of Mathematical Sciences, University of Essex, Wivenhoe Park, Colchester, United Kingdom
| | - Andrew P. Harrison
- Department of Mathematical Sciences, University of Essex, Wivenhoe Park, Colchester, United Kingdom
- Department of Biological Sciences, University of Essex, Wivenhoe Park, Colchester, United Kingdom
| |
Collapse
|
10
|
Onsongo G, Erdmann J, Spears MD, Chilton J, Beckman KB, Hauge A, Yohe S, Schomaker M, Bower M, Silverstein KAT, Thyagarajan B. Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory. BMC Res Notes 2014; 7:314. [PMID: 24885806 PMCID: PMC4036707 DOI: 10.1186/1756-0500-7-314] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Accepted: 05/06/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. FINDINGS To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. CONCLUSIONS We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Kevin A T Silverstein
- Research Informatics Support Systems, Minnesota Supercomputing Institute, University of Minnesota, Room 599 Walter Library 117 Pleasant St SE, Minneapolis, MN 55455, USA.
| | | |
Collapse
|
11
|
Cole C, Krampis K, Karagiannis K, Almeida JS, Faison WJ, Motwani M, Wan Q, Golikov A, Pan Y, Simonyan V, Mazumder R. Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data. BMC Bioinformatics 2014; 15:28. [PMID: 24467687 PMCID: PMC3916084 DOI: 10.1186/1471-2105-15-28] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 01/22/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. RESULTS To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). CONCLUSIONS Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.
| |
Collapse
|
12
|
Richardson EJ, Escalettes F, Fotheringham I, Wallace RJ, Watson M. Meta4: a web application for sharing and annotating metagenomic gene predictions using web services. Front Genet 2013; 4:168. [PMID: 24046776 PMCID: PMC3763215 DOI: 10.3389/fgene.2013.00168] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Accepted: 08/13/2013] [Indexed: 11/22/2022] Open
Abstract
Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website1, code is available on Github2, a cloud image is available, and an example implementation can be seen at
Collapse
Affiliation(s)
- Emily J Richardson
- ARK-Genomics, The Roslin Institute and R(D)SVS, University of Edinburgh Easter Bush, Midlothian, UK
| | | | | | | | | |
Collapse
|
13
|
|