1
|
Abstract
Globus, developed as Software-as-a-Service (SaaS) for research data management, also provides APIs that constitute a flexible and powerful Platform-as-a-Service (PaaS) to which developers can outsource data management activities such as transfer and sharing, as well as identity, profile and group management. By providing these frequently important but always challenging capabilities as a service, accessible over the network, Globus PaaS streamlines web application development and makes it easy for individuals, teams, and institutions to create collaborative applications such as science gateways for science communities. We introduce the capabilities of this platform and review representative applications.
Collapse
Affiliation(s)
- Rachana Ananthakrishnan
- Computation Institute, Argonne National Laboratory & University of Chicago, Chicago, IL 60637, USA
| | - Kyle Chard
- Computation Institute, Argonne National Laboratory & University of Chicago, Chicago, IL 60637, USA
| | - Ian Foster
- Computation Institute, Argonne National Laboratory & University of Chicago, Chicago, IL 60637, USA
| | - Steven Tuecke
- Computation Institute, Argonne National Laboratory & University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
2
|
Stanberry L, Rekepalli B, Liu Y, Giblock P, Higdon R, Montague E, Broomall W, Kolker N, Kolker E. Optimizing high performance computing workflow for protein functional annotation. Concurr Comput 2014; 26:2112-2121. [PMID: 25313296 PMCID: PMC4194055 DOI: 10.1002/cpe.3264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Collapse
Affiliation(s)
- Larissa Stanberry
- Bioinformatics & High-Throughput Analysis Laboratory and High-Throughput Analysis Core, Seattle Children's Research Institute (SCRI), DELSA Global, Seattle, WA 98101, USA
| | - Bhanu Rekepalli
- Joint Institute for Computational Sciences, University of Tennessee - Oak Ridge National Laboratory (JICS UT - ORNL), DELSA Global, Oak Ridge, TN, USA
| | - Yuan Liu
- Joint Institute for Computational Sciences, University of Tennessee - Oak Ridge National Laboratory (JICS UT - ORNL), DELSA Global, Oak Ridge, TN, USA
| | | | - Roger Higdon
- Bioinformatics & High-Throughput Analysis Laboratory and High-Throughput Analysis Core, Seattle Children's Research Institute (SCRI), DELSA Global, Seattle, WA 98101, USA
| | - Elizabeth Montague
- Bioinformatics & High-Throughput Analysis Laboratory and High-Throughput Analysis Core, Seattle Children's Research Institute (SCRI), DELSA Global, Seattle, WA 98101, USA
| | - William Broomall
- Bioinformatics & High-Throughput Analysis Laboratory and High-Throughput Analysis Core, Seattle Children's Research Institute (SCRI), DELSA Global, Seattle, WA 98101, USA
| | - Natali Kolker
- Bioinformatics & High-Throughput Analysis Laboratory and High-Throughput Analysis Core, Seattle Children's Research Institute (SCRI), DELSA Global, Seattle, WA 98101, USA
| | - Eugene Kolker
- Bioinformatics & High-throughput Analysis Laboratory, SCRI, High-throughput Analysis Core, SCRI, Predicitive Analytics, Seattle Children's Hospital, Departments of Pediatrics and Biomedical Informatics & Medical Education, University of Washington, DELSA Global
| |
Collapse
|