1
|
Gomez-Alvarez V, Teal TK, Schmidt TM. Systematic artifacts in metagenomes from complex microbial communities. ISME JOURNAL 2009; 3:1314-7. [PMID: 19587772 DOI: 10.1038/ismej.2009.72] [Citation(s) in RCA: 329] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Metagenomics is providing an unprecedented view of the taxonomic diversity, metabolic potential and ecological role of microbial communities in biomes as diverse as the mammalian gastrointestinal tract, the marine water column and soils. However, we have found a systematic error in metagenomes generated by 454-based pyrosequencing that leads to an overestimation of gene and taxon abundance; between 11% and 35% of sequences in a typical metagenome are artificial replicates. Here we document the error in several published and original datasets and offer a web-based solution (http://microbiomes.msu.edu/replicates) for identifying and removing these artifacts.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
16 |
329 |
2
|
Dietrich LEP, Teal TK, Price-Whelan A, Newman DK. Redox-active antibiotics control gene expression and community behavior in divergent bacteria. Science 2008; 321:1203-6. [PMID: 18755976 PMCID: PMC2745639 DOI: 10.1126/science.1160619] [Citation(s) in RCA: 302] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
It is thought that bacteria excrete redox-active pigments as antibiotics to inhibit competitors. In Pseudomonas aeruginosa, the endogenous antibiotic pyocyanin activates SoxR, a transcription factor conserved in Proteo- and Actinobacteria. In Escherichia coli, SoxR regulates the superoxide stress response. Bioinformatic analysis coupled with gene expression studies in P. aeruginosa and Streptomyces coelicolor revealed that the majority of SoxR regulons in bacteria lack the genes required for stress responses, despite the fact that many of these organisms still produce redox-active small molecules, which indicates that redox-active pigments play a role independent of oxidative stress. These compounds had profound effects on the structural organization of colony biofilms in both P. aeruginosa and S. coelicolor, which shows that "secondary metabolites" play important conserved roles in gene expression and development.
Collapse
|
Research Support, Non-U.S. Gov't |
17 |
302 |
3
|
Abstract
Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow the same practices as lab projects and notebooks, with organized data, documented steps, and the project structured for reproducibility, but researchers new to computing often don't know where to start. This paper presents a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts, are drawn from a wide variety of published sources from our daily lives and from our work with volunteer organizations that have delivered workshops to over 11,000 people since 2010.
Collapse
|
Journal Article |
8 |
158 |
4
|
Werling BP, Dickson TL, Isaacs R, Gaines H, Gratton C, Gross KL, Liere H, Malmstrom CM, Meehan TD, Ruan L, Robertson BA, Robertson GP, Schmidt TM, Schrotenboer AC, Teal TK, Wilson JK, Landis DA. Perennial grasslands enhance biodiversity and multiple ecosystem services in bioenergy landscapes. Proc Natl Acad Sci U S A 2014; 111:1652-7. [PMID: 24474791 PMCID: PMC3910622 DOI: 10.1073/pnas.1309492111] [Citation(s) in RCA: 126] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Agriculture is being challenged to provide food, and increasingly fuel, for an expanding global population. Producing bioenergy crops on marginal lands--farmland suboptimal for food crops--could help meet energy goals while minimizing competition with food production. However, the ecological costs and benefits of growing bioenergy feedstocks--primarily annual grain crops--on marginal lands have been questioned. Here we show that perennial bioenergy crops provide an alternative to annual grains that increases biodiversity of multiple taxa and sustain a variety of ecosystem functions, promoting the creation of multifunctional agricultural landscapes. We found that switchgrass and prairie plantings harbored significantly greater plant, methanotrophic bacteria, arthropod, and bird diversity than maize. Although biomass production was greater in maize, all other ecosystem services, including methane consumption, pest suppression, pollination, and conservation of grassland birds, were higher in perennial grasslands. Moreover, we found that the linkage between biodiversity and ecosystem services is dependent not only on the choice of bioenergy crop but also on its location relative to other habitats, with local landscape context as important as crop choice in determining provision of some services. Our study suggests that bioenergy policy that supports coordinated land use can diversify agricultural landscapes and sustain multiple critical ecosystem services.
Collapse
|
research-article |
11 |
126 |
5
|
Singh P, Teal TK, Marsh TL, Tiedje JM, Mosci R, Jernigan K, Zell A, Newton DW, Salimnia H, Lephart P, Sundin D, Khalife W, Britton RA, Rudrik JT, Manning SD. Intestinal microbial communities associated with acute enteric infections and disease recovery. MICROBIOME 2015; 3:45. [PMID: 26395244 PMCID: PMC4579588 DOI: 10.1186/s40168-015-0109-2] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Accepted: 09/11/2015] [Indexed: 05/10/2023]
Abstract
BACKGROUND The intestinal microbiome represents a complex network of microbes that are important for human health and preventing pathogen invasion. Studies that examine differences in intestinal microbial communities across individuals with and without enteric infections are useful for identifying microbes that support or impede intestinal health. RESULTS 16S rRNA gene sequencing was conducted on stool DNA from patients with enteric infections (n = 200) and 75 healthy family members to identify differences in intestinal community composition. Stools from 13 patients were also examined post-infection to better understand how intestinal communities recover. Patient communities had lower species richness, evenness, and diversity versus uninfected communities, while principle coordinate analysis demonstrated close clustering of uninfected communities, but not the patient communities, irrespective of age, gender, and race. Differences in community composition between patients and family members were mostly due to variation in the abundance of phyla Proteobacteria, Bacteroidetes, and Firmicutes. Patient communities had significantly more Proteobacteria representing genus Escherichia relative to uninfected communities, which were dominated by Bacteroides. Intestinal communities from patients with bloody diarrhea clustered together in the neighbor-joining phylogeny, while communities from 13 patients' post-infection had a significant increase in Bacteroidetes and Firmicutes and clustered together with uninfected communities. CONCLUSIONS These data demonstrate that the intestinal communities in patients with enteric bacterial infections get altered in similar ways. Furthermore, preventing an increase in Escherichia abundance may be an important consideration for future prevention strategies.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
121 |
6
|
Teal TK, Lies DP, Wold BJ, Newman DK. Spatiometabolic stratification of Shewanella oneidensis biofilms. Appl Environ Microbiol 2006; 72:7324-30. [PMID: 16936048 PMCID: PMC1636161 DOI: 10.1128/aem.01163-06] [Citation(s) in RCA: 109] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Biofilms, or surface-attached microbial communities, are both ubiquitous and resilient in the environment. Although much is known about how biofilms form, develop, and detach, very little is understood about how these events are related to metabolism and its dynamics. It is commonly thought that large subpopulations of cells within biofilms are not actively producing proteins or generating energy and are therefore dead. An alternative hypothesis is that within the growth-inactive domains of biofilms, significant populations of living cells persist and retain the capacity to dynamically regulate their metabolism. To test this, we employed unstable fluorescent reporters to measure growth activity and protein synthesis in vivo over the course of biofilm development and created a quantitative routine to compare domains of activity in independently grown biofilms. Here we report that Shewanella oneidensis biofilm structures reproducibly stratify with respect to growth activity and metabolism as a function of size. Within domains of growth-inactive cells, genes typically upregulated under anaerobic conditions are expressed well after growth has ceased. These findings reveal that, far from being dead, the majority of cells in mature S. oneidensis biofilms have actively turned-on metabolic programs appropriate to their local microenvironment and developmental stage.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
19 |
109 |
7
|
Gorlenko V, Tsapin A, Namsaraev Z, Teal T, Tourova T, Engler D, Mielke R, Nealson K. Anaerobranca californiensis sp. nov., an anaerobic, alkalithermophilic, fermentative bacterium isolated from a hot spring on Mono Lake. Int J Syst Evol Microbiol 2004; 54:739-743. [PMID: 15143017 DOI: 10.1099/ijs.0.02909-0] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A novel, obligately anaerobic, alkalithermophilic, chemo-organotrophic bacterium was isolated from the sediment of an alkaline hot spring located on Paoha Island in Mono Lake, California, USA. This rod-shaped bacterium was motile via peritrichous flagella. Isolated strains grew optimally in 5-25 g NaCl l(-1), at pH 9.0-9.5 and at a temperature of 58 degrees C and were fermentative and mainly proteolytic, utilizing peptone, Casamino acids and yeast extract. Optimal growth was seen in the presence of elemental sulfur, polysulfide or thiosulfate with concomitant reduction to hydrogen sulfide. Sulfite was also formed in an equal ratio to sulfide during reduction of thiosulfate. The novel isolate could also reduce Fe(III) and Se(IV) in the presence of organic matter. On the basis of physiological properties, 16S rRNA gene sequence and DNA-DNA hybridization data, strain PAOHA-1(T) (=DSM 14826(T)=UNIQEM 227(T)) belongs to the genus Anaerobranca and represents a novel species, Anaerobranca californiensis sp. nov.
Collapse
MESH Headings
- Bacteria, Anaerobic/classification
- Bacteria, Anaerobic/genetics
- Bacteria, Anaerobic/isolation & purification
- Bacteria, Anaerobic/metabolism
- California
- DNA, Bacterial/genetics
- DNA, Ribosomal/genetics
- Fermentation
- Fresh Water/microbiology
- Hot Temperature
- Hydrogen-Ion Concentration
- Microscopy, Electron
- Molecular Sequence Data
- Phenotype
- Phylogeny
- RNA, Bacterial/genetics
- RNA, Ribosomal, 16S/genetics
- Sulfides/metabolism
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
21 |
64 |
8
|
Teal TK, Cranston KA, Lapp H, White E, Wilson G, Ram K, Pawlik A. Data Carpentry: Workshops to Increase Data Literacy for Researchers. INTERNATIONAL JOURNAL OF DIGITAL CURATION 2015. [DOI: 10.2218/ijdc.v10i1.351] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
In many domains the rapid generation of large amounts of data is fundamentally changing how research is done. The deluge of data presents great opportunities, but also many challenges in managing, analyzing and sharing data. However, good training resources for researchers looking to develop skills that will enable them to be more effective and productive researchers are scarce and there is little space in the existing curriculum for courses or additional lectures. To address this need we have developed an introductory two-day intensive workshop, Data Carpentry, designed to teach basic concepts, skills, and tools for working more effectively and reproducibly with data. These workshops are based on Software Carpentry: two-day, hands-on, bootcamp style workshops teaching best practices in software development, that have demonstrated the success of short workshops to teach foundational research skills. Data Carpentry focuses on data literacy in particular, with the objective of teaching skills to researchers to enable them to retrieve, view, manipulate, analyze and store their and other’s data in an open and reproducible way in order to extract knowledge from data.
Collapse
|
|
10 |
64 |
9
|
Kim Y, Aw TG, Teal TK, Rose JB. Metagenomic Investigation of Viral Communities in Ballast Water. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2015; 49:8396-407. [PMID: 26107908 DOI: 10.1021/acs.est.5b01633] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Ballast water is one of the most important vectors for the transport of non-native species to new aquatic environments. Due to the development of new ballast water quality standards for viruses, this study aimed to determine the taxonomic diversity and composition of viral communities (viromes) in ballast and harbor waters using metagenomics approaches. Ballast waters from different sources within the North America Great Lakes and paired harbor waters were collected around the Port of Duluth-Superior. Bioinformatics analysis of over 550 million sequences showed that a majority of the viral sequences could not be assigned to any taxa associated with reference sequences, indicating the lack of knowledge on viruses in ballast and harbor waters. However, the assigned viruses were dominated by double-stranded DNA phages, and sequences associated with potentially emerging viral pathogens of fish and shrimp were detected with low amino acid similarity in both ballast and harbor waters. Annotation-independent comparisons showed that viromes were distinct among the Great Lakes, and the Great Lakes viromes were closely related to viromes of other cold natural freshwater systems but distant from viromes of marine and human designed/managed freshwater systems. These results represent the most detailed characterization to date of viruses in ballast water, demonstrating their diversity and the potential significance of the ship-mediated spread of viruses.
Collapse
|
|
10 |
53 |
10
|
Abstract
Blunders which occurred over a 1 year period in the clinical chemistry departments of two health districts were recorded and categorized according to type and detection stage. A blunder was defined as an incident leading to an incorrect result/set of results either being reported or detected at the final checking-out stage in the laboratory. Of the total of 120 blunders--which is a blunder rate of less than 0.1% of requests--53 (44%) were detected at the final checking-out stage. Blunders detected after the report had left the laboratory were divided into those subsequently picked up by laboratory personnel (23); those detected by clinicians (19); and those by external quality assessment schemes (21). The types of blunder were fairly equally distributed between the booking-in (36), analysis (38), and reporting (35) stages of the laboratory process. A formal review of blunders detected in laboratories is a valuable aid to overall performance.
Collapse
|
|
31 |
44 |
11
|
Müller HM, Rangarajan A, Teal TK, Sternberg PW. Textpresso for neuroscience: searching the full text of thousands of neuroscience research papers. Neuroinformatics 2008; 6:195-204. [PMID: 18949581 DOI: 10.1007/s12021-008-9031-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2008] [Accepted: 09/22/2008] [Indexed: 10/21/2022]
Abstract
Textpresso is a text-mining system for scientific literature. Its two major features are access to the full text of research papers and the development and use of categories of biological concepts as well as categories that describe or relate objects. A search engine enables the user to search for one or a combination of these categories and/or keywords within an entire literature. Here we describe Textpresso for Neuroscience, part of the core Neuroscience Information Framework (NIF). The Textpresso site currently consists of 67,500 full text papers and 131,300 abstracts. We show that using categories in literature can make a pure keyword query more refined and meaningful. We also show how semantic queries can be formulated with categories only. We explain the build and content of the database and describe the main features of the web pages and the advanced search options. We also give detailed illustrations of the web service developed to provide programmatic access to Textpresso. This web service is used by the NIF interface to access Textpresso. The standalone website of Textpresso for Neuroscience can be accessed at http://www.textpresso.org/neuroscience/.
Collapse
|
Research Support, Non-U.S. Gov't |
17 |
35 |
12
|
Johnson BK, Scholz MB, Teal TK, Abramovitch RB. SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis. BMC Bioinformatics 2016; 17:66. [PMID: 26847232 PMCID: PMC4743240 DOI: 10.1186/s12859-016-0923-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 01/29/2016] [Indexed: 12/18/2022] Open
Abstract
Background Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. Results To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. Conclusions SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
33 |
13
|
Seviour PW, Teal TK, Richmond W, Elkeles RS. Serum lipids, lipoproteins and macrovascular disease in non-insulin-dependent diabetics: a possible new approach to prevention. Diabet Med 1988; 5:166-71. [PMID: 2964984 DOI: 10.1111/j.1464-5491.1988.tb00965.x] [Citation(s) in RCA: 29] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The relationship between macrovascular disease and serum lipids, low density lipoprotein (LDL) cholesterol, high density lipoprotein (HDL), and subfraction cholesterol, and apolipoproteins has been examined in 53 female and 95 male patients with non-insulin-dependent diabetes mellitus (NIDDM). In males, those with macrovascular disease had higher serum and LDL cholesterol concentrations than those without. In females, those with macrovascular disease had higher levels of serum triglyceride, cholesterol, LDL cholesterol, as well as lower HDL, HDL2, and HDL3 cholesterol and apoprotein A-1, than those without. On multivariate analysis, LDL cholesterol was the most important association with macrovascular disease in males and apoprotein A-1 in females. In a subgroup of 36 patients, a double-blind placebo controlled study using bezafibrate or placebo, in addition to conventional oral hypoglycaemic therapy over 4 months, showed falls in serum and LDL cholesterol and in serum triglyceride and a rise in HDL cholesterol in the treated group. These changes should reduce the incidence of macrovascular disease in NIDDM and we suggest further prospective studies of such therapy in addition to conventional oral hypoglycaemic agents.
Collapse
|
Clinical Trial |
37 |
29 |
14
|
Hampton SE, Jones MB, Wasser LA, Schildhauer MP, Supp SR, Brun J, Hernandez RR, Boettiger C, Collins SL, Gross LJ, Fernández DS, Budden A, White EP, Teal TK, Labou SG, Aukema JE. Skills and Knowledge for Data-Intensive Environmental Research. Bioscience 2017; 67:546-557. [PMID: 28584342 PMCID: PMC5451289 DOI: 10.1093/biosci/bix025] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap.
Collapse
|
Journal Article |
8 |
29 |
15
|
Abstract
Extremely large datasets have become routine in biology. However, performing a computational analysis of a large dataset can be overwhelming, especially for novices. Here, we present a step-by-step guide to computing workflows with the biologist end-user in mind. Starting from a foundation of sound data management practices, we make specific recommendations on how to approach and perform computational analyses of large datasets, with a view to enabling sound, reproducible biological research.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
19 |
16
|
Teal TK, Reed M, Stevens PE, Lamb EJ. Stability of parathyroid hormone ex vivo in haemodialysis patients. Ann Clin Biochem 2003; 40:191-3. [PMID: 12662412 DOI: 10.1258/000456303763046175] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
BACKGROUND The stability of parathyroid hormone (PTH) in blood ex vivo is a significant practical problem for laboratories and clinicians. Several studies have suggested that PTH is more stable in blood collected into a potassium edetate (EDTA) preservative. METHODS To confirm that this was applicable to renal dialysis patients using our assay (Nichols chemiluminescence), we examined PTH stability in 13 patients with end-stage renal failure using three different blood collection tubes. RESULTS PTH remained stable in EDTA plasma for up to 48 h at room temperature. PTH was significantly reduced in serum collected into plain tubes after 2 h, and after 4 h in serum collected into serum separator tubes, at room temperature. CONCLUSION In the assessment of renal osteodystrophy, the use of EDTA plasma can confer significant benefit, especially in busy laboratories where rapid frozen separation of blood may be hard to achieve.
Collapse
|
|
22 |
15 |
17
|
Teal TK, Schmidt TM. Identifying and removing artificial replicates from 454 pyrosequencing data. Cold Spring Harb Protoc 2010; 2010:pdb.prot5409. [PMID: 20360363 DOI: 10.1101/pdb.prot5409] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
An intrinsic artifact of 454-based pyrosequencing leads to artificial overrepresentation of >10% of the original DNA sequencing templates. This artificial amplification of sequences is unbiased with regard to position on the pyrosequencing plate or sequence identity, and it occurs in all currently available 454 technologies. The amplified sequences start at the same position and are identical (duplicates), or vary in length, or contain a sequencing discrepancy. If the abundance of any sequence in a data set is going to be enumerated, either for comparative community analysis, transcriptional analysis or other applications, it is important to remove these artificial replicates before analysis. A web-based tool that incorporates the clustering algorithm cd-hit was developed to identify and remove artificially replicated sequences in 454-based pyrosequencing data sets. This tool cannot be used for data sets that have an initial amplification step before the standard pyrosequencing procedure, because artificial replicates cannot be distinguished from expected replication due to polymerase chain reaction (PCR) amplification, e.g., in sequencing of amplified gene "tags." This protocol provides details on how to use the replicate filter and obtain a file of unique sequences for use in metagenomic or transcriptomic analyses.
Collapse
|
Journal Article |
15 |
13 |
18
|
Smith AM, Niemeyer KE, Katz DS, Barba LA, Githinji G, Gymrek M, Huff KD, Madan CR, Cabunoc Mayes A, Moerman KM, Prins P, Ram K, Rokem A, Teal TK, Valls Guimera R, Vanderplas JT. Journal of Open Source Software (JOSS): design and first-year review. PEERJ PREPRINTS 2018; 4:e147. [PMID: 32704456 PMCID: PMC7340488 DOI: 10.7717/peerj-cs.147] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 01/24/2018] [Indexed: 06/01/2023]
Abstract
This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a digital object identifier (DOI), deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative (OSI).
Collapse
|
research-article |
7 |
11 |
19
|
Williams JJ, Teal TK. A vision for collaborative training infrastructure for bioinformatics. Ann N Y Acad Sci 2016; 1387:54-60. [PMID: 27603332 DOI: 10.1111/nyas.13207] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Revised: 07/14/2016] [Accepted: 07/20/2016] [Indexed: 11/29/2022]
Abstract
In biology, a missing link connecting data generation and data-driven discovery is the training that prepares researchers to effectively manage and analyze data. National and international cyberinfrastructure along with evolving private sector resources place biologists and students within reach of the tools needed for data-intensive biology, but training is still required to make effective use of them. In this concept paper, we review a number of opportunities and challenges that can inform the creation of a national bioinformatics training infrastructure capable of servicing the large number of emerging and existing life scientists. While college curricula are slower to adapt, grassroots startup-spirited organizations, such as Software and Data Carpentry, have made impressive inroads in training on the best practices of software use, development, and data analysis. Given the transformative potential of biology and medicine as full-fledged data sciences, more support is needed to organize, amplify, and assess these efforts and their impacts.
Collapse
|
Review |
9 |
11 |
20
|
Richmond W, Seviour PW, Teal TK, Elkeles RS. Impaired intravascular lipolysis with changes in concentrations of high density lipoprotein subclasses in young smokers. BMJ : BRITISH MEDICAL JOURNAL 1987; 295:246-7. [PMID: 3115392 PMCID: PMC1247082 DOI: 10.1136/bmj.295.6592.246] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
research-article |
38 |
11 |
21
|
Shade A, Dunivin TK, Choi J, Teal TK, Howe AC. Strategies for Building Computing Skills To Support Microbiome Analysis: a Five-Year Perspective from the EDAMAME Workshop. mSystems 2019; 4:e00297-19. [PMID: 31431509 PMCID: PMC6702294 DOI: 10.1128/msystems.00297-19] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 07/07/2019] [Indexed: 01/13/2023] Open
Abstract
Here, we report our educational approach and learner evaluations of the first 5 years of the Explorations in Data Analysis for Metagenomic Advances in Microbial Ecology (EDAMAME) workshop, held annually at Michigan State University's Kellogg Biological Station from 2014 to 2018. We hope this information will be useful for others who want to organize computing-intensive workshops and will encourage quantitative skill development among microbiologists.IMPORTANCE High-throughput sequencing and related statistical and bioinformatic analyses have become routine in microbiology in the past decade, but there are few formal training opportunities to develop these skills. A weeklong workshop can offer sufficient time for novices to become introduced to best computing practices and common workflows in sequence analysis. We report our experiences in executing such a workshop targeted to professional learners (graduate students, postdoctoral scientists, faculty, and research staff).
Collapse
|
research-article |
6 |
9 |
22
|
Knight BL, Teal TK. A comparison between heat-stable phosphatase inhibitors and activators from rabbit skeletal muscle and liver and their effects upon different preparations of phosphoprotein phosphatase. EUROPEAN JOURNAL OF BIOCHEMISTRY 1980; 104:521-8. [PMID: 6244953 DOI: 10.1111/j.1432-1033.1980.tb04454.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
|
45 |
8 |
23
|
Friesner J, Assmann SM, Bastow R, Bailey-Serres J, Beynon J, Brendel V, Buell CR, Bucksch A, Busch W, Demura T, Dinneny JR, Doherty CJ, Eveland AL, Falter-Braun P, Gehan MA, Gonzales M, Grotewold E, Gutierrez R, Kramer U, Krouk G, Ma S, Markelz RJC, Megraw M, Meyers BC, Murray JAH, Provart NJ, Rhee S, Smith R, Spalding EP, Taylor C, Teal TK, Torii KU, Town C, Vaughn M, Vierstra R, Ware D, Wilkins O, Williams C, Brady SM. The Next Generation of Training for Arabidopsis Researchers: Bioinformatics and Quantitative Biology. PLANT PHYSIOLOGY 2017; 175:1499-1509. [PMID: 29208732 PMCID: PMC5717721 DOI: 10.1104/pp.17.01490] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 10/31/2017] [Indexed: 05/20/2023]
Abstract
Training for experimental plant biologists needs to combine bioinformatics, quantitative approaches, computational biology, and training in the art of collaboration, best achieved through fully integrated curriculum development.
Collapse
|
article-commentary |
8 |
7 |
24
|
Sahneh F, Balk MA, Kisley M, Chan CK, Fox M, Nord B, Lyons E, Swetnam T, Huppenkothen D, Sutherland W, Walls RL, Quinn DP, Tarin T, LeBauer D, Ribes D, Birnie DP, Lushbough C, Carr E, Nearing G, Fischer J, Tyle K, Carrasco L, Lang M, Rose PW, Rushforth RR, Roy S, Matheson T, Lee T, Brown CT, Teal TK, Papeș M, Kobourov S, Merchant N. Ten simple rules to cultivate transdisciplinary collaboration in data science. PLoS Comput Biol 2021; 17:e1008879. [PMID: 33983959 PMCID: PMC8118297 DOI: 10.1371/journal.pcbi.1008879] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|
Editorial |
4 |
6 |
25
|
Abstract
For many adaptive complex systems information about the environment is not simply recorded in a look-up table, but is rather encoded in a theory, schema, or model, which compresses information. The grammar of a language can be viewed as such a schema or theory. In a prior study [Teal et al., 1999] we proposed several conjectures about the learning and evolution of language that should follow from these observations: (C1) compression aids in generalization; (C2) compression occurs more easily in a "smooth," as opposed to a "rugged," problem space: and (C3) constraints from compression make it likely that natural languages evolve towards smooth string spaces. This previous work found general, if not complete support for these three conjectures. Here we build on that study to clarify the relationship between Minimum Description Length (MDL) and error in our model and examine evolution of certain languages in more detail. Our results suggest a fourth conjecture: that all else being equal, (C4) more complex languages change more rapidly during evolution.
Collapse
|
|
25 |
2 |