26
|
Wong HM, Bridges SM, Yiu CKY, McGrath CPJ, Au TK, Parthasarathy DS. Development and validation of Hong Kong Rapid Estimate of Adult Literacy in Dentistry. ACTA ACUST UNITED AC 2012; 3:118-27. [PMID: 22319026 DOI: 10.1111/j.2041-1626.2012.00113.x] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AIM To develop and validate an instrument, the Hong Kong Rapid Estimate of Adult Literacy in Dentistry (HKREALD-30). METHODS The Rapid Estimate of Adult Literacy in Dentistry (REALD-99) was translated into Chinese and modified in the pretest. A total of 200 parents of pediatric dental patients were interviewed using this modified scale and administered additional three sets of self-reported questionnaires. The 99 items of the scale were reduced to 30 (HKREALD-30). Concurrent validity was tested by comparing the HKREALD-30 scores with the participants' educational level, pattern of dental visits and reading habits. Convergent validity was tested by examining the association between HKREALD-30 and the Test of Functional Health Literacy in Dentistry (TOFHLiD). The test-retest reliability and internal consistency of HKREALD-30 were also evaluated. RESULTS A significant correlation (P < 0.01) was found between HKREALD-30 and participants' reading habits. HKREALD-30 was also highly correlated with TOFHLiD (Spearman's rho = 0.693, P < 0.01). In the regression model, HKREALD-30 was positively associated with TOFHLiD (P < 0.05) after controlling for participants' characteristics. The intra-class correlation coefficient of HKREALD-30 was 0.78 and the Cronbach's alpha was 0.84. CONCLUSION Initial testing of HKREALD-30 suggested that it is a valid and reliable instrument for the basic screening of oral health literacy among Chinese people in Hong Kong.
Collapse
|
27
|
Malone BM, Tan F, Bridges SM, Peng Z. Comparison of four ChIP-Seq analytical algorithms using rice endosperm H3K27 trimethylation profiling data. PLoS One 2011; 6:e25260. [PMID: 21984925 PMCID: PMC3184143 DOI: 10.1371/journal.pone.0025260] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2011] [Accepted: 08/30/2011] [Indexed: 11/18/2022] Open
Abstract
Chromatin immunoprecipitation coupled with high throughput DNA Sequencing (ChIP-Seq) has emerged as a powerful tool for genome wide profiling of the binding sites of proteins associated with DNA such as histones and transcription factors. However, no peak calling program has gained consensus acceptance by the scientific community as the preferred tool for ChIP-Seq data analysis. Analyzing the large data sets generated by ChIP-Seq studies remains highly challenging for most molecular biology laboratories.Here we profile H3K27me3 enrichment sites in rice young endosperm using the ChIP-Seq approach and analyze the data using four peak calling algorithms (FindPeaks, PeakSeq, USeq, and MACS). Comparison of the four algorithms reveals that these programs produce very different peaks in terms of peak size, number, and position relative to genes. We verify the peak predictions using ChIP-PCR to evaluate the accuracy of peak prediction of the four algorithms. We discuss the approach of each algorithm and compare similarities and differences in the results. Despite their differences in the peaks identified, all of the programs reach similar conclusions about the effect of H3K27me3 on gene expression. Its presence either upstream or downstream of a gene is predominately associated with repression of the gene. Additionally, GO analysis finds that a substantially higher ratio of genes associated with H3K27me3 were involved in multicellular organism development, signal transduction, response to external and endogenous stimuli, and secondary metabolic pathways than the rest of the rice genome.
Collapse
|
28
|
Lang NP, Bridges SM, Lulic M. Implant dentistry in undergraduate dental curricula in South-East Asia: forum workshop at the University of Hong Kong, Prince Philip Dental Hospital, 19-20 November 2010. JOURNAL OF INVESTIGATIVE AND CLINICAL DENTISTRY 2011; 2:152-155. [PMID: 25426784 DOI: 10.1111/j.2041-1626.2011.00085.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper reports on the discussions arising from a 2-day forum on implant dentistry education in South-East Asia. The 10 institutions present represented undergraduate and postgraduate dental curricula from seven countries, including Hong Kong, Indonesia, Malaysia, Taiwan, Thailand, the Philippines, and Singapore. While not aiming to reach consensus as in other such conferences, the outcome was positive in establishing realistic goals in university education in implant dentistry for curriculum leaders and developers.
Collapse
|
29
|
Sanders WS, Johnston CI, Bridges SM, Burgess SC, Willeford KO. Prediction of cell penetrating peptides by support vector machines. PLoS Comput Biol 2011; 7:e1002101. [PMID: 21779156 PMCID: PMC3136433 DOI: 10.1371/journal.pcbi.1002101] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2011] [Accepted: 05/09/2011] [Indexed: 11/18/2022] Open
Abstract
Cell penetrating peptides (CPPs) are those peptides that can transverse cell membranes to enter cells. Once inside the cell, different CPPs can localize to different cellular components and perform different roles. Some generate pore-forming complexes resulting in the destruction of cells while others localize to various organelles. Use of machine learning methods to predict potential new CPPs will enable more rapid screening for applications such as drug delivery. We have investigated the influence of the composition of training datasets on the ability to classify peptides as cell penetrating using support vector machines (SVMs). We identified 111 known CPPs and 34 known non-penetrating peptides from the literature and commercial vendors and used several approaches to build training data sets for the classifiers. Features were calculated from the datasets using a set of basic biochemical properties combined with features from the literature determined to be relevant in the prediction of CPPs. Our results using different training datasets confirm the importance of a balanced training set with approximately equal number of positive and negative examples. The SVM based classifiers have greater classification accuracy than previously reported methods for the prediction of CPPs, and because they use primary biochemical properties of the peptides as features, these classifiers provide insight into the properties needed for cell-penetration. To confirm our SVM classifications, a subset of peptides classified as either penetrating or non-penetrating was selected for synthesis and experimental validation. Of the synthesized peptides predicted to be CPPs, 100% of these peptides were shown to be penetrating. Cell penetrating peptides (CPPs) are peptides that can potentially transport other functional molecules across cellular membranes and therefore serve a role as drug delivery vehicles. The properties of a given peptide that make it cell penetrating are unclear, and the rapid screening of potential CPPs aids researchers by allowing focus on those peptides most likely to be utilized in a therapeutic capacity. This paper shows that basic features representing primary biochemical properties of these peptides can be used to train a classifier that can accurately predict cell penetrating potential of peptides and provide insight into the biochemical properties associated with cell penetration.
Collapse
|
30
|
Sanders WS, Wang N, Bridges SM, Malone BM, Dandass YS, McCarthy FM, Nanduri B, Lawrence ML, Burgess SC. The proteogenomic mapping tool. BMC Bioinformatics 2011; 12:115. [PMID: 21513508 PMCID: PMC3107813 DOI: 10.1186/1471-2105-12-115] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2010] [Accepted: 04/22/2011] [Indexed: 11/25/2022] Open
Abstract
Background High-throughput mass spectrometry (MS) proteomics data is increasingly being used to complement traditional structural genome annotation methods. To keep pace with the high speed of experimental data generation and to aid in structural genome annotation, experimentally observed peptides need to be mapped back to their source genome location quickly and exactly. Previously, the tools to do this have been limited to custom scripts designed by individual research groups to analyze their own data, are generally not widely available, and do not scale well with large eukaryotic genomes. Results The Proteogenomic Mapping Tool includes a Java implementation of the Aho-Corasick string searching algorithm which takes as input standardized file types and rapidly searches experimentally observed peptides against a given genome translated in all 6 reading frames for exact matches. The Java implementation allows the application to scale well with larger eukaryotic genomes while providing cross-platform functionality. Conclusions The Proteogenomic Mapping Tool provides a standalone application for mapping peptides back to their source genome on a number of operating system platforms with standard desktop computer hardware and executes very rapidly for a variety of datasets. Allowing the selection of different genetic codes for different organisms allows researchers to easily customize the tool to their own research interests and is recommended for anyone working to structurally annotate genomes using MS derived proteomics data.
Collapse
|
31
|
Bridges SM, Yiu CKY, McGrath CP. Multilingual interactions in clinical dental education: a focus on mediated interpreting. Commun Med 2011; 8:197-210. [PMID: 23264983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
In clinical dental consultations in multilingual contexts, medical interpreting is often performed by the supporting staff as part of routine triadic formulations. As academic dentistry becomes increasingly internationalized, issues of language and culture add to the interactional complexity of clinical communication and education. A multivariate approach was adopted to investigate one case of multilingualism in dentistry in Asia. Collection of both survey (n = 86) and interactional data provided empirical evidence regarding language use and language demands across integrated Polyclinics. Descriptive statistics of Dental Surgery Assistant (DSA) perception data and conversation analysis (CA) of mediated interpretation indicate that, as members of the oral healthcare team, DSAs in Hong Kong are an essential resource in their role of intercultural mediators between patients and clinicians, both staff and students. Discussion of sociolinguistic notions of place-as-location and place-as-meaning supports a wider conceptualization of the role of support staff as interpreters in clinical settings. Implications are drawn for policy, curriculum and staff development.
Collapse
|
32
|
McCarthy FM, Gresham CR, Buza TJ, Chouvarine P, Pillai LR, Kumar R, Ozkan S, Wang H, Manda P, Arick T, Bridges SM, Burgess SC. AgBase: supporting functional modeling in agricultural organisms. Nucleic Acids Res 2010; 39:D497-506. [PMID: 21075795 PMCID: PMC3013706 DOI: 10.1093/nar/gkq1115] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
AgBase (http://www.agbase.msstate.edu/) provides resources to facilitate modeling of functional genomics data and structural and functional annotation of agriculturally important animal, plant, microbe and parasite genomes. The website is redesigned to improve accessibility and ease of use, including improved search capabilities. Expanded capabilities include new dedicated pages for horse, cat, dog, cotton, rice and soybean. We currently provide 590 240 Gene Ontology (GO) annotations to 105 454 gene products in 64 different species, including GO annotations linked to transcripts represented on agricultural microarrays. For many of these arrays, this provides the only functional annotation available. GO annotations are available for download and we provide comprehensive, species-specific GO annotation files for 18 different organisms. The tools available at AgBase have been expanded and several existing tools improved based upon user feedback. One of seven new tools available at AgBase, GOModeler, supports hypothesis testing from functional genomics data. We host several associated databases and provide genome browsers for three agricultural pathogens. Moreover, we provide comprehensive training resources (including worked examples and tutorials) via links to Educational Resources at the AgBase website.
Collapse
|
33
|
Bridges SM, Botelho MG, Tsang PCS. PBL.2.0: blended learning for an interactive, problem-based pedagogy. MEDICAL EDUCATION 2010; 44:1131. [PMID: 20946496 DOI: 10.1111/j.1365-2923.2010.03830.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
|
34
|
Harhay GP, Smith TP, Alexander LJ, Haudenschild CD, Keele JW, Matukumalli LK, Schroeder SG, Van Tassell CP, Gresham CR, Bridges SM, Burgess SC, Sonstegard TS. An atlas of bovine gene expression reveals novel distinctive tissue characteristics and evidence for improving genome annotation. Genome Biol 2010; 11:R102. [PMID: 20961407 PMCID: PMC3218658 DOI: 10.1186/gb-2010-11-10-r102] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Revised: 07/22/2010] [Accepted: 10/20/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A comprehensive transcriptome survey, or gene atlas, provides information essential for a complete understanding of the genomic biology of an organism. We present an atlas of RNA abundance for 92 adult, juvenile and fetal cattle tissues and three cattle cell lines. RESULTS The Bovine Gene Atlas was generated from 7.2 million unique digital gene expression tag sequences (300.2 million total raw tag sequences), from which 1.59 million unique tag sequences were identified that mapped to the draft bovine genome accounting for 85% of the total raw tag abundance. Filtering these tags yielded 87,764 unique tag sequences that unambiguously mapped to 16,517 annotated protein-coding loci in the draft genome accounting for 45% of the total raw tag abundance. Clustering of tissues based on tag abundance profiles generally confirmed ontology classification based on anatomy. There were 5,429 constitutively expressed loci and 3,445 constitutively expressed unique tag sequences mapping outside annotated gene boundaries that represent a resource for enhancing current gene models. Physical measures such as inferred transcript length or antisense tag abundance identified tissues with atypical transcriptional tag profiles. We report for the first time the tissue-specific variation in the proportion of mitochondrial transcriptional tag abundance. CONCLUSIONS The Bovine Gene Atlas is the deepest and broadest transcriptome survey of any livestock genome to date. Commonalities and variation in sense and antisense transcript tag profiles identified in different tissues facilitate the examination of the relationship between gene expression, tissue, and gene function.
Collapse
|
35
|
Kelley RY, Gresham C, Harper J, Bridges SM, Warburton ML, Hawkins LK, Pechanova O, Peethambaran B, Pechan T, Luthe DS, Mylroie JE, Ankala A, Ozkan S, Henry WB, Williams WP. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize. BMC Bioinformatics 2010; 11 Suppl 6:S25. [PMID: 20946609 PMCID: PMC3026372 DOI: 10.1186/1471-2105-11-s6-s25] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. RESULTS In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. CONCLUSIONS CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu.
Collapse
|
36
|
Manda P, Freeman MG, Bridges SM, Jankun-Kelly TJ, Nanduri B, McCarthy FM, Burgess SC. GOModeler--a tool for hypothesis-testing of functional genomics datasets. BMC Bioinformatics 2010; 11 Suppl 6:S29. [PMID: 20946613 PMCID: PMC3026376 DOI: 10.1186/1471-2105-11-s6-s29] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Functional genomics technologies that measure genome expression at a global scale are accelerating biological knowledge discovery. Generating these high throughput datasets is relatively easy compared to the downstream functional modelling necessary for elucidating the molecular mechanisms that govern the biology under investigation. A number of publicly available ‘discovery-based’ computational tools use the computationally amenable Gene Ontology (GO) for hypothesis generation. However, there are few tools that support hypothesis-based testing using the GO and none that support testing with user defined hypothesis terms. Here, we present GOModeler, a tool that enables researchers to conduct hypothesis-based testing of high throughput datasets using the GO. GOModeler summarizes the overall effect of a user defined gene/protein differential expression dataset on specific GO hypothesis terms selected by the user to describe a biological experiment. The design of the tool allows the user to complement the functional information in the GO with his/her domain specific expertise for comprehensive hypothesis testing. Results GOModeler tests the relevance of the hypothesis terms chosen by the user for the input gene dataset by providing the individual effects of the genes on the hypothesis terms and the overall effect of the entire dataset on each of the hypothesis terms. It matches the GO identifiers (ids) of the genes with the GO ids of the hypothesis terms and parses the names of those ids that match to assign effects. We demonstrate the capabilities of GOModeler with a dataset of nine differentially expressed cytokine genes and compare the results to those obtained through manual analysis of the dataset by an immunologist. The direction of overall effects on all hypothesis terms except one was consistent with the results obtained by manual analysis. The tool’s editing capability enables the user to augment the information extracted. GOModeler is available as a part of the AgBase tool suite (http://www.agbase.msstate.edu). Conclusions GOModeler allows hypothesis driven analysis of high throughput datasets using the GO. Using this tool, researchers can quickly evaluate the overall effect of quantitative expression changes of gene set on specific biological processes of interest. The results are provided in both tabular and graphical formats.
Collapse
|
37
|
Paul D, Bridges SM, Burgess SC, Dandass YS, Lawrence ML. Complete genome and comparative analysis of the chemolithoautotrophic bacterium Oligotropha carboxidovorans OM5. BMC Genomics 2010; 11:511. [PMID: 20863402 PMCID: PMC3091675 DOI: 10.1186/1471-2164-11-511] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Accepted: 09/23/2010] [Indexed: 11/30/2022] Open
Abstract
Background Oligotropha carboxidovorans OM5 T. (DSM 1227, ATCC 49405) is a chemolithoautotrophic bacterium capable of utilizing CO (carbon monoxide) and fixing CO2 (carbon dioxide). We previously published the draft genome of this organism and recently submitted the complete genome sequence to GenBank. Results The genome sequence of the chemolithoautotrophic bacterium Oligotropha carboxidovorans OM5 consists of a 3.74-Mb chromosome and a 133-kb megaplasmid that contains the genes responsible for utilization of carbon monoxide, carbon dioxide, and hydrogen. To our knowledge, this strain is the first one to be sequenced in the genus Oligotropha, the closest fully sequenced relatives being Bradyrhizobium sp. BTAi and USDA110 and Nitrobacter hamburgiensis X14. Analysis of the O. carboxidovorans genome reveals potential links between plasmid-encoded chemolithoautotrophy and chromosomally-encoded lipid metabolism. Comparative analysis of O. carboxidovorans with closely related species revealed differences in metabolic pathways, particularly in carbohydrate and lipid metabolism, as well as transport pathways. Conclusion Oligotropha, Bradyrhizobium sp and Nitrobacter hamburgiensis X14 are phylogenetically proximal. Although there is significant conservation of genome organization between the species, there are major differences in many metabolic pathways that reflect the adaptive strategies unique to each species.
Collapse
|
38
|
Nanduri B, Wang N, Lawrence ML, Bridges SM, Burgess SC. Gene model detection using mass spectrometry. Methods Mol Biol 2010; 604:137-44. [PMID: 20013369 DOI: 10.1007/978-1-60761-444-9_10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
The utility of a genome sequence in biological research depends entirely on the comprehensive description of all of its functional elements. Analysis of genome sequences is still predominantly gene-centric (i.e., identifying gene models/open reading frames). In this article, we describe a proteomics-based method for identifying open reading frames that are missed by computational algorithms. Mass spectrometry-based identification of peptides and proteins from biological samples provide evidence for the expression of the genome sequence at the protein level. This proteogenomic annotation method combines computationally predicted ORFs and the genome sequence with proteomics to identify novel gene models. We also describe our proteogenomic mapping pipeline - a set of computational tools that automate the proteogenomic annotation work flow. This pipeline is available for download at www.agbase.msstate.edu/tools/ .
Collapse
|
39
|
van den Berg BHJ, Thanthiriwatte C, Manda P, Bridges SM. Comparing gene annotation enrichment tools for functional modeling of agricultural microarray data. BMC Bioinformatics 2009; 10 Suppl 11:S9. [PMID: 19811693 PMCID: PMC3226198 DOI: 10.1186/1471-2105-10-s11-s9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The widespread availability of microarray technology has driven functional genomics to the forefront as scientists seek to draw meaningful biological conclusions from their microarray results. Gene annotation enrichment analysis is a functional analysis technique that has gained widespread attention and for which many tools have been developed. Unfortunately, most of these tools have limited support for agricultural species. Here, we evaluate and compare four publicly available computational tools (Onto-Express, EasyGO, GOstat, and DAVID) that support analysis of gene expression datasets in agricultural species. We use AgBase as the functional annotation reference for agricultural species. The selected tools were evaluated based on i) available features, usage and accessibility, ii) implemented statistical computational methods, and iii) annotation and enrichment performance analysis. Annotation was assessed using a randomly selected test gene annotation set and an experimental differentially expressed gene-set – both from chicken. The experimental set was also used to evaluate identification of enriched functional groups. Comparison of the tools shows that they produce different sets of annotations for the two datasets and different functional groups for the experimental dataset. While DAVID, GOstat and Onto-Express annotate comparable numbers of genes, DAVID provides by far the most annotations per gene. However, many of DAVID's annotations appear to be redundant or are at very high levels in the GO hierarchy. The GOSlim distribution of annotations shows that GOstat, Onto-Express and EasyGO provide similar GO distributions to those found in AgBase while annotations from DAVID show a different GOSlim distribution, again probably due to duplication and many non-specific terms. No consistent trends were found in results of GO term over/under representation analysis applied to the experimental data using different tools. While GOstat, David and Onto-Express could retrieve some significantly enriched terms, EasyGO did not show any significantly enriched terms. There was little agreement about the enriched terms identified by the tools. Conclusion Different tools for functionally annotating gene sets and identifying significantly enriched GO categories differ widely in their results when applied to a test annotation gene set and an experimental dataset from chicken. These results emphasize the need for care when interpreting the results of such analysis and the lack of standardization of approaches.
Collapse
|
40
|
Malone BM, Perkins AD, Bridges SM. Integrating phenotype and gene expression data for predicting gene function. BMC Bioinformatics 2009; 10 Suppl 11:S20. [PMID: 19811686 PMCID: PMC3226192 DOI: 10.1186/1471-2105-10-s11-s20] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023] Open
Abstract
Background This paper presents a framework for integrating disparate data sets to predict gene function. The algorithm constructs a graph, called an integrated similarity graph, by computing similarities based upon both gene expression and textual phenotype data. This integrated graph is then used to make predictions about whether individual genes should be assigned a particular annotation from the Gene Ontology. Results A combined graph was generated from publicly-available gene expression data and phenotypic information from Saccharomyces cerevisiae. This graph was used to assign annotations to genes, as were graphs constructed from gene expression data and textual phenotype information alone. While the F-measure appeared similar for all three methods, annotations based upon the integrated similarity graph exhibited a better overall precision than gene expression or phenotype information alone can generate. The integrated approach was also able to assign almost as many annotations as the gene expression method alone, and generated significantly more total and correct assignments than the phenotype information could provide. Conclusion These results suggest that augmenting standard gene expression data sets with publicly-available textual phenotype data can help generate more precise functional annotation predictions while mitigating the weaknesses of a standard textual phenotype approach.
Collapse
|
41
|
Jankun-Kelly TJ, Lindeman AD, Bridges SM. Exploratory visual analysis of conserved domains on multiple sequence alignments. BMC Bioinformatics 2009; 10 Suppl 11:S7. [PMID: 19811691 PMCID: PMC3226196 DOI: 10.1186/1471-2105-10-s11-s7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Multiple alignment of protein sequences can provide insight into sequence conservation across many species and thus allow identification of those sections of the sequence most critical to protein function. This insight can be augmented by joint display of conserved domains along the sequences. By fusing this metadata visually, biologists can analyze sequence conservation and functional motifs simultaneously and efficiently. Results We present MSAVis, a new approach combining luminance and hue for simultaneous visualization of conserved motifs and sequence alignment. Input for the algorithm is a multiple sequence alignment in a standard format. The NCBI Conserved Domain Database (CDD) is used for finding conserved domains along the alignment. The visualization quickly identifies conserved domains, and allows both macro (sequence-long) and micro (small amino-acid neighborhood) views. Conclusion MSAVis utilizes two visual cues, luminance and hue, to facilitate at-a-glance summary of the conservation of a user-provided protein alignment while enabling multiple comparisons among functional domains. These visual cues are preattentive and separable so that the relationship between conservation strength and domain membership can be understood. The MSAVis software, written in Python and using BioPython and OpenGL, can be found at http://agbase.msstate.edu/tools/MSAVis.html.
Collapse
|
42
|
Kelley RY, Williams WP, Mylroie JE, Boykin DL, Hawkins LK, Windham GL, Brooks TD, Bridges SM, Scheffler BE, Wilkinson JR. Genomic profile of maize response toAspergillus flavusinfection. TOXIN REV 2009. [DOI: 10.1080/15569540903089239] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
43
|
Bridges SM, Dyson JE, Corbet EF. Blended learning, knowledge co-construction and undergraduate group work. MEDICAL EDUCATION 2009; 43:490-491. [PMID: 19422510 DOI: 10.1111/j.1365-2923.2009.03345.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
|
44
|
Chitteti BR, Tan F, Mujahid H, Magee BG, Bridges SM, Peng Z. Comparative analysis of proteome differential regulation during cell dedifferentiation in Arabidopsis. Proteomics 2009; 8:4303-16. [PMID: 18814325 DOI: 10.1002/pmic.200701149] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Cell dedifferentiation is a cell fate switching process in which differentiated cells undergo genome reprogramming to regain the competency of cell division and organ regeneration. The molecular mechanism underlying the cell dedifferentiation process remains obscure. In this report, we investigate the cell dedifferentiation process in Arabidopsis using a shotgun proteomics approach. A total of 758 proteins are identified by two or more matched peptides. Comparative analyses at four time points using two label-free methods reveal that 193 proteins display up-regulation and 183 proteins display down-regulation within 48 h. While the results of the two label-free quantification methods match well with each other, comparison with previously published 2-DE gel results reveal that label-free quantification results differ substantially from those of the 2-DE method for proteins with peptides common to multiple proteins, suggesting a limitation of the label-free methods in quantifying proteins with closely related family members in complex samples. Our results show that the shotgun approach and the traditional 2-DE gel approach complement each other in both protein identification and quantification. An interesting observation is that core histones and histone variants are subjected to extensive down-regulation, indicating that there is a dramatic change in the chromatin during cell differentiation.
Collapse
|
45
|
Bridges SM, Burgess SC, McCarthy FM. Introduction to the Proceedings of the Avian Genomics and Gene Ontology Annotation Workshop. BMC Genomics 2009; 10 Suppl 2:I1. [PMID: 19607650 PMCID: PMC2966328 DOI: 10.1186/1471-2164-10-s2-i1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The Avian Genomics Conference and Gene Ontology Annotation Workshop brought together researchers and students from around the world to present their latest research addressing the delivery of value from the billions of base-pairs of Archosaur sequence that have become available in the last few years. This editorial describes the conference itself and introduces the ten peer-reviewed manuscripts accepted for publications in the proceedings. These manuscripts address issues ranging from the poultry industry view of USDA genomics policy to the genomics of a wide variety of Archeosaur species including chicken, duck, alligator, and condors and their pathogens.
Collapse
|
46
|
Dandass YS, Burgess SC, Lawrence M, Bridges SM. Accelerating string set matching in FPGA hardware for bioinformatics research. BMC Bioinformatics 2008; 9:197. [PMID: 18412963 PMCID: PMC2374783 DOI: 10.1186/1471-2105-9-197] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2008] [Accepted: 04/15/2008] [Indexed: 11/16/2022] Open
Abstract
Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences). Results This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. Conclusion Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.
Collapse
|
47
|
Buza TJ, McCarthy FM, Wang N, Bridges SM, Burgess SC. Gene Ontology annotation quality analysis in model eukaryotes. Nucleic Acids Res 2008; 36:e12. [PMID: 18187504 PMCID: PMC2241866 DOI: 10.1093/nar/gkm1167] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Functional analysis using the Gene Ontology (GO) is crucial for array analysis, but it is often difficult for researchers to assess the amount and quality of GO annotations associated with different sets of gene products. In many cases the source of the GO annotations and the date the GO annotations were last updated is not apparent, further complicating a researchers’ ability to assess the quality of the GO data provided. Moreover, GO biocurators need to ensure that the GO quality is maintained and optimal for the functional processes that are most relevant for their research community. We report the GO Annotation Quality (GAQ) score, a quantitative measure of GO quality that includes breadth of GO annotation, the level of detail of annotation and the type of evidence used to make the annotation. As a case study, we apply the GAQ scoring method to a set of diverse eukaryotes and demonstrate how the GAQ score can be used to track changes in GO annotations over time and to assess the quality of GO annotations available for specific biological processes. The GAQ score also allows researchers to quantitatively assess the functional data available for their experimental systems (arrays or databases).
Collapse
|
48
|
Sanders WS, Bridges SM, McCarthy FM, Nanduri B, Burgess SC. Prediction of peptides observable by mass spectrometry applied at the experimental set level. BMC Bioinformatics 2007; 8 Suppl 7:S23. [PMID: 18047723 PMCID: PMC2099492 DOI: 10.1186/1471-2105-8-s7-s23] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background When proteins are subjected to proteolytic digestion and analyzed by mass spectrometry using a method such as 2D LC MS/MS, only a portion of the proteotypic peptides associated with each protein will be observed. The ability to predict which peptides can and cannot potentially be observed for a particular experimental dataset has several important applications in proteomics research including calculation of peptide coverage in terms of potentially detectable peptides, systems biology analysis of data sets, and protein quantification. Results We have developed a methodology for constructing artificial neural networks that can be used to predict which peptides are potentially observable for a given set of experimental, instrumental, and analytical conditions for 2D LC MS/MS (a.k.a Multidimensional Protein Identification Technology [MudPIT]) datasets. Neural network classifiers constructed using this procedure for two MudPIT datasets exhibit 10-fold cross validation accuracy of about 80%. We show that a classifier constructed for one dataset has poor predictive performance with the other dataset, thus demonstrating the need for dataset specific classifiers. Classification results with each dataset are used to compute informative percent amino acid coverage statistics for each protein in terms of the predicted detectable peptides in addition to the percent coverage of the complete sequence. We also demonstrate the utility of predicted peptide observability for systems analysis to help determine if proteins that were expected but not observed generate sufficient peptides for detection. Conclusion Classifiers that accurately predict the likelihood of detecting proteotypic peptides by mass spectrometry provide proteomics researchers with powerful new approaches for data analysis. We demonstrate that the procedure we have developed for building a classifier based on an individual experimental data set results in classifiers with accuracy comparable to those reported in the literature based on large training sets collected from multiple experiments. Our approach allows the researcher to construct a classifier that is specific for the experimental, instrument, and analytical conditions of a single experiment and amenable to local, condition-specific, implementation. The resulting classifiers have application in a number of areas such as determination of peptide coverage for protein identification, pathway analysis, and protein quantification.
Collapse
|
49
|
Bridges SM, Magee GB, Wang N, Williams WP, Burgess SC, Nanduri B. ProtQuant: a tool for the label-free quantification of MudPIT proteomics data. BMC Bioinformatics 2007; 8 Suppl 7:S24. [PMID: 18047724 PMCID: PMC2099493 DOI: 10.1186/1471-2105-8-s7-s24] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Effective and economical methods for quantitative analysis of high throughput mass spectrometry data are essential to meet the goals of directly identifying, characterizing, and quantifying proteins from a particular cell state. Multidimensional Protein Identification Technology (MudPIT) is a common approach used in protein identification. Two types of methods are used to detect differential protein expression in MudPIT experiments: those involving stable isotope labelling and the so-called label-free methods. Label-free methods are based on the relationship between protein abundance and sampling statistics such as peptide count, spectral count, probabilistic peptide identification scores, and sum of peptide Sequest XCorr scores (ΣXCorr). Although a number of label-free methods for protein quantification have been described in the literature, there are few publicly available tools that implement these methods. We describe ProtQuant, a Java-based tool for label-free protein quantification that uses the previously published ΣXCorr method for quantification and includes an improved method for handling missing data. Results ProtQuant was designed for ease of use and portability for the bench scientist. It implements the ΣXCorr method for label free protein quantification from MudPIT datasets. ProtQuant has a graphical user interface, accepts multiple file formats, is not limited by the size of the input files, and can process any number of replicates and any number of treatments. In addition,ProtQuant implements a new method for dealing with missing values for peptide scores used for quantification. The new algorithm, called ΣXCorr*, uses "below threshold" peptide scores to provide meaningful non-zero values for missing data points. We demonstrate that ΣXCorr* produces an average reduction in false positive identifications of differential expression of 25% compared to ΣXCorr. Conclusion ProtQuant is a tool for protein quantification built for multi-platform use with an intuitive user interface. ProtQuant efficiently and uniquely performs label-free quantification of protein datasets produced with Sequest and provides the user with facilities for data management and analysis. Importantly, ProtQuant is available as a self-installing executable for the Windows environment used by many bench scientists.
Collapse
|
50
|
McCarthy FM, Bridges SM, Burgess SC. GOing from functional genomics to biological significance. Cytogenet Genome Res 2007; 117:278-87. [PMID: 17675869 DOI: 10.1159/000103189] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2006] [Accepted: 08/16/2006] [Indexed: 11/19/2022] Open
Abstract
The chicken genome is sequenced and this, together with microarray and other functional genomics technologies, makes post-genomic research possible in the chicken. At this time, however, such research is hindered by a lack of genomic structural and functional annotations. Bio-ontologies have been developed for different annotation requirements, as well as to facilitate data sharing and computational analysis, but these are not yet optimally utilized in the chicken. Here we discuss genomic annotation and bio-ontologies. We focus specifically on the Gene Ontology (GO), chicken GO annotations and how these can facilitate functional genomics in the chicken. The GO is the most developed and widely used bio-ontology. It is the de facto standard for functional annotation. Despite its critical importance in analyzing microarray and other functional genomics data, relatively few chicken gene products have any GO annotation. When these are available, the average quality of chicken gene products annotations (defined using evidence code weight and annotation depth) is much less than in mouse. Moreover, tools allowing chicken researchers to easily and rapidly use the GO are either lacking or hard to use. To address all of these problems we developed ChickGO and AgBase. Chicken GO annotations are provided by complementary work at MSU-AgBase and EBI-GOA. The GO tools pipeline at AgBase uses GO to derive functional and biological significance from microarray and other functional genomics data. Not only will improved genomic annotation and tools to use these annotations benefit the chicken research community but they will also facilitate research in other avian species and comparative genomics.
Collapse
|