1
|
Abstract
The UniProt Knowledgebase is a public database for protein sequence and function, covering the tree of life. This Community Page article present a community submission system to harness timely scientific knowledge via crowdsourcing of the literature, creating a research ecosystem where researchers play an active role in scaling up UniProt curation, while receiving proper attribution for their biocuration work.
Collapse
|
2
|
UniProt genomic mapping for deciphering functional effects of missense variants. Hum Mutat 2019; 40:694-705. [PMID: 30840782 PMCID: PMC6563471 DOI: 10.1002/humu.23738] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Revised: 12/17/2018] [Accepted: 02/17/2019] [Indexed: 01/08/2023]
Abstract
Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated single nucleotide polymorphism (SNP) data show that 32% of UniProtKB variants colocate with 8% of ClinVar SNPs. The majority of colocated UniProtKB disease-associated variants (86%) map to 'pathogenic' ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein, and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.
Collapse
|
3
|
Eye-Tracking Study to Enhance Usability of Molecular Diagnostics Reports in Cancer Precision Medicine. JCO Precis Oncol 2018; 2:1-11. [PMID: 35135129 DOI: 10.1200/po.17.00296] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE We conducted usability studies on commercially available molecular diagnostic (MDX) test reports to identify strengths and weaknesses in content and form that drive clinical decision making. Given routine genomic testing in cancer medicine, oncologists must interpret MDX reports as well as evidence concerning clinical utility of biomarkers accurately for treatment or trial selection. This work aims to evaluate effectiveness of MDX reports in facilitating cancer treatment planning. METHODS Fourteen clinicians at an academic tertiary care medical facility, with a wide range of experience in oncology and in the use of molecular testing, participated in this study. Three commercially available, widely used, Clinical Laboratory Improvement Amendments (CLIA)-certified, College of American Pathologists (CAP)-accredited test reports (labeled Laboratories A, B, and C) were used. Eye tracking, surveys, and think-aloud protocols were used to collect usability data for these MDX reports focusing on ease of comprehension and actionability. RESULTS Clinicians found two primary areas in molecular diagnostic reports most useful for patient care: therapy options with benefit or lack of benefit to patients, including enrolling clinical trials; and pathogenic tumor molecular anomalies detected. Therapeutic implications and therapy classes such as US Food and Drug Administration-approved off-label, on-label, clinical trials were critical for decision making. However, all reports had usability and comprehension issues in these areas and could be improved. CONCLUSION Focused usability studies can help drive our understanding of the clinical workflow for use of molecular diagnostic tests in cancer care. This in turn can have major effects on quality of care, outcomes, costs, and patient satisfaction. This study demonstrates the use of specific usability techniques (eye tracking and think-aloud protocols) to help clinical laboratories improve MDX report design in a precision oncology treatment setting.
Collapse
|
4
|
Future of Evidence Synthesis in Precision Oncology: Between Systematic Reviews and Biocuration. JCO Precis Oncol 2018; 2:PO.17.00175. [PMID: 31930186 PMCID: PMC6953752 DOI: 10.1200/po.17.00175] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
5
|
Computational clustering for viral reference proteomes. Bioinformatics 2016; 32:2041-3. [PMID: 27153712 DOI: 10.1093/bioinformatics/btw110] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 02/21/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. RESULTS We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt's curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs. AVAILABILITY AND IMPLEMENTATION http://proteininformationresource.org/rps/viruses/ CONTACT chenc@udel.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
6
|
Abstract
RATIONALE Subtypes of cigarette smoke-induced disease affect different lung structures and may have distinct pathophysiological mechanisms. OBJECTIVE To determine if proteomic classification of the cellular and vascular origins of sputum proteins can characterize these mechanisms and phenotypes. SUBJECTS AND METHODS Individual sputum specimens from lifelong nonsmokers (n=7) and smokers with normal lung function (n=13), mucous hypersecretion with normal lung function (n=11), obstructed airflow without emphysema (n=15), and obstruction plus emphysema (n=10) were assessed with mass spectrometry. Data reduction, logarithmic transformation of spectral counts, and Cytoscape network-interaction analysis were performed. The original 203 proteins were reduced to the most informative 50. Sources were secretory dimeric IgA, submucosal gland serous and mucous cells, goblet and other epithelial cells, and vascular permeability. RESULTS Epithelial proteins discriminated nonsmokers from smokers. Mucin 5AC was elevated in healthy smokers and chronic bronchitis, suggesting a continuum with the severity of hypersecretion determined by mechanisms of goblet-cell hyperplasia. Obstructed airflow was correlated with glandular proteins and lower levels of Ig joining chain compared to other groups. Emphysema subjects' sputum was unique, with high plasma proteins and components of neutrophil extracellular traps, such as histones and defensins. In contrast, defensins were correlated with epithelial proteins in all other groups. Protein-network interactions were unique to each group. CONCLUSION The proteomes were interpreted as complex "biosignatures" that suggest distinct pathophysiological mechanisms for mucin 5AC hypersecretion, airflow obstruction, and inflammatory emphysema phenotypes. Proteomic phenotyping may improve genotyping studies by selecting more homogeneous study groups. Each phenotype may require its own mechanistically based diagnostic, risk-assessment, drug- and other treatment algorithms.
Collapse
|
7
|
Abstract
The Clinical Proteomic Tumor Analysis Consortium (CPTAC), under the auspices of the National Cancer Institute's Office of Cancer Clinical Proteomics Research, is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to clinical tumor samples with characterized genomic and transcript profiles. The consortium analyzes cancer biospecimens using mass spectrometry, identifying and quantifying the constituent proteins and characterizing each tumor sample's proteome. Mass spectrometry enables highly specific identification of proteins and their isoforms, accurate relative quantitation of protein abundance in contrasting biospecimens, and localization of post-translational protein modifications, such as phosphorylation, on a protein's sequence. The combination of proteomics, transcriptomics, and genomics data from the same clinical tumor samples provides an unprecedented opportunity for tumor proteogenomics. The CPTAC Data Portal is the centralized data repository for the dissemination of proteomic data collected by Proteome Characterization Centers (PCCs) in the consortium. The portal currently hosts 6.3 TB of data and includes proteomic investigations of breast, colorectal, and ovarian tumor tissues from The Cancer Genome Atlas (TCGA). The data collected by the consortium is made freely available to the public through the data portal.
Collapse
|
8
|
In silico analysis of autoimmune diseases and genetic relationships to vaccination against infectious diseases. BMC Immunol 2014; 15:61. [PMID: 25486901 PMCID: PMC4266212 DOI: 10.1186/s12865-014-0061-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 12/01/2014] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Near universal administration of vaccines mandates intense pharmacovigilance for vaccine safety and a stringently low tolerance for adverse events. Reports of autoimmune diseases (AID) following vaccination have been challenging to evaluate given the high rates of vaccination, background incidence of autoimmunity, and low incidence and variable times for onset of AID after vaccinations. In order to identify biologically plausible pathways to adverse autoimmune events of vaccine-related AID, we used a systems biology approach to create a matrix of innate and adaptive immune mechanisms active in specific diseases, responses to vaccine antigens, adjuvants, preservatives and stabilizers, for the most common vaccine-associated AID found in the Vaccine Adverse Event Reporting System. RESULTS This report focuses on Guillain-Barre Syndrome (GBS), Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE), and Idiopathic (or immune) Thrombocytopenic Purpura (ITP). Multiple curated databases and automated text mining of PubMed literature identified 667 genes associated with RA, 448 with SLE, 49 with ITP and 73 with GBS. While all data sources provided valuable and unique gene associations, text mining using natural language processing (NLP) algorithms provided the most information but required curation to remove incorrect associations. Six genes were associated with all four AIDs. Thirty-three pathways were shared by the four AIDs. Classification of genes into twelve immune system related categories identified more "Th17 T-cell subtype" genes in RA than the other AIDs, and more "Chemokine plus Receptors" genes associated with RA than SLE. Gene networks were visualized and clustered into interconnected modules with specific gene clusters for each AID, including one in RA with ten C-X-C motif chemokines. The intersection of genes associated with GBS, GBS peptide auto-antigens, influenza A infection, and influenza vaccination created a subnetwork of genes that inferred a possible role for the MAPK signaling pathway in influenza vaccine related GBS. CONCLUSIONS Results showing unique and common gene sets, pathways, immune system categories and functional clusters of genes in four autoimmune diseases suggest it is possible to develop molecular classifications of autoimmune and inflammatory events. Combining this information with cellular and other disease responses should greatly aid in the assessment of potential immune-mediated adverse events following vaccination.
Collapse
|
9
|
UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 2014; 31:926-32. [PMID: 25398609 PMCID: PMC4375400 DOI: 10.1093/bioinformatics/btu739] [Citation(s) in RCA: 898] [Impact Index Per Article: 89.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. RESULTS Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (∼7 times shorter hit list before expansion), faster (∼6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation.
Collapse
|
10
|
Informatics and data quality at collaborative multicenter Breast and Colon Cancer Family Registries. J Am Med Inform Assoc 2012; 19:e125-8. [PMID: 22323393 DOI: 10.1136/amiajnl-2011-000546] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Quality control and harmonization of data is a vital and challenging undertaking for any successful data coordination center and a responsibility shared between the multiple sites that produce, integrate, and utilize the data. Here we describe a coordinated effort between scientists and data managers in the Cancer Family Registries to implement a data governance infrastructure consisting of both organizational and technical solutions. The technical solution uses a rule-based validation system that facilitates error detection and correction for data centers submitting data to a central informatics database. Validation rules comprise both standard checks on allowable values and a crosscheck of related database elements for logical and scientific consistency. Evaluation over a 2-year timeframe showed a significant decrease in the number of errors in the database and a concurrent increase in data consistency and accuracy.
Collapse
|
11
|
Abstract
MOTIVATION Identifier (ID) mapping establishes links between various biological databases and is an essential first step for molecular data integration and functional annotation. ID mapping allows diverse molecular data on genes and proteins to be combined and mapped to functional pathways and ontologies. We have developed comprehensive protein-centric ID mapping services providing mappings for 90 IDs derived from databases on genes, proteins, pathways, diseases, structures, protein families, protein interaction, literature, ontologies, etc. The services are widely used and have been regularly updated since 2006. AVAILABILITY www.uniprot.org/mappingandproteininformation-resource.org/pirwww/search/idmapping.shtml CONTACT huang@dbi.udel.edu.
Collapse
|
12
|
Protein Information Resource: a community resource for expert annotation of protein data. Nucleic Acids Res 2001; 29:29-32. [PMID: 11125041 PMCID: PMC29802 DOI: 10.1093/nar/29.1.29] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2000] [Accepted: 10/04/2000] [Indexed: 11/13/2022] Open
Abstract
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200,000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter-national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Collapse
|
13
|
Abstract
UNLABELLED The Protein Information Resource (PIR) has greatly expanded its Web site and developed a set of interactive search and analysis tools to facilitate the analysis, annotation, and functional identification of proteins. New search engines have been implemented to combine sequence similarity search results with database annotation information. The new PIR search systems have proved very useful in providing enriched functional annotation of protein sequences, determining protein superfamily-domain relationships, and detecting annotation errors in genomic database archives. AVAILABILITY http://pir.georgetown.edu/. CONTACT mcgarvey@nbrf.georgetown.edu
Collapse
|
14
|
Abstract
The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text searching of the Protein Sequence Database and auxiliary databases. Several new web-based search engines combine searches of sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. New capabilities for searching the PIR sequence databases include annotation-sorted search, domain search, combined global and domain search, and interactive text searches. The PIR-International databases and search tools are accessible on the PIR WWW site at http://pir.georgetown.edu and at the MIPS WWW site at http://www. mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Collapse
|
15
|
Abstract
The Protein Information Resource (PIR; http://www-nbrf.georgetown. edu/pir/) supports research on molecular evolution, functional genomics, and computational biology by maintaining a comprehensive, non-redundant, well-organized and freely available protein sequence database. Since 1988 the database has been maintained collaboratively by PIR-International, an international association of data collection centers cooperating to develop this resource during a period of explosive growth in new sequence data and new computer technologies. The PIR Protein Sequence Database entries are classified into superfamilies, families and homology domains, for which sequence alignments are available. Full-scale family classification supports comparative genomics research, aids sequence annotation, assists database organization and improves database integrity. The PIR WWW server supports direct on-line sequence similarity searches, information retrieval, and knowledge discovery by providing the Protein Sequence Database and other supplementary databases. Sequence entries are extensively cross-referenced and hypertext-linked to major nucleic acid, literature, genome, structure, sequence alignment and family databases. The weekly release of the Protein Sequence Database can be accessed through the PIR Web site. The quarterly release of the database is freely available from our anonymous FTP server and is also available on CD-ROM with the accompanying ATLAS database search program.
Collapse
|
16
|
Expression of the rabies virus glycoprotein in transgenic tomatoes. BIO/TECHNOLOGY (NATURE PUBLISHING COMPANY) 1995; 13:1484-7. [PMID: 9636308 DOI: 10.1038/nbt1295-1484] [Citation(s) in RCA: 161] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
We have engineered tomato plants (Lycopersicon esculentum Mill var. UC82b) to express a gene for the glycoprotein (G-protein), which coats the outer surface of the rabies virus. The recombinant constructs contained the G-protein gene from the ERA strain of rabies virus, including the signal peptide, under the control of the 35S promoter of cauliflower mosaic virus. Plants were transformed by Agrobacterium tumefaciens-mediated transformation of cotyledons and tissue culture on selective media. PCR confirmed the presence of the G-protein gene in plants surviving selection. Northern blot analysis indicated that RNA of the appropriate molecular weight was produced in both leaves and fruit of the transgenic plants. The recombinant G-protein was immunoprecipitated and detected by Western blot from leaves and fruit using different antisera. The G-protein expressed in tomato appeared as two distinct bands with apparent molecular mass of 62 and 60 kDa as compared to the 66 kDa observed for G-protein from virus grown in BHK cells. Electron microscopy of leaf tissue using immunogold-labeling and antisera specific for rabies G-protein showed localization of the G-protein to the Golgi bodies, vesicles, plasmalemma and cell walls of vascular parenchyma cells. In light of our previous demonstration that orally administered rabies G-protein from the same ERA strain elicits protective immunity in animals, these transgenic plants should provide a valuable tool for the development of edible oral vaccines.
Collapse
|
17
|
Transformed tomato plants express a satellite RNA of cucumber mosaic virus and produce lethal necrosis upon infection with viral RNA. Biochem Biophys Res Commun 1990; 170:548-55. [PMID: 1696472 DOI: 10.1016/0006-291x(90)92126-k] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Tomato plants transformed with a single copy of a tomato necrosis causing satellite RNA of cucumber mosaic virus (CMV) express the satellite sequence, but the plants show no disease symptoms and have a normal appearance. Upon challenge infection of the F1 progeny with a CMV strain free of any detectable encapsidated satellite the plants accumulated single and double-stranded forms of satellite RNA and developed lethal necrosis.
Collapse
|
18
|
Activation of N-hydroxyphenacetin to mutagenic and nucleic acid-binding metabolites by acyltransfer, deacylation, and sulfate conjugation. Cancer Res 1981; 41:3424-9. [PMID: 7020926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
N-Hydroxyphenacetin was activated to a mutagen in the Salmonella-Ames test by rabbit liver acyltransferase, rat liver cytosol, and rat liver microsomes. N-[ring]3H]-Hydroxyphenacetin was bound to transfer RNA when activated by acyltransferase from rabbit or rat liver or rat liver microsomes. The acyltransferase-catalyzed binding was not inhibited by paraoxon, a deacetylase inhibitor. The use of N-hydroxyphenacetin radioactively labeled in the acetyl group, as well as the ring, indicated that deacetylation was involved in the microsome-catalyzed binding reaction. In addition, the microsome-catalyzed binding was inhibited 90% by paraoxon. p-Nitrosophenetole, a deacetylated derivative of N-hydroxyphenacetin, was synthesized and bound to transfer RNA without enzymatic activation. Activation of N-hydroxyphenacetin by sulfate conjugation was also found to lead to binding to transfer RNA. The data implicated acyl transfer, deacetylation, and sulfate conjugation as possible routes for the activation of N-hydroxyphenacetin.
Collapse
|