1
|
Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 2007; 24:1596-9. [PMID: 17488738 DOI: 10.1093/molbev/msm092] [Citation(s) in RCA: 19658] [Impact Index Per Article: 1092.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
We announce the release of the fourth version of MEGA software, which expands on the existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary hypotheses. Version 4 includes a unique facility to generate captions, written in figure legend format, in order to provide natural language descriptions of the models and methods used in the analyses. This facility aims to promote a better understanding of the underlying assumptions used in analyses, and of the results generated. Another new feature is the Maximum Composite Likelihood (MCL) method for estimating evolutionary distances between all pairs of sequences simultaneously, with and without incorporating rate variation among sites and substitution pattern heterogeneities among lineages. This MCL method also can be used to estimate transition/transversion bias and nucleotide substitution pattern without knowledge of the phylogenetic tree. This new version is a native 32-bit Windows application with multi-threading and multi-user supports, and it is also available to run in a Linux desktop environment (via the Wine compatibility layer) and on Intel-based Macintosh computers under the Parallels program. The current version of MEGA is available free of charge at (http://www.megasoftware.net).
Collapse
|
Research Support, Non-U.S. Gov't |
18 |
19658 |
2
|
Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005; 21:3674-6. [PMID: 16081474 DOI: 10.1093/bioinformatics/bti610] [Citation(s) in RCA: 8340] [Impact Index Per Article: 417.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY We present here Blast2GO (B2G), a research tool designed with the main purpose of enabling Gene Ontology (GO) based data mining on sequence data for which no GO annotation is yet available. B2G joints in one application GO annotation based on similarity searches with statistical analysis and highlighted visualization on directed acyclic graphs. This tool offers a suitable platform for functional genomics research in non-model species. B2G is an intuitive and interactive desktop application that allows monitoring and comprehension of the whole annotation and analysis process. AVAILABILITY Blast2GO is freely available via Java Web Start at http://www.blast2go.de. SUPPLEMENTARY MATERIAL http://www.blast2go.de -> Evaluation.
Collapse
|
Research Support, Non-U.S. Gov't |
20 |
8340 |
3
|
Kumar S, Tamura K, Nei M. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform 2005; 5:150-63. [PMID: 15260895 DOI: 10.1093/bib/5.2.150] [Citation(s) in RCA: 8079] [Impact Index Per Article: 404.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. The scope of these investigations has now expanded greatly owing to the development of high-throughput sequencing techniques and novel statistical and computational methods. These methods require easy-to-use computer programs. One such effort has been to produce Molecular Evolutionary Genetics Analysis (MEGA) software, with its focus on facilitating the exploration and analysis of the DNA and protein sequence variation from an evolutionary perspective. Currently in its third major release, MEGA3 contains facilities for automatic and manual sequence alignment, web-based mining of databases, inference of the phylogenetic trees, estimation of evolutionary distances and testing evolutionary hypotheses. This paper provides an overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA.
Collapse
|
Research Support, U.S. Gov't, P.H.S. |
20 |
8079 |
4
|
Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009; 25:1189-91. [PMID: 19151095 PMCID: PMC2672624 DOI: 10.1093/bioinformatics/btp033] [Citation(s) in RCA: 7181] [Impact Index Per Article: 448.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Revised: 11/24/2008] [Accepted: 01/08/2009] [Indexed: 12/11/2022] Open
Abstract
UNLABELLED Jalview Version 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments. Core features include keyboard and mouse-based editing, multiple views and alignment overviews, and linked structure display with Jmol. Jalview 2 is available in two forms: a lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server. AVAILABILITY The Jalview 2 Desktop application and JalviewLite applet are made freely available under the GPL, and can be downloaded from www.jalview.org.
Collapse
|
research-article |
16 |
7181 |
5
|
Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 2009; 37:W202-8. [PMID: 19458158 PMCID: PMC2703892 DOI: 10.1093/nar/gkp335] [Citation(s) in RCA: 6817] [Impact Index Per Article: 426.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2009] [Revised: 04/10/2009] [Accepted: 04/21/2009] [Indexed: 11/13/2022] Open
Abstract
The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms--MAST, FIMO and GLAM2SCAN--allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm TOMTOM. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and TOMTOM), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net.
Collapse
|
Research Support, N.I.H., Extramural |
16 |
6817 |
6
|
The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 2020; 47:D506-D515. [PMID: 30395287 PMCID: PMC6323992 DOI: 10.1093/nar/gky1049] [Citation(s) in RCA: 5175] [Impact Index Per Article: 1035.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/18/2018] [Indexed: 12/14/2022] Open
Abstract
The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a million of these proteins. These annotations are supplemented by annotations provided by rule based automated systems, and those imported from other resources. In this article we describe significant updates that we have made over the last 2 years to the resource. We have greatly expanded the number of Reference Proteomes that we provide and in particular we have focussed on improving the number of viral Reference Proteomes. The UniProt website has been augmented with new data visualizations for the subcellular localization of proteins as well as their structure and interactions. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
|
Research Support, Non-U.S. Gov't |
5 |
5175 |
7
|
Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, Buchner A, Lai T, Steppi S, Jobb G, Förster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, König A, Liss T, Lüssmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer KH. ARB: a software environment for sequence data. Nucleic Acids Res 2004; 32:1363-71. [PMID: 14985472 PMCID: PMC390282 DOI: 10.1093/nar/gkh293] [Citation(s) in RCA: 4658] [Impact Index Per Article: 221.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2004] [Revised: 01/28/2004] [Accepted: 01/28/2004] [Indexed: 11/12/2022] Open
Abstract
The ARB (from Latin arbor, tree) project was initiated almost 10 years ago. The ARB program package comprises a variety of directly interacting software tools for sequence database maintenance and analysis which are controlled by a common graphical user interface. Although it was initially designed for ribosomal RNA data, it can be used for any nucleic and amino acid sequence data as well. A central database contains processed (aligned) primary structure data. Any additional descriptive data can be stored in database fields assigned to the individual sequences or linked via local or worldwide networks. A phylogenetic tree visualized in the main window can be used for data access and visualization. The package comprises additional tools for data import and export, sequence alignment, primary and secondary structure editing, profile and filter calculation, phylogenetic analyses, specific hybridization probe design and evaluation and other components for data analysis. Currently, the package is used by numerous working groups worldwide.
Collapse
|
research-article |
21 |
4658 |
8
|
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M. Pfam: the protein families database. Nucleic Acids Res 2014; 42:D222-30. [PMID: 24288371 PMCID: PMC3965110 DOI: 10.1093/nar/gkt1223] [Citation(s) in RCA: 4487] [Impact Index Per Article: 407.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Revised: 11/04/2013] [Accepted: 11/05/2013] [Indexed: 01/17/2023] Open
Abstract
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.
Collapse
|
research-article |
11 |
4487 |
9
|
Schwede T, Kopp J, Guex N, Peitsch MC. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res 2003; 31:3381-5. [PMID: 12824332 PMCID: PMC168927 DOI: 10.1093/nar/gkg520] [Citation(s) in RCA: 4224] [Impact Index Per Article: 192.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SWISS-MODEL (http://swissmodel.expasy.org) is a server for automated comparative modeling of three-dimensional (3D) protein structures. It pioneered the field of automated modeling starting in 1993 and is the most widely-used free web-based automated modeling facility today. In 2002 the server computed 120 000 user requests for 3D protein models. SWISS-MODEL provides several levels of user interaction through its World Wide Web interface: in the 'first approach mode' only an amino acid sequence of a protein is submitted to build a 3D model. Template selection, alignment and model building are done completely automated by the server. In the 'alignment mode', the modeling process is based on a user-defined target-template alignment. Complex modeling tasks can be handled with the 'project mode' using DeepView (Swiss-PdbViewer), an integrated sequence-to-structure workbench. All models are sent back via email with a detailed modeling report. WhatCheck analyses and ANOLEA evaluations are provided optionally. The reliability of SWISS-MODEL is continuously evaluated in the EVA-CM project. The SWISS-MODEL server is under constant development to improve the successful implementation of expert knowledge into an easy-to-use server.
Collapse
|
research-article |
22 |
4224 |
10
|
Abstract
UniProt is an important collection of protein sequences and their annotations, which has doubled in size to 80 million sequences during the past year. This growth in sequences has prompted an extension of UniProt accession number space from 6 to 10 characters. An increasing fraction of new sequences are identical to a sequence that already exists in the database with the majority of sequences coming from genome sequencing projects. We have created a new proteome identifier that uniquely identifies a particular assembly of a species and strain or subspecies to help users track the provenance of sequences. We present a new website that has been designed using a user-experience design process. We have introduced an annotation score for all entries in UniProt to represent the relative amount of knowledge known about each protein. These scores will be helpful in identifying which proteins are the best characterized and most informative for comparative analysis. All UniProt data is provided freely and is available on the web at http://www.uniprot.org/.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
11 |
3642 |
11
|
Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res 2019; 47:W636-W641. [PMID: 30976793 PMCID: PMC6602479 DOI: 10.1093/nar/gkz268] [Citation(s) in RCA: 3042] [Impact Index Per Article: 507.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 03/22/2019] [Accepted: 04/03/2019] [Indexed: 02/07/2023] Open
Abstract
The EMBL-EBI provides free access to popular bioinformatics sequence analysis applications as well as to a full-featured text search engine with powerful cross-referencing and data retrieval capabilities. Access to these services is provided via user-friendly web interfaces and via established RESTful and SOAP Web Services APIs (https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/EMBL-EBI+Web+Services+APIs+-+Data+Retrieval). Both systems have been developed with the same core principles that allow them to integrate an ever-increasing volume of biological data, making them an integral part of many popular data resources provided at the EMBL-EBI. Here, we describe the latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability.
Collapse
|
research-article |
6 |
3042 |
12
|
Bailey TL, Johnson J, Grant CE, Noble WS. The MEME Suite. Nucleic Acids Res 2015; 43:W39-49. [PMID: 25953851 PMCID: PMC4489269 DOI: 10.1093/nar/gkv416] [Citation(s) in RCA: 2709] [Impact Index Per Article: 270.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Revised: 04/10/2015] [Accepted: 04/18/2015] [Indexed: 11/13/2022] Open
Abstract
The MEME Suite is a powerful, integrated set of web-based tools for studying sequence motifs in proteins, DNA and RNA. Such motifs encode many biological functions, and their detection and characterization is important in the study of molecular interactions in the cell, including the regulation of gene expression. Since the previous description of the MEME Suite in the 2009 Nucleic Acids Research Web Server Issue, we have added six new tools. Here we describe the capabilities of all the tools within the suite, give advice on their best use and provide several case studies to illustrate how to combine the results of various MEME Suite tools for successful motif-based analyses. The MEME Suite is freely available for academic use at http://meme-suite.org, and source code is also available for download and local installation.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
2709 |
13
|
Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R. InterProScan: protein domains identifier. Nucleic Acids Res 2005; 33:W116-20. [PMID: 15980438 PMCID: PMC1160203 DOI: 10.1093/nar/gki442] [Citation(s) in RCA: 2158] [Impact Index Per Article: 107.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
InterProScan [E. M. Zdobnov and R. Apweiler (2001) Bioinformatics, 17, 847–848] is a tool that combines different protein signature recognition methods from the InterPro [N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, D. Binns, P. Bradley, P. Bork, P. Bucher, L. Cerutti et al. (2005) Nucleic Acids Res., 33, D201–D205] consortium member databases into one resource. At the time of writing there are 10 distinct publicly available databases in the application. Protein as well as DNA sequences can be analysed. A web-based version is accessible for academic and commercial organizations from the EBI (). In addition, a standalone Perl version and a SOAP Web Service [J. Snell, D. Tidwell and P. Kulchenko (2001) Programming Web Services with SOAP, 1st edn. O'Reilly Publishers, Sebastopol, CA, ] are also available to the users. Various output formats are supported and include text tables, XML documents, as well as various graphs to help interpret the results.
Collapse
|
Research Support, Non-U.S. Gov't |
20 |
2158 |
14
|
Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 2006; 34:W609-12. [PMID: 16845082 PMCID: PMC1538804 DOI: 10.1093/nar/gkl315] [Citation(s) in RCA: 2157] [Impact Index Per Article: 113.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
PAL2NAL is a web server that constructs a multiple codon alignment from the corresponding aligned protein sequences. Such codon alignments can be used to evaluate the type and rate of nucleotide substitutions in coding DNA for a wide range of evolutionary analyses, such as the identification of levels of selective constraint acting on genes, or to perform DNA-based phylogenetic studies. The server takes a protein sequence alignment and the corresponding DNA sequences as input. In contrast to other existing applications, this server is able to construct codon alignments even if the input DNA sequence has mismatches with the input protein sequence, or contains untranslated regions and polyA tails. The server can also deal with frame shifts and inframe stop codons in the input models, and is thus suitable for the analysis of pseudogenes. Another distinct feature is that the user can specify a subregion of the input alignment in order to specifically analyze functional domains or exons of interest. The PAL2NAL server is available at .
Collapse
|
Research Support, Non-U.S. Gov't |
19 |
2157 |
15
|
Abstract
UNLABELLED We have implemented a high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. AVAILABILITY ProtTest 3 source code and binaries are freely available under GNU license for download from http://darwin.uvigo.es/software/prottest3, linked to a Mercurial repository at Bitbucket (https://bitbucket.org/). CONTACT dposada@uvigo.es SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
Evaluation Study |
14 |
2078 |
16
|
Fredriksson R, Lagerström MC, Lundin LG, Schiöth HB. The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. Mol Pharmacol 2003; 63:1256-72. [PMID: 12761335 DOI: 10.1124/mol.63.6.1256] [Citation(s) in RCA: 2074] [Impact Index Per Article: 94.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The superfamily of G-protein-coupled receptors (GPCRs) is very diverse in structure and function and its members are among the most pursued targets for drug development. We identified more than 800 human GPCR sequences and simultaneously analyzed 342 unique functional nonolfactory human GPCR sequences with phylogenetic analyses. Our results show, with high bootstrap support, five main families, named glutamate, rhodopsin, adhesion, frizzled/taste2, and secretin, forming the GRAFS classification system. The rhodopsin family is the largest and forms four main groups with 13 sub-branches. Positions of the GPCRs in chromosomal paralogons regions indicate the importance of tetraploidizations or local gene duplication events for their creation. We also searched for "fingerprint" motifs using Hidden Markov Models delineating the putative inter-relationship of the GRAFS families. We show several common structural features indicating that the human GPCRs in the GRAFS families share a common ancestor. This study represents the first overall map of the GPCRs in a single mammalian genome. Our novel approach of analyzing such large and diverse sequence sets may be useful for studies on GPCRs in other genomes and divergent protein families.
Collapse
MESH Headings
- Chromosome Mapping
- GTP-Binding Proteins/classification
- GTP-Binding Proteins/genetics
- Genome, Human
- Humans
- Membrane Glycoproteins
- Membrane Proteins/classification
- Membrane Proteins/genetics
- Phylogeny
- Platelet Glycoprotein GPIb-IX Complex
- Receptors, Cell Surface/classification
- Receptors, Cell Surface/genetics
- Receptors, G-Protein-Coupled
- Receptors, Gastrointestinal Hormone/classification
- Receptors, Gastrointestinal Hormone/genetics
- Receptors, Glutamate/classification
- Receptors, Glutamate/genetics
- Rhodopsin/classification
- Rhodopsin/genetics
- Sequence Analysis, Protein
- Sequence Homology, Amino Acid
Collapse
|
|
22 |
2074 |
17
|
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer ELL, Bateman A. The Pfam protein families database. Nucleic Acids Res 2008; 36:D281-8. [PMID: 18039703 PMCID: PMC2238907 DOI: 10.1093/nar/gkm960] [Citation(s) in RCA: 1709] [Impact Index Per Article: 100.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2007] [Revised: 10/10/2007] [Accepted: 10/16/2007] [Indexed: 12/14/2022] Open
Abstract
Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/), as well as from mirror sites in France (http://pfam.jouy.inra.fr/) and South Korea (http://pfam.ccbb.re.kr/).
Collapse
|
research-article |
17 |
1709 |
18
|
Zimmermann L, Stephens A, Nam SZ, Rau D, Kübler J, Lozajic M, Gabler F, Söding J, Lupas AN, Alva V. A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. J Mol Biol 2017; 430:2237-2243. [PMID: 29258817 DOI: 10.1016/j.jmb.2017.12.007] [Citation(s) in RCA: 1694] [Impact Index Per Article: 211.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2017] [Revised: 12/10/2017] [Accepted: 12/11/2017] [Indexed: 12/12/2022]
Abstract
The MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) is a free, one-stop web service for protein bioinformatic analysis. It currently offers 34 interconnected external and in-house tools, whose functionality covers sequence similarity searching, alignment construction, detection of sequence features, structure prediction, and sequence classification. This breadth has made the Toolkit an important resource for experimental biology and for teaching bioinformatic inquiry. Recently, we replaced the first version of the Toolkit, which was released in 2005 and had served around 2.5 million queries, with an entirely new version, focusing on improved features for the comprehensive analysis of proteins, as well as on promoting teaching. For instance, our popular remote homology detection server, HHpred, now allows pairwise comparison of two sequences or alignments and offers additional profile HMMs for several model organisms and domain databases. Here, we introduce the new version of our Toolkit and its application to the analysis of proteins.
Collapse
|
Research Support, Non-U.S. Gov't |
8 |
1694 |
19
|
Ishihama Y, Oda Y, Tabata T, Sato T, Nagasu T, Rappsilber J, Mann M. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 2005; 4:1265-72. [PMID: 15958392 DOI: 10.1074/mcp.m500061-mcp200] [Citation(s) in RCA: 1646] [Impact Index Per Article: 82.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
To estimate absolute protein contents in complex mixtures, we previously defined a protein abundance index (PAI) as the number of observed peptides divided by the number of observable peptides per protein (Rappsilber, J., Ryder, U., Lamond, A. I., and Mann, M. (2002) Large-scale proteomic analysis of the human spliceosome. Genome. Res. 12, 1231-1245). Here we report that PAI values obtained at different concentrations of serum albumin show a linear relationship with the logarithm of protein concentration in LC-MS/MS experiments. This was also the case for 46 proteins in a mouse whole cell lysate. For absolute quantitation, PAI was converted to exponentially modified PAI (emPAI), equal to 10PAI minus one, which is proportional to protein content in a protein mixture. For the 46 proteins in the whole lysate, the deviation percentages of the emPAI-based abundances from the actual values were within 63% on average, similar or better than determination of abundance by protein staining. emPAI was applied to comprehensive protein expression analysis and to a comparison study between gene and protein expression in a human cancer cell line, HCT116. The values of emPAI are easily calculated and add important quantitation information to proteomic experiments; therefore we suggest that they should be reported in large scale proteomic identification projects.
Collapse
|
Research Support, Non-U.S. Gov't |
20 |
1646 |
20
|
|
Review |
24 |
1593 |
21
|
Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res 2004; 32:W327-31. [PMID: 15215404 PMCID: PMC441592 DOI: 10.1093/nar/gkh454] [Citation(s) in RCA: 1518] [Impact Index Per Article: 72.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2004] [Revised: 04/21/2004] [Accepted: 04/21/2004] [Indexed: 11/14/2022] Open
Abstract
We describe the Conserved Domain Search service (CD-Search), a web-based tool for the detection of structural and functional domains in protein sequences. CD-Search uses BLAST(R) heuristics to provide a fast, interactive service, and searches a comprehensive collection of domain models. Search results are displayed as domain architecture cartoons and pairwise alignments between the query and domain-model consensus sequences. Search results may be visualized in further detail by embedding the query sequence into multiple alignment displays and by mapping onto three-dimensional molecular graphic displays of known structures within the domain family. CD-Search can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi.
Collapse
|
research-article |
21 |
1518 |
22
|
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. InterPro: the integrative protein signature database. Nucleic Acids Res 2009; 37:D211-5. [PMID: 18940856 PMCID: PMC2686546 DOI: 10.1093/nar/gkn785] [Citation(s) in RCA: 1511] [Impact Index Per Article: 94.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2008] [Revised: 10/08/2008] [Accepted: 10/09/2008] [Indexed: 11/13/2022] Open
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).
Collapse
|
Research Support, N.I.H., Extramural |
16 |
1511 |
23
|
Kim DE, Chivian D, Baker D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res 2004; 32:W526-31. [PMID: 15215442 PMCID: PMC441606 DOI: 10.1093/nar/gkh468] [Citation(s) in RCA: 1505] [Impact Index Per Article: 71.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Robetta server (http://robetta.bakerlab.org) provides automated tools for protein structure prediction and analysis. For structure prediction, sequences submitted to the server are parsed into putative domains and structural models are generated using either comparative modeling or de novo structure prediction methods. If a confident match to a protein of known structure is found using BLAST, PSI-BLAST, FFAS03 or 3D-Jury, it is used as a template for comparative modeling. If no match is found, structure predictions are made using the de novo Rosetta fragment insertion method. Experimental nuclear magnetic resonance (NMR) constraints data can also be submitted with a query sequence for RosettaNMR de novo structure determination. Other current capabilities include the prediction of the effects of mutations on protein-protein interactions using computational interface alanine scanning. The Rosetta protein design and protein-protein docking methodologies will soon be available through the server as well.
Collapse
|
Journal Article |
21 |
1505 |
24
|
Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei T, Mende DR, Sunagawa S, Kuhn M, Jensen LJ, von Mering C, Bork P. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 2016; 44:D286-93. [PMID: 26582926 PMCID: PMC4702882 DOI: 10.1093/nar/gkv1248] [Citation(s) in RCA: 1467] [Impact Index Per Article: 163.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Revised: 10/30/2015] [Accepted: 11/02/2015] [Indexed: 01/19/2023] Open
Abstract
eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing overall annotation coverage from 67% to 72%. The functional annotations of OGs have been expanded to also provide Gene Ontology terms, KEGG pathways and SMART/Pfam domains for each group. Moreover, eggNOG now provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. We have also incorporated a framework for quickly mapping novel sequences to OGs based on precomputed HMM profiles. Finally, eggNOG version 4.5 incorporates a novel data set spanning 2605 viral OGs, covering 5228 proteins from 352 viral proteomes. All data are accessible for bulk downloading, as a web-service, and through a completely redesigned web interface. The new access points provide faster searches and a number of new browsing and visualization capabilities, facilitating the needs of both experts and less experienced users. eggNOG v4.5 is available at http://eggnog.embl.de.
Collapse
|
research-article |
9 |
1467 |
25
|
Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD. HMMER web server: 2018 update. Nucleic Acids Res 2018; 46:W200-W204. [PMID: 29905871 PMCID: PMC6030962 DOI: 10.1093/nar/gky448] [Citation(s) in RCA: 1415] [Impact Index Per Article: 202.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 04/18/2018] [Accepted: 06/12/2018] [Indexed: 12/25/2022] Open
Abstract
The HMMER webserver [http://www.ebi.ac.uk/Tools/hmmer] is a free-to-use service which provides fast searches against widely used sequence databases and profile hidden Markov model (HMM) libraries using the HMMER software suite (http://hmmer.org). The results of a sequence search may be summarized in a number of ways, allowing users to view and filter the significant hits by domain architecture or taxonomy. For large scale usage, we provide an application programmatic interface (API) which has been expanded in scope, such that all result presentations are available via both HTML and API. Furthermore, we have refactored our JavaScript visualization library to provide standalone components for different result representations. These consume the aforementioned API and can be integrated into third-party websites. The range of databases that can be searched against has been expanded, adding four sequence datasets (12 in total) and one profile HMM library (6 in total). To help users explore the biological context of their results, and to discover new data resources, search results are now supplemented with cross references to other EMBL-EBI databases.
Collapse
|
Research Support, N.I.H., Extramural |
7 |
1415 |