1
|
Ireland SM, Martin ACR. GraphQL for the delivery of bioinformatics web APIs and application to ZincBind. BIOINFORMATICS ADVANCES 2021; 1:vbab023. [PMID: 35585947 PMCID: PMC9108989 DOI: 10.1093/bioadv/vbab023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 09/06/2021] [Accepted: 09/23/2021] [Indexed: 01/27/2023]
Abstract
Motivation Many bioinformatics resources are provided as 'web services', with large databases and analysis software stored on a central server, and clients interacting with them using the hypertext transport protocol (HTTP). While some provide only a visual HTML interface, requiring a web browser to use them, many provide programmatic access using a web application programming interface (API) which returns XML, JSON or plain text that computer programs can interpret more easily. This allows access to be automated. Initially, many bioinformatics APIs used the 'simple object access protocol' (SOAP) and, more recently, representational state transfer (REST). Results GraphQL is a novel, increasingly prevalent alternative to REST and SOAP that represents the available data in the form of a graph to which any conceivable query can be submitted, and which is seeing increasing adoption in industry. Here, we review the principles of GraphQL, outline its particular suitability to the delivery of bioinformatics resources and describe its implementation in our ZincBind resource. Availability and implementation https://api.zincbind.net. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Sam M Ireland
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Andrew C R Martin
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- To whom correspondence should be addressed. /
| |
Collapse
|
2
|
Jimenez-Lopez JC, Zafra A, Palanco L, Florido JF, Alché JDD. Identification and Assessment of the Potential Allergenicity of 7S Vicilins in Olive (Olea europaea L.) Seeds. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4946872. [PMID: 27034939 PMCID: PMC4789380 DOI: 10.1155/2016/4946872] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 01/25/2016] [Accepted: 02/03/2016] [Indexed: 12/23/2022]
Abstract
Olive seeds, which are a raw material of interest, have been reported to contain 11S seed storage proteins (SSPs). However, the presence of SSPs such as 7S vicilins has not been studied. In this study, following a search in the olive seed transcriptome, 58 sequences corresponding to 7S vicilins were retrieved. A partial sequence was amplified by PCR from olive seed cDNA and subjected to phylogenetic analysis with other sequences. Structural analysis showed that olive 7S vicilin contains 9 α-helixes and 22 β-sheets. Additionally, 3D structural analysis displayed good superimposition with vicilin models generated from Pistacia and Sesamum. In order to assess potential allergenicity, T and B epitopes present in these proteins were identified by bioinformatic approaches. Different motifs were observed among the species, as well as some species-specific motifs. Finally, expression analysis of vicilins was carried out in protein extracts obtained from seeds of different species, including the olive. Noticeable bands were observed for all species in the 15-75 kDa MW interval, which were compatible with vicilins. The reactivity of the extracts to sera from patients allergic to nuts was also analysed. The findings with regard to the potential use of olive seed as food are discussed.
Collapse
Affiliation(s)
- Jose C. Jimenez-Lopez
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSIC, 18008 Granada, Spain
| | - Adoración Zafra
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSIC, 18008 Granada, Spain
- Elayo Group, Castillo de Locubín, 23670 Jaén, Spain
| | - Lucía Palanco
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSIC, 18008 Granada, Spain
| | | | - Juan de Dios Alché
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSIC, 18008 Granada, Spain
| |
Collapse
|
3
|
McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R. Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res 2013; 41:W597-600. [PMID: 23671338 PMCID: PMC3692137 DOI: 10.1093/nar/gkt376] [Citation(s) in RCA: 1184] [Impact Index Per Article: 107.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services (http://www.ebi.ac.uk/Tools/webservices/) interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods.
Collapse
Affiliation(s)
- Hamish McWilliam
- EMBL Outstation-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD Cambridge, UK
| | | | | | | | | | | | | | | |
Collapse
|
4
|
You M, Yue Z, He W, Yang X, Yang G, Xie M, Zhan D, Baxter SW, Vasseur L, Gurr GM, Douglas CJ, Bai J, Wang P, Cui K, Huang S, Li X, Zhou Q, Wu Z, Chen Q, Liu C, Wang B, Li X, Xu X, Lu C, Hu M, Davey JW, Smith SM, Chen M, Xia X, Tang W, Ke F, Zheng D, Hu Y, Song F, You Y, Ma X, Peng L, Zheng Y, Liang Y, Chen Y, Yu L, Zhang Y, Liu Y, Li G, Fang L, Li J, Zhou X, Luo Y, Gou C, Wang J, Wang J, Yang H, Wang J. A heterozygous moth genome provides insights into herbivory and detoxification. Nat Genet 2013; 45:220-5. [PMID: 23313953 DOI: 10.1038/ng.2524] [Citation(s) in RCA: 378] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 12/12/2012] [Indexed: 11/09/2022]
Abstract
How an insect evolves to become a successful herbivore is of profound biological and practical importance. Herbivores are often adapted to feed on a specific group of evolutionarily and biochemically related host plants, but the genetic and molecular bases for adaptation to plant defense compounds remain poorly understood. We report the first whole-genome sequence of a basal lepidopteran species, Plutella xylostella, which contains 18,071 protein-coding and 1,412 unique genes with an expansion of gene families associated with perception and the detoxification of plant defense compounds. A recent expansion of retrotransposons near detoxification-related genes and a wider system used in the metabolism of plant defense compounds are shown to also be involved in the development of insecticide resistance. This work shows the genetic and molecular bases for the evolutionary success of this worldwide herbivore and offers wider insights into insect adaptation to plant feeding, as well as opening avenues for more sustainable pest management.
Collapse
Affiliation(s)
- Minsheng You
- Institute of Applied Ecology, Fujian Agriculture and Forestry University, Fuzhou, China.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Glykas M. Performance Measurement in Business Process, Workflow and Human Resource Management. KNOWLEDGE AND PROCESS MANAGEMENT 2011. [DOI: 10.1002/kpm.387] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Michael Glykas
- Financial Management Engineering; University of the Aegean; North Aegean Greece
| |
Collapse
|
6
|
Ayyadurai VAS, Dewey CF. CytoSolve: A Scalable Computational Method for Dynamic Integration of Multiple Molecular Pathway Models. Cell Mol Bioeng 2010; 4:28-45. [PMID: 21423324 PMCID: PMC3032229 DOI: 10.1007/s12195-010-0143-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2010] [Accepted: 10/04/2010] [Indexed: 11/26/2022] Open
Abstract
A grand challenge of computational systems biology is to create a molecular pathway model of the whole cell. Current approaches involve merging smaller molecular pathway models’ source codes to create a large monolithic model (computer program) that runs on a single computer. Such a larger model is difficult, if not impossible, to maintain given ongoing updates to the source codes of the smaller models. This paper describes a new system called CytoSolve that dynamically integrates computations of smaller models that can run in parallel across different machines without the need to merge the source codes of the individual models. This approach is demonstrated on the classic Epidermal Growth Factor Receptor (EGFR) model of Kholodenko. The EGFR model is split into four smaller models and each smaller model is distributed on a different machine. Results from four smaller models are dynamically integrated to generate identical results to the monolithic EGFR model running on a single machine. The overhead for parallel and dynamic computation is approximately twice that of a monolithic model running on a single machine. The CytoSolve approach provides a scalable method since smaller models may reside on any computer worldwide, where the source code of each model can be independently maintained and updated.
Collapse
Affiliation(s)
- V. A. Shiva Ayyadurai
- Department of Biological Engineering, Massachusetts Institute of Technology, 3-237, 77 Massachusetts Avenue, Cambridge, MA 02138 USA
- International Center for Integrative Systems, 701 Concord Avenue, Cambridge, MA 02138 USA
| | - C. Forbes Dewey
- Department of Biological Engineering, Massachusetts Institute of Technology, 3-237, 77 Massachusetts Avenue, Cambridge, MA 02138 USA
- Department of Mechanical Engineering, Massachusetts Institute of Technology, 3-254, 77 Massachusetts Avenue, Cambridge, MA 02138 USA
| |
Collapse
|
7
|
Pettifer S, Ison J, Kalas M, Thorne D, McDermott P, Jonassen I, Liaquat A, Fernández JM, Rodriguez JM, Pisano DG, Blanchet C, Uludag M, Rice P, Bartaseviciute E, Rapacki K, Hekkelman M, Sand O, Stockinger H, Clegg AB, Bongcam-Rudloff E, Salzemann J, Breton V, Attwood TK, Cameron G, Vriend G. The EMBRACE web service collection. Nucleic Acids Res 2010; 38:W683-8. [PMID: 20462862 PMCID: PMC2896104 DOI: 10.1093/nar/gkq297] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2010] [Revised: 03/29/2010] [Accepted: 04/07/2010] [Indexed: 12/03/2022] Open
Abstract
The EMBRACE (European Model for Bioinformatics Research and Community Education) web service collection is the culmination of a 5-year project that set out to investigate issues involved in developing and deploying web services for use in the life sciences. The project concluded that in order for web services to achieve widespread adoption, standards must be defined for the choice of web service technology, for semantically annotating both service function and the data exchanged, and a mechanism for discovering services must be provided. Building on this, the project developed: EDAM, an ontology for describing life science web services; BioXSD, a schema for exchanging data between services; and a centralized registry (http://www.embraceregistry.net) that collects together around 1000 services developed by the consortium partners. This article presents the current status of the collection and its associated recommendations and standards definitions.
Collapse
Affiliation(s)
- Steve Pettifer
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Nelson RT, Avraham S, Shoemaker RC, May GD, Ware D, Gessler DDG. Applications and methods utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for bioinformatics resource discovery and disparate data and service integration. BioData Min 2010; 3:3. [PMID: 20525377 PMCID: PMC2894815 DOI: 10.1186/1756-0381-3-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 06/04/2010] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of data between information resources difficult and labor intensive. A recently described semantic web protocol, the Simple Semantic Web Architecture and Protocol (SSWAP; pronounced "swap") offers the ability to describe data and services in a semantically meaningful way. We report how three major information resources (Gramene, SoyBase and the Legume Information System [LIS]) used SSWAP to semantically describe selected data and web services. METHODS We selected high-priority Quantitative Trait Locus (QTL), genomic mapping, trait, phenotypic, and sequence data and associated services such as BLAST for publication, data retrieval, and service invocation via semantic web services. Data and services were mapped to concepts and categories as implemented in legacy and de novo community ontologies. We used SSWAP to express these offerings in OWL Web Ontology Language (OWL), Resource Description Framework (RDF) and eXtensible Markup Language (XML) documents, which are appropriate for their semantic discovery and retrieval. We implemented SSWAP services to respond to web queries and return data. These services are registered with the SSWAP Discovery Server and are available for semantic discovery at http://sswap.info. RESULTS A total of ten services delivering QTL information from Gramene were created. From SoyBase, we created six services delivering information about soybean QTLs, and seven services delivering genetic locus information. For LIS we constructed three services, two of which allow the retrieval of DNA and RNA FASTA sequences with the third service providing nucleic acid sequence comparison capability (BLAST). CONCLUSIONS The need for semantic integration technologies has preceded available solutions. We report the feasibility of mapping high priority data from local, independent, idiosyncratic data schemas to common shared concepts as implemented in web-accessible ontologies. These mappings are then amenable for use in semantic web services. Our implementation of approximately two dozen services means that biological data at three large information resources (Gramene, SoyBase, and LIS) is available for programmatic access, semantic searching, and enhanced interaction between the separate missions of these resources.
Collapse
Affiliation(s)
- Rex T Nelson
- USDA-ARS, CICGR, 100 Osborne Dr. Rm. 1575, Ames, IA, 50011-1010 USA
| | - Shulamit Avraham
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | | | - Gregory D May
- National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
- USDA-ARS, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | | |
Collapse
|
9
|
Katayama T, Nakao M, Takagi T. TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services. Nucleic Acids Res 2010; 38:W706-11. [PMID: 20472643 PMCID: PMC2896079 DOI: 10.1093/nar/gkq386] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Web services have become widely used in bioinformatics analysis, but there exist incompatibilities in interfaces and data types, which prevent users from making full use of a combination of these services. Therefore, we have developed the TogoWS service to provide an integrated interface with advanced features. In the TogoWS REST (REpresentative State Transfer) API (application programming interface), we introduce a unified access method for major database resources through intuitive URIs that can be used to search, retrieve, parse and convert the database entries. The TogoWS SOAP API resolves compatibility issues found on the server and client-side SOAP implementations. The TogoWS service is freely available at: http://togows.dbcls.jp/.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan.
| | | | | |
Collapse
|
10
|
Lamprecht AL, Margaria T, Steffen B. Bio-jETI: a framework for semantics-based service composition. BMC Bioinformatics 2009; 10 Suppl 10:S8. [PMID: 19796405 PMCID: PMC2755829 DOI: 10.1186/1471-2105-10-s10-s8] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND The development of bioinformatics databases, algorithms, and tools throughout the last years has lead to a highly distributed world of bioinformatics services. Without adequate management and development support, in silico researchers are hardly able to exploit the potential of building complex, specialized analysis processes from these services. The Semantic Web aims at thoroughly equipping individual data and services with machine-processable meta-information, while workflow systems support the construction of service compositions. However, even in this combination, in silico researchers currently would have to deal manually with the service interfaces, the adequacy of the semantic annotations, type incompatibilities, and the consistency of service compositions. RESULTS In this paper, we demonstrate by means of two examples how Semantic Web technology together with an adequate domain modelling frees in silico researchers from dealing with interfaces, types, and inconsistencies. In Bio-jETI, bioinformatics services can be graphically combined to complex services without worrying about details of their interfaces or about type mismatches of the composition. These issues are taken care of at the semantic level by Bio-jETI's model checking and synthesis features. Whenever possible, they automatically resolve type mismatches in the considered service setting. Otherwise, they graphically indicate impossible/incorrect service combinations. In the latter case, the workflow developer may either modify his service composition using semantically similar services, or ask for help in developing the missing mediator that correctly bridges the detected type gap. Newly developed mediators should then be adequately annotated semantically, and added to the service library for later reuse in similar situations. CONCLUSION We show the power of semantic annotations in an adequately modelled and semantically enabled domain setting. Using model checking and synthesis methods, users may orchestrate complex processes from a wealth of heterogeneous services without worrying about interfaces and (type) consistency. The success of this method strongly depends on a careful semantic annotation of the provided services and on its consequent exploitation for analysis, validation, and synthesis. We are convinced that these annotations will become standard, as they will become preconditions for the success and widespread use of (preferred) services in the Semantic Web.
Collapse
Affiliation(s)
- Anna-Lena Lamprecht
- grid.5675.10000000104169637Chair for Programming Systems, Dortmund University of Technology, Dortmund, D-44227 Germany
| | - Tiziana Margaria
- grid.11348.3f0000000109421117Chair for Service and Software Engineering, Potsdam University, Potsdam, D-14882 Germany
| | - Bernhard Steffen
- grid.5675.10000000104169637Chair for Programming Systems, Dortmund University of Technology, Dortmund, D-44227 Germany
| |
Collapse
|
11
|
Gessler DDG, Schiltz GS, May GD, Avraham S, Town CD, Grant D, Nelson RT. SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services. BMC Bioinformatics 2009; 10:309. [PMID: 19775460 PMCID: PMC2761904 DOI: 10.1186/1471-2105-10-309] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 09/23/2009] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations found in both pure web service technologies and pure semantic web technologies. RESULTS There are currently over 2400 resources published in SSWAP. Approximately two dozen are custom-written services for QTL (Quantitative Trait Loci) and mapping data for legumes and grasses (grains). The remaining are wrappers to Nucleic Acids Research Database and Web Server entries. As an architecture, SSWAP establishes how clients (users of data, services, and ontologies), providers (suppliers of data, services, and ontologies), and discovery servers (semantic search engines) interact to allow for the description, querying, discovery, invocation, and response of semantic web services. As a protocol, SSWAP provides the vocabulary and semantics to allow clients, providers, and discovery servers to engage in semantic web services. The protocol is based on the W3C-sanctioned first-order description logic language OWL DL. As an open source platform, a discovery server running at http://sswap.info (as in to "swap info") uses the description logic reasoner Pellet to integrate semantic resources. The platform hosts an interactive guide to the protocol at http://sswap.info/protocol.jsp, developer tools at http://sswap.info/developer.jsp, and a portal to third-party ontologies at http://sswapmeet.sswap.info (a "swap meet"). CONCLUSION SSWAP addresses the three basic requirements of a semantic web services architecture (i.e., a common syntax, shared semantic, and semantic discovery) while addressing three technology limitations common in distributed service systems: i.e., i) the fatal mutability of traditional interfaces, ii) the rigidity and fragility of static subsumption hierarchies, and iii) the confounding of content, structure, and presentation. SSWAP is novel by establishing the concept of a canonical yet mutable OWL DL graph that allows data and service providers to describe their resources, to allow discovery servers to offer semantically rich search engines, to allow clients to discover and invoke those resources, and to allow providers to respond with semantically tagged data. SSWAP allows for a mix-and-match of terms from both new and legacy third-party ontologies in these graphs.
Collapse
Affiliation(s)
| | - Gary S Schiltz
- National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA
| | - Greg D May
- National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA
| | - Shulamit Avraham
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Christopher D Town
- The J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA
| | - David Grant
- USDA-ARS-CICGR and Department of Agronomy, Iowa State University, Ames IA, 50011, USA
| | - Rex T Nelson
- USDA-ARS-CICGR and Department of Agronomy, Iowa State University, Ames IA, 50011, USA
| |
Collapse
|
12
|
Krawczyk J, Kohl TA, Goesmann A, Kalinowski J, Baumbach J. From Corynebacterium glutamicum to Mycobacterium tuberculosis--towards transfers of gene regulatory networks and integrated data analyses with MycoRegNet. Nucleic Acids Res 2009; 37:e97. [PMID: 19494184 PMCID: PMC2724278 DOI: 10.1093/nar/gkp453] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Year by year, approximately two million people die from tuberculosis, a disease caused by the bacterium Mycobacterium tuberculosis. There is a tremendous need for new anti-tuberculosis therapies (antituberculotica) and drugs to cope with the spread of tuberculosis. Despite many efforts to obtain a better understanding of M. tuberculosis' pathogenicity and its survival strategy in humans, many questions are still unresolved. Among other cellular processes in bacteria, pathogenicity is controlled by transcriptional regulation. Thus, various studies on M. tuberculosis concentrate on the analysis of transcriptional regulation in order to gain new insights on pathogenicity and other essential processes ensuring mycobacterial survival. We designed a bioinformatics pipeline for the reliable transfer of gene regulations between taxonomically closely related organisms that incorporates (i) a prediction of orthologous genes and (ii) the prediction of transcription factor binding sites. In total, 460 regulatory interactions were identified for M. tuberculosis using our comparative approach. Based on that, we designed a publicly available platform that aims to data integration, analysis, visualization and finally the reconstruction of mycobacterial transcriptional gene regulatory networks: MycoRegNet. It is a comprehensive database system and analysis platform that offers several methods for data exploration and the generation of novel hypotheses. MycoRegNet is publicly available at http://mycoregnet.cebitec.uni-bielefeld.de.
Collapse
Affiliation(s)
- Justina Krawczyk
- Computational Genomics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany and International Computer Science Institute, Berkeley, CA, USA
| | | | | | | | | |
Collapse
|
13
|
McWilliam H, Valentin F, Goujon M, Li W, Narayanasamy M, Martin J, Miyar T, Lopez R. Web services at the European Bioinformatics Institute-2009. Nucleic Acids Res 2009; 37:W6-10. [PMID: 19435877 PMCID: PMC2703973 DOI: 10.1093/nar/gkp302] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The European Bioinformatics Institute (EMBL-EBI) has been providing access to mainstream databases and tools in bioinformatics since 1997. In addition to the traditional web form based interfaces, APIs exist for core data resources such as EMBL-Bank, Ensembl, UniProt, InterPro, PDB and ArrayExpress. These APIs are based on Web Services (SOAP/REST) interfaces that allow users to systematically access databases and analytical tools. From the user's point of view, these Web Services provide the same functionality as the browser-based forms. However, using the APIs frees the user from web page constraints and are ideal for the analysis of large batches of data, performing text-mining tasks and the casual or systematic evaluation of mathematical models in regulatory networks. Furthermore, these services are widespread and easy to use; require no prior knowledge of the technology and no more than basic experience in programming. In the following we wish to inform of new and updated services as well as briefly describe planned developments to be made available during the course of 2009–2010.
Collapse
Affiliation(s)
- Hamish McWilliam
- European Bioinformatics Institute, EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambdrige CB10 1SD, UK.
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009; 25:1189-91. [PMID: 19151095 PMCID: PMC2672624 DOI: 10.1093/bioinformatics/btp033] [Citation(s) in RCA: 6505] [Impact Index Per Article: 433.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Revised: 11/24/2008] [Accepted: 01/08/2009] [Indexed: 12/11/2022] Open
Abstract
UNLABELLED Jalview Version 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments. Core features include keyboard and mouse-based editing, multiple views and alignment overviews, and linked structure display with Jmol. Jalview 2 is available in two forms: a lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server. AVAILABILITY The Jalview 2 Desktop application and JalviewLite applet are made freely available under the GPL, and can be downloaded from www.jalview.org.
Collapse
Affiliation(s)
- Andrew M Waterhouse
- School of Life Sciences Research, College of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK
| | | | | | | | | |
Collapse
|
15
|
Aravindhan G, Kumar RS, Subha K, Subazini T, Dey A, Kant K, Kumar GR. AIM-BLAST-AJAX Interfaced Multisequence Blast. PROTEOMICS INSIGHTS 2009. [DOI: 10.4137/pri.s2260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
AIM-BLAST, AJAX Interfaced Multisequence Blast, is a simplified tool developed to facilitate the multiple sequences blast using AJAX as an interface. This tool has been integrated with the SOAP services of EBI NCBI Blast and the functionality of AJAX (Asynchronous Javascript and XML), so as to minimize the enormous bandwidth consumption while carrying out blast analysis for many sequences at an instance. Although a few tools for multiple sequences blast are already available online, they are restricted only to a limited number of genomes and consume several bytes of data transfer for receiving the results. Further, AIM-BLAST also has enhanced features for automated parsing of the Blast results of individual sequence and presenting them as “one sequence-one function” manner. This will save the users time and effort in interpreting the bulky blast results to identify one suitable hit. The results of the blast search in this tool are displayed in an easily interpretable table format that makes the tool user-friendly too. Hence this tool, with a laconic framework, will remain a well structured, flexible and a highly controlled Blast Program for investigating numerous sequences at a stretch with the consumption of reduced level of data transfer.
Collapse
Affiliation(s)
- G. Aravindhan
- Bioinformatics Division, AU-KBC Research Centre, MIT Campus, Anna University, Chennai—600 044, India
| | - R. Sathish Kumar
- NRCFOSS, AU-KBC Research Centre, MIT Campus, Anna University, Chennai—600 044, India
| | - K. Subha
- Bioinformatics Division, AU-KBC Research Centre, MIT Campus, Anna University, Chennai—600 044, India
| | - T.K. Subazini
- Bioinformatics Division, AU-KBC Research Centre, MIT Campus, Anna University, Chennai—600 044, India
| | - Alpana Dey
- Ministry of Communications and Information Technology, DT, New Delhi-110 003, India
| | - Krishna Kant
- Ministry of Communications and Information Technology, DT, New Delhi-110 003, India
| | - G. Ramesh Kumar
- Bioinformatics Division, AU-KBC Research Centre, MIT Campus, Anna University, Chennai—600 044, India
| |
Collapse
|
16
|
Abstract
The development of affordable, high-throughput sequencing technology has led to a flood of publicly available bacterial genome-sequence data. The availability of multiple genome sequences presents both an opportunity and a challenge for microbiologists, and new computational approaches are needed to extract the knowledge that is required to address specific biological problems and to analyse genomic data. The field of e-Science is maturing, and Grid-based technologies can help address this challenge.
Collapse
|
17
|
Lamprecht AL, Margaria T, Steffen B, Sczyrba A, Hartmeier S, Giegerich R. GeneFisher-P: variations of GeneFisher as processes in Bio-jETI. BMC Bioinformatics 2008; 9 Suppl 4:S13. [PMID: 18460174 PMCID: PMC2367627 DOI: 10.1186/1471-2105-9-s4-s13] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND PCR primer design is an everyday, but not trivial task requiring state-of-the-art software. We describe the popular tool GeneFisher and explain its recent restructuring using workflow techniques. We apply a service-oriented approach to model and implement GeneFisher-P, a process-based version of the GeneFisher web application, as a part of the Bio-jETI platform for service modeling and execution. We show how to introduce a flexible process layer to meet the growing demand for improved user-friendliness and flexibility. RESULTS Within Bio-jETI, we model the process using the jABC framework, a mature model-driven, service-oriented process definition platform. We encapsulate remote legacy tools and integrate web services using jETI, an extension of the jABC for seamless integration of remote resources as basic services, ready to be used in the process. Some of the basic services used by GeneFisher are in fact already provided as individual web services at BiBiServ and can be directly accessed. Others are legacy programs, and are made available to Bio-jETI via the jETI technology. The full power of service-based process orientation is required when more bioinformatics tools, available as web services or via jETI, lead to easy extensions or variations of the basic process. This concerns for instance variations of data retrieval or alignment tools as provided by the European Bioinformatics Institute (EBI). CONCLUSIONS The resulting service- and process-oriented GeneFisher-P demonstrates how basic services from heterogeneous sources can be easily orchestrated in the Bio-jETI platform and lead to a flexible family of specialized processes tailored to specific tasks.
Collapse
Affiliation(s)
- Anna-Lena Lamprecht
- Dortmund University of Technology, Chair of Programming Systems, Dortmund D-44227, Germany
| | - Tiziana Margaria
- Potsdam University, Chair of Service and Software Engineering, Potsdam D-14482, Germany
| | - Bernhard Steffen
- Dortmund University of Technology, Chair of Programming Systems, Dortmund D-44227, Germany
| | - Alexander Sczyrba
- Bielefeld University, Faculty of Technology, Bielefeld D-33594, Germany
| | - Sven Hartmeier
- Bielefeld University, Faculty of Technology, Bielefeld D-33594, Germany
| | - Robert Giegerich
- Bielefeld University, Faculty of Technology, Bielefeld D-33594, Germany
| |
Collapse
|
18
|
Lai YA, Lai IH, Tseng CF, Lee J, Mao SJT. Evidence of tandem repeat and extra thiol-groups resulted in the polymeric formation of bovine haptoglobin: a unique structure of Hp 2-2 phenotype. BMB Rep 2008; 40:1028-38. [PMID: 18047801 DOI: 10.5483/bmbrep.2007.40.6.1028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Human plasma Hp is classified as 1-1, 2-1, and 2-2. They are inherited from two alleles Hp 1 and Hp 2, but there is only Hp 1 in almost all the animal species. Hp 2-2 molecule is extremely large and heterogeneous associated with the development of inflammatory-related diseases. In this study, we expressed entire bovine Hp in E. coli as a alphabeta linear form. Interestingly, the antibodies prepared against this form could recognize the subunit of native Hp. In stead of a complicated column method, the antibody was able to isolate bovine Hp via immunoaffinity and gel-filtration columns. The isolated Hp is polymeric containing two major molecular forms (660 and 730 kDa). Their size and hemoglobin binding complex are significantly larger than that of human Hp 2-2. The amino-acid sequence deducted from the nucleotide sequence is similar to human Hp 2 containing a tandem repeat over the alpha chain. Thus, the Hp 2 allele is not unique in human. We also found that there is one additional -SH group (Cys-97) in bovine alpha chain with a total of 8 -SH groups, which may be responsible for the overall polymeric structure that is markedly different from human Hp 2-2. The significance of the finding and its relationship to structural evolution are also discussed.
Collapse
Affiliation(s)
- Yi An Lai
- Research Institute of Biochemical Engineering, Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan, ROC
| | | | | | | | | |
Collapse
|
19
|
Farny NG, Hurt JA, Silver PA. Definition of global and transcript-specific mRNA export pathways in metazoans. Genes Dev 2007; 22:66-78. [PMID: 18086857 DOI: 10.1101/gad.1616008] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Eukaryotic gene expression requires export of messenger RNAs (mRNAs) from their site of transcription in the nucleus to the cytoplasm where they are translated. While mRNA export has been studied in yeast, the complexity of gene structure and cellular function in metazoan cells has likely led to increased diversification of these organisms' export pathways. Here we report the results of a genome-wide RNAi screen in which we identify 72 factors required for polyadenylated [poly-(A(+))] mRNA export from the nucleus in Drosophila cells. Using structural and functional conservation analysis of yeast and Drosophila mRNA export factors, we expose the evolutionary divergence of eukaryotic mRNA export pathways. Additionally, we demonstrate the differential export requirements of two endogenous heat-inducible transcripts--intronless heat-shock protein 70 (HSP70) and intron-containing HSP83--and identify novel export factors that participate in HSP83 mRNA splicing. We characterize several novel factors and demonstrate their participation in interactions with known components of the Drosophila export machinery. One of these factors, Drosophila melanogaster PCI domain-containing protein 2 (dmPCID2), associates with polysomes and may bridge the transition between exported messenger ribonucleoprotein particles (mRNPs) and polysomes. Our results define the global network of factors involved in Drosophila mRNA export, reveal specificity in the export requirements of different transcripts, and expose new avenues for future work in mRNA export.
Collapse
Affiliation(s)
- Natalie G Farny
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | |
Collapse
|
20
|
Neuweger H, Baumbach J, Albaum S, Bekel T, Dondrup M, Hüser AT, Kalinowski J, Oehm S, Pühler A, Rahmann S, Weile J, Goesmann A. CoryneCenter - an online resource for the integrated analysis of corynebacterial genome and transcriptome data. BMC SYSTEMS BIOLOGY 2007; 1:55. [PMID: 18034885 PMCID: PMC2212648 DOI: 10.1186/1752-0509-1-55] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2007] [Accepted: 11/22/2007] [Indexed: 11/10/2022]
Abstract
BACKGROUND The introduction of high-throughput genome sequencing and post-genome analysis technologies, e.g. DNA microarray approaches, has created the potential to unravel and scrutinize complex gene-regulatory networks on a large scale. The discovery of transcriptional regulatory interactions has become a major topic in modern functional genomics. RESULTS To facilitate the analysis of gene-regulatory networks, we have developed CoryneCenter, a web-based resource for the systematic integration and analysis of genome, transcriptome, and gene regulatory information for prokaryotes, especially corynebacteria. For this purpose, we extended and combined the following systems into a common platform: (1) GenDB, an open source genome annotation system, (2) EMMA, a MAGE compliant application for high-throughput transcriptome data storage and analysis, and (3) CoryneRegNet, an ontology-based data warehouse designed to facilitate the reconstruction and analysis of gene regulatory interactions. We demonstrate the potential of CoryneCenter by means of an application example. Using microarray hybridization data, we compare the gene expression of Corynebacterium glutamicum under acetate and glucose feeding conditions: Known regulatory networks are confirmed, but moreover CoryneCenter points out additional regulatory interactions. CONCLUSION CoryneCenter provides more than the sum of its parts. Its novel analysis and visualization features significantly simplify the process of obtaining new biological insights into complex regulatory systems. Although the platform currently focusses on corynebacteria, the integrated tools are by no means restricted to these species, and the presented approach offers a general strategy for the analysis and verification of gene regulatory networks. CoryneCenter provides freely accessible projects with the underlying genome annotation, gene expression, and gene regulation data. The system is publicly available at http://www.CoryneCenter.de.
Collapse
Affiliation(s)
- Heiko Neuweger
- Computational Methods for Emerging Technologies group, Bielefeld University, Bielefeld, Germany.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Baumbach J. CoryneRegNet 4.0 - A reference database for corynebacterial gene regulatory networks. BMC Bioinformatics 2007; 8:429. [PMID: 17986320 PMCID: PMC2194740 DOI: 10.1186/1471-2105-8-429] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Accepted: 11/06/2007] [Indexed: 11/10/2022] Open
Abstract
Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user) can be analyzed in the context of known transcriptional regulatory networks to predict putative contradictions or further gene regulatory interactions. Furthermore, it integrates protein clusters by means of heuristically solving the weighted graph cluster editing problem. In addition, it provides Web Service based access to up to date gene annotation data from GenDB. Conclusion The release 4.0 of CoryneRegNet is a comprehensive system for the integrated analysis of procaryotic gene regulatory networks. It is a versatile systems biology platform to support the efficient and large-scale analysis of transcriptional regulation of gene expression in microorganisms. It is publicly available at .
Collapse
Affiliation(s)
- Jan Baumbach
- Computational Methods for Emerging Technologies, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
22
|
Prlić A, Down TA, Kulesha E, Finn RD, Kähäri A, Hubbard TJP. Integrating sequence and structural biology with DAS. BMC Bioinformatics 2007; 8:333. [PMID: 17850653 PMCID: PMC2031907 DOI: 10.1186/1471-2105-8-333] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Accepted: 09/12/2007] [Indexed: 11/16/2022] Open
Abstract
Background The Distributed Annotation System (DAS) is a network protocol for exchanging biological data. It is frequently used to share annotations of genomes and protein sequence. Results Here we present several extensions to the current DAS 1.5 protocol. These provide new commands to share alignments, three dimensional molecular structure data, add the possibility for registration and discovery of DAS servers, and provide a convention how to provide different types of data plots. We present examples of web sites and applications that use the new extensions. We operate a public registry of DAS sources, which now includes entries for more than 250 distinct sources. Conclusion Our DAS extensions are essential for the management of the growing number of services and exchange of diverse biological data sets. In addition the extensions allow new types of applications to be developed and scientific questions to be addressed. The registry of DAS sources is available at
Collapse
Affiliation(s)
- Andreas Prlić
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Thomas A Down
- Wellcome Trust/Cancer Research UK Gurdon Institute, Cambridge University, Cambridge, UK
| | - Eugene Kulesha
- European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Robert D Finn
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Andreas Kähäri
- European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Tim JP Hubbard
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| |
Collapse
|
23
|
Abstract
We present a new version of the European Bioinformatics Institute Web Services, a complete suite of SOAP-based web tools for structural and functional analysis, with new and improved applications. New functionality has been added to most of the services already available, and an improved version of the underlying framework has allowed us to include more applications. Information on the EBI Web Services, tutorials and clients can be found at http://www.ebi.ac.uk/Tools/webservices.
Collapse
Affiliation(s)
| | | | | | - Rodrigo Lopez
- *To whom correspondence should be addressed. +44 1223 494423+44 1223 494468
| |
Collapse
|
24
|
Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J, Murray-Rust P, Steinbeck C, Wikberg JES. Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinformatics 2007; 8:59. [PMID: 17316423 PMCID: PMC1808478 DOI: 10.1186/1471-2105-8-59] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2006] [Accepted: 02/22/2007] [Indexed: 11/13/2022] Open
Abstract
Background There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no sucessful attempts have been made to integrate chemo- and bioinformatics into a single framework. Results Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. Conclusion Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at .
Collapse
Affiliation(s)
- Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Tobias Helmus
- Cologne University Bioinformatics Center, Cologne University, Cologne, Germany
| | - Egon L Willighagen
- Cologne University Bioinformatics Center, Cologne University, Cologne, Germany
| | - Stefan Kuhn
- Cologne University Bioinformatics Center, Cologne University, Cologne, Germany
| | - Martin Eklund
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | | | - Peter Murray-Rust
- Department of Chemistry, Unilever Centre for Molecular Informatics, University of Cambridge, Cambridge, UK
| | - Christoph Steinbeck
- Cologne University Bioinformatics Center, Cologne University, Cologne, Germany
| | - Jarl ES Wikberg
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
25
|
Riley ML, Schmidt T, Artamonova II, Wagner C, Volz A, Heumann K, Mewes HW, Frishman D. PEDANT genome database: 10 years online. Nucleic Acids Res 2006; 35:D354-7. [PMID: 17148486 PMCID: PMC1761421 DOI: 10.1093/nar/gkl1005] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The PEDANT genome database provides exhaustive annotation of 468 genomes by a broad set of bioinformatics algorithms. We describe recent developments of the PEDANT Web server. The all-new Graphical User Interface (GUI) implemented in Javatrade mark allows for more efficient navigation of the genome data, extended search capabilities, user customization and export facilities. The DNA and Protein viewers have been made highly dynamic and customizable. We also provide Web Services to access the entire body of PEDANT data programmatically. Finally, we report on the application of association rule mining for automatic detection of potential annotation errors. PEDANT is freely accessible to academic users at http://pedant.gsf.de.
Collapse
Affiliation(s)
- M. Louise Riley
- Institute for Bioinformatics, GSF-National Research Center for Health and EnvironmentIngolstädter Landstrasse 1, 85764 Neuherberg, Germany
| | - Thorsten Schmidt
- Department of Genome-oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München85350 Freising, Germany
| | - Irena I. Artamonova
- Institute for Bioinformatics, GSF-National Research Center for Health and EnvironmentIngolstädter Landstrasse 1, 85764 Neuherberg, Germany
| | - Christian Wagner
- Biomax Informatics AG, Lochhamer Strasse 982152 Martinsried, Germany
| | - Andreas Volz
- Biomax Informatics AG, Lochhamer Strasse 982152 Martinsried, Germany
| | - Klaus Heumann
- Biomax Informatics AG, Lochhamer Strasse 982152 Martinsried, Germany
| | - Hans-Werner Mewes
- Institute for Bioinformatics, GSF-National Research Center for Health and EnvironmentIngolstädter Landstrasse 1, 85764 Neuherberg, Germany
- Department of Genome-oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München85350 Freising, Germany
| | - Dmitrij Frishman
- Institute for Bioinformatics, GSF-National Research Center for Health and EnvironmentIngolstädter Landstrasse 1, 85764 Neuherberg, Germany
- Department of Genome-oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München85350 Freising, Germany
- To whom correspondence should be addressed. Tel: +49 8161 712134; Fax: +49 8161 712186;
| |
Collapse
|
26
|
Wang Z, Miyake T, Edwards SV, Amemiya CT. Tuatara (Sphenodon) Genomics: BAC Library Construction, Sequence Survey, and Application to the DMRT Gene Family. J Hered 2006; 97:541-8. [PMID: 17135461 DOI: 10.1093/jhered/esl040] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The tuatara (Sphenodon punctatus) is of "extraordinary biological interest" as the most distinctive surviving reptilian lineage (Rhyncocephalia) in the world. To provide a genomic resource for an understanding of genome evolution in reptiles, and as part of a larger project to produce genomic resources for various reptiles (evogen.jgi.doe.gov/second_levels/BACs/our_libraries.html), a large-insert bacterial artificial chromosome (BAC) library from a male tuatara was constructed. The library consists of 215 424 individual clones whose average insert size was empirically determined to be 145 kb, yielding a genomic coverage of approximately 6.3x. A BAC-end sequencing analysis of 121 420 bp of sequence revealed a genomic GC content of 46.8%, among the highest observed thus far for vertebrates, and identified several short interspersed repetitive elements (mammalian interspersed repeat-type repeats) and long interspersed repetitive elements, including chicken repeat 1 element. Finally, as a quality control measure the arrayed library was screened with probes corresponding to 2 conserved noncoding regions of the candidate sex-determining gene DMRT1 and the DM domain of the related DMRT2 gene. A deep coverage contig spanning nearly 300 kb was generated, supporting the deep coverage and utility of the library for exploring tuatara genomics.
Collapse
Affiliation(s)
- Zhenshan Wang
- Department of Biology, University of Washington, Seattle, WA 98195, USA.
| | | | | | | |
Collapse
|
27
|
XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics 2006; 7:490. [PMID: 17087823 PMCID: PMC2001303 DOI: 10.1186/1471-2105-7-490] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Accepted: 11/06/2006] [Indexed: 11/30/2022] Open
Abstract
Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
Collapse
|
28
|
Byun Y, Han K. PseudoViewer: web application and web service for visualizing RNA pseudoknots and secondary structures. Nucleic Acids Res 2006; 34:W416-22. [PMID: 16845039 PMCID: PMC1538805 DOI: 10.1093/nar/gkl210] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Visualizing RNA secondary structures and pseudoknot structures is essential to bioinformatics systems that deal with RNA structures. However, many bioinformatics systems use heterogeneous data structures and incompatible software components, so integration of software components (including a visualization component) into a system can be hindered by incompatibilities between the components of the system. This paper presents an XML web service and web application program for visualizing RNA secondary structures with pseudoknots. Experimental results show that the PseudoViewer web service and web application are useful for resolving many problems with incompatible software components as well as for visualizing large-scale RNA secondary structures with pseudoknots of any type. The web service and web application are available at .
Collapse
Affiliation(s)
| | - Kyungsook Han
- To whom correspondence should be addressed. Tel: +82 32 860 7388; Fax: +82 32 863 4386;
| |
Collapse
|
29
|
Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. Taverna: a tool for building and running workflows of services. Nucleic Acids Res 2006; 34:W729-32. [PMID: 16845108 PMCID: PMC1538887 DOI: 10.1093/nar/gkl320] [Citation(s) in RCA: 620] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from http://taverna.sourceforge.net/.
Collapse
Affiliation(s)
- Duncan Hull
- School of Computer Science, University of Manchester, M13 9PL, UK.
| | | | | | | | | | | | | |
Collapse
|
30
|
Rampp M, Soddemann T, Lederer H. The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis. Nucleic Acids Res 2006; 34:W15-9. [PMID: 16844980 PMCID: PMC1538907 DOI: 10.1093/nar/gkl254] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2006] [Revised: 03/13/2006] [Accepted: 03/30/2006] [Indexed: 11/14/2022] Open
Abstract
We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein-structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at www.migenas.org (click 'Start Toolkit').
Collapse
Affiliation(s)
- Markus Rampp
- Rechenzentrum Garching der Max-Planck-Gesellschaft (RZG), am Max-Planck-Institut für PlasmaphysikBoltzmannstrasse 2, 85748 Garching, Germany
| | - Thomas Soddemann
- Rechenzentrum Garching der Max-Planck-Gesellschaft (RZG), am Max-Planck-Institut für PlasmaphysikBoltzmannstrasse 2, 85748 Garching, Germany
| | - Hermann Lederer
- Rechenzentrum Garching der Max-Planck-Gesellschaft (RZG), am Max-Planck-Institut für PlasmaphysikBoltzmannstrasse 2, 85748 Garching, Germany
| |
Collapse
|
31
|
Philippi S, Köhler J. Addressing the problems with life-science databases for traditional uses and systems biology. Nat Rev Genet 2006; 7:482-8. [PMID: 16682980 DOI: 10.1038/nrg1872] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A prerequisite to systems biology is the integration of heterogeneous experimental data, which are stored in numerous life-science databases. However, a wide range of obstacles that relate to access, handling and integration impede the efficient use of the contents of these databases. Addressing these issues will not only be essential for progress in systems biology, it will also be crucial for sustaining the more traditional uses of life-science databases.
Collapse
Affiliation(s)
- Stephan Philippi
- Department of Computer Science, University of Koblenz, PO Box 201602, 56016 Koblenz, Germany.
| | | |
Collapse
|
32
|
Stajich JE, Dietrich FS. Evidence of mRNA-mediated intron loss in the human-pathogenic fungus Cryptococcus neoformans. EUKARYOTIC CELL 2006; 5:789-93. [PMID: 16682456 PMCID: PMC1459680 DOI: 10.1128/ec.5.5.789-793.2006] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Introns are a defining feature of eukaryotic genomes, though the mechanism of intron gain or loss is not well understood. Reverse transcription of mRNA followed by homologous recombination with the genome has been posited as a mechanism of intron loss, though little direct evidence of recent loss events has been described to support this model. We find supporting evidence for an mRNA-mediated mechanism of loss through comparative genome analyses that revealed a recent loss of 10 adjacent introns in a 22-exon gene in the human-pathogenic fungus Cryptococcus neoformans. We surveyed the gene structures of the entire genomes of Cryptococcus gattii, which diverged from the C. neoformans lineage 37 million years ago (Mya), and C. neoformans var. grubii and var. neoformans, which diverged 18 Mya. Our comparison revealed greater than 99.9% intron conservation, with evidence from 20 genes showing evidence of intron loss, but no convincing evidence of intron gain. Our findings confirm that Cryptococcus introns have been quite stable over recent evolutionary time, with occasional mRNA-mediated intron loss events.
Collapse
Affiliation(s)
- Jason E Stajich
- Department of Molecular Genetics and Microbiology, Center for Applied Genomics and Technology, and Institute for Genome Sciences and Policy, Duke University, Box 3568, Durham, NC 27710, USA
| | | |
Collapse
|
33
|
Chakravarti R, Adams JC. Comparative genomics of the syndecans defines an ancestral genomic context associated with matrilins in vertebrates. BMC Genomics 2006; 7:83. [PMID: 16620374 PMCID: PMC1464127 DOI: 10.1186/1471-2164-7-83] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2006] [Accepted: 04/18/2006] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND The syndecans are the major family of transmembrane proteoglycans in animals and are known for multiple roles in cell interactions and growth factor signalling during development, inflammatory response, wound-repair and tumorigenesis. Although syndecans have been cloned from several invertebrate and vertebrate species, the extent of conservation of the family across the animal kingdom is unknown and there are gaps in our knowledge of chordate syndecans. Here, we develop a new level of knowledge for the whole syndecan family, by combining molecular phylogeny of syndecan protein sequences with analysis of the genomic contexts of syndecan genes in multiple vertebrate organisms. RESULTS We identified syndecan-encoding sequences in representative Cnidaria and throughout the Bilateria. The C1 and C2 regions of the cytoplasmic domain are highly conserved throughout the animal kingdom. We identified in the variable region a universally-conserved leucine residue and a tyrosine residue that is conserved throughout the Bilateria. Of all the genomes examined, only tetrapod and fish genomes encode multiple syndecans. No syndecan-1 was identified in fish. The genomic context of each vertebrate syndecan gene is syntenic between human, mouse and chicken, and this conservation clearly extends to syndecan-2 and -3 in T. nigroviridis. In addition, tetrapod syndecans were found to be encoded from paralogous chromosomal regions that also contain the four members of the matrilin family. Whereas the matrilin-3 and syndecan-1 genes are adjacent in tetrapods, this chromosomal region appears to have undergone extensive lineage-specific rearrangements in fish. CONCLUSION Throughout the animal kingdom, syndecan extracellular domains have undergone rapid change and elements of the cytoplasmic domains have been very conserved. The four syndecan genes of vertebrates are syntenic across tetrapods, and synteny of the syndecan-2 and -3 genes is apparent between tetrapods and fish. In vertebrates, each of the four family members are encoded from paralogous genomic regions in which members of the matrilin family are also syntenic between tetrapods and fish. This genomic organization appears to have been set up after the divergence of urochordates (Ciona) and vertebrates. The syndecan-1 gene appears to have been lost relatively early in the fish lineage. These conclusions provide the basis for a new model of syndecan evolution in vertebrates and a new perspective for analyzing the roles of syndecans in cells and whole organisms.
Collapse
Affiliation(s)
- Ritu Chakravarti
- Dept. of Cell Biology, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44195, USA
| | - Josephine C Adams
- Dept. of Cell Biology, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH 44195, USA
- Dept. of Molecular Medicine, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland Clinic Foundation, Cleveland, OH 44195, USA
| |
Collapse
|
34
|
Abstract
UNLABELLED Emerging web-services technology allows interoperability between multiple distributed architectures. Here, we present REMORA, a web server implemented according to the BioMoby web-service specifications, providing life science researchers with an easy-to-use workflow generator and launcher, a repository of predefined workflows and a survey system. CONTACT Jerome.Gouzy@toulouse.inra.fr AVAILABILITY The REMORA web server is freely available at http://bioinfo.genopole-toulouse.prd.fr/remora, sources are available upon request from the authors.
Collapse
Affiliation(s)
- Sébastien Carrere
- INRA-CNRS Laboratoire des Interactions Plantes Micro-organismes (LIPM) BP 52627, 31326 Castanet Tolosan Cedex, France
| | | |
Collapse
|