1
|
ImmuSort, a database on gene plasticity and electronic sorting for immune cells. Sci Rep 2015; 5:10370. [PMID: 25988315 PMCID: PMC4437374 DOI: 10.1038/srep10370] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 04/09/2015] [Indexed: 01/15/2023] Open
Abstract
Gene expression is highly dynamic and plastic. We present a new immunological database, ImmuSort. Unlike other gene expression databases, ImmuSort provides a convenient way to view global differential gene expression data across thousands of experimental conditions in immune cells. It enables electronic sorting, which is a bioinformatics process to retrieve cell states associated with specific experimental conditions that are mainly based on gene expression intensity. A comparison of gene expression profiles reveals other applications, such as the evaluation of immune cell biomarkers and cell subsets, identification of cell specific and/or disease-associated genes or transcripts, comparison of gene expression in different transcript variants and probe set quality evaluation. A plasticity score is introduced to measure gene plasticity. Average rank and marker evaluation scores are used to evaluate biomarkers. The current version includes 31 human and 17 mouse immune cell groups, comprising 10,422 and 3,929 microarrays derived from public databases, respectively. A total of 20,283 human and 20,963 mouse genes are available to query in the database. Examples show the distinct advantages of the database. The database URL is http://202.85.212.211/Account/ImmuSort.html.
Collapse
|
2
|
Abstract
AbstractCurrently, biological databases (DBs) are a common tool to complement the research of a wide range of biomedical disciplines, but there are only a few specialized medical DBs for human brain tumour magnetic resonance spectroscopy (MRS) data; they typically store a limited range of biological data (i.e. clinical information, magnetic resonance imaging and MRS data) and are not offered as open-source Structured Query Language relational DB schemas. We present a novel approach to biological DBs: a distributed Web-accessible DB for storing and managing clinical and biomedical data related to brain tumours from different clinical centres. This tool is designed for multi-platform systems with dissimilar DB management systems. Being the main data repository of the HealthAgents (HA) project, it uses multi-agent technology and allows the centres to share data and obtain diagnosis classifications from other centres distributed around the world in a reliable way.The HA project aims to create an agent-based distributed decision support system (DSS) to assist doctors to provide a brain tumour diagnosis and prognosis. The HA DB enables the DSS to totally integrate with its Graphical User Interface to perform classifications with the stored data and visualize the results using the HA distributed agents framework. This new feature converts the system presented in the first application in the world to combine a storage and management tool for brain tumour data and a complete Web-based DSS to obtain automatic diagnosis.
Collapse
|
3
|
Jupp S, Klein J, Schanstra J, Stevens R. Developing a kidney and urinary pathway knowledge base. J Biomed Semantics 2011; 2 Suppl 2:S7. [PMID: 21624162 PMCID: PMC3102896 DOI: 10.1186/2041-1480-2-s2-s7] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration. RESULTS We present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney. CONCLUSIONS The KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain's ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself. AVAILABILITY The KUPKB may be accessed via http://www.e-lico.eu/kupkb.
Collapse
Affiliation(s)
- Simon Jupp
- School of Computer Science, University of Manchester, UK
| | - Julie Klein
- Institut National de la Santé et de la Recherche Médicale (INSERM), U858, Toulouse, France
- Université Toulouse III Paul-Sabatier, I2MR, IFR150, Toulouse, France
| | - Joost Schanstra
- Institut National de la Santé et de la Recherche Médicale (INSERM), U858, Toulouse, France
- Université Toulouse III Paul-Sabatier, I2MR, IFR150, Toulouse, France
| | - Robert Stevens
- School of Computer Science, University of Manchester, UK
| |
Collapse
|
4
|
|
5
|
Web Service management system for bioinformatics research: a case study. SERVICE ORIENTED COMPUTING AND APPLICATIONS 2011. [DOI: 10.1007/s11761-011-0076-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
Brochhausen M, Spear AD, Cocos C, Weiler G, Martín L, Anguita A, Stenzhorn H, Daskalaki E, Schera F, Schwarz U, Sfakianakis S, Kiefer S, Dörr M, Graf N, Tsiknakis M. The ACGT Master Ontology and its applications--towards an ontology-driven cancer research and management system. J Biomed Inform 2011; 44:8-25. [PMID: 20438862 PMCID: PMC5755590 DOI: 10.1016/j.jbi.2010.04.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Revised: 04/23/2010] [Accepted: 04/27/2010] [Indexed: 11/28/2022]
Abstract
OBJECTIVE This paper introduces the objectives, methods and results of ontology development in the EU co-funded project Advancing Clinico-genomic Trials on Cancer-Open Grid Services for Improving Medical Knowledge Discovery (ACGT). While the available data in the life sciences has recently grown both in amount and quality, the full exploitation of it is being hindered by the use of different underlying technologies, coding systems, category schemes and reporting methods on the part of different research groups. The goal of the ACGT project is to contribute to the resolution of these problems by developing an ontology-driven, semantic grid services infrastructure that will enable efficient execution of discovery-driven scientific workflows in the context of multi-centric, post-genomic clinical trials. The focus of the present paper is the ACGT Master Ontology (MO). METHODS ACGT project researchers undertook a systematic review of existing domain and upper-level ontologies, as well as of existing ontology design software, implementation methods, and end-user interfaces. This included the careful study of best practices, design principles and evaluation methods for ontology design, maintenance, implementation, and versioning, as well as for use on the part of domain experts and clinicians. RESULTS To date, the results of the ACGT project include (i) the development of a master ontology (the ACGT-MO) based on clearly defined principles of ontology development and evaluation; (ii) the development of a technical infrastructure (the ACGT Platform) that implements the ACGT-MO utilizing independent tools, components and resources that have been developed based on open architectural standards, and which includes an application updating and evolving the ontology efficiently in response to end-user needs; and (iii) the development of an Ontology-based Trial Management Application (ObTiMA) that integrates the ACGT-MO into the design process of clinical trials in order to guarantee automatic semantic integration without the need to perform a separate mapping process.
Collapse
Affiliation(s)
- Mathias Brochhausen
- Institute of Formal Ontology and Medical, Information Science, Saarland University, P.O. Box 15 11 50, 66041 Saarbrücken, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
|
8
|
Rutledge RM, Esser L, Ma J, Xia D. Toward understanding the mechanism of action of the yeast multidrug resistance transporter Pdr5p: a molecular modeling study. J Struct Biol 2010; 173:333-44. [PMID: 21034832 DOI: 10.1016/j.jsb.2010.10.012] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Revised: 10/19/2010] [Accepted: 10/21/2010] [Indexed: 10/18/2022]
Abstract
Pleotropic drug resistant protein 5 (Pdr5p) is a plasma membrane ATP-binding cassette (ABC) transporter and the major drug efflux pump in Saccharomyces cerevisiae. The Pdr5p family of fungal transporters possesses a number of structural features significantly different from other modeled or crystallized ABC transporters, which include a reverse topology, an atypical ATP-binding site, a very low sequence similarity in the transmembrane section and long linkers between domains. These features present a considerable hurdle in molecular modeling studies of these important transporters. Here, we report the creation of an atomic model of Pdr5p based on a combination of homology modeling and ab initio methods, incorporating information from consensus transmembrane segment prediction, residue lipophilicity, and sequence entropy. Reported mutations in the transmembrane substrate-binding pocket that altered drug-resistance were used to validate the model, and one mutation that changed the communication pattern between transmembrane and nucleotide-binding domains was used in model improvement. The predictive power of the model was demonstrated experimentally by the increased sensitivity of yeast mutants to clotrimazole having alanine substitutions for Thr1213 and Gln1253, which are predicted to be in the substrate-binding pocket, without reducing the amount of Pdr5p in the plasma membrane. The quality and reliability of our model are discussed in the context of various approaches used for modeling different parts of the structure.
Collapse
Affiliation(s)
- Robert M Rutledge
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | |
Collapse
|
9
|
Wang K, Bai X, Li J, Ding C. A service-based framework for pharmacogenomics data integration. ENTERP INF SYST-UK 2010. [DOI: 10.1080/17517575.2010.498525] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Cadag E, Tarczy-Hornoch P. Supporting retrieval of diverse biomedical data using evidence-aware queries. J Biomed Inform 2010; 43:873-82. [PMID: 20643225 DOI: 10.1016/j.jbi.2010.07.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Revised: 06/23/2010] [Accepted: 07/03/2010] [Indexed: 11/18/2022]
Abstract
Though there have been many advances in providing access to linked and integrated biomedical data across repositories, developing methods which allow users to specify ambiguous and exploratory queries over disparate sources remains a challenge to extracting well-curated or diversely-supported biological information. In the following work, we discuss the concepts of data coverage and evidence in the context of integrated sources. We address diverse information retrieval via a simple framework for representing coverage and evidence that operates in parallel with an arbitrary schema, and a language upon which queries on the schema and framework may be executed. We show that this approach is capable of answering questions that require ranged levels of evidence or triangulation, and demonstrate that appropriately-formed queries can significantly improve the level of precision when retrieving well-supported biomedical data.
Collapse
Affiliation(s)
- Eithon Cadag
- Department of Medical Education and Biomedical Informatics, University of Washington, 1959 NE Pacific St., HSB I-264, Seattle, WA 98195-7240, USA.
| | | |
Collapse
|
11
|
Lim SJ, Tan TW, Tong JC. Computational Epigenetics: the new scientific paradigm. Bioinformation 2010; 4:331-7. [PMID: 20978607 PMCID: PMC2957762 DOI: 10.6026/97320630004331] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2009] [Revised: 01/13/2010] [Accepted: 01/21/2010] [Indexed: 12/25/2022] Open
Abstract
Epigenetics has recently emerged as a critical field for studying how non-gene factors can influence the traits and functions of an organism. At the core of this new wave of research is the use of computational tools that play critical roles not only in directing the selection of key experiments, but also in formulating new testable hypotheses through detailed analysis of complex genomic information that is not achievable using traditional approaches alone. Epigenomics, which combines traditional genomics with computer science, mathematics, chemistry, biochemistry and proteomics for the large-scale analysis of heritable changes in phenotype, gene function or gene expression that are not dependent on gene sequence, offers new opportunities to further our understanding of transcriptional regulation, nuclear organization, development and disease. This article examines existing computational strategies for the study of epigenetic factors. The most important databases and bioinformatic tools in this rapidly growing field have been reviewed.
Collapse
Affiliation(s)
- Shen Jean Lim
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore 117597
| | | | | |
Collapse
|
12
|
Tatusova T. Genomic databases and resources at the National Center for Biotechnology Information. Methods Mol Biol 2010; 609:17-44. [PMID: 20221911 DOI: 10.1007/978-1-60327-241-4_2] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI Web site. Entrez, a text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Customized genomic BLAST enables sequence similarity searches against a special collection of organism-specific sequence data and viewing the resulting alignments within a genomic context using NCBI's genome browser, Map Viewer.Comparative genome analysis tools lead to further understanding of evolutionary processes, quickening the pace of discovery.
Collapse
|
13
|
Wang P, Yu P, Gao P, Shi T, Ma D. Discovery of novel human transcript variants by analysis of intronic single-block EST with polyadenylation site. BMC Genomics 2009; 10:518. [PMID: 19906316 PMCID: PMC2784480 DOI: 10.1186/1471-2164-10-518] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Accepted: 11/12/2009] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND Alternative polyadenylation sites within a gene can lead to alternative transcript variants. Although bioinformatic analysis has been conducted to detect polyadenylation sites using nucleic acid sequences (EST/mRNA) in the public databases, one special type, single-block EST is much less emphasized. This bias leaves a large space to discover novel transcript variants. RESULTS In the present study, we identified novel transcript variants in the human genome by detecting intronic polyadenylation sites. Poly(A/T)-tailed ESTs were obtained from single-block ESTs and clustered into 10,844 groups standing for 5,670 genes. Most sites were not found in other alternative splicing databases. To verify that these sites are from expressed transcripts, we analyzed the supporting EST number of each site, blasted representative ESTs against known mRNA sequences, traced terminal sequences from cDNA clones, and compared with the data of Affymetrix tiling array. These analyses confirmed about 84% (9,118/10,844) of the novel alternative transcripts, especially, 33% (3,575/10,844) of the transcripts from 2,704 genes were taken as high-reliability. Additionally, RT-PCR confirmed 38% (10/26) of predicted novel transcript variants. CONCLUSION Our results provide evidence for novel transcript variants with intronic poly(A) sites. The expression of these novel variants was confirmed with computational and experimental tools. Our data provide a genome-wide resource for identification of novel human transcript variants with intronic polyadenylation sites, and offer a new view into the mystery of the human transcriptome.
Collapse
Affiliation(s)
- Pingzhang Wang
- Chinese National Human Genome Center, #3-707 North YongChang Road BDA, Beijing, PR China.
| | | | | | | | | |
Collapse
|
14
|
Jonstrup SP, Gray T, Kahns S, Skall HF, Snow M, Olesen NJ. FishPathogens.eu/vhsv: a user-friendly viral haemorrhagic septicaemia virus isolate and sequence database. JOURNAL OF FISH DISEASES 2009; 32:925-929. [PMID: 19538460 DOI: 10.1111/j.1365-2761.2009.01073.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
A database has been created, http://www.FishPathogens.eu, with the aim of providing a single repository for collating important information on significant pathogens of aquaculture, relevant to their control and management. This database will be developed, maintained and managed as part of the European Community Reference Laboratory for Fish Diseases function. This concept has been initially developed for viral haemorrhagic septicaemia virus and will be extended in future to include information on other significant aquaculture pathogens. Information included for each isolate comprises sequence, geographical origin, host origin and useful key literature. Various search mechanisms make it easy to find specific groups of isolates. Search results can be presented in several different ways including table-based, map-based and graph-based outputs. When retrieving sequences, the user is given freedom to obtain data from any selected part of the genome of interest. The output of the sequence search can be readily retrieved as a FASTA file ready to be imported into a sequence alignment tool of choice, facilitating further molecular epidemiological study.
Collapse
Affiliation(s)
- S P Jonstrup
- Division of Poultry, Fish and Fur Animals, National Veterinary Institute, Technical University of Denmark, Arhus N, Denmark.
| | | | | | | | | | | |
Collapse
|
15
|
Mesiti M, Jiménez-Ruiz E, Sanz I, Berlanga-Llavori R, Perlasca P, Valentini G, Manset D. XML-based approaches for the integration of heterogeneous bio-molecular data. BMC Bioinformatics 2009; 10 Suppl 12:S7. [PMID: 19828083 PMCID: PMC2762072 DOI: 10.1186/1471-2105-10-s12-s7] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing. Results In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML. Conclusion XML has succeeded in the integration of heterogeneous biomolecular information, and has established itself as the syntactic glue for biological data sources. Nevertheless, a large variety of XML-based data formats have been proposed, thus resulting in a difficult effective integration of bioinformatics data schemes. The adoption of a few semantic-rich standard formats is urgent to achieve a seamless integration of the current biological resources.
Collapse
Affiliation(s)
- Marco Mesiti
- Università degli Studi di Milano, Via Comelico 39, Milan, Italy.
| | | | | | | | | | | | | |
Collapse
|
16
|
de la Calle G, García-Remesal M, Chiesa S, de la Iglesia D, Maojo V. BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature. BMC Bioinformatics 2009; 10:320. [PMID: 19811635 PMCID: PMC2765974 DOI: 10.1186/1471-2105-10-320] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2009] [Accepted: 10/07/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities. RESULTS We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted. CONCLUSION These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at http://edelman.dia.fi.upm.es/biri/.
Collapse
Affiliation(s)
- Guillermo de la Calle
- Dept Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain
| | - Miguel García-Remesal
- Dept Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain
| | - Stefano Chiesa
- Dept Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain
| | - Diana de la Iglesia
- Dept Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain
| | - Victor Maojo
- Dept Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660 Boadilla del Monte, Madrid, Spain
| |
Collapse
|
17
|
Matsunaga T, Yonemori C, Tomita E, Muramatsu M. Clique-based data mining for related genes in a biomedical database. BMC Bioinformatics 2009; 10:205. [PMID: 19566964 PMCID: PMC2721841 DOI: 10.1186/1471-2105-10-205] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2008] [Accepted: 07/01/2009] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph. RESULTS We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes. CONCLUSION We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms.
Collapse
Affiliation(s)
- Tsutomu Matsunaga
- Research and Development Headquarters, NTT DATA Corporation, Tokyo, 135-8671, Japan
| | - Chikara Yonemori
- Research and Development Headquarters, NTT DATA Corporation, Tokyo, 135-8671, Japan
| | - Etsuji Tomita
- The Advanced Algorithms Research Laboratory, The University of Electro-Communications, Tokyo, 182-8585, Japan
- Research and Development Initiative, Chuo University, Tokyo, 112-8551, Japan
| | - Masaaki Muramatsu
- Medical Research Institute, Tokyo Medical and Dental University, Tokyo, 101-0062, Japan
- Research Institute, HuBit Genomix Inc, Tokyo, 102-0092, Japan
| |
Collapse
|
18
|
Rubino F, Attimonelli M. RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences. BMC Bioinformatics 2009; 10 Suppl 6:S5. [PMID: 19534754 PMCID: PMC2697652 DOI: 10.1186/1471-2105-10-s6-s5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the most frequent uses of bioinformatics tools concerns functional characterization of a newly produced nucleotide sequence (a query sequence) by applying Blast or FASTA against a set of sequences (the subject sequences). However, in some specific contexts, it is useful to compare the query sequence against a cluster such as a MultiAlignment (MA). We present here the RegExpBlasting (REB) algorithm, which compares an unclassified sequence with a dataset of patterns defined by application of Regular Expression rules to a given-as-input MA datasets. The REB algorithm workflow consists in i. the definition of a dataset of multialignments ii. the association of each MA to a pattern, defined by application of regular expression rules; iii. automatic characterization of a submitted biosequence according to the function of the sequences described by the pattern best matching the query sequence. RESULTS An application of this algorithm is used in the "characterize your sequence" tool available in the PPNEMA resource. PPNEMA is a resource of Ribosomal Cistron sequences from various species, grouped according to nematode genera. It allows the retrieval of plant nematode multialigned sequences or the classification of new nematode rDNA sequences by applying REB. The same algorithm also supports automatic updating of the PPNEMA database. The present paper gives examples of the use of REB within PPNEMA. CONCLUSION The use of REB in PPNEMA updating, the PPNEMA "characterize your sequence" option clearly demonstrates the power of the method. Using REB can also rapidly solve any other bioinformatics problem, where the addition of a new sequence to a pre-existing cluster is required. The statistical tests carried out here show the powerful flexibility of the method.
Collapse
Affiliation(s)
- Francesco Rubino
- Department of Biochemistry and Molecular Biology E, Quagliariello - Bari, 70126, Italy.
| | | |
Collapse
|
19
|
Abstract
Motivation In the biological sciences, the need to analyse vast amounts of information has become commonplace. Such large-scale analyses often involve drawing together data from a variety of different databases, held remotely on the internet or locally on in-house servers. Supporting these tasks are ad hoc collections of data-manipulation tools, scripting languages and visualisation software, which are often combined in arcane ways to create cumbersome systems that have been customised for a particular purpose, and are consequently not readily adaptable to other uses. For many day-to-day bioinformatics tasks, the sizes of current databases, and the scale of the analyses necessary, now demand increasing levels of automation; nevertheless, the unique experience and intuition of human researchers is still required to interpret the end results in any meaningful biological way. Putting humans in the loop requires tools to support real-time interaction with these vast and complex data-sets. Numerous tools do exist for this purpose, but many do not have optimal interfaces, most are effectively isolated from other tools and databases owing to incompatible data formats, and many have limited real-time performance when applied to realistically large data-sets: much of the user's cognitive capacity is therefore focused on controlling the software and manipulating esoteric file formats rather than on performing the research. Methods To confront these issues, harnessing expertise in human-computer interaction (HCI), high-performance rendering and distributed systems, and guided by bioinformaticians and end-user biologists, we are building reusable software components that, together, create a toolkit that is both architecturally sound from a computing point of view, and addresses both user and developer requirements. Key to the system's usability is its direct exploitation of semantics, which, crucially, gives individual components knowledge of their own functionality and allows them to interoperate seamlessly, removing many of the existing barriers and bottlenecks from standard bioinformatics tasks. Results The toolkit, named Utopia, is freely available from .
Collapse
|
20
|
Szpakowski S, McCusker J, Krauthammer M. Using semantic web technologies to annotate and align microarray designs. Cancer Inform 2009; 8:65-73. [PMID: 24904201 PMCID: PMC4042255 DOI: 10.4137/cin.s2335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
In this paper, we annotate and align two different gene expression microarray designs using the Genomic ELement Ontology (GELO). GELO is a new ontology that leverages an existing community resource, Sequence Ontology (SO), to create views of genomically-aligned data in a semantic web environment. We start the process by mapping array probes to genomic coordinates. The coordinates represent an implicit link between the probes and multiple genomic elements, such as genes, transcripts, miRNA, and repetitive elements, which are represented using concepts in SO. We then use the RDF Query Language (SPARQL) to create explicit links between the probes and the elements. We show how the approach allows us to easily determine the element coverage and genomic overlap of the two array designs. We believe that the method will ultimately be useful for integration of cancer data across multiple omic studies. The ontology and other materials described in this paper are available at http://krauthammerlab.med.yale.edu/wiki/Gelo.
Collapse
Affiliation(s)
- Sebastian Szpakowski
- Program for Computational Biology and Bioinformatics (CBB), Yale University School of Medicine, New Haven, CT. ; Department of Pathology, Yale University School of Medicine, New Haven, CT
| | - James McCusker
- Department of Pathology, Yale University School of Medicine, New Haven, CT
| | | |
Collapse
|
21
|
Current World Literature. Curr Opin Lipidol 2009; 20:135-42. [PMID: 19276892 DOI: 10.1097/mol.0b013e32832a7e09] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
22
|
Modeling genomic data with type attributes, balancing stability and maintainability. BMC Bioinformatics 2009; 10:97. [PMID: 19327130 PMCID: PMC2676260 DOI: 10.1186/1471-2105-10-97] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2008] [Accepted: 03/27/2009] [Indexed: 11/10/2022] Open
Abstract
Background Molecular biology (MB) is a dynamic research domain that benefits greatly from the use of modern software technology in preparing experiments, analyzing acquired data, and even performing "in-silico" analyses. As ever new findings change the face of this domain, software for MB has to be sufficiently flexible to accommodate these changes. At the same time, however, the efficient development of high-quality and interoperable software requires a stable model of concepts for the subject domain and their relations. The result of these two contradictory requirements is increased complexity in the development of MB software. A common means to reduce complexity is to consider only a small part of the domain, instead of the domain as a whole. As a result, small, specialized programs develop their own domain understanding. They often use one of the numerous data formats or implement proprietary data models. This makes it difficult to incorporate the results of different programs, which is needed by many users in order to work with the software efficiently. The data conversions required to achieve interoperability involve more than just type conversion. Usually they also require complex data mappings and lead to a loss of information. Results To address these problems, we have developed a flexible computer model for the MB domain that supports both changeability and interoperability. This model describes concepts of MB in a formal manner and provides a comprehensive view on it. In this model, we adapted the design pattern "Dynamic Object Model" by using meta data and association classes. A small, highly abstract class model, named "operational model," defines the scope of the software system. An object model, named "knowledge model," describes concrete concepts of the MB domain. The structure of the knowledge model is described by a meta model. We proved our model to be stable, flexible, and useful by implementing a prototype of an MB software framework based on the proposed model. Conclusion Stability and flexibility of the domain model is achieved by its separation into two model parts, the operational model and the knowledge model. These parts are connected by the meta model of the knowledge model to the whole domain model. This approach makes it possible to comply with the requirements of interoperability and flexibility in MB.
Collapse
|
23
|
Leach SM, Tipney H, Feng W, Baumgartner WA, Kasliwal P, Schuyler RP, Williams T, Spritz RA, Hunter L. Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput Biol 2009; 5:e1000215. [PMID: 19325874 PMCID: PMC2653649 DOI: 10.1371/journal.pcbi.1000215] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2008] [Accepted: 02/12/2009] [Indexed: 01/17/2023] Open
Abstract
The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.
Collapse
Affiliation(s)
- Sonia M. Leach
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Hannah Tipney
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Weiguo Feng
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - William A. Baumgartner
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Priyanka Kasliwal
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Ronald P. Schuyler
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Trevor Williams
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Richard A. Spritz
- Human Medical Genetics Program, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Lawrence Hunter
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
- * E-mail:
| |
Collapse
|
24
|
Ruttenberg A, Rees JA, Samwald M, Marshall MS. Life sciences on the Semantic Web: the Neurocommons and beyond. Brief Bioinform 2009; 10:193-204. [PMID: 19282504 DOI: 10.1093/bib/bbp004] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Translational research, the effort to couple the results of basic research to clinical applications, depends on the ability to effectively answer questions using information that spans multiple disciplines. The Semantic Web, with its emphasis on combining information using standard representation languages, access to that information via standard web protocols, and technologies to leverage computation, such as in the form of inference and distributable query, offers a social and technological basis for assembling, integrating and making available biomedical knowledge at Web scale. In this article, we discuss the use of Semantic Web technology for assembling and querying biomedical knowledge from multiple sources and disciplines. We present the Neurocommons prototype knowledge base, a demonstration intended to show the feasibility and benefits of using these technologies. The prototype knowledge base can be used to experiment with and assess the scalability of current tools and methods for creating such a resource, and to elicit issues that will need to be addressed in order to expand the scope and use of it. We demonstrate the utility of the knowledge base by reviewing a few example queries that provide answers to precise questions relevant to the understanding of disease. All components of the knowledge base are freely available at http://neurocommons.org/, enabling readers to reconstruct the knowledge base and experiment with this new technology.
Collapse
Affiliation(s)
- Alan Ruttenberg
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
| | | | | | | |
Collapse
|
25
|
|
26
|
Chiang AP, Butte AJ. Data-driven methods to discover molecular determinants of serious adverse drug events. Clin Pharmacol Ther 2009; 85:259-68. [PMID: 19177064 PMCID: PMC2726746 DOI: 10.1038/clpt.2008.274] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The dangers of serious adverse drug reactions (SADRs) are well known to clinicians, pharmacologists, and the lay public. Efforts to elucidate the molecular mechanisms behind SADRs have made significant progress through genetics and gene expression measurements. However, as the field of pharmacology adopts the same novel higher-density measurement modalities that have proven successful in other areas of biology, one wonders whether there can be more ways to benefit from the explosion of data created by these tools. The development of analytic tools and algorithms to interpret these biological data to create tools for medicine is central to the field of translational bioinformatics. In this review we introduce some of the types of SADR predictors that are required, and we discuss several databases that are publicly available for the study of SADRs, ranging from clinical to molecular measurements. We also describe recent examples of how bioinformatics methods coupled with data repositories can advance the science of SADRs.
Collapse
Affiliation(s)
- A P Chiang
- Department of Medicine, Stanford Center for Biomedical Informatics, Stanford University School of Medicine, Stanford, California, USA
| | | |
Collapse
|
27
|
Tipney HJ, Schuyler RP, Hunter L. Consistent visualizations of changing knowledge. SUMMIT ON TRANSLATIONAL BIOINFORMATICS 2009; 2009:129-32. [PMID: 21347184 PMCID: PMC3041575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Networks are increasingly used in biology to represent complex data in uncomplicated symbolic form. However, as biological knowledge is continually evolving, so must those networks representing this knowledge. Capturing and presenting this type of knowledge change over time is particularly challenging due to the intimate manner in which researchers customize those networks they come into contact with. The effective visualization of this knowledge is important as it creates insight into complex systems and stimulates hypothesis generation and biological discovery. Here we highlight how the retention of user customizations, and the collection and visualization of knowledge associated provenance supports effective and productive network exploration. We also present an extension of the Hanalyzer system, ReOrient, which supports network exploration and analysis in the presence of knowledge change.
Collapse
|
28
|
|
29
|
Vizcaíno JA, Mueller M, Hermjakob H, Martens L. Charting online OMICS resources: A navigational chart for clinical researchers. Proteomics Clin Appl 2008; 3:18-29. [PMID: 21136933 DOI: 10.1002/prca.200800082] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2008] [Indexed: 12/22/2022]
Abstract
The life sciences have sprouted several popular and successful OMICS technologies that span all levels of biological information transfer. Ever since the start of the Human Genome Project, the then revolutionary idea to make all resulting data publicly available has been central to all of the efforts across OMICS technologies. As a result, a great variety of publicly available data repositories and resources is currently available to the research community. This widespread availability of data does come at the price of increased confusion on the part of the users, especially for those that see the OMICS technologies as tools to help unravel a larger biological or clinical question. We therefore provide a comprehensive overview of the available resources across OMICS fields, with a special emphasis on those databases that are relevant to the study of proteins. Additionally, we also describe various integrative systems that have been established, and highlight new developments in the field that can revolutionize the way in which live data integration is achieved over the internet.
Collapse
Affiliation(s)
- Juan Antonio Vizcaíno
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | |
Collapse
|
30
|
Greenhill SJ, Blust R, Gray RD. The Austronesian Basic Vocabulary Database: from bioinformatics to lexomics. Evol Bioinform Online 2008; 4:271-83. [PMID: 19204825 PMCID: PMC2614200 DOI: 10.4137/ebo.s893] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Phylogenetic methods have revolutionised evolutionary biology and have recently been applied to studies of linguistic and cultural evolution. However, the basic comparative data on the languages of the world required for these analyses is often widely dispersed in hard to obtain sources. Here we outline how our Austronesian Basic Vocabulary Database (ABVD) helps remedy this situation by collating wordlists from over 500 languages into one web-accessible database. We describe the technology underlying the ABVD and discuss the benefits that an evolutionary bioinformatic approach can provide. These include facilitating computational comparative linguistic research, answering questions about human prehistory, enabling syntheses with genetic data, and safe-guarding fragile linguistic information.
Collapse
Affiliation(s)
- Simon J Greenhill
- Department of Psychology, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand.
| | | | | |
Collapse
|
31
|
|
32
|
Chemical databases for environmental health and clinical research. Toxicol Lett 2008; 186:62-5. [PMID: 18996453 DOI: 10.1016/j.toxlet.2008.10.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2008] [Revised: 09/19/2008] [Accepted: 10/01/2008] [Indexed: 01/09/2023]
Abstract
The increasing number of publicly available biological databases reflects the evolving need for managing and evaluating abundant and complex data in biological, clinical and computational research. Currently there are over 1000 biologically relevant databases in the public domain with varied content and diverse approaches to capturing and presenting data. This review summarizes the comparatively small niche of sophisticated databases and other resources that aim to enhance understanding of chemicals and their biological actions. The databases reviewed include 1 that emphasizes environmental chemicals and 9 that emphasize drugs and small molecules. These databases and their associated resources are incrementally strengthening the expanding field of toxicogenomics-based research by providing centralized sources of manually and computationally curated datasets and highly sophisticated tools for the meta-analysis of continually increasing environmental chemical, drug and small-molecule datasets.
Collapse
|
33
|
Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, Evelo C. WikiPathways: pathway editing for the people. PLoS Biol 2008; 6:e184. [PMID: 18651794 PMCID: PMC2475545 DOI: 10.1371/journal.pbio.0060184] [Citation(s) in RCA: 430] [Impact Index Per Article: 26.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
WikiPathways provides a collaborative platform for creating, updating, and sharing pathway diagrams and serves as an example of content curation by the biology community.
Collapse
Affiliation(s)
| | | | | | | | - Bruce R Conklin
- * To whom correspondence should be addressed. E-mail: (BRC); (CE)
| | - Chris Evelo
- * To whom correspondence should be addressed. E-mail: (BRC); (CE)
| |
Collapse
|
34
|
Hull D, Pettifer SR, Kell DB. Defrosting the digital library: bibliographic tools for the next generation web. PLoS Comput Biol 2008; 4:e1000204. [PMID: 18974831 PMCID: PMC2568856 DOI: 10.1371/journal.pcbi.1000204] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as "thought in cold storage," and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.
Collapse
Affiliation(s)
- Duncan Hull
- School of Chemistry, The University of Manchester, Manchester, UK.
| | | | | |
Collapse
|
35
|
Mazumder R, Vasudevan S. Structure-guided comparative analysis of proteins: principles, tools, and applications for predicting function. PLoS Comput Biol 2008; 4:e1000151. [PMID: 18818720 PMCID: PMC2515338 DOI: 10.1371/journal.pcbi.1000151] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Raja Mazumder
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D.C., United States of America
| | - Sona Vasudevan
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D.C., United States of America
- * E-mail:
| |
Collapse
|
36
|
Krallinger M, Valencia A, Hirschman L. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol 2008; 9 Suppl 2:S8. [PMID: 18834499 PMCID: PMC2559992 DOI: 10.1186/gb-2008-9-s2-s8] [Citation(s) in RCA: 145] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet .
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Biology and BioComputing Programme, Spanish Nacional Cancer Research Centre (CNIO), Madrid, Spain.
| | | | | |
Collapse
|
37
|
Adriaens ME, Jaillard M, Waagmeester A, Coort SLM, Pico AR, Evelo CTA. The public road to high-quality curated biological pathways. Drug Discov Today 2008; 13:856-62. [PMID: 18652912 DOI: 10.1016/j.drudis.2008.06.013] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Revised: 06/16/2008] [Accepted: 06/24/2008] [Indexed: 11/28/2022]
Abstract
Biological pathways are abstract and functional visual representations of existing biological knowledge. By mapping high-throughput data on these representations, changes and patterns in biological systems on the genetic, metabolic and protein level are instantly assessable. Many public domain repositories exist for storing biological pathways, each applying its own conventions and storage format. A pathway-based content review of these repositories reveals that none of them are comprehensive. To address this issue, we apply a general workflow to create curated biological pathways, in which we combine three content sources: public domain databases, literature and experts. In this workflow all content of a particular biological pathway is manually retrieved from biological pathway databases and literature, after which this content is compared, combined and subsequently curated by experts. From the curated content, new biological pathways can be created for a pathway analysis tool of choice and distributed among its user base. We applied this procedure to construct high-quality curated biological pathways involved in human fatty acid metabolism.
Collapse
Affiliation(s)
- Michiel E Adriaens
- Department of Bioinformatics-BiGCaT, Maastricht University, Universiteitssingel 40, 6229ER Maastricht, The Netherlands.
| | | | | | | | | | | |
Collapse
|
38
|
Brazas MD, Fox JA, Brown T, McMillan S, Ouellette BFF. Keeping pace with the data: 2008 update on the Bioinformatics Links Directory. Nucleic Acids Res 2008; 36:W2-4. [PMID: 18586831 PMCID: PMC2447757 DOI: 10.1093/nar/gkn399] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Bioinformatics Links Directory, http://bioinformatics.ca/links_directory/, is an online resource for public access to all of the life science research web servers published in this and previous issues of Nucleic Acids Research, together with other useful tools, databases and resources for bioinformatics and molecular biology research. Dependent on community input and development, the Bioinformatics Links Directory exemplifies an open access research tool and resource. The 2008 update includes the 94 web servers featured in the July 2008 Web Server issue of Nucleic Acids Research, bringing the total number of servers listed in the Bioinformatics Links Directory to over 1200 links. A complete list of all links listed in this Nucleic Acids Research 2008 Web Server issue can be accessed online at http://bioinfomatics.ca/links_directory/narweb2008/. The 2008 update of the Bioinformatics Links Directory, which includes the Web Server list and summaries, is also available online at the Nucleic Acids Research website, http://nar.oxfordjournals.org/.
Collapse
Affiliation(s)
- Michelle D Brazas
- Ontario Institute for Cancer Research, 101 College St, Suite 800, Toronto, Ontario, Canada
| | | | | | | | | |
Collapse
|
39
|
Hughes LM, Bao J, Hu ZL, Honavar V, Reecy JM. Animal trait ontology: The importance and usefulness of a unified trait vocabulary for animal species. J Anim Sci 2008; 86:1485-91. [PMID: 18272850 DOI: 10.2527/jas.2008-0930] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Ontologies help to identify and formally define the entities and relationships in specific domains of interest. Bio-ontologies, in particular, play a central role in the annotation, integration, analysis, and interpretation of biological data. Missing from the number of bio-ontologies is one that includes phenotypic trait information found in livestock species. As a result, the Animal Trait Ontology (ATO) project being carried out under the auspices of the USDA-National Animal Genome Research Program is aimed at the development of a standardized trait ontology for farm animals and software tools to assist the research community in collaborative creation, editing, maintenance, and use of such an ontology. The ATO is currently inclusive of cattle, pig, and chicken species, and will include other livestock species in the future. The ATO will eventually be linked to other species (e.g., human, rat, mouse) so that comparative analysis can be efficiently performed between species.
Collapse
Affiliation(s)
- L M Hughes
- Department of Animal Science, Center for Integrated Animal Genomics, Iowa State University, 2255 Kildee Hall, Ames 50011, USA
| | | | | | | | | |
Collapse
|