1
|
Deng CH, Naithani S, Kumari S, Cobo-Simón I, Quezada-Rodríguez EH, Skrabisova M, Gladman N, Correll MJ, Sikiru AB, Afuwape OO, Marrano A, Rebollo I, Zhang W, Jung S. Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences. Database (Oxford) 2023; 2023:baad088. [PMID: 38079567 PMCID: PMC10712715 DOI: 10.1093/database/baad088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 10/17/2023] [Accepted: 11/28/2023] [Indexed: 12/18/2023]
Abstract
Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021-22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.
Collapse
Affiliation(s)
- Cecilia H Deng
- Molecular and Digital Breeding, New Cultivar Innovation, The New Zealand Institute for Plant and Food Research Limited, 120 Mt Albert Road, Auckland 1025, New Zealand
| | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
| | - Irene Cobo-Simón
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
- Institute of Forest Science (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Elsa H Quezada-Rodríguez
- Departamento de Producción Agrícola y Animal, Universidad Autónoma Metropolitana-Xochimilco, Ciudad de México, México
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Maria Skrabisova
- Department of Biochemistry, Faculty of Science, Palacky University, Olomouc, Czech Republic
| | - Nick Gladman
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
- U.S. Department of Agriculture-Agricultural Research Service, NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY 14853, USA
| | - Melanie J Correll
- Agricultural and Biological Engineering Department, University of Florida, 1741 Museum Rd, Gainesville, FL 32611, USA
| | | | | | - Annarita Marrano
- Phoenix Bioinformatics, 39899 Balentine Drive, Suite 200, Newark, CA 94560, USA
| | | | - Wentao Zhang
- National Research Council Canada, 110 Gymnasium Pl, Saskatoon, Saskatchewan S7N 0W9, Canada
| | - Sook Jung
- Department of Horticulture, Washington State University, 303c Plant Sciences Building, Pullman, WA 99164-6414, USA
| |
Collapse
|
2
|
Kodama Y, Mashima J, Kosuge T, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y, Takagi T. DNA Data Bank of Japan: 30th anniversary. Nucleic Acids Res 2019; 46:D30-D35. [PMID: 29040613 PMCID: PMC5753283 DOI: 10.1093/nar/gkx926] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/02/2017] [Indexed: 11/17/2022] Open
Abstract
The DNA Data Bank of Japan (DDBJ) Center (http://www.ddbj.nig.ac.jp) has been providing public data services for 30 years since 1987. We are collecting nucleotide sequence data and associated biological information from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The DDBJ Center also services the Japanese Genotype-phenotype Archive (JGA) with the National Bioscience Database Center to collect genotype and phenotype data of human individuals. Here, we outline our database activities for INSDC and JGA over the past year, and introduce submission, retrieval and analysis services running on our supercomputer system and their recent developments. Furthermore, we highlight our responses to the amended Japanese rules for the protection of personal information and the launch of the DDBJ Group Cloud service for sharing pre-publication data among research groups.
Collapse
Affiliation(s)
- Yuichi Kodama
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Jun Mashima
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Eli Kaminuma
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Kousaku Okubo
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Toshihisa Takagi
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan.,National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
| |
Collapse
|
3
|
Mashima J, Kodama Y, Fujisawa T, Katayama T, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y, Takagi T. DNA Data Bank of Japan. Nucleic Acids Res 2016; 45:D25-D31. [PMID: 27924010 PMCID: PMC5210514 DOI: 10.1093/nar/gkw1001] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Revised: 10/13/2016] [Accepted: 10/15/2016] [Indexed: 12/27/2022] Open
Abstract
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). The DDBJ Center also services Japanese Genotype-phenotype Archive (JGA), with the National Bioscience Database Center to collect human-subjected data from Japanese researchers. Here, we report our database activities for INSDC and JGA over the past year, and introduce retrieval and analytical services running on our supercomputer system and their recent modifications. Furthermore, with the Database Center for Life Science, the DDBJ Center improves semantic web technologies to integrate and to share biological data, for providing the RDF version of the sequence data.
Collapse
Affiliation(s)
- Jun Mashima
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Yuichi Kodama
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Takatomo Fujisawa
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | | | - Yoshihiro Okuda
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Eli Kaminuma
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Kousaku Okubo
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Toshihisa Takagi
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan .,National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
| |
Collapse
|
4
|
Mashima J, Kodama Y, Kosuge T, Fujisawa T, Katayama T, Nagasaki H, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y, Takagi T. DNA data bank of Japan (DDBJ) progress report. Nucleic Acids Res 2015; 44:D51-7. [PMID: 26578571 PMCID: PMC4702806 DOI: 10.1093/nar/gkv1105] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 10/09/2015] [Indexed: 01/07/2023] Open
Abstract
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Since 2013, the DDBJ Center has been operating the Japanese Genotype-phenotype Archive (JGA) in collaboration with the National Bioscience Database Center (NBDC) in Japan. In addition, the DDBJ Center develops semantic web technologies for data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan. This paper briefly reports on the activities of the DDBJ Center over the past year including submissions to databases and improvements in our services for data retrieval, analysis, and integration.
Collapse
Affiliation(s)
- Jun Mashima
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Yuichi Kodama
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Takatomo Fujisawa
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | | | - Hideki Nagasaki
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Yoshihiro Okuda
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Eli Kaminuma
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Kousaku Okubo
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Toshihisa Takagi
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
| |
Collapse
|
5
|
Kodama Y, Mashima J, Kosuge T, Katayama T, Fujisawa T, Kaminuma E, Ogasawara O, Okubo K, Takagi T, Nakamura Y. The DDBJ Japanese Genotype-phenotype Archive for genetic and phenotypic human data. Nucleic Acids Res 2014; 43:D18-22. [PMID: 25477381 PMCID: PMC4383935 DOI: 10.1093/nar/gku1120] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. Since October 2013, DDBJ Center has operated the Japanese Genotype-phenotype Archive (JGA) in collaboration with our partner institute, the National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency. DDBJ Center provides the JGA database system which securely stores genotype and phenotype data collected from individuals whose consent agreements authorize data release only for specific research use. NBDC has established guidelines and policies for sharing human-derived data and reviews data submission and usage requests from researchers. In addition to the JGA project, DDBJ Center develops Semantic Web technologies for data integration and sharing in collaboration with the Database Center for Life Science. This paper describes the overview of the JGA project, updates to the DDBJ databases, and services for data retrieval, analysis and integration.
Collapse
Affiliation(s)
- Yuichi Kodama
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Jun Mashima
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Toshiaki Katayama
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
| | - Takatomo Fujisawa
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Eli Kaminuma
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Kousaku Okubo
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Toshihisa Takagi
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan Database Center for Life Science, Chiba 277-0871, Japan
| | - Yasukazu Nakamura
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| |
Collapse
|
6
|
Kosuge T, Mashima J, Kodama Y, Fujisawa T, Kaminuma E, Ogasawara O, Okubo K, Takagi T, Nakamura Y. DDBJ progress report: a new submission system for leading to a correct annotation. Nucleic Acids Res 2013; 42:D44-9. [PMID: 24194602 PMCID: PMC3964987 DOI: 10.1093/nar/gkt1066] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. This database content is shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). DDBJ launched a new nucleotide sequence submission system for receiving traditional nucleotide sequence. We expect that the new submission system will be useful for many submitters to input accurate annotation and reduce the time needed for data input. In addition, DDBJ has started a new service, the Japanese Genotype–phenotype Archive (JGA), with our partner institute, the National Bioscience Database Center (NBDC). JGA permanently archives and shares all types of individual human genetic and phenotypic data. We also introduce improvements in the DDBJ services and databases made during the past year.
Collapse
Affiliation(s)
- Takehide Kosuge
- DDBJ Center, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan and National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Li MW, Qi X, Ni M, Lam HM. Silicon era of carbon-based life: application of genomics and bioinformatics in crop stress research. Int J Mol Sci 2013; 14:11444-83. [PMID: 23759993 PMCID: PMC3709742 DOI: 10.3390/ijms140611444] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Revised: 05/07/2013] [Accepted: 05/17/2013] [Indexed: 01/25/2023] Open
Abstract
Abiotic and biotic stresses lead to massive reprogramming of different life processes and are the major limiting factors hampering crop productivity. Omics-based research platforms allow for a holistic and comprehensive survey on crop stress responses and hence may bring forth better crop improvement strategies. Since high-throughput approaches generate considerable amounts of data, bioinformatics tools will play an essential role in storing, retrieving, sharing, processing, and analyzing them. Genomic and functional genomic studies in crops still lag far behind similar studies in humans and other animals. In this review, we summarize some useful genomics and bioinformatics resources available to crop scientists. In addition, we also discuss the major challenges and advancements in the "-omics" studies, with an emphasis on their possible impacts on crop stress research and crop improvement.
Collapse
Affiliation(s)
- Man-Wah Li
- Center for Soybean Research, State Key Laboratory of Agrobiotechnology and School of Life Sciences, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong; E-Mails: (M.-W.L.); (X.Q.); (M.N.)
| | - Xinpeng Qi
- Center for Soybean Research, State Key Laboratory of Agrobiotechnology and School of Life Sciences, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong; E-Mails: (M.-W.L.); (X.Q.); (M.N.)
| | - Meng Ni
- Center for Soybean Research, State Key Laboratory of Agrobiotechnology and School of Life Sciences, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong; E-Mails: (M.-W.L.); (X.Q.); (M.N.)
| | - Hon-Ming Lam
- Center for Soybean Research, State Key Laboratory of Agrobiotechnology and School of Life Sciences, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong; E-Mails: (M.-W.L.); (X.Q.); (M.N.)
| |
Collapse
|
8
|
Peng YJ, Shih CF, Yang JY, Tan CM, Hsu WH, Huang YP, Liao PC, Yang CH. A RING-type E3 ligase controls anther dehiscence by activating the jasmonate biosynthetic pathway gene DEFECTIVE IN ANTHER DEHISCENCE1 in Arabidopsis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2013; 74:310-27. [PMID: 23347376 DOI: 10.1111/tpj.12122] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/02/2013] [Accepted: 01/14/2013] [Indexed: 05/21/2023]
Abstract
Suppression of expression of DAF [DEFECTIVE IN ANTHER DEHISCENCE1 (DAD1)-Activating Factor], a gene that encodes a putative RING-finger E3 ligase protein, causes non-dehiscence of the anthers, alters pollen development and causes sterility in 35S:DAF RNAi/antisense Arabidopsis plants. This mutant phenotype correlates with the suppression of DAF but not with expression of the two most closely related genes, DAFL1/2. The expression of DAD1 was significantly reduced in 35S:DAF RNAi/antisense plants, and complementation with 35S:DAF did not rescue the dad1 mutant, indicating that DAF acts upstream of DAD1 in jasmonic acid biosynthesis. This assumption is supported by the finding that 35S:DAF RNAi/antisense plants showed a similar cellular basis for anther dehiscence to that found in dad1 mutants, and that external application of jasmonic acid rescued the anther non-dehiscence and pollen defects in 35S:DAF antisense flowers. We further demonstrate that DAF is an E3 ubiquitin ligase and that its activity is abolished by C132S and H137Y mutations in its RING motif. Furthermore, ectopic expression of the dominant-negative C132S or H137Y mutations causes similar indehiscence of anthers and reduction in DAD1 expression in transgenic Arabidopsis. This result not only confirms that DAF controls anther dehiscence by positively regulating the expression of DAD1 in the jasmonic acid biosynthesis pathway, but also supports the notion that DAF functions as an E3 ubiquitin ligase, and that the conserved RING-finger region is required for its activity.
Collapse
Affiliation(s)
- Yan-Jhu Peng
- Institute of Biotechnology, National Chung Hsing University, Taichung, 40227, Taiwan
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Kalia VC, Raju SC, Purohit HJ. Genomic analysis reveals versatile organisms for quorum quenching enzymes: acyl-homoserine lactone-acylase and -lactonase. Open Microbiol J 2011; 5:1-13. [PMID: 21660112 PMCID: PMC3106361 DOI: 10.2174/1874285801105010001] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2010] [Revised: 12/28/2010] [Accepted: 12/30/2010] [Indexed: 01/22/2023] Open
Abstract
Microbial virulence and their resistance to multiple drugs have obliged researchers to look for novel drug targets. Virulence of pathogenic microbes is regulated by signal molecules such as acylated homoserine lactone (AHL) produced during a cell density dependent phenomenon of quorum sensing (QS). In contrast, certain microbes produce AHL-lactonases and -acylases to degrade QS signals, also termed as quorum quenching. Mining sequenced genome databases has revealed organisms possessing conserved domains for AHL-lactonases and -acylases: i) Streptomyces (Actinobacteria), ii) Deinococcus (Deinococcus-Thermus), iii) Hyphomonas (α-Proteobacteria), iv) Ralstonia (β-Proteobacteria), v) Photorhabdus (γ-Proteobacteria), and certain marine gamma proteobacterium. Presence of genes for both the enzymes within an organism was observed in the following: i) Deinococcus radiodurans R1, ii) Hyphomonas neptunium ATCC 15444 and iii) Photorhabdus luminescens subsp. laumondii TTO1. These observations are supported by the presence motifs for lactonase and acylase in these strains. Phylogenetic analysis and multiple sequence alignment of the gene sequences for AHL-lactonases and -acylases have revealed consensus sequences which can be used to design primers for amplifying these genes even among mixed cultures and metagenomes. Quorum quenching can be exploited to prevent food spoilage, bacterial infections and bioremediation.
Collapse
Affiliation(s)
- Vipin Chandra Kalia
- Microbial Biotechnology and Genomics, Institute of Genomics and Integrative Biology (IGIB), CSIR, Delhi University Campus, Mall Road, Delhi-110007, India
| | - Sajan C Raju
- Environmental Genomics Unit, National Environmental Engineering Research Institute (NEERI), CSIR, Nehru Marg, Nagpur - 440020, India
| | - Hemant J Purohit
- Environmental Genomics Unit, National Environmental Engineering Research Institute (NEERI), CSIR, Nehru Marg, Nagpur - 440020, India
| |
Collapse
|
10
|
Katayama T, Arakawa K, Nakao M, Ono K, Aoki-Kinoshita KF, Yamamoto Y, Yamaguchi A, Kawashima S, Chun HW, Aerts J, Aranda B, Barboza LH, Bonnal RJ, Bruskiewich R, Bryne JC, Fernández JM, Funahashi A, Gordon PM, Goto N, Groscurth A, Gutteridge A, Holland R, Kano Y, Kawas EA, Kerhornou A, Kibukawa E, Kinjo AR, Kuhn M, Lapp H, Lehvaslaiho H, Nakamura H, Nakamura Y, Nishizawa T, Nobata C, Noguchi T, Oinn TM, Okamoto S, Owen S, Pafilis E, Pocock M, Prins P, Ranzinger R, Reisinger F, Salwinski L, Schreiber M, Senger M, Shigemoto Y, Standley DM, Sugawara H, Tashiro T, Trelles O, Vos RA, Wilkinson MD, York W, Zmasek CM, Asai K, Takagi T. The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*. J Biomed Semantics 2010; 1:8. [PMID: 20727200 PMCID: PMC2939597 DOI: 10.1186/2041-1480-1-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2009] [Accepted: 08/21/2010] [Indexed: 11/30/2022] Open
Abstract
Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Database Center for Life Science, Research Organization of Information and Systems, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Eilbeck K, Lewis SE. Sequence ontology annotation guide. Comp Funct Genomics 2010; 5:642-7. [PMID: 18629179 PMCID: PMC2447471 DOI: 10.1002/cfg.446] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2004] [Revised: 11/24/2004] [Accepted: 11/25/2004] [Indexed: 11/07/2022] Open
Abstract
This Sequence Ontology (SO) [13] aims to unify the way in which we describe sequence annotations, by providing a controlled vocabulary of terms and the relationships between them. Using SO terms to label the parts of sequence annotations greatly facilitates downstream analyses of their contents, as it ensures that annotations produced by different groups conform to a single standard. This greatly facilitates analyses of annotation contents and characteristics, e.g. comparisons of UTRs, alternative splicing, etc. Because SO also specifies the relationships between features, e.g. part_of, kind_of, annotations described with SO terms are also better substrates for validation and visualization software.This document provides a step-by-step guide to producing a SO compliant file describing a sequence annotation. We illustrate this by using an annotated gene as an example. First we show where the terms needed to describe the gene's features are located in SO and their relationships to one another. We then show line by line how to format the file to construct a SO compliant annotation of this gene.
Collapse
Affiliation(s)
- Karen Eilbeck
- Department of Molecular and Cellular Biology, Life Sciences Addition, University of California, Berkeley, California 94729-3200, USA.
| | | |
Collapse
|
12
|
Hsu HF, Hsieh WP, Chen MK, Chang YY, Yang CH. C/D class MADS box genes from two monocots, orchid (Oncidium Gower Ramsey) and lily (Lilium longiflorum), exhibit different effects on floral transition and formation in Arabidopsis thaliana. PLANT & CELL PHYSIOLOGY 2010; 51:1029-45. [PMID: 20395287 DOI: 10.1093/pcp/pcq052] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
We have characterized three C/D class MADS box genes from an orchid (Oncidium Gower Ramsey) and a lily (Lilium longiflorum). OMADS4 of orchid and LMADS10 of lily are C class gene orthologs, whereas OMADS2 of orchid is a putative D class gene ortholog. The identity of these three genes is further supported by the presence of conserved motifs in the C-terminal regions of the proteins. The mRNA for these three genes can be detected in flowers and is absent in vegetative leaves. In flowers, OMADS4 and LMADS10 show similar expression patterns, being specifically expressed in the stamens and carpels. The expression of OMADS2 is restricted to the stigmatic cavity and ovary of the carpel. The similarities of the expression patterns of OMADS4/LMADS10 and OMADS2 to those of C and D class genes, respectively, indicate that their transcriptional regulation is highly evolutionarily conserved in these monocot species. Yeast two-hybrid analysis indicates that both OMADS2 and OMADS4 form homodimers and heterodimers with each other. Similar interactions are observed for LMADS2 and LMADS10. Ectopic expression of LMADS10 causes extremely early flowering, terminal flower formation and conversion of the sepals into carpel-like structures, similar to ectopic expression of the lily D class gene LMADS2. In contrast, 35S::OMADS2 and 35S::OMADS4 cause only early or moderately early flowering in transgenic Arabidopsis plants without floral organ conversion. This result indicates that C/D class genes from the lily have stronger effects than those from the orchid in transgenic Arabidopsis, revealing possible functional diversification of C/D class genes from the two monocots in regulating floral transition and formation.
Collapse
Affiliation(s)
- Hsing-Fun Hsu
- Graduate Institute of Biotechnology, National Chung Hsing University, Taichung, Taiwan 40227 ROC
| | | | | | | | | |
Collapse
|
13
|
Katayama T, Nakao M, Takagi T. TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services. Nucleic Acids Res 2010; 38:W706-11. [PMID: 20472643 PMCID: PMC2896079 DOI: 10.1093/nar/gkq386] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Web services have become widely used in bioinformatics analysis, but there exist incompatibilities in interfaces and data types, which prevent users from making full use of a combination of these services. Therefore, we have developed the TogoWS service to provide an integrated interface with advanced features. In the TogoWS REST (REpresentative State Transfer) API (application programming interface), we introduce a unified access method for major database resources through intuitive URIs that can be used to search, retrieve, parse and convert the database entries. The TogoWS SOAP API resolves compatibility issues found on the server and client-side SOAP implementations. The TogoWS service is freely available at: http://togows.dbcls.jp/.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan.
| | | | | |
Collapse
|
14
|
A new species of Calicotyle Diesing, 1850 (Monogenea: Monocotylidae) from the shortspine spurdog Squalus mitsukurii Jordan & Snyder and the synonymy of Gymnocalicotyle Nybelin, 1941 with this genus. Syst Parasitol 2010; 75:117-24. [DOI: 10.1007/s11230-009-9228-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2009] [Accepted: 10/21/2009] [Indexed: 10/19/2022]
|
15
|
Abstract
This chapter gives an overview of the most commonly used biological databases of nucleic acid sequences and their structures. We cover general sequence databases, databases for specific DNA features, noncoding RNA sequences, and RNA secondary and tertiary structures.
Collapse
|
16
|
Lamprecht AL, Margaria T, Steffen B. Bio-jETI: a framework for semantics-based service composition. BMC Bioinformatics 2009; 10 Suppl 10:S8. [PMID: 19796405 PMCID: PMC2755829 DOI: 10.1186/1471-2105-10-s10-s8] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND The development of bioinformatics databases, algorithms, and tools throughout the last years has lead to a highly distributed world of bioinformatics services. Without adequate management and development support, in silico researchers are hardly able to exploit the potential of building complex, specialized analysis processes from these services. The Semantic Web aims at thoroughly equipping individual data and services with machine-processable meta-information, while workflow systems support the construction of service compositions. However, even in this combination, in silico researchers currently would have to deal manually with the service interfaces, the adequacy of the semantic annotations, type incompatibilities, and the consistency of service compositions. RESULTS In this paper, we demonstrate by means of two examples how Semantic Web technology together with an adequate domain modelling frees in silico researchers from dealing with interfaces, types, and inconsistencies. In Bio-jETI, bioinformatics services can be graphically combined to complex services without worrying about details of their interfaces or about type mismatches of the composition. These issues are taken care of at the semantic level by Bio-jETI's model checking and synthesis features. Whenever possible, they automatically resolve type mismatches in the considered service setting. Otherwise, they graphically indicate impossible/incorrect service combinations. In the latter case, the workflow developer may either modify his service composition using semantically similar services, or ask for help in developing the missing mediator that correctly bridges the detected type gap. Newly developed mediators should then be adequately annotated semantically, and added to the service library for later reuse in similar situations. CONCLUSION We show the power of semantic annotations in an adequately modelled and semantically enabled domain setting. Using model checking and synthesis methods, users may orchestrate complex processes from a wealth of heterogeneous services without worrying about interfaces and (type) consistency. The success of this method strongly depends on a careful semantic annotation of the provided services and on its consequent exploitation for analysis, validation, and synthesis. We are convinced that these annotations will become standard, as they will become preconditions for the success and widespread use of (preferred) services in the Semantic Web.
Collapse
Affiliation(s)
- Anna-Lena Lamprecht
- grid.5675.10000000104169637Chair for Programming Systems, Dortmund University of Technology, Dortmund, D-44227 Germany
| | - Tiziana Margaria
- grid.11348.3f0000000109421117Chair for Service and Software Engineering, Potsdam University, Potsdam, D-14882 Germany
| | - Bernhard Steffen
- grid.5675.10000000104169637Chair for Programming Systems, Dortmund University of Technology, Dortmund, D-44227 Germany
| |
Collapse
|
17
|
Wagener J, Spjuth O, Willighagen EL, Wikberg JES. XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services. BMC Bioinformatics 2009; 10:279. [PMID: 19732427 PMCID: PMC2755485 DOI: 10.1186/1471-2105-10-279] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Accepted: 09/04/2009] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. RESULTS We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. CONCLUSION XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.
Collapse
Affiliation(s)
- Johannes Wagener
- Max von Pettenkofer-Institut, Ludwig-Maximilians-Universität, Munich, Germany.
| | | | | | | |
Collapse
|
18
|
Chang YY, Chiu YF, Wu JW, Yang CH. Four Orchid (Oncidium Gower Ramsey) AP1/AGL9-like MADS Box Genes Show Novel Expression Patterns and Cause Different Effects on Floral Transition and Formation in Arabidopsis thaliana. ACTA ACUST UNITED AC 2009; 50:1425-38. [DOI: 10.1093/pcp/pcp087] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
19
|
Kwon Y, Shigemoto Y, Kuwana Y, Sugawara H. Web API for biology with a workflow navigation system. Nucleic Acids Res 2009; 37:W11-6. [PMID: 19417067 PMCID: PMC2703950 DOI: 10.1093/nar/gkp300] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
DNA Data Bank of Japan (DDBJ) provides Web-based systems for biological analysis, called Web APIs for biology (WABI). So far, we have developed over 20 SOAP services and several workflows that consist of a series of method invocations. In this article, we present newly developed services of WABI, that is, REST-based Web services, additional workflows and a workflow navigation system. Each Web service and workflow can be used as a complete service or a building block for programmers to construct more complex information processing systems. The workflow navigation system aims to help non-programming biologists perform analysis tasks by providing next applicable services on Web browsers according to the output of a previously selected service. With this function, users can apply multiple services consecutively only by following links without any programming or manual copy-and-paste operations on Web browsers. The listed services are determined automatically by the system referring to the dictionaries of service categories, the input/output types of services and HTML tags. WABI and the workflow navigation system are freely accessible at http://www.xml.nig.ac.jp/index.html and http://cyclamen.ddbj.nig.ac.jp/, respectively.
Collapse
Affiliation(s)
- Yeondae Kwon
- Laboratory for Research and Development of Biological Databases, Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan.
| | | | | | | |
Collapse
|
20
|
Genotype-phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet 2009; 10:9-18. [PMID: 19065136 DOI: 10.1038/nrg2483] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The flow of research data concerning the genetic basis of health and disease is rapidly increasing in speed and complexity. In response, many projects are seeking to ensure that there are appropriate informatics tools, systems and databases available to manage and exploit this flood of information. Previous solutions, such as central databases, journal-based publication and manually intensive data curation, are now being enhanced with new systems for federated databases, database publication, and more automated management of data flows and quality control. Along with emerging technologies that enhance connectivity and data retrieval, these advances should help to create a powerful knowledge environment for genotype-phenotype information.
Collapse
|
21
|
Orchard S, Kerrien S, Jones P, Ceol A, Chatr-Aryamontri A, Salwinski L, Nerothin J, Hermjakob H. Submit your interaction data the IMEx way: a step by step guide to trouble-free deposition. Proteomics 2008; 7 Suppl 1:28-34. [PMID: 17893861 DOI: 10.1002/pmic.200700286] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The ever-increasing generation of, and corresponding interest in, molecular interaction data has lead to the establishment of a number of high-quality molecular interaction databases which manually curate interaction data extracted from the literature. In order to effectively share the curation load, and ensure that data is stored in and accessible from multiple sources, these databases have united to form the IMEx consortium. All of the IMEx databases also accept direct deposition of interaction data from authors prior to publication, thus both assisting the scientist in preparing the dataset for publication and ensuring that its subsequent representation in the public domain databases is fully accurate. This article walks the potential submitter through the various routes by which data may be deposited with the databases and describes the tools which have been developed to assist in this process.
Collapse
Affiliation(s)
- Sandra Orchard
- European Bioinformatics Institute, Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
We present a new version of the European Bioinformatics Institute Web Services, a complete suite of SOAP-based web tools for structural and functional analysis, with new and improved applications. New functionality has been added to most of the services already available, and an improved version of the underlying framework has allowed us to include more applications. Information on the EBI Web Services, tutorials and clients can be found at http://www.ebi.ac.uk/Tools/webservices.
Collapse
Affiliation(s)
| | | | | | - Rodrigo Lopez
- *To whom correspondence should be addressed. +44 1223 494423+44 1223 494468
| |
Collapse
|
23
|
Takeuchi S. Molecular cloning, sequence, function and structural basis of human heart 150 kDa oxygen-regulated protein, an ER chaperone. Protein J 2007; 25:517-28. [PMID: 17131193 DOI: 10.1007/s10930-006-9038-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Apoptosis of heart tissues followed by hypoxia and ischemia leads finally to cardiac insufficiency. The full-length coding sequence of 3301 bp including cDNA(s) of the ER chaperone ORP150, which was specifically induced by hypoxia stress, was cloned from human cardiac infarct. Phylogenetic analyses reveal that human heart ORP150 shares a highly conserved N-terminal ATPase domain among its related family members. Moreover, hydropathic profiling reveals that their ca. 70 N-terminal residues and unique C-terminal halves exhibit similar hydropathy profiles among members. These findings suggest that ORP150 is structurally and functionally well conserved in distant species.
Collapse
Affiliation(s)
- Satoru Takeuchi
- Department of Protein Research, Hibergenome (formerly ProstaColon), 85 NE, Takamatsu, Kahoku, Ishikawa, 929-1215, Japan.
| |
Collapse
|
24
|
Shirai T, Igarashi K, Ozawa T, Hagihara H, Kobayashi T, Ozaki K, Ito S. Ancestral sequence evolutionary trace and crystal structure analyses of alkaline alpha-amylase from Bacillus sp. KSM-1378 to clarify the alkaline adaptation process of proteins. Proteins 2007; 66:600-10. [PMID: 17154418 DOI: 10.1002/prot.21255] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The crystal structure of alkaline liquefying alpha-amylase (AmyK) from the alkaliphilic Bacillus sp. KSM-1378 was determined at 2.1 A resolution. The AmyK structure belongs to the GH13 glycoside hydrolase family, which consists of three domains, and bound three calcium and one sodium ions. The alkaline adaptation mechanism of AmyK was investigated by the ancestral sequence evolutionary trace method and by extensive comparisons between alkaline and nonalkaline enzyme structures, including three other protein families: protease, cellulase, and phosphoserine aminotransferase. The consensus change for the alkaline adaptation process was a decrease in the Lys content. The loss of a Lys residue is associated with ion pair remodeling, which mainly consists of the loss of Lys-Asp/Glu ion pairs and the acquisition of Arg ion pairs, preferably Arg-Glu. The predicted replacements of the positively charged amino acids were often, although not always, used for ion pair remodeling.
Collapse
Affiliation(s)
- Tsuyoshi Shirai
- Department of Bioscience, Nagahama Institute of Bio-science and Technology, Nagahama 526-0829, Japan.
| | | | | | | | | | | | | |
Collapse
|
25
|
Abstract
The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer cell immunoglobulin-like receptors (KIRs); IPD-MHC, a database of sequences of the major histocompatibility complex (MHC) of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC, share the same database structure.
Collapse
Affiliation(s)
- James Robinson
- Anthony Nolan Research Institute, Royal Free Hospital, Hampstead, London, UK
| | | |
Collapse
|
26
|
Takeuchi S. Expression and Purification of Human PAG, a Transmembrane Adapter Protein Using an Insect Cell Expression System and its Structure Basis. Protein J 2006; 25:295-9. [PMID: 16947079 DOI: 10.1007/s10930-006-9015-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
In this study, we report the purification and structure basis of human phosphoprotein associated with glycosphingolipid-enriched microdomains (PAG), a C-SRC tyrosine kinase (CSK)-binding protein. Human PAG was produced using an insect cell expression system. The PAG was purified by metal affinity, ion exchange, and gel filtration chromatographies. The final purity of gel-purified PAG was evaluated by SDS-PAGE and mass spectrometry. Recombinant human PAG migrates to 60 kDa on SDS-PAGE gel, while native PAG is a 46 kDa transmembrane adapter protein in lipid rafts. Recombinant human PAG has a difference of 2590.7 Da with a calculated mass (47803.41 Da) and an observed mass (50394.1 Da) by mass spectrometry. Consequently, although human PAG sequence shares well-known sites for modifications such as myristoylation, palmitoylation, and tyrosine phosphorylation sites, perhaps the difference suggests the existence of unknown modification sites. We show the high PAG-binding ability with CSK in vitro as well as the human PAG structure characterized by 11 alpha-helix structures including a 3 kDa transmembrane domain.
Collapse
Affiliation(s)
- Satoru Takeuchi
- Department of Protein Research, ProstaColon, 85 NE, Takamatsu, Kahoku, Ishikawa, 929-1215, Japan.
| |
Collapse
|
27
|
Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. Taverna: a tool for building and running workflows of services. Nucleic Acids Res 2006; 34:W729-32. [PMID: 16845108 PMCID: PMC1538887 DOI: 10.1093/nar/gkl320] [Citation(s) in RCA: 620] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from http://taverna.sourceforge.net/.
Collapse
Affiliation(s)
- Duncan Hull
- School of Computer Science, University of Manchester, M13 9PL, UK.
| | | | | | | | | | | | | |
Collapse
|
28
|
Malmström L, Marko-Varga G, Westergren-Thorsson G, Laurell T, Malmström J. 2DDB - a bioinformatics solution for analysis of quantitative proteomics data. BMC Bioinformatics 2006; 7:158. [PMID: 16549013 PMCID: PMC1435938 DOI: 10.1186/1471-2105-7-158] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2005] [Accepted: 03/20/2006] [Indexed: 11/13/2022] Open
Abstract
Background We present 2DDB, a bioinformatics solution for storage, integration and analysis of quantitative proteomics data. As the data complexity and the rate with which it is produced increases in the proteomics field, the need for flexible analysis software increases. Results 2DDB is based on a core data model describing fundamentals such as experiment description and identified proteins. The extended data models are built on top of the core data model to capture more specific aspects of the data. A number of public databases and bioinformatical tools have been integrated giving the user access to large amounts of relevant data. A statistical and graphical package, R, is used for statistical and graphical analysis. The current implementation handles quantitative data from 2D gel electrophoresis and multidimensional liquid chromatography/mass spectrometry experiments. Conclusion The software has successfully been employed in a number of projects ranging from quantitative liquid-chromatography-mass spectrometry based analysis of transforming growth factor-beta stimulated fi-broblasts to 2D gel electrophoresis/mass spectrometry analysis of biopsies from human cervix. The software is available for download at SourceForge.
Collapse
Affiliation(s)
- Lars Malmström
- Department of Electrical Measurements, LTH, P.O Box 118, SE-221 00, Lund, Sweden
| | - György Marko-Varga
- Department of Analytical Chemistry, Lund University, SE-221 87, Lund, Sweden
| | | | - Thomas Laurell
- Department of Electrical Measurements, LTH, P.O Box 118, SE-221 00, Lund, Sweden
| | - Johan Malmström
- Department of Cell and Molecular Biology, C13, BMC, University of Lund, SE-221 84, Lund, Sweden
- Institute for Molecular Systems Biology, ETH Hönggerberg, HPT E 53, Wolfgang Pauli-Str. 16, CH-8093 Zürich, Switzerland
| |
Collapse
|
29
|
Takeuchi S. Analytical assays of human HSP27 and thermal-stress survival of Escherichia coli cells that overexpress it. Biochem Biophys Res Commun 2006; 341:1252-6. [PMID: 16466698 DOI: 10.1016/j.bbrc.2006.01.090] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2006] [Accepted: 01/17/2006] [Indexed: 11/29/2022]
Abstract
HSP27 is a small heat-shock protein (sHSP). Such proteins are produced in all organisms. These small HSPs exhibit chaperone-like activity that can bind to unfolded polypeptides and prevent uncontrolled protein aggregation in vitro. Cellular anti-apoptosis function and enhanced cell survival are correlated with increased expression of HSPs. This study presents a thermal-stress survival model for cells using the Escherichia coli expression system for which human HSP27, a recombinant protein, is inducible. Results show that E. coli cells overexpressing human HSP27 have enhanced tolerance to 50 degrees C thermal stress.
Collapse
Affiliation(s)
- Satoru Takeuchi
- Department of Protein Research, ProstaColon, 85 NE, Takamatsu, Kahoku, Ishikawa 929-1215, Japan.
| |
Collapse
|
30
|
Affiliation(s)
- P Farahani
- Centre for Evaluation of Medicines, St Joseph's Hospital, McMaster University, Hamilton, ON, Canada.
| | | |
Collapse
|
31
|
Navarange M, Game L, Fowler D, Wadekar V, Banks H, Cooley N, Rahman F, Hinshelwood J, Broderick P, Causton HC. MiMiR: a comprehensive solution for storage, annotation and exchange of microarray data. BMC Bioinformatics 2005; 6:268. [PMID: 16280078 PMCID: PMC1299320 DOI: 10.1186/1471-2105-6-268] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2005] [Accepted: 11/09/2005] [Indexed: 11/25/2022] Open
Abstract
Background The generation of large amounts of microarray data presents challenges for data collection, annotation, exchange and analysis. Although there are now widely accepted formats, minimum standards for data content and ontologies for microarray data, only a few groups are using them together to build and populate large-scale databases. Structured environments for data management are crucial for making full use of these data. Description The MiMiR database provides a comprehensive infrastructure for microarray data annotation, storage and exchange and is based on the MAGE format. MiMiR is MIAME-supportive, customised for use with data generated on the Affymetrix platform and includes a tool for data annotation using ontologies. Detailed information on the experiment, methods, reagents and signal intensity data can be captured in a systematic format. Reports screens permit the user to query the database, to view annotation on individual experiments and provide summary statistics. MiMiR has tools for automatic upload of the data from the microarray scanner and export to databases using MAGE-ML. Conclusion MiMiR facilitates microarray data management, annotation and exchange, in line with international guidelines. The database is valuable for underpinning research activities and promotes a systematic approach to data handling. Copies of MiMiR are freely available to academic groups under licence.
Collapse
Affiliation(s)
- Mahendra Navarange
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| | - Laurence Game
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| | - Derek Fowler
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| | - Vihar Wadekar
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| | - Helen Banks
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| | - Nicola Cooley
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| | - Fatimah Rahman
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| | - Justin Hinshelwood
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| | - Peter Broderick
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| | - Helen C Causton
- CSC-IC Microarray Centre, Imperial College, Hammersmith Campus, DuCane Road, London W12 ONN, UK
| |
Collapse
|
32
|
Abstract
Complete genomic sequences of several oral pathogens have been deciphered and multiple sources of independently annotated data are available for the same genomes. Different gene identification schemes and functional annotation methods used in these databases present a challenge for cross-referencing and the efficient use of the data. The Bioinformatics Resource for Oral Pathogens (BROP) aims to integrate bioinformatics data from multiple sources for easy comparison, analysis and data-mining through specially designed software interfaces. Currently, databases and tools provided by BROP include: (i) a graphical genome viewer (Genome Viewer) that allows side-by-side visual comparison of independently annotated datasets for the same genome; (ii) a pipeline of automatic data-mining algorithms to keep the genome annotation always up-to-date; (iii) comparative genomic tools such as Genome-wide ORF Alignment (GOAL); and (iv) the Oral Pathogen Microarray Database. BROP can also handle unfinished genomic sequences and provides secure yet flexible control over data access. The concept of providing an integrated source of genomic data, as well as the data-mining model used in BROP can be applied to other organisms. BROP can be publicly accessed at .
Collapse
Affiliation(s)
- Tsute Chen
- The Forsyth Institute, 140 Fenway, Boston, MA 02115, USA.
| | | | | | | |
Collapse
|
33
|
Kriventseva EV, Koutsos AC, Blass C, Kafatos FC, Christophides GK, Zdobnov EM. AnoEST: toward A. gambiae functional genomics. Genome Res 2005; 15:893-9. [PMID: 15899967 PMCID: PMC1142480 DOI: 10.1101/gr.3756405] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Here, we present an analysis of 215,634 EST and cDNA sequences of a major vector of human malaria Anopheles gambiae structured into the AnoEST database. The expressed sequences are grouped into clusters using genomic sequence as template and associated with inferred functional annotation, including the following: corresponding Ensembl gene prediction, putative orthologous genes in other species, homology to known proteins, protein domains, associated Gene Ontology terms, and corresponding classification into broad GO-slim functional groups. AnoEST is a vital resource for interpretation of expression profiles derived using recently developed A. gambiae cDNA microarrays. Using these cDNA microarrays, we have experimentally confirmed the expression of 7961 clusters during mosquito development. Of these, 3100 are not associated with currently predicted genes. Moreover, we found that clusters with confirmed expression are nonbiased with respect to the current gene annotation or homology to known proteins. Consequently, we expect that many as yet unconfirmed clusters are likely to be actual A. gambiae genes. [AnoEST is publicly available at http://komar.embl.de, and is also accessible as a Distributed Annotation Service (DAS).].
Collapse
|
34
|
Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol 2005; 6:R44. [PMID: 15892872 PMCID: PMC1175956 DOI: 10.1186/gb-2005-6-5-r44] [Citation(s) in RCA: 480] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2004] [Revised: 02/01/2005] [Accepted: 03/30/2005] [Indexed: 11/10/2022] Open
Abstract
The goal of the Sequence Ontology (SO) project is to produce a structured controlled vocabulary with a common set of terms and definitions for parts of a genomic annotation, and to describe the relationships among them. Details of SO construction, design and use, particularly with regard to part-whole relationships are discussed and the practical utility of SO is demonstrated for a set of genome annotations from Drosophila melanogaster. The Sequence Ontology (SO) is a structured controlled vocabulary for the parts of a genomic annotation. SO provides a common set of terms and definitions that will facilitate the exchange, analysis and management of genomic data. Because SO treats part-whole relationships rigorously, data described with it can become substrates for automated reasoning, and instances of sequence features described by the SO can be subjected to a group of logical operations termed extensional mereology operators.
Collapse
Affiliation(s)
- Karen Eilbeck
- Department of Molecular and Cellular Biology, Life Sciences Addition, University of California, Berkeley, CA 94729-3200, USA.
| | | | | | | | | | | | | |
Collapse
|
35
|
Kersey P, Bower L, Morris L, Horne A, Petryszak R, Kanz C, Kanapin A, Das U, Michoud K, Phan I, Gattiker A, Kulikova T, Faruque N, Duggan K, Mclaren P, Reimholz B, Duret L, Penel S, Reuter I, Apweiler R. Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res 2005; 33:D297-302. [PMID: 15608201 PMCID: PMC539993 DOI: 10.1093/nar/gki039] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8.
Collapse
Affiliation(s)
- Paul Kersey
- The EMBL Outstation-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Giudicelli V, Chaume D, Lefranc MP. IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes. Nucleic Acids Res 2005; 33:D256-61. [PMID: 15608191 PMCID: PMC539964 DOI: 10.1093/nar/gki010] [Citation(s) in RCA: 369] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMGT/GENE-DB is the comprehensive IMGT genome database for immunoglobulin (IG) and T cell receptor (TR) genes from human and mouse, and, in development, from other vertebrates. IMGT/GENE-DB is the international reference for the IG and TR gene nomenclature and works in close collaboration with the HUGO Nomenclature Committee, Mouse Genome Database and genome committees for other species. IMGT/GENE-DB allows a search of IG and TR genes by locus, group and subgroup, which are CLASSIFICATION concepts of IMGT-ONTOLOGY. Short cuts allow the retrieval gene information by gene name or clone name. Direct links with configurable URL give access to information usable by humans or programs. An IMGT/GENE-DB entry displays accurate gene data related to genome (gene localization), allelic polymorphisms (number of alleles, IMGT reference sequences, functionality, etc.) gene expression (known cDNAs), proteins and structures (Protein displays, IMGT Colliers de Perles). It provides internal links to the IMGT sequence databases and to the IMGT Repertoire Web resources, and external links to genome and generalist sequence databases. IMGT/GENE-DB manages the IMGT reference directory used by the IMGT tools for IG and TR gene and allele comparison and assignment, and by the IMGT databases for gene data annotation. IMGT/GENE-DB is freely available at http://imgt.cines.fr.
Collapse
Affiliation(s)
- Véronique Giudicelli
- IMGT, the international ImMunoGeneTics information system, Laboratoire d'ImmunoGénétique Moléculaire, LIGM, Université Montpellier II, Institut de Génétique Humaine, IGH, UPR CNRS 1142, 141 rue de la Cardonille, 34396 Montpellier Cedex 5, France
| | | | | |
Collapse
|
37
|
Abstract
The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors; IPD-MHC, a database of sequences of the Major Histocompatibility Complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC share the same database structure. The sharing of a common database structure makes it easier to implement common tools for data submission and retrieval. The data are currently available online from the website and ftp directory; files will also be made available in different formats to download from the website and ftp server. The data will also be included in SRS, BLAST and FASTA search engines at the European Bioinformatics Institute.
Collapse
Affiliation(s)
- James Robinson
- Anthony Nolan Research Institute, Royal Free Hospital, Pond Street, Hampstead, London NW3 2QG, UK
| | | | | | | |
Collapse
|
38
|
Abstract
GenBank® is a comprehensive database that contains publicly available DNA sequences for more than 165 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps to ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Dennis A Benson
- Department of Health and Human Services, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | | | | | | |
Collapse
|
39
|
Petersen G, Johnson P, Andersson L, Klinga-Levan K, Gómez-Fabre PM, Ståhl F. RatMap--rat genome tools and data. Nucleic Acids Res 2005; 33:D492-4. [PMID: 15608244 PMCID: PMC540079 DOI: 10.1093/nar/gki125] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB–Genetics at Göteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided.
Collapse
Affiliation(s)
- Greta Petersen
- Department of Cell and Molecular Biology-Genetics, Göteborg University, Box 462, SE 40530 Göteborg, Sweden
| | | | | | | | | | | |
Collapse
|
40
|
Kanz C, Aldebert P, Althorpe N, Baker W, Baldwin A, Bates K, Browne P, van den Broek A, Castro M, Cochrane G, Duggan K, Eberhardt R, Faruque N, Gamble J, Diez FG, Harte N, Kulikova T, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silventoinen V, Sobhany S, Stoehr P, Tuli MA, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R. The EMBL Nucleotide Sequence Database. Nucleic Acids Res 2005; 33:D29-33. [PMID: 15608199 PMCID: PMC540052 DOI: 10.1093/nar/gki098] [Citation(s) in RCA: 173] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.
Collapse
Affiliation(s)
- Carola Kanz
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Matthews KA, Kaufman TC, Gelbart WM. Research resources for Drosophila: the expanding universe. Nat Rev Genet 2005; 6:179-93. [PMID: 15738962 DOI: 10.1038/nrg1554] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Drosophila melanogaster has been the subject of research into central questions about biological mechanisms for almost a century. The experimental tools and resources that are available or under development for D. melanogaster and its related species, particularly those for genomic analysis, are truly outstanding. Here we review three types of resource that have been developed for D. melanogaster research: databases and other sources of information, biological materials and experimental services. These resources are there to be exploited and we hope that this guide will encourage new uses for D. melanogaster information, materials and services, both by those new to flies and by experienced D. melanogaster researchers.
Collapse
Affiliation(s)
- Kathleen A Matthews
- Department of Biology, Indiana University, Bloomington, Indiana 47405-3700, USA.
| | | | | |
Collapse
|
42
|
Atlas - a data warehouse for integrative bioinformatics. BMC Bioinformatics 2005; 6:34. [PMID: 15723693 PMCID: PMC554782 DOI: 10.1186/1471-2105-6-34] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2004] [Accepted: 02/21/2005] [Indexed: 11/24/2022] Open
Abstract
Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at:
Collapse
|
43
|
Lefranc MP, Pommié C, Kaas Q, Duprat E, Bosc N, Guiraudou D, Jean C, Ruiz M, Da Piédade I, Rouard M, Foulquier E, Thouvenin V, Lefranc G. IMGT unique numbering for immunoglobulin and T cell receptor constant domains and Ig superfamily C-like domains. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2005; 29:185-203. [PMID: 15572068 DOI: 10.1016/j.dci.2004.07.003] [Citation(s) in RCA: 186] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2004] [Accepted: 07/16/2004] [Indexed: 05/24/2023]
Abstract
IMGT, the international ImMunoGeneTics information system (http://imgt.cines.fr) provides a common access to expertly annotated data on the genome, proteome, genetics and structure of immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC), and related proteins of the immune system (RPI) of human and other vertebrates. The NUMEROTATION concept of IMGT-ONTOLOGY has allowed to define a unique numbering for the variable domains (V-DOMAINs) and for the V-LIKE-DOMAINs. In this paper, this standardized characterization is extended to the constant domains (C-DOMAINs), and to the C-LIKE-DOMAINs, leading, for the first time, to their standardized description of mutations, allelic polymorphisms, two-dimensional (2D) representations and tridimensional (3D) structures. The IMGT unique numbering is, therefore, highly valuable for the comparative, structural or evolutionary studies of the immunoglobulin superfamily (IgSF) domains, V-DOMAINs and C-DOMAINs of IG and TR in vertebrates, and V-LIKE-DOMAINs and C-LIKE-DOMAINs of proteins other than IG and TR, in any species.
Collapse
Affiliation(s)
- Marie-Paule Lefranc
- IMGT, the International ImMunoGeneTics Information System, LIGM, Laboratoire d'ImmunoGénétique Moléculaire, Université Montpellier II, UPR CNRS 1142, IGH, 141 rue de la Cardonille, 34396 Montpellier cedex 5, France.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Furey TS, Diekhans M, Lu Y, Graves TA, Oddy L, Randall-Maher J, Hillier LW, Wilson RK, Haussler D. Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing. Genome Res 2004; 14:2034-40. [PMID: 15489323 PMCID: PMC528917 DOI: 10.1101/gr.2467904] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The NCBI Reference Sequence (RefSeq) project and the NIH Mammalian Gene Collection (MGC) together define a set of approximately 30,000 nonredundant human mRNA sequences with identified coding regions representing 17,000 distinct loci. These high-quality mRNA sequences allow for the identification of transcribed regions in the human genome sequence, and many researchers accept them as the correct representation of each defined gene sequence. Computational comparison of these mRNA sequences and the recently published essentially finished human genome sequence reveals several thousand undocumented nonsynonymous substitution and frame shift discrepancies between the two resources. Additional analysis is undertaken to verify that the euchromatic human genome is sufficiently complete--containing nearly the whole mRNA collection, thus allowing for a comprehensive analysis to be undertaken. Many of the discrepancies will prove to be genuine polymorphisms in the human population, somatic cell genomic variants, or examples of RNA editing. It is observed that the genome sequence variant has significant additional support from other mRNAs and ESTs, almost four times more often than does the mRNA variant, suggesting that the genome sequence is more accurate. In approximately 15% of these cases, there is substantial support for both variants, suggestive of an undocumented polymorphism. An initial screening against a 24-individual genomic DNA diversity panel verified 60% of a small set of potential single nucleotide polymorphisms from which successful results could be obtained. We also find statistical evidence that a few of these discrepancies are due to RNA editing. Overall, these results suggest that the mRNA collections may contain a substantial number of errors. For current and future mRNA collections, it may be prudent to fully reconcile each genome sequence discrepancy, classifying each as a polymorphism, site of RNA editing or somatic cell variation, or genome sequence error.
Collapse
Affiliation(s)
- Terrence S Furey
- Center for Biomolecular Science and Engineering, Department of Computer Science, University of California, Santa Cruz, Santa Cruz, California 95064, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Giudicelli V, Chaume D, Lefranc MP. IMGT/V-QUEST, an integrated software program for immunoglobulin and T cell receptor V-J and V-D-J rearrangement analysis. Nucleic Acids Res 2004; 32:W435-40. [PMID: 15215425 PMCID: PMC441550 DOI: 10.1093/nar/gkh412] [Citation(s) in RCA: 224] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2004] [Revised: 04/01/2004] [Accepted: 04/01/2004] [Indexed: 11/14/2022] Open
Abstract
IMGT/V-QUEST, for 'V-QUEry and STandardization', is an integrated software program which analyses the immunoglobulin (IG) and T cell receptor (TR) rearranged nucleotide sequences. The extraordinary diversity of the IG and TR repertoires (10(12) antibodies and 10(12) TR per individual) results from several mechanisms at the DNA level: the combinatorial diversity of the variable (V), diversity (D) and joining (J) genes, the N-diversity and, for IG, the somatic mutations. IMGT/V-QUEST identifies the V, D and J genes and alleles by alignment with the germline IG and TR gene and allele sequences of the IMGT reference directory. IMGT/V-QUEST delimits the structurally important features, frameworks and complementarity-determining regions (the last of these forming the antigen binding site), on the basis of the IMGT unique numbering. The tool localizes the somatic mutations of the IG rearranged sequences. IMGT/V-QUEST also dynamically displays a graphical two-dimensional representation, or IMGT Collier de Perles, of the IG and TR variable regions. Moreover, IMGT/V-QUEST can interact with IMGT/JunctionAnalysis for the detailed description of the V-J and V-D-J junctions, and with IMGT/PhyloGene for the construction of phylogenetic trees. IMGT/V-QUEST is currently available for human and mouse, and partly for non-human primates, sheep, chondrichthyes and teleostei. IMGT/V-QUEST is freely available at http://imgt.cines.fr.
Collapse
Affiliation(s)
- Véronique Giudicelli
- IMGT, the international ImMunoGeneTics information system, Laboratoire d'ImmunoGénétique Moléculaire, LIGM, Institut de Génétique Humaine IGH, UPR CNRS 1142, 141 rue de la Cardonille, F-34396 Montpellier Cedex 5, France
| | | | | |
Collapse
|