1
|
Baranzini SE, Börner K, Morris J, Nelson CA, Soman K, Schleimer E, Keiser M, Musen M, Pearce R, Reza T, Smith B, Herr BW, Oskotsky B, Rizk‐Jackson A, Rankin KP, Sanders SJ, Bove R, Rose PW, Israni S, Huang S. A biomedical open knowledge network harnesses the power of AI to understand deep human biology. AI MAG 2022; 43:46-58. [PMID: 36093122 PMCID: PMC9456356 DOI: 10.1002/aaai.12037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to enable computers to solve complex problems with AI methods. However, its application to biomedicine has been lagging in part due to the daunting complexity of molecular and cellular pathways that govern human physiology and pathology. In this article we describe concrete uses of SPOKE, an open knowledge network that connects curated information from 37 specialized and human-curated databases into a single property graph, with 3 million nodes and 15 million edges to date. Applications discussed in this article include drug discovery, COVID-19 research and chronic disease diagnosis and management.
Collapse
Affiliation(s)
- Sergio E. Baranzini
- Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA
- Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
| | - Katy Börner
- Department of Intelligent Systems Engineering Indiana University Bloomington Indiana USA
| | - John Morris
- Department of Pharmaceutical Chemistry University of California San Francisco San Francisco California USA
| | - Charlotte A. Nelson
- Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA
| | - Karthik Soman
- Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA
| | - Erica Schleimer
- Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA
| | - Michael Keiser
- Department of Pharmaceutical Chemistry University of California San Francisco San Francisco California USA
- Institute for Neurodegenerative Diseases University of California San Francisco San Francisco California USA
| | - Mark Musen
- Department of Medicine (Biomedical Informatics) and of Biomedical Data Science Stanford University School of Medicine Stanford California USA
| | - Roger Pearce
- Center for Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory Livermore California USA
| | - Tahsin Reza
- Center for Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory Livermore California USA
| | - Brett Smith
- Institute for Systems Biology Seattle Washington USA
| | - Bruce W. Herr
- Department of Intelligent Systems Engineering Indiana University Bloomington Indiana USA
| | - Boris Oskotsky
- Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
| | - Angela Rizk‐Jackson
- Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
| | - Katherine P. Rankin
- Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA
- Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
| | - Stephan J. Sanders
- Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
- Weill Institute for Neurosciences Department of Psychiatry and Behavioral Sciences University of California San Francisco San Francisco California USA
| | - Riley Bove
- Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA
- Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
| | - Peter W. Rose
- San Diego Supercomputer Center University of California San Diego La Jolla California USA
| | - Sharat Israni
- Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
| | - Sui Huang
- Institute for Systems Biology Seattle Washington USA
| |
Collapse
|
2
|
Chierici M, Bussola N, Marcolini A, Francescatto M, Zandonà A, Trastulla L, Agostinelli C, Jurman G, Furlanello C. Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling. Front Oncol 2020; 10:1065. [PMID: 32714870 PMCID: PMC7340129 DOI: 10.3389/fonc.2020.01065] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 05/28/2020] [Indexed: 12/20/2022] Open
Abstract
Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtyping and biomarker identification from multi-modal data. Here we apply the Integrative Network Fusion (INF) pipeline, which combines multiple omics layers exploiting Similarity Network Fusion (SNF) within a machine learning predictive framework. INF includes a feature ranking scheme (rSNF) on SNF-integrated features, used by a classifier over juxtaposed multi-omics features (juXT). In particular, we show instances of INF implementing Random Forest (RF) and linear Support Vector Machine (LSVM) as the classifier, and two baseline RF and LSVM models are also trained on juXT. A compact RF model, called rSNFi, trained on the intersection of top-ranked biomarkers from the two approaches juXT and rSNF is finally derived. All the classifiers are run in a 10x5-fold cross-validation schema to warrant reproducibility, following the guidelines for an unbiased Data Analysis Plan by the US FDA-led initiatives MAQC/SEQC. INF is demonstrated on four classification tasks on three multi-modal TCGA oncogenomics datasets. Gene expression, protein expression and copy number variants are used to predict estrogen receptor status (BRCA-ER, N = 381) and breast invasive carcinoma subtypes (BRCA-subtypes, N = 305), while gene expression, miRNA expression and methylation data is used as predictor layers for acute myeloid leukemia and renal clear cell carcinoma survival (AML-OS, N = 157; KIRC-OS, N = 181). In test, INF achieved similar Matthews Correlation Coefficient (MCC) values and 97% to 83% smaller feature sizes (FS), compared with juXT for BRCA-ER (MCC: 0.83 vs. 0.80; FS: 56 vs. 1801) and BRCA-subtypes (0.84 vs. 0.80; 302 vs. 1801), improving KIRC-OS performance (0.38 vs. 0.31; 111 vs. 2319). INF predictions are generally more accurate in test than one-dimensional omics models, with smaller signatures too, where transcriptomics consistently play the leading role. Overall, the INF framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes.
Collapse
Affiliation(s)
| | - Nicole Bussola
- Fondazione Bruno Kessler, Trento, Italy
- University of Trento, Trento, Italy
| | | | - Margherita Francescatto
- Fondazione Bruno Kessler, Trento, Italy
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| | | | | | | | | | | |
Collapse
|
3
|
Correlation-based network analysis combined with machine learning techniques highlight the role of the GABA shunt in Brachypodium sylvaticum freezing tolerance. Sci Rep 2020; 10:4489. [PMID: 32161322 PMCID: PMC7066199 DOI: 10.1038/s41598-020-61081-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 02/14/2020] [Indexed: 12/18/2022] Open
Abstract
Perennial grasses will account for approximately 16 billion gallons of renewable fuels by the year 2022, contributing significantly to carbon and nitrogen sequestration. However, perennial grasses productivity can be limited by severe freezing conditions in some geographical areas, although these risks could decrease with the advance of climate warming, the possibility of unpredictable early cold events cannot be discarded. We conducted a study on the model perennial grass Brachypodium sylvaticum to investigate the molecular mechanisms that contribute to cold and freezing adaption. The study was performed on two different B. sylvaticum accessions, Ain1 and Osl1, typical to warm and cold climates, respectively. Both accessions were grown under controlled conditions with subsequent cold acclimation followed by freezing stress. For each treatment a set of morphological parameters, transcription, metabolite, and lipid profiles were measured. State-of-the-art algorithms were employed to analyze cross-component relationships. Phenotypic analysis revealed higher adaption of Osl1 to freezing stress. Our analysis highlighted the differential regulation of the TCA cycle and the GABA shunt between Ain1 and Osl1. Osl1 adapted to freezing stress by repressing the GABA shunt activity, avoiding the detrimental reduction in fatty acid biosynthesis and the concomitant detrimental effects on membrane integrity.
Collapse
|
4
|
Lapatas V, Stefanidakis M, Jimenez RC, Via A, Schneider MV. Data integration in biological research: an overview. JOURNAL OF BIOLOGICAL RESEARCH (THESSALONIKE, GREECE) 2015; 22:9. [PMID: 26336651 PMCID: PMC4557916 DOI: 10.1186/s40709-015-0032-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 08/10/2015] [Indexed: 11/16/2022]
Abstract
Data sharing, integration and annotation are essential to ensure the reproducibility of the analysis and interpretation of the experimental findings. Often these activities are perceived as a role that bioinformaticians and computer scientists have to take with no or little input from the experimental biologist. On the contrary, biological researchers, being the producers and often the end users of such data, have a big role in enabling biological data integration. The quality and usefulness of data integration depend on the existence and adoption of standards, shared formats, and mechanisms that are suitable for biological researchers to submit and annotate the data, so it can be easily searchable, conveniently linked and consequently used for further biological analysis and discovery. Here, we provide background on what is data integration from a computational science point of view, how it has been applied to biological research, which key aspects contributed to its success and future directions.
Collapse
Affiliation(s)
- Vasileios Lapatas
- />Department of Informatics, Ionian University, 7 Tsirigoti Square, Corfu, 49100 Greece
| | - Michalis Stefanidakis
- />Department of Informatics, Ionian University, 7 Tsirigoti Square, Corfu, 49100 Greece
| | | | - Allegra Via
- />Biocomputing Group, Sapienza University, Piazzale Aldo Moro 5, Rome, 00185 Italy
| | | |
Collapse
|
5
|
Eisenhaber F. Unix interfaces, Kleisli, bucandin structure, etc. -- the heroic beginning of bioinformatics in Singapore. J Bioinform Comput Biol 2014; 12:1471002. [PMID: 24969753 DOI: 10.1142/s0219720014710024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Remarkably, Singapore as one of today's hotspots for bioinformatics and computational biology research appeared de novo out of pioneering efforts of engaged local individuals in the early 90-s that, supported with increasing public funds from 1996 on, morphed into the present vibrant research community. This article brings to mind the pioneers, their first successes and early institutional developments.
Collapse
Affiliation(s)
- Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Singapore , Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, Singapore 117597, Singapore , School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore 637553, Singapore
| |
Collapse
|
6
|
Jimenez-Lopez JC, Gachomo EW, Sharma S, Kotchoni SO. Genome sequencing and next-generation sequence data analysis: A comprehensive compilation of bioinformatics tools and databases. ACTA ACUST UNITED AC 2013. [DOI: 10.4236/ajmb.2013.32016] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
7
|
Abstract
This article aims to introduce the nature of data integration to life scientists. Generally, the subject of data integration is not discussed outside the field of computational science and is not covered in any detail, or even neglected, when teaching/training trainees. End users (hereby defined as wet-lab trainees, clinicians, lab researchers) will mostly interact with bioinformatics resources and tools through web interfaces that mask the user from the data integration processes. However, the lack of formal training or acquaintance with even simple database concepts and terminology often results in a real obstacle to the full comprehension of the resources and tools the end users wish to access. Understanding how data integration works is fundamental to empowering trainees to see the limitations as well as the possibilities when exploring, retrieving, and analysing biological data from databases. Here we introduce a game-based learning activity for training/teaching the topic of data integration that trainers/educators can adopt and adapt for their classroom. In particular we provide an example using DAS (Distributed Annotation Systems) as a method for data integration.
Collapse
Affiliation(s)
- Maria Victoria Schneider
- Outreach and Training Team, European Molecular Biology Laboratory Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
| | | |
Collapse
|
8
|
|
9
|
Louie B, Detwiler L, Dalvi N, Shaker R, Tarczy-Hornoch P, Suciu D. Incorporating Uncertainty Metrics into a General-Purpose Data Integration System. ACTA ACUST UNITED AC 2007. [DOI: 10.1109/ssdbm.2007.36] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
10
|
Lee TJ, Pouliot Y, Wagner V, Gupta P, Stringer-Calvert DWJ, Tenenbaum JD, Karp PD. BioWarehouse: a bioinformatics database warehouse toolkit. BMC Bioinformatics 2006; 7:170. [PMID: 16556315 PMCID: PMC1444936 DOI: 10.1186/1471-2105-7-170] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2005] [Accepted: 03/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This article addresses the problem of interoperation of heterogeneous bioinformatics databases. RESULTS We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. CONCLUSION BioWarehouse embodies significant progress on the database integration problem for bioinformatics.
Collapse
Affiliation(s)
- Thomas J Lee
- Bioinformatics Research Group, SRI International, Menlo Park, USA
| | - Yannick Pouliot
- Bioinformatics Research Group, SRI International, Menlo Park, USA
| | - Valerie Wagner
- Bioinformatics Research Group, SRI International, Menlo Park, USA
| | - Priyanka Gupta
- Bioinformatics Research Group, SRI International, Menlo Park, USA
| | | | | | - Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, USA
| |
Collapse
|
11
|
Abstract
Genomic medicine aims to revolutionize health care by applying our growing understanding of the molecular basis of disease. Research in this arena is data intensive, which means data sets are large and highly heterogeneous. To create knowledge from data, researchers must integrate these large and diverse data sets. This presents daunting informatic challenges such as representation of data that is suitable for computational inference (knowledge representation), and linking heterogeneous data sets (data integration). Fortunately, many of these challenges can be classified as data integration problems, and technologies exist in the area of data integration that may be applied to these challenges. In this paper, we discuss the opportunities of genomic medicine as well as identify the informatics challenges in this domain. We also review concepts and methodologies in the field of data integration. These data integration concepts and methodologies are then aligned with informatics challenges in genomic medicine and presented as potential solutions. We conclude this paper with challenges still not addressed in genomic medicine and gaps that remain in data integration research to facilitate genomic medicine.
Collapse
|
12
|
Mork P, Shaker R, Tarczy-Hornoch P. The Multiple Roles of Ontologies in the BioMediator Data Integration System. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11530084_9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
13
|
Marenco L, Wang TY, Shepherd G, Miller PL, Nadkarni P. QIS: A framework for biomedical database federation. J Am Med Inform Assoc 2004; 11:523-34. [PMID: 15298995 PMCID: PMC524633 DOI: 10.1197/jamia.m1506] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Query Integrator System (QIS) is a database mediator framework intended to address robust data integration from continuously changing heterogeneous data sources in the biosciences. Currently in the advanced prototype stage, it is being used on a production basis to integrate data from neuroscience databases developed for the SenseLab project at Yale University with external neuroscience and genomics databases. The QIS framework uses standard technologies and is intended to be deployable by administrators with a moderate level of technological expertise: It comes with various tools, such as interfaces for the design of distributed queries. The QIS architecture is based on a set of distributed network-based servers, data source servers, integration servers, and ontology servers, that exchange metadata as well as mappings of both metadata and data elements to elements in an ontology. Metadata version difference determination coupled with decomposition of stored queries is used as the basis for partial query recovery when the schema of data sources alters.
Collapse
Affiliation(s)
- Luis Marenco
- Center for Medical Informatics, Yale University, School of Medicine, New Haven, CT 06520-8009, USA.
| | | | | | | | | |
Collapse
|
14
|
Tsur S. A plea for normalization of biosciences information. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2003; 7:109-12. [PMID: 12831569 DOI: 10.1089/153623103322006733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Shalom Tsur
- Real-Time Enterprise Group, Mountain View, California 94040, USA.
| |
Collapse
|
15
|
|
16
|
|
17
|
Michalickova K, Bader GD, Dumontier M, Lieu H, Betel D, Isserlin R, Hogue CWV. SeqHound: biological sequence and structure database as a platform for bioinformatics research. BMC Bioinformatics 2002; 3:32. [PMID: 12401134 PMCID: PMC138791 DOI: 10.1186/1471-2105-3-32] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2002] [Accepted: 10/25/2002] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. RESULTS SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. CONCLUSIONS The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.
Collapse
Affiliation(s)
- Katerina Michalickova
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8
- Samuel Lunenfeld Research Institute, 600 University Avenue, Toronto, Ontario, Canada M5G 1X5
| | - Gary D Bader
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8
- Samuel Lunenfeld Research Institute, 600 University Avenue, Toronto, Ontario, Canada M5G 1X5
| | - Michel Dumontier
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8
- Samuel Lunenfeld Research Institute, 600 University Avenue, Toronto, Ontario, Canada M5G 1X5
| | - Hao Lieu
- Samuel Lunenfeld Research Institute, 600 University Avenue, Toronto, Ontario, Canada M5G 1X5
| | - Doron Betel
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8
- Samuel Lunenfeld Research Institute, 600 University Avenue, Toronto, Ontario, Canada M5G 1X5
| | - Ruth Isserlin
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8
- Samuel Lunenfeld Research Institute, 600 University Avenue, Toronto, Ontario, Canada M5G 1X5
| | - Christopher WV Hogue
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8
- Samuel Lunenfeld Research Institute, 600 University Avenue, Toronto, Ontario, Canada M5G 1X5
| |
Collapse
|
18
|
Abstract
The explosive growth in biotechnology combined with major advances in information technology has the potential to radically transform immunology in the postgenomics era. Not only do we now have ready access to vast quantities of existing data, but new data with relevance to immunology are being accumulated at an exponential rate. Resources for computational immunology include biological databases and methods for data extraction, comparison, analysis and interpretation. Publicly accessible biological databases of relevance to immunologists number in the hundreds and are growing daily. The ability to efficiently extract and analyse information from these databases is vital for efficient immunology research. Most importantly, a new generation of computational immunology tools enables modelling of peptide transport by the transporter associated with antigen processing (TAP), modelling of antibody binding sites, identification of allergenic motifs and modelling of T-cell receptor serial triggering.
Collapse
Affiliation(s)
- Nikolai Petrovsky
- National BioinformaticsCentre, University of Canberra and National Health Sciences Centre,Canberra Clinical School, Woden, Australian Capital Territory, Australia.
| | | |
Collapse
|
19
|
Affiliation(s)
- D R Masys
- University of California, San Diego School of Medicine, USA.
| |
Collapse
|
20
|
Abstract
Pharmacogenomics requires the integration and analysis of genomic, molecular, cellular, and clinical data, and it thus offers a remarkable set of challenges to biomedical informatics. These include infrastructural challenges such as the creation of data models and databases for storing these data, the integration of these data with external databases, the extraction of information from natural language text, and the protection of databases with sensitive information. There are also scientific challenges in creating tools to support gene expression analysis, three-dimensional structural analysis, and comparative genomic analysis. In this review, we summarize the current uses of informatics within pharmacogenomics and show how the technical challenges that remain for biomedical informatics are typical of those that will be confronted in the postgenomic era.
Collapse
Affiliation(s)
- Russ B Altman
- Stanford Medical Informatics, Stanford, California 94305-5479, USA.
| | | |
Collapse
|
21
|
Abstract
The advent of whole-genome data resources--not only sequence but also other genome-scale data collections such as gene expression, protein interaction, and genetic variation--is having two marked, complementary effects on the relatively new discipline of bioinformatics. First, the veritable flood of data is creating a need and demand for new tools for dealing adequately with the deluge, and, second, the unprecedented extent, diversity, and impending completeness of the data sets are creating opportunities for new approaches to discovery based on computational methods.
Collapse
Affiliation(s)
- D B Searls
- Bioinformatics Department, SmithKline Beecham Pharmaceuticals, King of Prussia, Pennsylvania 19406, USA.
| |
Collapse
|
22
|
Sobral BW, Mangalam H, Siepel A, Mendes P, Pecherer R, McLaren G. Bioinformatics for rice resources. NOVARTIS FOUNDATION SYMPOSIUM 2002; 236:59-81; discussion 81-4. [PMID: 11387987 DOI: 10.1002/9780470515778.ch6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
Abstract
The distinguishing feature of the 'new biology' is that it is information intensive. Not only does it demand access to and assimilation of vast data sets accumulated by engineered laboratory processes, but it also demands a previously unimaginable level of data integration across data types and sources. There are various information resources available for rice. In addition, there are various information resources that are not focused on rice but that contain rice data. The challenge for rice researchers and breeders is to access this wealth of data meaningfully. This challenge will grow significantly as international efforts aimed at sequencing the entire rice genome come into full swing. Only through concerted efforts in bioinformatics will the power of these public data be brought to bear on the needs of rice researchers and breeders worldwide. These efforts will need to focus on two large but distinct areas: (1) development of an effective bioinformatics infrastructure (hardware systems, software systems, and software engineers and support staff) and (2) computational biology research in visualization and analysis of very large, complex data sets, such as those that will be developed using high-throughput expression technologies, large-scale insertional mutagenesis, and biochemical profiling of various types. In the midst of the large flow of high-throughput data that the international rice genome sequencing efforts will produce, it is also imperative that integration of those data with unique germplasm data held in trust by the CGIAR be a part of the informatics infrastructure. This paper will focus on the state of rice information resources, the needs of the rice community, and some proposed bioinformatics activities to support these needs.
Collapse
Affiliation(s)
- B W Sobral
- Virginia Bioinformatics Institute, Virginia Tech (0477), 1750 Kraft Drive, Suite 1400, Blacksburg, VA 24061, USA
| | | | | | | | | | | |
Collapse
|
23
|
Lecompte O, Thompson JD, Plewniak F, Thierry J, Poch O. Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene 2001; 270:17-30. [PMID: 11403999 DOI: 10.1016/s0378-1119(01)00461-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Multiple alignment, since its introduction in the early seventies, has become a cornerstone of modern molecular biology. It has traditionally been used to deduce structure / function by homology, to detect conserved motifs and in phylogenetic studies. There has recently been some renewed interest in the development of multiple alignment techniques, with current opinion moving away from a single all-encompassing algorithm to iterative and / or co-operative strategies. The exploitation of multiple alignments in genome annotation projects represents a qualitative leap in the functional analysis process, opening the way to the study of the co-evolution of validated sets of proteins and to reliable phylogenomic analysis. However, the alignment of the highly complex proteins detected by today's advanced database search methods is a daunting task. In addition, with the explosion of the sequence databases and with the establishment of numerous specialized biological databases, multiple alignment programs must evolve if they are to successfully rise to the new challenges of the post-genomic era. The way forward is clearly an integrated system bringing together sequence data, knowledge-based systems and prediction methods with their inherent unreliability. The incorporation of such heterogeneous, often non-consistent, data will require major changes to the fundamental alignment algorithms used to date. Such an integrated multiple alignment system will provide an ideal workbench for the validation, propagation and presentation of this information in a format that is concise, clear and intuitive.
Collapse
Affiliation(s)
- O Lecompte
- Laboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS/INSERM/ULP), BP 163, 67404 Cedex, Illkirch, France
| | | | | | | | | |
Collapse
|