Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Chung SY, Wong L. Kleisli: a new tool for data integration in biology. Trends Biotechnol 1999;17:351-5. [PMID: 10461180 DOI: 10.1016/s0167-7799(99)01342-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Number

Cited by Other Article(s)

Baranzini SE, Börner K, Morris J, Nelson CA, Soman K, Schleimer E, Keiser M, Musen M, Pearce R, Reza T, Smith B, Herr BW, Oskotsky B, Rizk‐Jackson A, Rankin KP, Sanders SJ, Bove R, Rose PW, Israni S, Huang S. A biomedical open knowledge network harnesses the power of AI to understand deep human biology. AI MAG 2022;43:46-58. [PMID: 36093122 PMCID: PMC9456356 DOI: 10.1002/aaai.12037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Affiliation(s)

Sergio E. Baranzini Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
Katy Börner Department of Intelligent Systems Engineering Indiana University Bloomington Indiana USA
John Morris Department of Pharmaceutical Chemistry University of California San Francisco San Francisco California USA
Charlotte A. Nelson Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA
Karthik Soman Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA
Erica Schleimer Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA
Michael Keiser Department of Pharmaceutical Chemistry University of California San Francisco San Francisco California USA Institute for Neurodegenerative Diseases University of California San Francisco San Francisco California USA
Mark Musen Department of Medicine (Biomedical Informatics) and of Biomedical Data Science Stanford University School of Medicine Stanford California USA
Roger Pearce Center for Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory Livermore California USA
Tahsin Reza Center for Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory Livermore California USA
Brett Smith Institute for Systems Biology Seattle Washington USA
Bruce W. Herr Department of Intelligent Systems Engineering Indiana University Bloomington Indiana USA
Boris Oskotsky Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
Angela Rizk‐Jackson Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
Katherine P. Rankin Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
Stephan J. Sanders Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA Weill Institute for Neurosciences Department of Psychiatry and Behavioral Sciences University of California San Francisco San Francisco California USA
Riley Bove Weill Institute for Neurosciences Department of Neurology University of California San Francisco San Francisco California USA Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
Peter W. Rose San Diego Supercomputer Center University of California San Diego La Jolla California USA
Sharat Israni Bakar Institute for Computational Health Sciences University of California San Francisco San Francisco California USA
Sui Huang Institute for Systems Biology Seattle Washington USA

Collapse

Chierici M, Bussola N, Marcolini A, Francescatto M, Zandonà A, Trastulla L, Agostinelli C, Jurman G, Furlanello C. Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling. Front Oncol 2020;10:1065. [PMID: 32714870 PMCID: PMC7340129 DOI: 10.3389/fonc.2020.01065] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 05/28/2020] [Indexed: 12/20/2022] Open

Abstract

Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtyping and biomarker identification from multi-modal data. Here we apply the Integrative Network Fusion (INF) pipeline, which combines multiple omics layers exploiting Similarity Network Fusion (SNF) within a machine learning predictive framework. INF includes a feature ranking scheme (rSNF) on SNF-integrated features, used by a classifier over juxtaposed multi-omics features (juXT). In particular, we show instances of INF implementing Random Forest (RF) and linear Support Vector Machine (LSVM) as the classifier, and two baseline RF and LSVM models are also trained on juXT. A compact RF model, called rSNFi, trained on the intersection of top-ranked biomarkers from the two approaches juXT and rSNF is finally derived. All the classifiers are run in a 10x5-fold cross-validation schema to warrant reproducibility, following the guidelines for an unbiased Data Analysis Plan by the US FDA-led initiatives MAQC/SEQC. INF is demonstrated on four classification tasks on three multi-modal TCGA oncogenomics datasets. Gene expression, protein expression and copy number variants are used to predict estrogen receptor status (BRCA-ER, N = 381) and breast invasive carcinoma subtypes (BRCA-subtypes, N = 305), while gene expression, miRNA expression and methylation data is used as predictor layers for acute myeloid leukemia and renal clear cell carcinoma survival (AML-OS, N = 157; KIRC-OS, N = 181). In test, INF achieved similar Matthews Correlation Coefficient (MCC) values and 97% to 83% smaller feature sizes (FS), compared with juXT for BRCA-ER (MCC: 0.83 vs. 0.80; FS: 56 vs. 1801) and BRCA-subtypes (0.84 vs. 0.80; 302 vs. 1801), improving KIRC-OS performance (0.38 vs. 0.31; 111 vs. 2319). INF predictions are generally more accurate in test than one-dimensional omics models, with smaller signatures too, where transcriptomics consistently play the leading role. Overall, the INF framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes.

Collapse

Correlation-based network analysis combined with machine learning techniques highlight the role of the GABA shunt in Brachypodium sylvaticum freezing tolerance. Sci Rep 2020;10:4489. [PMID: 32161322 PMCID: PMC7066199 DOI: 10.1038/s41598-020-61081-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 02/14/2020] [Indexed: 12/18/2022] Open

Lapatas V, Stefanidakis M, Jimenez RC, Via A, Schneider MV. Data integration in biological research: an overview. JOURNAL OF BIOLOGICAL RESEARCH (THESSALONIKE, GREECE) 2015;22:9. [PMID: 26336651 PMCID: PMC4557916 DOI: 10.1186/s40709-015-0032-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 08/10/2015] [Indexed: 11/16/2022]

Eisenhaber F. Unix interfaces, Kleisli, bucandin structure, etc. -- the heroic beginning of bioinformatics in Singapore. J Bioinform Comput Biol 2014;12:1471002. [PMID: 24969753 DOI: 10.1142/s0219720014710024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Jimenez-Lopez JC, Gachomo EW, Sharma S, Kotchoni SO. Genome sequencing and next-generation sequence data analysis: A comprehensive compilation of bioinformatics tools and databases. ACTA ACUST UNITED AC 2013. [DOI: 10.4236/ajmb.2013.32016] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Schneider MV, Jimenez RC. Teaching the fundamentals of biological data integration using classroom games. PLoS Comput Biol 2012;8:e1002789. [PMID: 23300402 PMCID: PMC3531283 DOI: 10.1371/journal.pcbi.1002789] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Brief overview of bioinformatics activities in Singapore. PLoS Comput Biol 2009;5:e1000508. [PMID: 19779544 PMCID: PMC2737619 DOI: 10.1371/journal.pcbi.1000508] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

Louie B, Detwiler L, Dalvi N, Shaker R, Tarczy-Hornoch P, Suciu D. Incorporating Uncertainty Metrics into a General-Purpose Data Integration System. ACTA ACUST UNITED AC 2007. [DOI: 10.1109/ssdbm.2007.36] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Lee TJ, Pouliot Y, Wagner V, Gupta P, Stringer-Calvert DWJ, Tenenbaum JD, Karp PD. BioWarehouse: a bioinformatics database warehouse toolkit. BMC Bioinformatics 2006;7:170. [PMID: 16556315 PMCID: PMC1444936 DOI: 10.1186/1471-2105-7-170] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2005] [Accepted: 03/23/2006] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

This article addresses the problem of interoperation of heterogeneous bioinformatics databases.

RESULTS

We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research.

CONCLUSION

BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

Collapse

Data integration and genomic medicine. J Biomed Inform 2006;40:5-16. [PMID: 16574494 DOI: 10.1016/j.jbi.2006.02.007] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2005] [Accepted: 02/05/2006] [Indexed: 10/25/2022]

Mork P, Shaker R, Tarczy-Hornoch P. The Multiple Roles of Ontologies in the BioMediator Data Integration System. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11530084_9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Marenco L, Wang TY, Shepherd G, Miller PL, Nadkarni P. QIS: A framework for biomedical database federation. J Am Med Inform Assoc 2004;11:523-34. [PMID: 15298995 PMCID: PMC524633 DOI: 10.1197/jamia.m1506] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Tsur S. A plea for normalization of biosciences information. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2003;7:109-12. [PMID: 12831569 DOI: 10.1089/153623103322006733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Genomic data modeling. INFORM SYST 2003. [DOI: 10.1016/s0306-4379(02)00071-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Chung SY, Wooley JC. Challenges Faced in the Integration of Biological Information. Bioinformatics 2003. [DOI: 10.1016/b978-155860829-0/50004-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Michalickova K, Bader GD, Dumontier M, Lieu H, Betel D, Isserlin R, Hogue CWV. SeqHound: biological sequence and structure database as a platform for bioinformatics research. BMC Bioinformatics 2002;3:32. [PMID: 12401134 PMCID: PMC138791 DOI: 10.1186/1471-2105-3-32] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2002] [Accepted: 10/25/2002] [Indexed: 11/18/2022] Open

Petrovsky N, Brusic V. Computational immunology: The coming of age. Immunol Cell Biol 2002;80:248-54. [PMID: 12067412 DOI: 10.1046/j.1440-1711.2002.01093.x] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Masys DR. Database designs for microarray data. THE PHARMACOGENOMICS JOURNAL 2002;1:232-3. [PMID: 11908763 DOI: 10.1038/sj.tpj.6500048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Altman RB, Klein TE. Challenges for biomedical informatics and pharmacogenomics. Annu Rev Pharmacol Toxicol 2002;42:113-33. [PMID: 11807167 DOI: 10.1146/annurev.pharmtox.42.082401.140850] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Searls DB. Bioinformatics tools for whole genomes. Annu Rev Genomics Hum Genet 2002;1:251-79. [PMID: 11701631 DOI: 10.1146/annurev.genom.1.1.251] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Sobral BW, Mangalam H, Siepel A, Mendes P, Pecherer R, McLaren G. Bioinformatics for rice resources. NOVARTIS FOUNDATION SYMPOSIUM 2002;236:59-81; discussion 81-4. [PMID: 11387987 DOI: 10.1002/9780470515778.ch6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]

Lecompte O, Thompson JD, Plewniak F, Thierry J, Poch O. Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene 2001;270:17-30. [PMID: 11403999 DOI: 10.1016/s0378-1119(01)00461-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]