Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Koike A, Kobayashi Y, Takagi T. Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. Genome Res 2003;13:1231-43. [PMID: 12799355 PMCID: PMC403651 DOI: 10.1101/gr.835903] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

For:	Koike A, Kobayashi Y, Takagi T. Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. Genome Res 2003;13:1231-43. [PMID: 12799355 PMCID: PMC403651 DOI: 10.1101/gr.835903] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Number

Cited by Other Article(s)

Jiang Y, Wang Y, Shen L, Adjeroh DA, Liu Z, Lin J. Identification of all-against-all protein-protein interactions based on deep hash learning. BMC Bioinformatics 2022;23:266. [PMID: 35804303 PMCID: PMC9264577 DOI: 10.1186/s12859-022-04811-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 06/17/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Protein-protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming.

RESULTS

In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database with M proteins can be transformed into a much more simpler problem: to find a number inside a sorted array of length M. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship.

CONCLUSIONS

The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from [Formula: see text] to [Formula: see text] for performing an all-against-all PPI prediction for a database with M proteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.

Collapse

Wang H, Qiu J, Liu H, Xu Y, Jia Y, Zhao Y. HKPocket: human kinase pocket database for drug design. BMC Bioinformatics 2019;20:617. [PMID: 31783725 PMCID: PMC6884818 DOI: 10.1186/s12859-019-3254-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 11/15/2019] [Indexed: 01/06/2023] Open

Hyung D, Mallon AM, Kyung DS, Cho SY, Seong JK. TarGo: network based target gene selection system for human disease related mouse models. Lab Anim Res 2019;35:23. [PMID: 32257911 PMCID: PMC7081697 DOI: 10.1186/s42826-019-0023-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 10/21/2019] [Indexed: 11/25/2022] Open

Pozzi B, Mammi P, Bragado L, Giono LE, Srebrow A. When SUMO met splicing. RNA Biol 2018;15:689-695. [PMID: 29741121 PMCID: PMC6152442 DOI: 10.1080/15476286.2018.1457936] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2017] [Revised: 02/22/2018] [Accepted: 03/20/2018] [Indexed: 12/12/2022] Open

Stroehlein AJ, Young ND, Gasser RB. Advances in kinome research of parasitic worms - implications for fundamental research and applied biotechnological outcomes. Biotechnol Adv 2018;36:915-934. [PMID: 29477756 DOI: 10.1016/j.biotechadv.2018.02.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/15/2018] [Accepted: 02/21/2018] [Indexed: 12/17/2022]

Žitnik S, Žitnik M, Zupan B, Bajec M. Sieve-based relation extraction of gene regulatory networks from biological literature. BMC Bioinformatics 2015;16 Suppl 16:S1. [PMID: 26551454 PMCID: PMC4642041 DOI: 10.1186/1471-2105-16-s16-s1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Abstract

Background

Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks.

Results

We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions.

Conclusions

Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.

Collapse

van Linden OPJ, Kooistra AJ, Leurs R, de Esch IJP, de Graaf C. KLIFS: a knowledge-based structural database to navigate kinase-ligand interaction space. J Med Chem 2013;57:249-77. [PMID: 23941661 DOI: 10.1021/jm400378w] [Citation(s) in RCA: 212] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Friedman C, Rindflesch TC, Corn M. Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. J Biomed Inform 2013;46:765-73. [PMID: 23810857 DOI: 10.1016/j.jbi.2013.06.004] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Revised: 06/07/2013] [Accepted: 06/07/2013] [Indexed: 01/29/2023]

Tikk D, Solt I, Thomas P, Leser U. A detailed error analysis of 13 kernel methods for protein-protein interaction extraction. BMC Bioinformatics 2013;14:12. [PMID: 23323857 PMCID: PMC3680070 DOI: 10.1186/1471-2105-14-12] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2012] [Accepted: 12/19/2012] [Indexed: 11/21/2022] Open

RASOnD-a comprehensive resource and search tool for RAS superfamily oncogenes from various species. BMC Genomics 2011;12:341. [PMID: 21729256 PMCID: PMC3141677 DOI: 10.1186/1471-2164-12-341] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 07/05/2011] [Indexed: 12/30/2022] Open

Abstract

Background

The Ras superfamily plays an important role in the control of cell signalling and division. Mutations in the Ras genes convert them into active oncogenes. The Ras oncogenes form a major thrust of global cancer research as they are involved in the development and progression of tumors. This has resulted in the exponential growth of data on Ras superfamily across different public databases and in literature. However, no dedicated public resource is currently available for data mining and analysis on this family. The present database was developed to facilitate straightforward accession, retrieval and analysis of information available on Ras oncogenes from one particular site.

Description

We have developed the RAS Oncogene Database (RASOnD) as a comprehensive knowledgebase that provides integrated and curated information on a single platform for oncogenes of Ras superfamily. RASOnD encompasses exhaustive genomics and proteomics data existing across diverse publicly accessible databases. This resource presently includes overall 199,046 entries from 101 different species. It provides a search tool to generate information about their nucleotide and amino acid sequences, single nucleotide polymorphisms, chromosome positions, orthologies, motifs, structures, related pathways and associated diseases. We have implemented a number of user-friendly search interfaces and sequence analysis tools. At present the user can (i) browse the data (ii) search any field through a simple or advance search interface and (iii) perform a BLAST search and subsequently CLUSTALW multiple sequence alignment by selecting sequences of Ras oncogenes. The Generic gene browser, GBrowse, JMOL for structural visualization and TREEVIEW for phylograms have been integrated for clear perception of retrieved data. External links to related databases have been included in RASOnD.

Conclusions

This database is a resource and search tool dedicated to Ras oncogenes. It has utility to cancer biologists and cell molecular biologists as it is a ready source for research, identification and elucidation of the role of these oncogenes. The data generated can be used for understanding the relationship between the Ras oncogenes and their association with cancer. The database updated monthly is freely accessible online at http://202.141.47.181/rasond/ and http://www.aiims.edu/RAS.html.

Collapse

Dlxin-1, a member of MAGE family, inhibits cell proliferation, invasion and tumorigenicity of glioma stem cells. Cancer Gene Ther 2010;18:206-18. [DOI: 10.1038/cgt.2010.71] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Yang L, Zhang X, Chen J, Wang Q, Wang L, Jiang Y, Pan Y. ReCGiP, a database of reproduction candidate genes in pigs based on bibliomics. Reprod Biol Endocrinol 2010;8:96. [PMID: 20707928 PMCID: PMC3224910 DOI: 10.1186/1477-7827-8-96] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2010] [Accepted: 08/14/2010] [Indexed: 11/23/2022] Open

Krallinger M, Leitner F, Valencia A. Analysis of biological processes and diseases using text mining approaches. Methods Mol Biol 2010;593:341-382. [PMID: 19957157 DOI: 10.1007/978-1-60327-194-3_16] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]

Abstract

A number of biomedical text mining systems have been developed to extract biologically relevant information directly from the literature, complementing bioinformatics methods in the analysis of experimentally generated data. We provide a short overview of the general characteristics of natural language data, existing biomedical literature databases, and lexical resources relevant in the context of biomedical text mining. A selected number of practically useful systems are introduced together with the type of user queries supported and the results they generate. The extraction of biological relationships, such as protein-protein interactions as well as metabolic and signaling pathways using information extraction systems, will be discussed through example cases of cancer-relevant proteins. Basic strategies for detecting associations of genes to diseases together with literature mining of mutations, SNPs, and epigenetic information (methylation) are described. We provide an overview of disease-centric and gene-centric literature mining methods for linking genes to phenotypic and genotypic aspects. Moreover, we discuss recent efforts for finding biomarkers through text mining and for gene list analysis and prioritization. Some relevant issues for implementing a customized biomedical text mining system will be pointed out. To demonstrate the usefulness of literature mining for the molecular oncology domain, we implemented two cancer-related applications. The first tool consists of a literature mining system for retrieving human mutations together with supporting articles. Specific gene mutations are linked to a set of predefined cancer types. The second application consists of a text categorization system supporting breast cancer-specific literature search and document-based breast cancer gene ranking. Future trends in text mining emphasize the importance of community efforts such as the BioCreative challenge for the development and integration of multiple systems into a common platform provided by the BioCreative Metaserver.

Collapse

Episodic Src activation in uveal melanoma revealed by kinase activity profiling. Br J Cancer 2009;101:312-9. [PMID: 19568237 PMCID: PMC2720193 DOI: 10.1038/sj.bjc.6605172] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open

Yang CY, Chang CH, Yu YL, Lin TCE, Lee SA, Yen CC, Yang JM, Lai JM, Hong YR, Tseng TL, Chao KM, Huang CYF. PhosphoPOINT: a comprehensive human kinase interactome and phospho-protein database. ACTA ACUST UNITED AC 2008;24:i14-20. [PMID: 18689816 DOI: 10.1093/bioinformatics/btn297] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Rinaldi F, Kappeler T, Kaljurand K, Schneider G, Klenner M, Clematide S, Hess M, von Allmen JM, Parisot P, Romacker M, Vachon T. OntoGene in BioCreative II. Genome Biol 2008;9 Suppl 2:S13. [PMID: 18834491 PMCID: PMC2559984 DOI: 10.1186/gb-2008-9-s2-s13] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Giles CB, Wren JD. Large-scale directional relationship extraction and resolution. BMC Bioinformatics 2008;9 Suppl 9:S11. [PMID: 18793456 PMCID: PMC2537562 DOI: 10.1186/1471-2105-9-s9-s11] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Kim JD, Ohta T, Tsujii J. Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 2008;9:10. [PMID: 18182099 PMCID: PMC2267702 DOI: 10.1186/1471-2105-9-10] [Citation(s) in RCA: 120] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2007] [Accepted: 01/08/2008] [Indexed: 11/24/2022] Open

Tsuruoka Y, McNaught J, Tsujii J, Ananiadou S. Learning string similarity measures for gene/protein name dictionary look-up using logistic regression. Bioinformatics 2007;23:2768-74. [PMID: 17698493 DOI: 10.1093/bioinformatics/btm393] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Xuan W, Wang P, Watson SJ, Meng F. Medline search engine for finding genetic markers with biological significance. Bioinformatics 2007;23:2477-84. [PMID: 17823133 DOI: 10.1093/bioinformatics/btm375] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Mueller M, Martens L, Apweiler R. Annotating the human proteome: Beyond establishing a parts list. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2007;1774:175-91. [PMID: 17223395 DOI: 10.1016/j.bbapap.2006.11.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2006] [Revised: 11/16/2006] [Accepted: 11/21/2006] [Indexed: 12/31/2022]

Rinaldi F, Schneider G, Kaljurand K, Hess M, Romacker M. An environment for relation mining over richly annotated corpora: the case of GENIA. BMC Bioinformatics 2006;7 Suppl 3:S3. [PMID: 17134476 PMCID: PMC1764447 DOI: 10.1186/1471-2105-7-s3-s3] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open

Kohn KW, Aladjem MI, Weinstein JN, Pommier Y. Molecular interaction maps of bioregulatory networks: a general rubric for systems biology. Mol Biol Cell 2005;17:1-13. [PMID: 16267266 PMCID: PMC1345641 DOI: 10.1091/mbc.e05-09-0824] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Liu H, Hu ZZ, Zhang J, Wu C. BioThesaurus: a web-based thesaurus of protein and gene names. Bioinformatics 2005;22:103-5. [PMID: 16267085 DOI: 10.1093/bioinformatics/bti749] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Krallinger M, Valencia A. Text-mining and information-retrieval services for molecular biology. Genome Biol 2005;6:224. [PMID: 15998455 PMCID: PMC1175978 DOI: 10.1186/gb-2005-6-7-224] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Mullen T, Mizuta Y, Collier N. A baseline feature set for learning rhetorical zones using full articles in the biomedical domain. ACTA ACUST UNITED AC 2005. [DOI: 10.1145/1089815.1089823] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Hakenberg J, Schmeier S, Kowald A, Klipp E, Leser U. Finding kinetic parameters using text mining. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2005;8:131-52. [PMID: 15268772 DOI: 10.1089/1536231041388366] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Eis K, Ince SJ, Jahn C, Jautelat R, Katchourovsky V, Kettschau G, Woloszczak R. Kinase Data Mining: Dealing with the Information (Over-)Flow. Chembiochem 2005;6:567-70. [PMID: 15712317 DOI: 10.1002/cbic.200400154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Santos C, Eggle D, States DJ. Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction. Bioinformatics 2004;21:1653-8. [PMID: 15564295 DOI: 10.1093/bioinformatics/bti165] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Abstract

MOTIVATION

Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map.

RESULTS

A 'gold standard' set of names and assertions was derived by manual scanning of the Wnt genes website (http://www.stanford.edu/~rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer-reviewed articles related to Wnt signaling including 3369 Pubmed and 1230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the 'gold standard' Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal pathway databases.

AVAILABILITY

The pipeline software components are freely available on request to the authors.

CONTACT

dstates@umich.edu

SUPPLEMENTARY INFORMATION

http://stateslab.bioinformatics.med.umich.edu/software.html.

Collapse

Koike A, Niwa Y, Takagi T. Automatic extraction of gene/protein biological functions from biomedical text. Bioinformatics 2004;21:1227-36. [PMID: 15509601 DOI: 10.1093/bioinformatics/bti084] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Hanash S. Integrated global profiling of cancer. Nat Rev Cancer 2004;4:638-44. [PMID: 15286743 DOI: 10.1038/nrc1414] [Citation(s) in RCA: 99] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]