Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wei CH, Phan L, Feltz J, Maiti R, Hefferon T, Lu Z. tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine. Bioinformatics 2018;34:80-87. [PMID: 28968638 DOI: 10.1093/bioinformatics/btx541] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/31/2017] [Indexed: 11/12/2022] Open

For:	Wei CH, Phan L, Feltz J, Maiti R, Hefferon T, Lu Z. tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine. Bioinformatics 2018;34:80-87. [PMID: 28968638 DOI: 10.1093/bioinformatics/btx541] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/31/2017] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Lee K, Famiglietti ML, McMahon A, Wei CH, MacArthur JAL, Poux S, Breuza L, Bridge A, Cunningham F, Xenarios I, Lu Z. Scaling up data curation using deep learning: An application to literature triage in genomic variation resources. PLoS Comput Biol 2018;14:e1006390. [PMID: 30102703 PMCID: PMC6107285 DOI: 10.1371/journal.pcbi.1006390] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 08/23/2018] [Accepted: 07/24/2018] [Indexed: 11/18/2022] Open

Abstract

Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.

As the volume of literature on genomic variants continues to grow at an increasing rate, it is becoming more difficult for a curator of a variant knowledge base to keep up with and curate all the published papers. Here, we suggest a deep learning-based literature triage method for genomic variation resources. Our method achieves state-of-the-art performance on the triage task. Moreover, our model does not require any laborious preprocessing or feature engineering steps, which are required for traditional machine learning triage methods. We applied our method to the literature triage process of UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog for genomic variation by collaborating with the database curators. Both the manual curation teams confirmed that our method achieved higher precision than their previous query-based triage methods without compromising recall. Both results show that our method is more efficient and can replace the traditional query-based triage methods of manually curated databases. Our method can give human curators more time to focus on more challenging tasks such as actual curation as well as the discovery of novel papers/experimental techniques to consider for inclusion.

Collapse

Piñeiro-Yáñez E, Reboiro-Jato M, Gómez-López G, Perales-Patón J, Troulé K, Rodríguez JM, Tejero H, Shimamura T, López-Casas PP, Carretero J, Valencia A, Hidalgo M, Glez-Peña D, Al-Shahrour F. PanDrugs: a novel method to prioritize anticancer drug treatments according to individual genomic data. Genome Med 2018;10:41. [PMID: 29848362 PMCID: PMC5977747 DOI: 10.1186/s13073-018-0546-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 05/04/2018] [Indexed: 02/07/2023] Open

Abstract

BACKGROUND

Large-sequencing cancer genome projects have shown that tumors have thousands of molecular alterations and their frequency is highly heterogeneous. In such scenarios, physicians and oncologists routinely face lists of cancer genomic alterations where only a minority of them are relevant biomarkers to drive clinical decision-making. For this reason, the medical community agrees on the urgent need of methodologies to establish the relevance of tumor alterations, assisting in genomic profile interpretation, and, more importantly, to prioritize those that could be clinically actionable for cancer therapy.

RESULTS

We present PanDrugs, a new computational methodology to guide the selection of personalized treatments in cancer patients using the variant lists provided by genome-wide sequencing analyses. PanDrugs offers the largest database of drug-target associations available from well-known targeted therapies to preclinical drugs. Scoring data-driven gene cancer relevance and drug feasibility PanDrugs interprets genomic alterations and provides a prioritized evidence-based list of anticancer therapies. Our tool represents the first drug prescription strategy applying a rational based on pathway context, multi-gene markers impact and information provided by functional experiments. Our approach has been systematically applied to TCGA patients and successfully validated in a cancer case study with a xenograft mouse model demonstrating its utility.

CONCLUSIONS

PanDrugs is a feasible method to identify potentially druggable molecular alterations and prioritize drugs to facilitate the interpretation of genomic landscape and clinical decision-making in cancer patients. Our approach expands the search of druggable genomic alterations from the concept of cancer driver genes to the druggable pathway context extending anticancer therapeutic options beyond already known cancer genes. The methodology is public and easily integratable with custom pipelines through its programmatic API or its docker image. The PanDrugs webtool is freely accessible at http://www.pandrugs.org .

Collapse

Hsu YY, Wei CH, Lu Z. Assisting document triage for human kinome curation via machine learning. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2018:5094578. [PMID: 30239677 PMCID: PMC6146134 DOI: 10.1093/database/bay091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 08/13/2018] [Indexed: 11/16/2022]

Gonzalez-Hernandez G, Sarker A, O’Connor K, Greene C. Advances in Text Mining and Visualization for Precision Medicine. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018;23:559-565. [PMID: 29218914 PMCID: PMC7466870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Gobeill J, Gaudet P, Dopp D, Morrone A, Kahanda I, Hsu YY, Wei CH, Lu Z, Ruch P. Overview of the BioCreative VI text-mining services for Kinome Curation Track. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2018:5133467. [PMID: 30329035 PMCID: PMC6191643 DOI: 10.1093/database/bay104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 09/13/2018] [Indexed: 11/30/2022]

Chen Q, Panyam NC, Elangovan A, Verspoor K. BioCreative VI Precision Medicine Track system performance is constrained by entity recognition and variations in corpus characteristics. Database (Oxford) 2018;2018:5255181. [PMID: 30576491 PMCID: PMC6301335 DOI: 10.1093/database/bay122] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Revised: 09/24/2018] [Accepted: 10/16/2018] [Indexed: 01/01/2023]

Abstract

Precision medicine aims to provide personalized treatments based on individual patient profiles. One critical step towards precision medicine is leveraging knowledge derived from biomedical publications-a tremendous literature resource presenting the latest scientific discoveries on genes, mutations and diseases. Biomedical natural language processing (BioNLP) plays a vital role in supporting automation of this process. BioCreative VI Track 4 brings community effort to the task of automatically identifying and extracting protein-protein interactions (PPi) affected by mutations (PPIm), important in the precision medicine context for capturing individual genotype variation related to disease.We present the READ-BioMed team's approach to identifying PPIm-related publications and to extracting specific PPIm information from those publications in the context of the BioCreative VI PPIm track. We observe that current BioNLP tools are insufficient to recognise entities for these two tasks; the best existing mutation recognition tool achieves only 55% recall in the document triage training set, while relation extraction performance is limited by the low recall performance of gene entity recognition. We develop the models accordingly: for document triage, we develop term lists capturing interactions and mutations to complement BioNLP tools, and select effective features via a feature contribution study, whereas an ensemble of BioNLP tools is employed for relation extraction.Our best document triage model achieves an F-score of 66.77% while our best model for relation extraction achieved an F-score of 35.09% over the final (updated post-task) test set. Impacting the document triage task, the characteristics of mutations are statistically different in the training and testing sets. While a vital new direction for biomedical text mining research, this early attempt to tackle the problem of identifying genetic variation of substantial biological significance highlights the importance of representative training data and the cascading impact of tool limitations in a modular system.

Collapse