Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wei CH, Phan L, Feltz J, Maiti R, Hefferon T, Lu Z. tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine. Bioinformatics 2018;34:80-87. [PMID: 28968638 DOI: 10.1093/bioinformatics/btx541] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/31/2017] [Indexed: 11/12/2022] Open

For:	Wei CH, Phan L, Feltz J, Maiti R, Hefferon T, Lu Z. tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine. Bioinformatics 2018;34:80-87. [PMID: 28968638 DOI: 10.1093/bioinformatics/btx541] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/31/2017] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Huang DL, Zeng Q, Xiong Y, Liu S, Pang C, Xia M, Fang T, Ma Y, Qiang C, Zhang Y, Zhang Y, Li H, Yuan Y. A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature. Interdiscip Sci 2024;16:333-344. [PMID: 38340264 PMCID: PMC11289304 DOI: 10.1007/s12539-024-00605-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 02/12/2024]

Asadi-Pooya AA, Malekpour M, Taherifard E, Mallahzadeh A, Farjoud Kouhanjani M. Coexistence of temporal lobe epilepsy and idiopathic generalized epilepsy. Epilepsy Behav 2024;151:109602. [PMID: 38160579 DOI: 10.1016/j.yebeh.2023.109602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 12/07/2023] [Accepted: 12/21/2023] [Indexed: 01/03/2024]

Jin Q, Leaman R, Lu Z. PubMed and beyond: biomedical literature search in the age of artificial intelligence. EBioMedicine 2024;100:104988. [PMID: 38306900 PMCID: PMC10850402 DOI: 10.1016/j.ebiom.2024.104988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 01/14/2024] [Accepted: 01/15/2024] [Indexed: 02/04/2024] Open

Wu Z, Feng C, Hu Y, Zhou Y, Li S, Zhang S, Hu Y, Chen Y, Chao H, Ni Q, Chen M. HALD, a human aging and longevity knowledge graph for precision gerontology and geroscience analyses. Sci Data 2023;10:851. [PMID: 38040715 PMCID: PMC10692171 DOI: 10.1038/s41597-023-02781-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 11/23/2023] [Indexed: 12/03/2023] Open

Xu Q, Liu Y, Sun D, Huang X, Li F, Zhai J, Li Y, Zhou Q, Qian N, Niu B. OncoCTMiner: streamlining precision oncology trial matching via molecular profile analysis. Database (Oxford) 2023;2023:baad077. [PMID: 37935585 PMCID: PMC10630409 DOI: 10.1093/database/baad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 09/08/2023] [Accepted: 10/21/2023] [Indexed: 11/09/2023]

Affiliation(s)

Quan Xu Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
Yueyue Liu Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
Dawei Sun Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
Xiaoqian Huang Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
Feihong Li Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
JinCheng Zhai Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
Yang Li Beijing International Center for Mathematical Research, Peking University, No. 5 Yiheyuan Road Haidian District, Beijing 100871, China Chongqing Research Institute of Big Data, Peking University, Chongqing 401333, China
Qiming Zhou Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
Niansong Qian Department of Oncology, Senior Department of Respiratory and Critical Care Medicine, The Eighth Medical Center of Chinese PLA General Hospital, No.17 A Heishanhu Road, Haidian District, Beijing 100853, China
Beifang Niu Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100190, China

Collapse

Hussain M, Muhammad K, Khan M, Din AU. A Novel CRYBB2 Silent Variant in Autosomal Dominant Congenital Cataracts (ADCC) in Pakistani families. Pak J Med Sci 2023;39:1399-1405. [PMID: 37680813 PMCID: PMC10480720 DOI: 10.12669/pjms.39.5.7061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/30/2022] [Accepted: 06/18/2023] [Indexed: 09/09/2023] Open

Pu Y, Beck D, Verspoor K. Graph embedding-based link prediction for literature-based discovery in Alzheimer's Disease. J Biomed Inform 2023;145:104464. [PMID: 37541406 DOI: 10.1016/j.jbi.2023.104464] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 07/29/2023] [Accepted: 07/30/2023] [Indexed: 08/06/2023]

Abstract

OBJECTIVE

We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology.

METHODS

We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed.

RESULTS

We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation.

CONCLUSION

Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases.

AVAILABILITY

Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.

Collapse

Sun Z, Tao C. Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023;2023:558-564. [PMID: 38283164 PMCID: PMC10815931 DOI: 10.1109/ichi57859.2023.00100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]

Abstract

Alzheimer's Disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Finding effective treatments for this disease is crucial. Clinical trials play an essential role in developing and testing new treatments for AD. However, identifying eligible participants can be challenging, time-consuming, and costly. In recent years, the development of natural language processing (NLP) techniques, specifically named entity recognition (NER) and named entity normalization (NEN), have helped to automate the identification and extraction of relevant information from the eligibility criteria (EC) more efficiently, in order to facilitate semi-automatic patient recruitment and enable data FAIRness for clinical trial data. Nevertheless, most current biomedical NER models only provide annotations for a restricted set of entity types that may not be applicable to the clinical trial data. Additionally, accurately performing NEN on entities that are negated using a negative prefix currently lacks established techniques. In this paper, we introduce a pipeline designed for information extraction from AD clinical trial EC, which involves preprocessing of the EC data, clinical NER, and biomedical NEN to Unified Medical Language System (UMLS). Our NER model can identify named entities in seven pre-defined categories, while our NEN model employs a combination of exact match and partial match search strategies, as well as customized rules to accurately normalize entities with negative prefixes. To evaluate the performance of our pipeline, we measured the precision, recall, and F1 score for the NER component, and we manually reviewed the top five mapping results produced by the NEN component. Our evaluation of the pipeline's performance revealed that it can successfully normalize named entities in clinical trial ECs with optimal accuracies. The NER component achieved a overall F1 of 0.816, demonstrating its ability to accurately identify seven types of named entities in clinical text. The NEN component of the pipeline also demonstrated impressive performance, with customized rules and a combination of exact and partial match strategies leading to an accuracy of 0.940 for normalized entities.

Collapse

Nicholson DN, Alquaddoomi F, Rubinetti V, Greene CS. Changing word meanings in biomedical literature reveal pandemics and new technologies. BioData Min 2023;16:16. [PMID: 37147665 PMCID: PMC10161184 DOI: 10.1186/s13040-023-00332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/24/2023] [Indexed: 05/07/2023] Open

Malekpour M, Salarikia SR, Kashkooli M, Asadi-Pooya AA. The genetic link between systemic autoimmune disorders and temporal lobe epilepsy: A bioinformatics study. Epilepsia Open 2023. [PMID: 36929812 DOI: 10.1002/epi4.12727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/11/2023] [Indexed: 03/18/2023] Open

Nicholson DN, Himmelstein DS, Greene CS. Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts. BioData Min 2022;15:26. [PMID: 36258252 PMCID: PMC9578183 DOI: 10.1186/s13040-022-00311-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 09/17/2022] [Indexed: 02/04/2023] Open

Abstract

BACKGROUND

Knowledge graphs support biomedical research efforts by providing contextual information for biomedical entities, constructing networks, and supporting the interpretation of high-throughput analyses. These databases are populated via manual curation, which is challenging to scale with an exponentially rising publication rate. Data programming is a paradigm that circumvents this arduous manual process by combining databases with simple rules and heuristics written as label functions, which are programs designed to annotate textual data automatically. Unfortunately, writing a useful label function requires substantial error analysis and is a nontrivial task that takes multiple days per function. This bottleneck makes populating a knowledge graph with multiple nodes and edge types practically infeasible. Thus, we sought to accelerate the label function creation process by evaluating how label functions can be re-used across multiple edge types.

RESULTS

We obtained entity-tagged abstracts and subsetted these entities to only contain compounds, genes, and disease mentions. We extracted sentences containing co-mentions of certain biomedical entities contained in a previously described knowledge graph, Hetionet v1. We trained a baseline model that used database-only label functions and then used a sampling approach to measure how well adding edge-specific or edge-mismatch label function combinations improved over our baseline. Next, we trained a discriminator model to detect sentences that indicated a biomedical relationship and then estimated the number of edge types that could be recalled and added to Hetionet v1. We found that adding edge-mismatch label functions rarely improved relationship extraction, while control edge-specific label functions did. There were two exceptions to this trend, Compound-binds-Gene and Gene-interacts-Gene, which both indicated physical relationships and showed signs of transferability. Across the scenarios tested, discriminative model performance strongly depends on generated annotations. Using the best discriminative model for each edge type, we recalled close to 30% of established edges within Hetionet v1.

CONCLUSIONS

Our results show that this framework can incorporate novel edges into our source knowledge graph. However, results with label function transfer were mixed. Only label functions describing very similar edge types supported improved performance when transferred. We expect that the continued development of this strategy may provide essential building blocks to populating biomedical knowledge graphs with discoveries, ensuring that these resources include cutting-edge results.

Collapse

Luo L, Lai PT, Wei CH, Arighi CN, Lu Z. BioRED: a rich biomedical relation extraction dataset. Brief Bioinform 2022;23:6645993. [PMID: 35849818 PMCID: PMC9487702 DOI: 10.1093/bib/bbac282] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 06/02/2022] [Accepted: 06/19/2022] [Indexed: 11/13/2022] Open

Wei CH, Allot A, Riehle K, Milosavljevic A, Lu Z. tmVar 3.0: an improved variant concept recognition and normalization tool. Bioinformatics 2022;38:4449-4451. [PMID: 35904569 PMCID: PMC9477515 DOI: 10.1093/bioinformatics/btac537] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 07/07/2022] [Accepted: 07/27/2022] [Indexed: 12/24/2022] Open

Xu Q, Liu Y, Hu J, Duan X, Song N, Zhou J, Zhai J, Su J, Liu S, Chen F, Zheng W, Guo Z, Li H, Zhou Q, Niu B. OncoPubMiner: a platform for mining oncology publications. Brief Bioinform 2022;23:6691792. [PMID: 36058206 DOI: 10.1093/bib/bbac383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/12/2022] Open

Affiliation(s)

Quan Xu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Yueyue Liu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Jifang Hu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100190, China
Xiaohong Duan ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Niuben Song ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Jiale Zhou ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Jincheng Zhai ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Junyan Su ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Siyao Liu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Fan Chen ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Wei Zheng The Department of Nephrology and Hypertension Medicine, Beijing Electric Power Hospital, Beijing 100073, China
Zhongjia Guo ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Hexiang Li ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Qiming Zhou ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Beifang Niu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100190, China

Collapse

Sung M, Jeong M, Choi Y, Kim D, Lee J, Kang J. BERN2: an advanced neural biomedical named entity recognition and normalization tool. Bioinformatics 2022;38:4837-4839. [PMID: 36053172 PMCID: PMC9563680 DOI: 10.1093/bioinformatics/btac598] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 07/09/2022] [Accepted: 08/31/2022] [Indexed: 11/14/2022] Open

Goto A, Rodriguez-Esteban R, Scharf SH, Morris GM. Understanding the genetics of viral drug resistance by integrating clinical data and mining of the scientific literature. Sci Rep 2022;12:14476. [PMID: 36008431 PMCID: PMC9403226 DOI: 10.1038/s41598-022-17746-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 07/30/2022] [Indexed: 11/16/2022] Open

Lin PC, Tsai YS, Yeh YM, Shen MR. Cutting-Edge AI Technologies Meet Precision Medicine to Improve Cancer Care. Biomolecules 2022;12:1133. [PMID: 36009026 PMCID: PMC9405970 DOI: 10.3390/biom12081133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 11/18/2022] Open

Wang C, Yu P, Hu L, Liang M, Mao Y, Zeng Q, Wang X, Huang K, Yan J, Xie L, Zhang F, Zhu F. Prevalence and prognosis of molecularly defined familial hypercholesterolemia in patients with acute coronary syndrome. Front Cardiovasc Med 2022;9:921803. [PMID: 35966514 PMCID: PMC9363594 DOI: 10.3389/fcvm.2022.921803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 07/06/2022] [Indexed: 11/13/2022] Open

Affiliation(s)

Cheng Wang Department of Cardiology, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China Clinic Center of Human Gene Research, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China
Puliang Yu Wuhan University of Science and Technology, Wuhan, China
Lizhi Hu Clinic Center of Human Gene Research, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China
Minglu Liang Clinic Center of Human Gene Research, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China
Yi Mao Department of Cardiology, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China
Qiutang Zeng Department of Cardiology, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China
Xiang Wang Department of Cardiology, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China
Kai Huang Department of Cardiology, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China Clinic Center of Human Gene Research, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China
Jin Yan Department of Clinical Laboratory, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China
Li Xie Clinical Research Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Fengxiao Zhang Department of Cardiology, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China Clinic Center of Human Gene Research, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China *Correspondence: Fengxiao Zhang,
Feng Zhu Department of Cardiology, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China Clinic Center of Human Gene Research, Tongji Medical College, Union Hospital, Huazhong University of Science and Technology, Wuhan, China Feng Zhu

Collapse

Garda S, Lenihan-Geels F, Proft S, Hochmuth S, Schülke M, Seelow D, Leser U. RegEl corpus: identifying DNA regulatory elements in the scientific literature. Database (Oxford) 2022;2022:6618549. [PMID: 35758881 PMCID: PMC9235371 DOI: 10.1093/database/baac043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 05/25/2022] [Accepted: 06/02/2022] [Indexed: 11/17/2022]

Wilhelm K, Edick MJ, Berry SA, Hartnett M, Brower A. Using Long-Term Follow-Up Data to Classify Genetic Variants in Newborn Screened Conditions. Front Genet 2022;13:859837. [PMID: 35692825 PMCID: PMC9178101 DOI: 10.3389/fgene.2022.859837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/28/2022] [Indexed: 11/16/2022] Open

Li PH, Chen TF, Yu JY, Shih SH, Su CH, Lin YH, Tsai HK, Juan HF, Chen CY, Huang JH. pubmedKB: an interactive web server for exploring biomedical entity relations in the biomedical literature. Nucleic Acids Res 2022;50:W616-W622. [PMID: 35536289 PMCID: PMC9252824 DOI: 10.1093/nar/gkac310] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 04/06/2022] [Accepted: 04/18/2022] [Indexed: 11/15/2022] Open

Wang X, Zhu X, Peng L, Zhao Y. Identification of lncRNA Biomarkers and LINC01198 Promotes Progression of Chronic Rhinosinusitis with Nasal Polyps through Sponge miR-6776-5p. BIOMED RESEARCH INTERNATIONAL 2022;2022:9469207. [PMID: 35572732 PMCID: PMC9106458 DOI: 10.1155/2022/9469207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 01/06/2022] [Accepted: 01/18/2022] [Indexed: 11/17/2022]

Abstract

Background

Chronic sinusitis (CRS) was a chronic inflammation that originated in the nasal mucosa and affected the health of most people around the world. Chronic rhinosinusitis with nasal polyps (CRSwNP) was one kind of chronic sinusitis. Emerging research had suggested that long noncoding RNAs (lncRNAs) played vital parts in inflammatories and inflammation development.

Methods

We acquired GEO data to analyze the differential expression between the miRNA, immune genes, TF, and lncRNA data in CRSWNP and the corresponding control tissues. Bioinformatic analysis by coexpression of endogenous RNA network and competitive way enrichment, analysis, and forecasting functions of these noncoding RNA. The different pathway expressions in CRSwNP patients were confirmed using GSVA to analyze the differentially expressed immune genes and TF data sets in CRSwNP patients. The differential immune gene and transcription factor data set in CRSwNP perform functional notes and protein-protein interaction (PPI) network structure. We predicted the potential genes and RNAs related to CRSWNP by constructing a ceRNA network. In addition, we also used 19 hub immune genes to predict the potential drugs of CRSWNP. lncRNA biomarkers in CRSwNP were identified by lncRNAs LASSO regression. The CIBERSORT algorithm was used to contrast the divergence in immune infiltrations between CRSwNP and usual inferior turbinate organizations in 22 immunocyte subgroups.

Results

We identified a total of 48 miRNAs, 304 lncRNAs, 92 TFs, and 525 immune genes as CRSwNP-specific RNAs. GO and KEGG pathways both analyzed differentially expressed immune genes and transcription factor data sets. We predicted the potential genes GNG7, TUSC8, LINC01198, and has-miR-6776-5p by constructing ceRNA and PPI networks. At the same time, we found that the above genes were involved in two important pathways: chemokine signal path and PI3K/AKT signal path. In addition, we predicted 5 small molecule drugs to treat CRSwNP by analyzing 19 central immune genes, namely, danazol, ikarugamycin, semustine, cefamandole, and molindone. Finally, we identified 5 biomarkers in CRSwNP, namely, LINC01198, LINC01094, LINC01798, LINC01829, and LINC01320.

Conclusions

We had identified CRSwNP-related miRNAs, lncRNAs, TFs, and immune genes, which may be making use of latent therapeutic target for CRSwNP. At the same time, we identified 5 lncRNA biomarkers in CRSwNP. The results of this study showed that LINC01198 promoted the progression of CRSwNPs through spongy miR-6776-5p. Our studies provide a new way for further analyses of the pathogenesis of CRSwNP.

Collapse

Zhu X, Gu Y, Xiao Z. HerbKG: Constructing a Herbal-Molecular Medicine Knowledge Graph Using a Two-Stage Framework Based on Deep Transfer Learning. Front Genet 2022;13:799349. [PMID: 35571049 PMCID: PMC9091197 DOI: 10.3389/fgene.2022.799349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 04/05/2022] [Indexed: 11/13/2022] Open

Pasche E, Mottaz A, Caucheteur D, Gobeill J, Michel PA, Ruch P. Variomes: a high recall search engine to support the curation of genomic variants. Bioinformatics 2022;38:2595-2601. [PMID: 35274687 PMCID: PMC9048643 DOI: 10.1093/bioinformatics/btac146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 02/07/2022] [Accepted: 03/10/2022] [Indexed: 12/02/2022] Open

Kaushik V, Plazzer JP, Winship I, Macrae F. Genetic variant interpretation: a primer for clinicians. Intern Med J 2021;51:1401-1406. [PMID: 34541770 DOI: 10.1111/imj.15485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 02/08/2021] [Accepted: 05/31/2021] [Indexed: 11/28/2022]

Becker TE, Jakobsson E. ResidueFinder: extracting individual residue mentions from protein literature. J Biomed Semantics 2021;12:14. [PMID: 34289903 PMCID: PMC8293528 DOI: 10.1186/s13326-021-00243-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 05/07/2021] [Indexed: 11/10/2022] Open

Abstract

Background

The revolution in molecular biology has shown how protein function and structure are based on specific sequences of amino acids. Thus, an important feature in many papers is the mention of the significance of individual amino acids in the context of the entire sequence of the protein. MutationFinder is a widely used program for finding mentions of specific mutations in texts. We report on augmenting the positive attributes of MutationFinder with a more inclusive regular expression list to create ResidueFinder, which finds mentions of native amino acids as well as mutations. We also consider parameter options for both ResidueFinder and MutationFinder to explore trade-offs between precision, recall, and computational efficiency. We test our methods and software in full text as well as abstracts.

Results

We find there is much more variety of formats for mentioning residues in the entire text of papers than in abstracts alone. Failure to take these multiple formats into account results in many false negatives in the program. Since MutationFinder, like several other programs, was primarily tested on abstracts, we found it necessary to build an expanded regular expression list to achieve acceptable recall in full text searches. We also discovered a number of artifacts arising from PDF to text conversion, which we wrote elements in the regular expression library to address. Taking into account those factors resulted in high recall on randomly selected primary research articles. We also developed a streamlined regular expression (called “cut”) which enables a several hundredfold speedup in both MutationFinder and ResidueFinder with only a modest compromise of recall. All regular expressions were tested using expanded F-measure statistics, i.e., we compute F_β for various values of where the larger the value of β the more recall is weighted, the smaller the value of β the more precision is weighted.

Conclusions

ResidueFinder is a simple, effective, and efficient program for finding individual residue mentions in primary literature starting with text files, implemented in Python, and available in SourceForge.net. The most computationally efficient versions of ResidueFinder could enable creation and maintenance of a database of residue mentions encompassing all articles in PubMed.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13326-021-00243-3.

Collapse

Islamaj R, Wei CH, Cissel D, Miliaras N, Printseva O, Rodionov O, Sekiya K, Ward J, Lu Z. NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition. J Biomed Inform 2021;118:103779. [PMID: 33839304 PMCID: PMC11037554 DOI: 10.1016/j.jbi.2021.103779] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 03/14/2021] [Accepted: 04/05/2021] [Indexed: 10/21/2022]

Lee K, Wei CH, Lu Z. Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature. Brief Bioinform 2021;22:bbaa142. [PMID: 32770181 PMCID: PMC8138883 DOI: 10.1093/bib/bbaa142] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 06/07/2020] [Accepted: 06/25/2020] [Indexed: 12/28/2022] Open

Kaushik V, Plazzer J, Macrae F. Evaluation of literature searching tools for curation of mismatch repair gene variants in hereditary colon cancer. ADVANCED GENETICS (HOBOKEN, N.J.) 2021;2:e10039. [PMID: 36618447 PMCID: PMC9744508 DOI: 10.1002/ggn2.10039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 01/12/2021] [Accepted: 01/14/2021] [Indexed: 01/11/2023]

Abstract

Pathogenic constitutional genomic variants in the mismatch repair (MMR) genes are the drivers of Lynch syndrome; optimal variant interpretation is required for the management of suspected and confirmed cases. The International Society for Hereditary Gastrointestinal Tumours (InSiGHT) provides expert classifications for MMR variants for the US National Human Genome Research Institute's (NHGRI) ClinGen initiative and interprets variants with discordant classifications and those of uncertain significance (VUSs). Given the onerous nature of extracting information related to variants, literature searching tools which harness artificial intelligence may aid in retrieving information to allow optimum variant classification. In this study, we described the nature of discordance in a sample of 80 variants from a list of variants requiring updating by InSiGHT for ClinGen by comparing their existing InSiGHT classifications with the various submissions for each variant on the US National Centre for Biotechnology Information's (NCBI) ClinVar database. To identify the potential value of a literature searching tool in extracting information related to classification, all variants were searched for using a traditional method (Google Scholar) and literature searching tool (Mastermind) independently. Descriptive statistics were used to compare: the number of articles before and after screening for relevance and the number of relevant articles unique to either method. Relevance was defined as containing the variant in question as well as data informing variant interpretation. A total of 916 articles were returned by both methods and Mastermind averaged four relevant articles per search compared to Google Scholar's three. Of relevant Mastermind articles, 193/308 (62.7%) were unique to it, compared to 87/202, (43.0%) for Google Scholar. For 24 variants, either or both methods found no information. All 6/80 (20%) variants with pathogenic or likely pathogenic InSiGHT classifications have newer VUS assertions on ClinVar. Our study demonstrated that for a sample of variants with varying discordant interpretations, Mastermind was able to return on average, a more relevant and unique literature search. Google Scholar was able to retrieve information that Mastermind did not, which supports a conclusion that Mastermind could play a complementary role in literature searching for classification. This work will aid InSiGHT in its role of classifying MMR variants.

Collapse

Martin RL. Gene-Centric Database Reveals Environmental and Lifestyle Relationships for Potential Risk Modification and Prevention. Lifestyle Genom 2021;14:30-36. [PMID: 33461193 DOI: 10.1159/000512690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 10/29/2020] [Indexed: 11/19/2022] Open

Dingerdissen HM, Bastian F, Vijay-Shanker K, Robinson-Rechavi M, Bell A, Gogate N, Gupta S, Holmes E, Kahsay R, Keeney J, Kincaid H, King CH, Liu D, Crichton DJ, Mazumder R. OncoMX: A Knowledgebase for Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data. JCO Clin Cancer Inform 2020;4:210-220. [PMID: 32142370 PMCID: PMC7101249 DOI: 10.1200/cci.19.00117] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Rehmat N, Farooq H, Kumar S, Ul Hussain S, Naveed H. Predicting the pathogenicity of protein coding mutations using Natural Language Processing. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020;2020:5842-5846. [PMID: 33019302 DOI: 10.1109/embc44109.2020.9175781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Sarmah P, Bharali R, Khatonier R, Khan A. Polymorphism in Toll interacting protein (TOLLIP) gene and its association with Visceral Leishmaniasis. GENE REPORTS 2020. [DOI: 10.1016/j.genrep.2020.100705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Saberian N, Shafi A, Peyvandipour A, Draghici S. MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature. Sci Rep 2020;10:12365. [PMID: 32703994 PMCID: PMC7378213 DOI: 10.1038/s41598-020-68649-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 06/17/2020] [Indexed: 11/09/2022] Open

Alag S. Unique insights from ClinicalTrials.gov by mining protein mutations and RSids in addition to applying the Human Phenotype Ontology. PLoS One 2020;15:e0233438. [PMID: 32459809 PMCID: PMC7252633 DOI: 10.1371/journal.pone.0233438] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2019] [Accepted: 05/05/2020] [Indexed: 01/31/2023] Open

UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 2020;47:D506-D515. [PMID: 30395287 PMCID: PMC6323992 DOI: 10.1093/nar/gky1049] [Citation(s) in RCA: 4747] [Impact Index Per Article: 1186.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/18/2018] [Indexed: 12/14/2022] Open

Wei CH, Allot A, Leaman R, Lu Z. PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res 2020;47:W587-W593. [PMID: 31114887 DOI: 10.1093/nar/gkz389] [Citation(s) in RCA: 188] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Revised: 04/08/2019] [Accepted: 04/30/2019] [Indexed: 11/12/2022] Open

Huang KY, Lee TY, Kao HJ, Ma CT, Lee CC, Lin TH, Chang WC, Huang HD. dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res 2020;47:D298-D308. [PMID: 30418626 PMCID: PMC6323979 DOI: 10.1093/nar/gky1074] [Citation(s) in RCA: 138] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/19/2018] [Indexed: 12/25/2022] Open

Chen Q, Du J, Kim S, Wilbur WJ, Lu Z. Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records. BMC Med Inform Decis Mak 2020;20:73. [PMID: 32349758 PMCID: PMC7191680 DOI: 10.1186/s12911-020-1044-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Abstract

Background

Capturing sentence semantics plays a vital role in a range of text mining applications. Despite continuous efforts on the development of related datasets and models in the general domain, both datasets and models are limited in biomedical and clinical domains. The BioCreative/OHNLP2018 organizers have made the first attempt to annotate 1068 sentence pairs from clinical notes and have called for a community effort to tackle the Semantic Textual Similarity (BioCreative/OHNLP STS) challenge.

Methods

We developed models using traditional machine learning and deep learning approaches. For the post challenge, we focused on two models: the Random Forest and the Encoder Network. We applied sentence embeddings pre-trained on PubMed abstracts and MIMIC-III clinical notes and updated the Random Forest and the Encoder Network accordingly.

Results

The official results demonstrated our best submission was the ensemble of eight models. It achieved a Person correlation coefficient of 0.8328 – the highest performance among 13 submissions from 4 teams. For the post challenge, the performance of both Random Forest and the Encoder Network was improved; in particular, the correlation of the Encoder Network was improved by ~ 13%. During the challenge task, no end-to-end deep learning models had better performance than machine learning models that take manually-crafted features. In contrast, with the sentence embeddings pre-trained on biomedical corpora, the Encoder Network now achieves a correlation of ~ 0.84, which is higher than the original best model. The ensembled model taking the improved versions of the Random Forest and Encoder Network as inputs further increased performance to 0.8528.

Conclusions

Deep learning models with sentence embeddings pre-trained on biomedical corpora achieve the highest performance on the test set. Through error analysis, we find that end-to-end deep learning models and traditional machine learning models with manually-crafted features complement each other by finding different types of sentences. We suggest a combination of these models can better find similar sentences in practice.

Collapse

Chen Q, Lee K, Yan S, Kim S, Wei CH, Lu Z. BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale. PLoS Comput Biol 2020;16:e1007617. [PMID: 32324731 PMCID: PMC7237030 DOI: 10.1371/journal.pcbi.1007617] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 05/19/2020] [Accepted: 12/19/2019] [Indexed: 12/14/2022] Open

Abstract

A massive number of biological entities, such as genes and mutations, are mentioned in the biomedical literature. The capturing of the semantic relatedness of biological entities is vital to many biological applications, such as protein-protein interaction prediction and literature-based discovery. Concept embeddings—which involve the learning of vector representations of concepts using machine learning models—have been employed to capture the semantics of concepts. To develop concept embeddings, named-entity recognition (NER) tools are first used to identify and normalize concepts from the literature, and then different machine learning models are used to train the embeddings. Despite multiple attempts, existing biomedical concept embeddings generally suffer from suboptimal NER tools, small-scale evaluation, and limited availability. In response, we employed high-performance machine learning-based NER tools for concept recognition and trained our concept embeddings, BioConceptVec, via four different machine learning models on ~30 million PubMed abstracts. BioConceptVec covers over 400,000 biomedical concepts mentioned in the literature and is of the largest among the publicly available biomedical concept embeddings to date. To evaluate the validity and utility of BioConceptVec, we respectively performed two intrinsic evaluations (identifying related concepts based on drug-gene and gene-gene interactions) and two extrinsic evaluations (protein-protein interaction prediction and drug-drug interaction extraction), collectively using over 25 million instances from nine independent datasets (17 million instances from six intrinsic evaluation tasks and 8 million instances from three extrinsic evaluation tasks), which is, by far, the most comprehensive to our best knowledge. The intrinsic evaluation results demonstrate that BioConceptVec consistently has, by a large margin, better performance than existing concept embeddings in identifying similar and related concepts. More importantly, the extrinsic evaluation results demonstrate that using BioConceptVec with advanced deep learning models can significantly improve performance in downstream bioinformatics studies and biomedical text-mining applications. Our BioConceptVec embeddings and benchmarking datasets are publicly available at https://github.com/ncbi-nlp/BioConceptVec.

Capturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and biomedical literature-based discovery. Here, we propose to leverage state-of-the-art text mining tools and machine learning models to learn the semantics via vector representations (aka. embeddings) of over 400,000 biological concepts mentioned in the entire PubMed abstracts. Our learned embeddings, namely BioConceptVec, can capture related concepts based on their surrounding contextual information in the literature, which is beyond exact term match or co-occurrence-based methods. BioConceptVec has been thoroughly evaluated in multiple bioinformatics tasks consisting of over 25 million instances from nine different biological datasets. The evaluation results demonstrate that BioConceptVec has better performance than existing methods in all tasks. Finally, BioConceptVec is made freely available to the research community and general public.

Collapse

Mutation profiles of classic myeloproliferative neoplasms detected by a customized next-generation sequencing-based 50-gene panel. JOURNAL OF BIO-X RESEARCH 2020. [DOI: 10.1097/jbr.0000000000000061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

Nie A, Pineda AL, Wright MW, Wand H, Wulf B, Costa H, Patel R, Bustamante CD, Zou J. LitGen: Genetic Literature Recommendation Guided by Human Explanations. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020;25:67-78. [PMID: 31797587 PMCID: PMC7478937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

PGxMine: Text mining for curation of PharmGKB. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020;25:611-622. [PMID: 31797632 PMCID: PMC6917032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Jiang Y, Wu C, Zhang Y, Zhang S, Yu S, Lei P, Lu Q, Xi Y, Wang H, Song Z. GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining. BMC Med Genomics 2019;12:193. [PMID: 31856831 PMCID: PMC6923899 DOI: 10.1186/s12920-019-0637-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/26/2019] [Indexed: 02/07/2023] Open

Merrick BA. Next generation sequencing data for use in risk assessment. CURRENT OPINION IN TOXICOLOGY 2019;18:18-26. [PMID: 31058248 PMCID: PMC6497160 DOI: 10.1016/j.cotox.2019.02.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Li L, Feng J, Zhang D, Yong J, Wang Y, Yao J, Huang R. Differential expression of miR-4492 and IL-10 is involved in chronic rhinosinusitis with nasal polyps. Exp Ther Med 2019;18:3968-3976. [PMID: 31611936 PMCID: PMC6781800 DOI: 10.3892/etm.2019.8022] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 07/05/2019] [Indexed: 12/22/2022] Open

Kwon D, Kim S, Wei CH, Leaman R, Lu Z. ezTag: tagging biomedical concepts via interactive learning. Nucleic Acids Res 2019;46:W523-W529. [PMID: 29788413 PMCID: PMC6030907 DOI: 10.1093/nar/gky428] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 05/07/2018] [Indexed: 12/22/2022] Open

Allot A, Peng Y, Wei CH, Lee K, Phan L, Lu Z. LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic Acids Res 2019;46:W530-W536. [PMID: 29762787 PMCID: PMC6030971 DOI: 10.1093/nar/gky355] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/08/2018] [Indexed: 01/10/2023] Open

Association of imputed prostate cancer transcriptome with disease risk reveals novel mechanisms. Nat Commun 2019;10:3107. [PMID: 31308362 PMCID: PMC6629701 DOI: 10.1038/s41467-019-10808-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Accepted: 06/04/2019] [Indexed: 12/16/2022] Open

Islamaj Dogan R, Kim S, Chatr-Aryamontri A, Wei CH, Comeau DC, Antunes R, Matos S, Chen Q, Elangovan A, Panyam NC, Verspoor K, Liu H, Wang Y, Liu Z, Altinel B, Hüsünbeyi ZM, Özgür A, Fergadis A, Wang CK, Dai HJ, Tran T, Kavuluru R, Luo L, Steppi A, Zhang J, Qu J, Lu Z. Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine. Database (Oxford) 2019;2019:5303240. [PMID: 30689846 PMCID: PMC6348314 DOI: 10.1093/database/bay147] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 12/19/2018] [Indexed: 12/16/2022]

Abstract

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.

Collapse

Affiliation(s)

Rezarta Islamaj Dogan National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Sun Kim National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Andrew Chatr-Aryamontri Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
Chih-Hsuan Wei National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Donald C Comeau National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Rui Antunes Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
Sérgio Matos Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
Qingyu Chen School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Aparna Elangovan School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Nagesh C Panyam School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Karin Verspoor School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Hongfang Liu Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
Yanshan Wang Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
Zhuang Liu School of Computer Science and Technology, Dalian University of Technology, Dalian, China
Berna Altinel Department of Computer Engineering, Marmara University, Istanbul, Turkey
Zehra Melce Hüsünbeyi Department of Computer Engineering, Bogaziçi University, Istanbul, Turkey
Arzucan Özgür
Aris Fergadis School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Athens, Greece
Chen-Kai Wang Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
Hong-Jie Dai Department of Electrical Engineering, National Kaousiung University of Science and Technology, Kaohsiung, Taiwan
Tung Tran Department of Computer Science, University of Kentucky, Lexington, KY, USA
Ramakanth Kavuluru Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
Ling Luo College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Albert Steppi Department of Statistics, Florida State University, Florida, USA
Jinfeng Zhang Department of Statistics, Florida State University, Florida, USA
Jinchan Qu Department of Statistics, Florida State University, Florida, USA
Zhiyong Lu National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA

Collapse