Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wei CH, Harris BR, Li D, Berardini TZ, Huala E, Kao HY, Lu Z. Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts. Database (Oxford) 2012;2012:bas041. [PMID: 23160414 PMCID: PMC3500520 DOI: 10.1093/database/bas041] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

For:	Wei CH, Harris BR, Li D, Berardini TZ, Huala E, Kao HY, Lu Z. Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts. Database (Oxford) 2012;2012:bas041. [PMID: 23160414 PMCID: PMC3500520 DOI: 10.1093/database/bas041] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Number

Cited by Other Article(s)

Almeida T, Jonker RAA, Antunes R, Almeida JR, Matos S. Towards discovery: an end-to-end system for uncovering novel biomedical relations. Database (Oxford) 2024;2024:baae057. [PMID: 38994795 PMCID: PMC11240158 DOI: 10.1093/database/baae057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/20/2024] [Accepted: 06/19/2024] [Indexed: 07/13/2024]

Feng X, Ma Z, Yu C, Xin R. MRNDR: Multihead Attention-Based Recommendation Network for Drug Repurposing. J Chem Inf Model 2024;64:2654-2669. [PMID: 38373300 DOI: 10.1021/acs.jcim.3c01726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]

Abstract

As is well-known, the process of developing new drugs is extremely expensive, whereas drug repurposing represents a promising approach to augment the efficiency of new drug development. While this method can indeed spare us from expensive drug toxicity and safety experiments, it still demands a substantial amount of time to carry out precise efficacy experiments for specific diseases, thereby consuming a significant quantity of resources. Therefore, if we can prescreen potential other indications for selected drugs, it could result in substantial cost savings. In light of this, this paper introduces a drug repurposing recommendation model called MRNDR, which stands for Multi-head attention-based Recommendation Network for Drug Repurposing. This model serves as a prediction tool for drug-disease relationships, leveraging the multihead self-attention mechanism that demonstrates robust generalization capabilities. These capabilities stem not only from our extensive million-level training data set, BioRE (Biology Recommended Entity data), but also from the utilization of the WRDS (Weighted Representation Distance Score) algorithm proposed by us. The MRNDR model has achieved new state-of-the-art results on the GP-KG public data set, with an MRR (Mean Reciprocal Rank) score of 0.308 and a Hits@10 score of 0.628. This represents significant improvements of 4.7% (MRR) and 18.1% (Hits@10) over the current best-performing models. Additionally, to further validate the practical utility of the model, we examined results recommended by MRNDR that were not present in the training data set. Some of these recommendations have undergone clinical trials, as evidenced by their presence on ClinicalTrials.gov and the China Clinical Trials Center, indirectly confirming the applicability of MRNDR. The MRNDR model can predict the reusability of candidate drugs, reducing the need for manual expert assessments and enabling efficient drug repurposing.

Collapse

Yao X, He Z, Liu Y, Wang Y, Ouyang S, Xia J. Cancer-Alterome: a literature-mined resource for regulatory events caused by genetic alterations in cancer. Sci Data 2024;11:265. [PMID: 38431735 PMCID: PMC10908799 DOI: 10.1038/s41597-024-03083-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 02/20/2024] [Indexed: 03/05/2024] Open

Preston S, Wei M, Rao R, Tinn R, Usuyama N, Lucas M, Gu Y, Weerasinghe R, Lee S, Piening B, Tittel P, Valluri N, Naumann T, Bifulco C, Poon H. Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision. PATTERNS (NEW YORK, N.Y.) 2023;4:100726. [PMID: 37123439 PMCID: PMC10140604 DOI: 10.1016/j.patter.2023.100726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 11/11/2022] [Accepted: 03/14/2023] [Indexed: 05/02/2023]

Tinn R, Cheng H, Gu Y, Usuyama N, Liu X, Naumann T, Gao J, Poon H. Fine-tuning large neural language models for biomedical natural language processing. PATTERNS (NEW YORK, N.Y.) 2023;4:100729. [PMID: 37123444 PMCID: PMC10140607 DOI: 10.1016/j.patter.2023.100729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/12/2022] [Accepted: 03/17/2023] [Indexed: 05/02/2023]

Chen Q, Du J, Allot A, Lu Z. LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature Curation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2584-2595. [PMID: 35536809 PMCID: PMC9647722 DOI: 10.1109/tcbb.2022.3173562] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/19/2022] [Accepted: 04/22/2022] [Indexed: 05/20/2023]

Chen Q, Allot A, Leaman R, Islamaj R, Du J, Fang L, Wang K, Xu S, Zhang Y, Bagherzadeh P, Bergler S, Bhatnagar A, Bhavsar N, Chang YC, Lin SJ, Tang W, Zhang H, Tavchioski I, Pollak S, Tian S, Zhang J, Otmakhova Y, Yepes AJ, Dong H, Wu H, Dufour R, Labrak Y, Chatterjee N, Tandon K, Laleye FAA, Rakotoson L, Chersoni E, Gu J, Friedrich A, Pujari SC, Chizhikova M, Sivadasan N, VG S, Lu Z. Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. Database (Oxford) 2022;2022:baac069. [PMID: 36043400 PMCID: PMC9428574 DOI: 10.1093/database/baac069] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 08/02/2022] [Accepted: 08/13/2022] [Indexed: 05/03/2023]

Abstract

The coronavirus disease 2019 (COVID-19) pandemic has been severely impacting global society since December 2019. The related findings such as vaccine and drug development have been reported in biomedical literature-at a rate of about 10 000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200 000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g. Diagnosis and Treatment) to the articles in LitCovid. The annotated topics have been widely used for navigating the COVID literature, rapidly locating articles of interest and other downstream studies. However, annotating the topics has been the bottleneck of manual curation. Despite the continuing advances in biomedical text-mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset-consisting of over 30 000 articles with manually reviewed topics-was created for training and testing. It is one of the largest multi-label classification datasets in biomedical scientific literature. Nineteen teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181 and 0.9394 for macro-F1-score, micro-F1-score and instance-based F1-score, respectively. Notably, these scores are substantially higher (e.g. 12%, higher for macro F1-score) than the corresponding scores of the state-of-art multi-label classification method. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.

Collapse

Affiliation(s)

Qingyu Chen National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
Alexis Allot National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
Robert Leaman National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
Rezarta Islamaj National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
Jingcheng Du School of Biomedical Informatics, UT Health, TX, Houston 77030, USA
Li Fang Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
Kai Wang Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
Shuo Xu College of Economics and Management, Beijing University of Technology, Beijing, QC, China
Yuefu Zhang College of Economics and Management, Beijing University of Technology, Beijing, QC, China
Parsa Bagherzadeh CLaC Labs, Concordia University, Montreal, Canada
Sabine Bergler CLaC Labs, Concordia University, Montreal, Canada
Aakash Bhatnagar Navrachana University, Vadodara, India
Nidhir Bhavsar Navrachana University, Vadodara, India
Yung-Chun Chang Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
Sheng-Jie Lin Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
Wentai Tang College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Hongtong Zhang College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Ilija Tavchioski Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia Jožef Stefan Institute, Ljubljana, Slovenia
Senja Pollak Jožef Stefan Institute, Ljubljana, Slovenia
Shubo Tian Department of Statistics, Florida State University, Tallahassee, FL, USA
Jinfeng Zhang Department of Statistics, Florida State University, Tallahassee, FL, USA
Yulia Otmakhova School of Computing and Information Systems, University of Melbourne, Melbourne, AU-VIC, Australia
Antonio Jimeno Yepes School of Computing Technologies, RMIT University, Melbourne, AU-VIC, Australia
Hang Dong Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, UK
Honghan Wu Institute of Health Informatics, University College London, London, UK
Richard Dufour LS2N, Nantes University, Nantes, France
Yanis Labrak LIA, Avignon University, Avignon, France
Niladri Chatterjee Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India
Kushagri Tandon Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India
Fréjus A A Laleye Opscidia, Paris, France
Loïc Rakotoson Opscidia, Paris, France
Emmanuele Chersoni Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
Jinghang Gu Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
Annemarie Friedrich Bosch Center for Artificial Intelligence, Renningen, Germany
Subhash Chandra Pujari Institute of Computer Science, Heidelberg University, Heidelberg, Germany Bosch Center for Artificial Intelligence, Renningen, Germany
Mariia Chizhikova SINAI Group, Department of Computer Science, Advanced Studies Center in ICT (CEATIC), Universidad de Jaén, Jaén, Spain
Naveen Sivadasan TCS Research, Life Sciences, Hyderabad, India
Saipradeep VG TCS Research, Life Sciences, Hyderabad, India
Zhiyong Lu National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA

Collapse

Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: The case of gluten bibliome. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Church K, Liu B. Acronyms and Opportunities for Improving Deep Nets. Front Artif Intell 2022;4:732381. [PMID: 34988434 PMCID: PMC8721666 DOI: 10.3389/frai.2021.732381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 10/21/2021] [Indexed: 11/13/2022] Open

Yan N, Huang S, Kong C. Extracting Entity Synonymous Relations via Context-Aware Permutation Invariance. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING 2022. [DOI: 10.4018/ijitwe.288039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Zhu T, Qin Y, Xiang Y, Hu B, Chen Q, Peng W. Distantly supervised biomedical relation extraction using piecewise attentive convolutional neural network and reinforcement learning. J Am Med Inform Assoc 2021;28:2571-2581. [PMID: 34524450 DOI: 10.1093/jamia/ocab176] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 07/08/2021] [Accepted: 08/06/2021] [Indexed: 11/13/2022] Open

Grissette H, Nfaoui EH. Affective Concept-Based Encoding of Patient Narratives via Sentic Computing and Neural Networks. Cognit Comput 2021;14:274-299. [PMID: 34422122 PMCID: PMC8371039 DOI: 10.1007/s12559-021-09903-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 06/23/2021] [Indexed: 11/30/2022]

Abstract

The automatic generation of features without human intervention is the most critical task for biomedical sentiment analysis. Regarding the high dynamicity of shared patient narrative data, the lack of formal medical language sentiment dictionaries prevents retrieval of the appropriate sentiment, which is unapproachable and can be prone to annotator bias. We propose a novel affective biomedical concept-based encoding via sentic computing and neural networks. The main contributions include four aspects. First, a biomedical embedding, in which a medical entity is defined, normalized, and synthesized from a text, is built using online patient narratives after being combined with label propagation from a widely used comprehensive biomedical vocabulary. Second, considering the dependence on biomedical definitions, drug reaction sample selection based on general matching is suggested. These feature settings are then used to build and recognize affective semantics and sentics based on an extreme learning machine. Finally, a semisupervised LSTM-BiLSTM model for biomedical sentiment analysis is constructed. There was a massive influx of patient self-reports related to the COVID-19 pandemic. A study was conducted in this direction, and we tested the validity, medical language familiarity, and transferability of our approach by analyzing millions of COVID-19 tweets. Comparisons to affective lexicons also indicate that integrating extreme learning machine cognitive capabilities has advantages over biomedical sentiment analysis. By considering sentics vectors on top of the formed embeddings, our semisupervised LSTM-BiLSTM achieved an accuracy of 87.5%. The evaluations of unsupervised learning approximated the results of the previous model when dealing with a serious loss of biomedical data. In this paper, we demonstrate the effectiveness of integrating deep-learning-based cognitive capabilities for both enhancing distributed biomedical definitions and inferring sentiment compositions from many patient self-reports on social networks. The relevant encoding of affective information conveyed regarding medication subjects clearly reveals defined roles and expectations that can have a positive impact on public health.

Collapse

Chen Q, Leaman R, Allot A, Luo L, Wei CH, Yan S, Lu Z. Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing. Annu Rev Biomed Data Sci 2021;4:313-339. [PMID: 34465169 DOI: 10.1146/annurev-biodatasci-021821-061045] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Turina P, Fariselli P, Capriotti E. ThermoScan: Semi-automatic Identification of Protein Stability Data From PubMed. Front Mol Biosci 2021;8:620475. [PMID: 33842537 PMCID: PMC8027235 DOI: 10.3389/fmolb.2021.620475] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/18/2021] [Indexed: 11/13/2022] Open

Abstract

During the last years, the increasing number of DNA sequencing and protein mutagenesis studies has generated a large amount of variation data published in the biomedical literature. The collection of such data has been essential for the development and assessment of tools predicting the impact of protein variants at functional and structural levels. Nevertheless, the collection of manually curated data from literature is a highly time consuming and costly process that requires domain experts. In particular, the development of methods for predicting the effect of amino acid variants on protein stability relies on the thermodynamic data extracted from literature. In the past, such data were deposited in the ProTherm database, which however is no longer maintained since 2013. For facilitating the collection of protein thermodynamic data from literature, we developed the semi-automatic tool ThermoScan. ThermoScan is a text mining approach for the identification of relevant thermodynamic data on protein stability from full-text articles. The method relies on a regular expression searching for groups of words, including the most common conceptual words appearing in experimental studies on protein stability, several thermodynamic variables, and their units of measure. ThermoScan analyzes full-text articles from the PubMed Central Open Access subset and calculates an empiric score that allows the identification of manuscripts reporting thermodynamic data on protein stability. The method was optimized on a set of publications included in the ProTherm database, and tested on a new curated set of articles, manually selected for presence of thermodynamic data. The results show that ThermoScan returns accurate predictions and outperforms recently developed text-mining algorithms based on the analysis of publication abstracts. Availability: The ThermoScan server is freely accessible online at https://folding.biofold.org/thermoscan. The ThermoScan python code and the Google Chrome extension for submitting visualized PMC web pages to the ThermoScan server are available at https://github.com/biofold/ThermoScan.

Collapse

Guo S, Huang L, Yao G, Wang Y, Guan H, Bai T. Extracting Biomedical Entity Relations using Biological Interaction Knowledge. Interdiscip Sci 2021;13:312-320. [PMID: 33730356 DOI: 10.1007/s12539-021-00425-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 02/24/2021] [Accepted: 03/05/2021] [Indexed: 10/21/2022]

Kaushik V, Plazzer J, Macrae F. Evaluation of literature searching tools for curation of mismatch repair gene variants in hereditary colon cancer. ADVANCED GENETICS (HOBOKEN, N.J.) 2021;2:e10039. [PMID: 36618447 PMCID: PMC9744508 DOI: 10.1002/ggn2.10039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 01/12/2021] [Accepted: 01/14/2021] [Indexed: 01/11/2023]

Abstract

Pathogenic constitutional genomic variants in the mismatch repair (MMR) genes are the drivers of Lynch syndrome; optimal variant interpretation is required for the management of suspected and confirmed cases. The International Society for Hereditary Gastrointestinal Tumours (InSiGHT) provides expert classifications for MMR variants for the US National Human Genome Research Institute's (NHGRI) ClinGen initiative and interprets variants with discordant classifications and those of uncertain significance (VUSs). Given the onerous nature of extracting information related to variants, literature searching tools which harness artificial intelligence may aid in retrieving information to allow optimum variant classification. In this study, we described the nature of discordance in a sample of 80 variants from a list of variants requiring updating by InSiGHT for ClinGen by comparing their existing InSiGHT classifications with the various submissions for each variant on the US National Centre for Biotechnology Information's (NCBI) ClinVar database. To identify the potential value of a literature searching tool in extracting information related to classification, all variants were searched for using a traditional method (Google Scholar) and literature searching tool (Mastermind) independently. Descriptive statistics were used to compare: the number of articles before and after screening for relevance and the number of relevant articles unique to either method. Relevance was defined as containing the variant in question as well as data informing variant interpretation. A total of 916 articles were returned by both methods and Mastermind averaged four relevant articles per search compared to Google Scholar's three. Of relevant Mastermind articles, 193/308 (62.7%) were unique to it, compared to 87/202, (43.0%) for Google Scholar. For 24 variants, either or both methods found no information. All 6/80 (20%) variants with pathogenic or likely pathogenic InSiGHT classifications have newer VUS assertions on ClinVar. Our study demonstrated that for a sample of variants with varying discordant interpretations, Mastermind was able to return on average, a more relevant and unique literature search. Google Scholar was able to retrieve information that Mastermind did not, which supports a conclusion that Mastermind could play a complementary role in literature searching for classification. This work will aid InSiGHT in its role of classifying MMR variants.

Collapse

Tworowski D, Gorohovski A, Mukherjee S, Carmi G, Levy E, Detroja R, Mukherjee SB, Frenkel-Morgenstern M. COVID19 Drug Repository: text-mining the literature in search of putative COVID19 therapeutics. Nucleic Acids Res 2021;49:D1113-D1121. [PMID: 33166390 PMCID: PMC7778969 DOI: 10.1093/nar/gkaa969] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 10/07/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open

Arango-Argoty GA, Guron GKP, Garner E, Riquelme MV, Heath LS, Pruden A, Vikesland PJ, Zhang L. ARGminer: a web platform for the crowdsourcing-based curation of antibiotic resistance genes. Bioinformatics 2020;36:2966-2973. [PMID: 32058567 DOI: 10.1093/bioinformatics/btaa095] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 01/31/2020] [Accepted: 02/08/2020] [Indexed: 12/20/2022] Open

Méar L, Herr M, Fauconnier A, Pineau C, Vialard F. Polymorphisms and endometriosis: a systematic review and meta-analyses. Hum Reprod Update 2020;26:73-102. [PMID: 31821471 DOI: 10.1093/humupd/dmz034] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 08/20/2019] [Accepted: 08/28/2019] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

Endometriosis is an estrogen-dependent gynecological disorder that affects at least 10% of women of reproductive age. It may lead to infertility and non-specific symptoms such as chronic pelvic pain. Endometriosis screening and diagnosis are difficult and time-consuming. Late diagnosis (with a delay ranging from 3.3 to 10.7 years) is a major problem and may contribute to disease progression and a worse response to treatment once initiated. Efficient screening tests might reduce this diagnostic delay. As endometriosis is presumed to be a complex disease with several genetic and non-genetic pathogenic factors, many researchers have sought to identify polymorphisms that predispose to this condition.

OBJECTIVE AND RATIONALE

We performed a systematic review and meta-analysis of the most regularly reported polymorphisms in order to identify those that might predispose to endometriosis and might thus be of value in screening.

SEARCH METHODS

The MEDLINE database was searched for English-language publications on DNA polymorphisms in endometriosis, with no date restriction. The PubTator text mining tool was used to extract gene names from the selected publications' abstracts. We only selected polymorphisms reported by at least three studies, having applied strict inclusion and exclusion criteria to their control populations. No stratification based on ethnicity was performed. All steps were carried out according to PRISMA guidelines.

OUTCOMES

The initial selection of 395 publications cited 242 different genes. Sixty-two genes (corresponding to 265 different polymorphisms) were cited at least in three publications. After the application of our other selection criteria (an original case-control study of endometriosis, a reported association between endometriosis and at least one polymorphism, data on women of reproductive age and a diagnosis of endometriosis in the cases established by surgery and/or MRI and confirmed by histology), 28 polymorphisms were eligible for meta-analysis. Only five of the 28 polymorphisms were found to be significantly associated with endometriosis: interferon gamma (IFNG) (CA) repeat, glutathione S-transferase mu 1 (GSTM1) null genotype, glutathione S-transferase pi 1 (GSTP1) rs1695 and wingless-type MMTV integration site family member 4 (WNT4) rs16826658 and rs2235529. Six others showed a significant trend towards an association: progesterone receptor (PGR) PROGINS, interCellular adhesion molecule 1 (ICAM1) rs1799969, aryl-hydrocarbon receptor repressor (AHRR) rs2292596, cytochrome family 17 subfamily A polypeptide 1 (CYP17A1) rs743572, CYP2C19 rs4244285 and peroxisome proliferator-activated receptor gamma (PPARG) rs1801282), and 12 showed a significant trend towards the lack of an association: tumor necrosis factor (TNF) rs1799964, interleukin 6 (IL6) rs1800796, transforming growth factor beta 1 (TGFB1) rs1800469, estrogen receptor 1 (ESR1) rs2234693, PGR rs10895068, FSH receptor (FSHR) rs6166, ICAM1 rs5498, CYP1A1 rs4646903, CYP19A1 rs10046, tumor protein 53 (TP53) rs1042522, X-ray repair complementing defective repair in Chinese hamster cells 1 (XRCC1) rs25487 and serpin peptidase inhibitor clade E member 1 (SERPINE1) rs1799889; however, for the 18 polymorphisms identified in the latter two groups, further studies of the potential association with the endometriosis risk are needed. The remaining five of the 28 polymorphisms were not associated with endometriosis: glutathione S-transferase theta 1 (GSTT1) null genotype, vascular endothelial growth factor alpha (VEGFA) rs699947, rs833061, rs2010963 and rs3025039.

WIDER IMPLICATIONS

By carefully taking account of how the control populations were defined, we identified polymorphisms that might be candidates for use in endometriosis screening and polymorphisms not associated with endometriosis. This might constitute the first step towards identifying polymorphism combinations that predispose to endometriosis (IFNG (CA) repeat, GSTM1 null genotype, GSTP1 rs1695, WNT4 rs16826658 and WNT4 rs2235529) in a large cohort of patients with well-defined inclusion criteria. In turn, these results might improve the diagnosis of endometriosis in primary care. Lastly, our present findings may enable a better understanding of endometriosis and improve the management of patients with this disease.

Collapse

Hansson LK, Hansen RB, Pletscher-Frankild S, Berzins R, Hansen DH, Madsen D, Christensen SB, Christiansen MR, Boulund U, Wolf XA, Kjærulff SK, van de Bunt M, Tulin S, Jensen TS, Wernersson R, Jensen JN. Semantic text mining in early drug discovery for type 2 diabetes. PLoS One 2020;15:e0233956. [PMID: 32542027 PMCID: PMC7295186 DOI: 10.1371/journal.pone.0233956] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 05/15/2020] [Indexed: 11/18/2022] Open

Döring K, Qaseem A, Becer M, Li J, Mishra P, Gao M, Kirchner P, Sauter F, Telukunta KK, Moumbock AFA, Thomas P, Günther S. Automated recognition of functional compound-protein relationships in literature. PLoS One 2020;15:e0220925. [PMID: 32126064 PMCID: PMC7053725 DOI: 10.1371/journal.pone.0220925] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 01/29/2020] [Indexed: 11/18/2022] Open

Boland MR, Kashyap A, Xiong J, Holmes J, Lorch S. Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives. J Am Med Inform Assoc 2019;25:1432-1443. [PMID: 30371821 PMCID: PMC6213088 DOI: 10.1093/jamia/ocy119] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Accepted: 08/13/2018] [Indexed: 11/14/2022] Open

Sun Y, Hou L, Qin L, Liu Y, Li J, Qian Q. RCorp: a resource for chemical disease semantic extraction in Chinese. BMC Med Inform Decis Mak 2019;19:234. [PMID: 31801523 PMCID: PMC6894109 DOI: 10.1186/s12911-019-0936-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open

Couch D, Yu Z, Nam JH, Allen C, Ramos PS, da Silveira WA, Hunt KJ, Hazard ES, Hardiman G, Lawson A, Chung D. GAIL: An interactive webserver for inference and dynamic visualization of gene-gene associations based on gene ontology guided mining of biomedical literature. PLoS One 2019;14:e0219195. [PMID: 31260503 PMCID: PMC6602258 DOI: 10.1371/journal.pone.0219195] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 06/18/2019] [Indexed: 01/08/2023] Open

Jiang X, Ringwald M, Blake JA, Arighi C, Zhang G, Shatkay H. An effective biomedical document classification scheme in support of biocuration: addressing class imbalance. Database (Oxford) 2019;2019:baz045. [PMID: 31032839 PMCID: PMC6482935 DOI: 10.1093/database/baz045] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 02/26/2019] [Accepted: 03/18/2019] [Indexed: 01/01/2023]

Urda D, Aragón F, Bautista R, Franco L, Veredas FJ, Claros MG, Jerez JM. BLASSO: integration of biological knowledge into a regularized linear model. BMC SYSTEMS BIOLOGY 2018;12:94. [PMID: 30458775 PMCID: PMC6245593 DOI: 10.1186/s12918-018-0612-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Abstract

Background

In RNA-Seq gene expression analysis, a genetic signature or biomarker is defined as a subset of genes that is probably involved in a given complex human trait and usually provide predictive capabilities for that trait. The discovery of new genetic signatures is challenging, as it entails the analysis of complex-nature information encoded at gene level. Moreover, biomarkers selection becomes unstable, since high correlation among the thousands of genes included in each sample usually exists, thus obtaining very low overlapping rates between the genetic signatures proposed by different authors. In this sense, this paper proposes BLASSO, a simple and highly interpretable linear model with l₁-regularization that incorporates prior biological knowledge to the prediction of breast cancer outcomes. Two different approaches to integrate biological knowledge in BLASSO, Gene-specific and Gene-disease, are proposed to test their predictive performance and biomarker stability on a public RNA-Seq gene expression dataset for breast cancer. The relevance of the genetic signature for the model is inspected by a functional analysis.

Results

BLASSO has been compared with a baseline LASSO model. Using 10-fold cross-validation with 100 repetitions for models’ assessment, average AUC values of 0.7 and 0.69 were obtained for the Gene-specific and the Gene-disease approaches, respectively. These efficacy rates outperform the average AUC of 0.65 obtained with the LASSO. With respect to the stability of the genetic signatures found, BLASSO outperformed the baseline model in terms of the robustness index (RI). The Gene-specific approach gave RI of 0.15±0.03, compared to RI of 0.09±0.03 given by LASSO, thus being 66% times more robust. The functional analysis performed to the genetic signature obtained with the Gene-disease approach showed a significant presence of genes related with cancer, as well as one gene (IFNK) and one pseudogene (PCNAP1) which a priori had not been described to be related with cancer.

Conclusions

BLASSO has been shown as a good choice both in terms of predictive efficacy and biomarker stability, when compared to other similar approaches. Further functional analyses of the genetic signatures obtained with BLASSO has not only revealed genes with important roles in cancer, but also genes that should play an unknown or collateral role in the studied disease.

Collapse

Global trends in infectious diseases of swine. Proc Natl Acad Sci U S A 2018;115:11495-11500. [PMID: 30348781 DOI: 10.1073/pnas.1806068115] [Citation(s) in RCA: 155] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Wei CH, Phan L, Feltz J, Maiti R, Hefferon T, Lu Z. tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine. Bioinformatics 2018;34:80-87. [PMID: 28968638 DOI: 10.1093/bioinformatics/btx541] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/31/2017] [Indexed: 11/12/2022] Open

Abstract

Motivation

Despite significant efforts in expert curation, clinical relevance about most of the 154 million dbSNP reference variants (RS) remains unknown. However, a wealth of knowledge about the variant biological function/disease impact is buried in unstructured literature data. Previous studies have attempted to harvest and unlock such information with text-mining techniques but are of limited use because their mutation extraction results are not standardized or integrated with curated data.

Results

We propose an automatic method to extract and normalize variant mentions to unique identifiers (dbSNP RSIDs). Our method, in benchmarking results, demonstrates a high F-measure of ∼90% and compared favorably to the state of the art. Next, we applied our approach to the entire PubMed and validated the results by verifying that each extracted variant-gene pair matched the dbSNP annotation based on mapped genomic position, and by analyzing variants curated in ClinVar. We then determined which text-mined variants and genes constituted novel discoveries. Our analysis reveals 41 889 RS numbers (associated with 9151 genes) not found in ClinVar. Moreover, we obtained a rich set worth further review: 12 462 rare variants (MAF ≤ 0.01) in 3849 genes which are presumed to be deleterious and not frequently found in the general population. To our knowledge, this is the first large-scale study to analyze and integrate text-mined variant data with curated knowledge in existing databases. Our results suggest that databases can be significantly enriched by text mining and that the combined information can greatly assist human efforts in evaluating/prioritizing variants in genomic research.

Availability and implementation

The tmVar 2.0 source code and corpus are freely available at https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/.

Contact

zhiyong.lu@nih.gov.

Collapse

Fergadis A, Baziotis C, Pappas D, Papageorgiou H, Potamianos A. Hierarchical bi-directional attention-based RNNs for supporting document classification on protein-protein interactions affected by genetic mutations. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2018:5077305. [PMID: 30137284 PMCID: PMC6105093 DOI: 10.1093/database/bay076] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 06/22/2018] [Indexed: 02/03/2023]

Chen Q, Panyam NC, Elangovan A, Verspoor K. BioCreative VI Precision Medicine Track system performance is constrained by entity recognition and variations in corpus characteristics. Database (Oxford) 2018;2018:5255181. [PMID: 30576491 PMCID: PMC6301335 DOI: 10.1093/database/bay122] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Revised: 09/24/2018] [Accepted: 10/16/2018] [Indexed: 01/01/2023]

Abstract

Precision medicine aims to provide personalized treatments based on individual patient profiles. One critical step towards precision medicine is leveraging knowledge derived from biomedical publications-a tremendous literature resource presenting the latest scientific discoveries on genes, mutations and diseases. Biomedical natural language processing (BioNLP) plays a vital role in supporting automation of this process. BioCreative VI Track 4 brings community effort to the task of automatically identifying and extracting protein-protein interactions (PPi) affected by mutations (PPIm), important in the precision medicine context for capturing individual genotype variation related to disease.We present the READ-BioMed team's approach to identifying PPIm-related publications and to extracting specific PPIm information from those publications in the context of the BioCreative VI PPIm track. We observe that current BioNLP tools are insufficient to recognise entities for these two tasks; the best existing mutation recognition tool achieves only 55% recall in the document triage training set, while relation extraction performance is limited by the low recall performance of gene entity recognition. We develop the models accordingly: for document triage, we develop term lists capturing interactions and mutations to complement BioNLP tools, and select effective features via a feature contribution study, whereas an ensemble of BioNLP tools is employed for relation extraction.Our best document triage model achieves an F-score of 66.77% while our best model for relation extraction achieved an F-score of 35.09% over the final (updated post-task) test set. Impacting the document triage task, the characteristics of mutations are statistically different in the training and testing sets. While a vital new direction for biomedical text mining research, this early attempt to tackle the problem of identifying genetic variation of substantial biological significance highlights the importance of representative training data and the cascading impact of tool limitations in a modular system.

Collapse

Differential gene expression in disease: a comparison between high-throughput studies and the literature. BMC Med Genomics 2017;10:59. [PMID: 29020950 PMCID: PMC5637346 DOI: 10.1186/s12920-017-0293-y] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Accepted: 10/02/2017] [Indexed: 11/10/2022] Open

Camel V, Galeano E, Carrer H. RED DE COEXPRESIÓN DE 320 GENES DE Tectona grandis RELACIONADOS CON PROCESOS DE ESTRÉS ABIÓTICO Y XILOGÉNESIS. TIP REVISTA ESPECIALIZADA EN CIENCIAS QUÍMICO-BIOLÓGICAS 2017. [DOI: 10.1016/j.recqb.2017.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open

Liu X, Yang Z, Lin H, Simmons M, Lu Z. DIGNiFI: Discovering causative genes for orphan diseases using protein-protein interaction networks. BMC SYSTEMS BIOLOGY 2017;11:23. [PMID: 28361678 PMCID: PMC5374555 DOI: 10.1186/s12918-017-0402-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Abstract

BACKGROUND

An orphan disease is any disease that affects a small percentage of the population. Orphan diseases are a great burden to patients and society, and most of them are genetic in origin. Unfortunately, our current understanding of the genes responsible for inherited orphan diseases is still quite limited. Developing effective computational algorithms to discover disease-causing genes would help unveil disease mechanisms and may enable better diagnosis and treatment.

RESULTS

We have developed a novel method, named as DIGNiFI (Disease causIng GeNe FInder), which uses Protein-Protein Interaction (PPI) network-based features to discover and rank candidate disease-causing genes. Specifically, our approach computes topologically similar genes by taking into account both local and global connected paths in PPI networks via Direct Neighbors and Local Random Walks, respectively. Furthermore, since genes with similar phenotypes tend to be functionally related, we have integrated PPI data with gene ontology (GO) annotations and protein complex data to further improve the performance of this approach. Results of 128 orphan diseases with 1184 known disease genes collected from the Orphanet show that our proposed methods outperform existing state-of-the-art methods for discovering candidate disease-causing genes. We also show that further performance improvement can be achieved when enriching the human-curated PPI network data with text-mined interactions from the biomedical literature. Finally, we demonstrate the utility of our approach by applying our method to identifying novel candidate genes for a set of four inherited retinal dystrophies. In this study, we found the top predictions for these retinal dystrophies consistent with literature reports and online databases of other retinal dystrophies.

CONCLUSIONS

Our method successfully prioritizes orphan-disease-causative genes. This method has great potential to benefit the field of orphan disease research, where resources are scarce and greatly needed.

Collapse

Singhal A, Leaman R, Catlett N, Lemberger T, McEntyre J, Polson S, Xenarios I, Arighi C, Lu Z. Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges. Database (Oxford) 2016;2016:baw161. [PMID: 28025348 PMCID: PMC5199160 DOI: 10.1093/database/baw161] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Revised: 11/10/2016] [Accepted: 11/11/2016] [Indexed: 12/24/2022]

Gore R, Diallo S, Padilla J. Classifying modeling and simulation as a scientific discipline. Scientometrics 2016. [DOI: 10.1007/s11192-016-2050-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Liu JL, Zhao M. A PubMed-wide study of endometriosis. Genomics 2016;108:151-157. [DOI: 10.1016/j.ygeno.2016.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 09/30/2016] [Accepted: 10/12/2016] [Indexed: 12/18/2022]

Evaluation and Verification of the Global Rapid Identification of Threats System for Infectious Diseases in Textual Data Sources. Interdiscip Perspect Infect Dis 2016;2016:5080746. [PMID: 27698665 PMCID: PMC5028852 DOI: 10.1155/2016/5080746] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Revised: 08/06/2016] [Accepted: 08/15/2016] [Indexed: 11/17/2022] Open

Fluck J, Madan S, Ansari S, Kodamullil AT, Karki R, Rastegar-Mojarad M, Catlett NL, Hayes W, Szostak J, Hoeng J, Peitsch M. Training and evaluation corpora for the extraction of causal relationships encoded in biological expression language (BEL). DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016;2016:baw113. [PMID: 27554092 PMCID: PMC4995071 DOI: 10.1093/database/baw113] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 07/07/2016] [Indexed: 01/21/2023]

Abstract

Success in extracting biological relationships is mainly dependent on the complexity of the task as well as the availability of high-quality training data. Here, we describe the new corpora in the systems biology modeling language BEL for training and testing biological relationship extraction systems that we prepared for the BioCreative V BEL track. BEL was designed to capture relationships not only between proteins or chemicals, but also complex events such as biological processes or disease states. A BEL nanopub is the smallest unit of information and represents a biological relationship with its provenance. In BEL relationships (called BEL statements), the entities are normalized to defined namespaces mainly derived from public repositories, such as sequence databases, MeSH or publicly available ontologies. In the BEL nanopubs, the BEL statements are associated with citation information and supportive evidence such as a text excerpt. To enable the training of extraction tools, we prepared BEL resources and made them available to the community. We selected a subset of these resources focusing on a reduced set of namespaces, namely, human and mouse genes, ChEBI chemicals, MeSH diseases and GO biological processes, as well as relationship types ‘increases’ and ‘decreases’. The published training corpus contains 11 000 BEL statements from over 6000 supportive text excerpts. For method evaluation, we selected and re-annotated two smaller subcorpora containing 100 text excerpts. For this re-annotation, the inter-annotator agreement was measured by the BEL track evaluation environment and resulted in a maximal F-score of 91.18% for full statement agreement. In addition, for a set of 100 BEL statements, we do not only provide the gold standard expert annotations, but also text excerpts pre-selected by two automated systems. Those text excerpts were evaluated and manually annotated as true or false supportive in the course of the BioCreative V BEL track task.

Database URL:http://wiki.openbel.org/display/BIOC/Datasets

Collapse

Li J, Sun Y, Johnson RJ, Sciaky D, Wei CH, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016;2016:baw068. [PMID: 27161011 PMCID: PMC4860626 DOI: 10.1093/database/baw068] [Citation(s) in RCA: 138] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2015] [Accepted: 04/11/2016] [Indexed: 11/14/2022]

Gu J, Qian L, Zhou G. Chemical-induced disease relation extraction with various linguistic features. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016;2016:baw042. [PMID: 27052618 PMCID: PMC4822558 DOI: 10.1093/database/baw042] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 03/04/2016] [Indexed: 01/06/2023]

Pafilis E, Buttigieg PL, Ferrell B, Pereira E, Schnetzer J, Arvanitidis C, Jensen LJ. EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016;2016:baw005. [PMID: 26896844 PMCID: PMC4761108 DOI: 10.1093/database/baw005] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 01/11/2016] [Indexed: 12/11/2022]

Serin EAR, Nijveen H, Hilhorst HWM, Ligterink W. Learning from Co-expression Networks: Possibilities and Challenges. FRONTIERS IN PLANT SCIENCE 2016;7:444. [PMID: 27092161 PMCID: PMC4825623 DOI: 10.3389/fpls.2016.00444] [Citation(s) in RCA: 186] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 03/21/2016] [Indexed: 05/18/2023]

Abstract

Plants are fascinating and complex organisms. A comprehensive understanding of the organization, function and evolution of plant genes is essential to disentangle important biological processes and to advance crop engineering and breeding strategies. The ultimate aim in deciphering complex biological processes is the discovery of causal genes and regulatory mechanisms controlling these processes. The recent surge of omics data has opened the door to a system-wide understanding of the flow of biological information underlying complex traits. However, dealing with the corresponding large data sets represents a challenging endeavor that calls for the development of powerful bioinformatics methods. A popular approach is the construction and analysis of gene networks. Such networks are often used for genome-wide representation of the complex functional organization of biological systems. Network based on similarity in gene expression are called (gene) co-expression networks. One of the major application of gene co-expression networks is the functional annotation of unknown genes. Constructing co-expression networks is generally straightforward. In contrast, the resulting network of connected genes can become very complex, which limits its biological interpretation. Several strategies can be employed to enhance the interpretation of the networks. A strategy in coherence with the biological question addressed needs to be established to infer reliable networks. Additional benefits can be gained from network-based strategies using prior knowledge and data integration to further enhance the elucidation of gene regulatory relationships. As a result, biological networks provide many more applications beyond the simple visualization of co-expressed genes. In this study we review the different approaches for co-expression network inference in plants. We analyse integrative genomics strategies used in recent studies that successfully identified candidate genes taking advantage of gene co-expression networks. Additionally, we discuss promising bioinformatics approaches that predict networks for specific purposes.

Collapse

Rodriguez-Esteban R. Biocuration with insufficient resources and fixed timelines. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015;2015:bav116. [PMID: 26708987 PMCID: PMC4691339 DOI: 10.1093/database/bav116] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 11/17/2015] [Indexed: 11/14/2022]

GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains. BIOMED RESEARCH INTERNATIONAL 2015;2015:918710. [PMID: 26380306 PMCID: PMC4561873 DOI: 10.1155/2015/918710] [Citation(s) in RCA: 111] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 04/03/2015] [Accepted: 04/04/2015] [Indexed: 02/01/2023]

Wei CH, Leaman R, Lu Z. SimConcept: a hybrid approach for simplifying composite named entities in biomedical text. IEEE J Biomed Health Inform 2015;19:1385-91. [PMID: 25879978 PMCID: PMC4543296 DOI: 10.1109/jbhi.2015.2422651] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Hsu YY, Kao HY. Curatable Named-Entity Recognition Using Semantic Relations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:785-792. [PMID: 26357317 DOI: 10.1109/tcbb.2014.2366770] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Huang CC, Lu Z. Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 2015;17:132-44. [PMID: 25935162 DOI: 10.1093/bib/bbv024] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Indexed: 11/13/2022] Open

Khare R, Good BM, Leaman R, Su AI, Lu Z. Crowdsourcing in biomedicine: challenges and opportunities. Brief Bioinform 2015;17:23-32. [PMID: 25888696 DOI: 10.1093/bib/bbv021] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Khare R, Burger JD, Aberdeen JS, Tresner-Kirsch DW, Corrales TJ, Hirchman L, Lu Z. Scaling drug indication curation through crowdsourcing. Database (Oxford) 2015;2015:bav016. [PMID: 25797061 PMCID: PMC4369375 DOI: 10.1093/database/bav016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2014] [Revised: 02/04/2015] [Accepted: 02/09/2015] [Indexed: 01/24/2023]

Leaman R, Wei CH, Lu Z. tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminform 2015;7:S3. [PMID: 25810774 PMCID: PMC4331693 DOI: 10.1186/1758-2946-7-s1-s3] [Citation(s) in RCA: 126] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open