Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

118
(from Reference Citation Analysis)

Article PDFs (60)

Cited by > 0 (87)

Searched Name

Karin Verspoor

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Wassell M, Vitiello A, Butler-Henderson K, Verspoor K, Pollard H. Generalizability of a Musculoskeletal Therapist Electronic Health Record for Modelling Outcomes to Work-Related Musculoskeletal Disorders. J Occup Rehabil 2024:10.1007/s10926-024-10196-w. [PMID: 38739344 DOI: 10.1007/s10926-024-10196-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/07/2024] [Indexed: 05/14/2024]

Liu Y, Ritchie SC, Teo SM, Ruuskanen MO, Kambur O, Zhu Q, Sanders J, Vázquez-Baeza Y, Verspoor K, Jousilahti P, Lahti L, Niiranen T, Salomaa V, Havulinna AS, Knight R, Méric G, Inouye M. Integration of polygenic and gut metagenomic risk prediction for common diseases. Nat Aging 2024;4:584-594. [PMID: 38528230 PMCID: PMC11031402 DOI: 10.1038/s43587-024-00590-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 02/13/2024] [Indexed: 03/27/2024]

Affiliation(s)

Yang Liu Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK. Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia. Department of Clinical Pathology, Melbourne Medical School, University of Melbourne, Melbourne, Victoria, Australia. Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK. British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
Scott C Ritchie Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK British Heart Foundation Cambridge Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
Shu Mei Teo Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia Centre for Youth Mental Health, University of Melbourne, Melbourne, Victoria, Australia
Matti O Ruuskanen Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland Department of Computing, University of Turku, Turku, Finland
Oleg Kambur Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
Qiyun Zhu School of Life Sciences, Arizona State University, Tempe, AZ, USA Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
Jon Sanders Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA
Yoshiki Vázquez-Baeza Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
Karin Verspoor School of Computing Technologies, RMIT University, Melbourne, Victoria, Australia School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
Pekka Jousilahti Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
Leo Lahti Department of Computing, University of Turku, Turku, Finland
Teemu Niiranen Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland Division of Medicine, Turku University Hospital and University of Turku, Turku, Finland
Veikko Salomaa Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
Aki S Havulinna Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland Institute for Molecular Medicine Finland, FIMM-HiLIFE, University of Helsinki, Helsinki, Finland
Rob Knight Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
Guillaume Méric Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia Central Clinical School, Monash University, Melbourne, Victoria, Australia Department of Cardiometabolic Health, University of Melbourne, Melbourne, Victoria, Australia Department of Cardiovascular Research, Translation and Implementation, La Trobe University, Melbourne, Victoria, Australia Department of Medical Sciences, Molecular Epidemiology, Uppsala University, Uppsala, Sweden
Michael Inouye Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK. Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia. Department of Clinical Pathology, Melbourne Medical School, University of Melbourne, Melbourne, Victoria, Australia. Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK. British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK. British Heart Foundation Cambridge Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK. Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK. The Alan Turing Institute, London, UK.

Collapse

Liu J, Capurro D, Nguyen A, Verspoor K. Uncovering Variations in Clinical Notes for NLP Modeling. Stud Health Technol Inform 2024;310:1460-1461. [PMID: 38269696 DOI: 10.3233/shti231244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]

Khanina A, Rozova V, Elkins S, Verspoor K, Thursky K. Designing a Digital Health Solution: A Platform for Automated Surveillance of Fungal Infection. Stud Health Technol Inform 2024;310:1454-1455. [PMID: 38269693 DOI: 10.3233/shti231241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]

Wassell M, Murray JL, Kumar C, Verspoor K, Butler-Henderson K. Understanding Clinician EHR Data Quality for Reuse in Predictive Modelling. Stud Health Technol Inform 2024;310:169-173. [PMID: 38269787 DOI: 10.3233/shti230949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]

Liu J, Capurro D, Nguyen A, Verspoor K. Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities. J Biomed Inform 2023;145:104466. [PMID: 37549722 DOI: 10.1016/j.jbi.2023.104466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 06/09/2023] [Accepted: 08/01/2023] [Indexed: 08/09/2023]

Abstract

OBJECTIVE

With the increasing amount and growing variety of healthcare data, multimodal machine learning supporting integrated modeling of structured and unstructured data is an increasingly important tool for clinical machine learning tasks. However, it is non-trivial to manage the differences in dimensionality, volume, and temporal characteristics of data modalities in the context of a shared target task. Furthermore, patients can have substantial variations in the availability of data, while existing multimodal modeling methods typically assume data completeness and lack a mechanism to handle missing modalities.

METHODS

We propose a Transformer-based fusion model with modality-specific tokens that summarize the corresponding modalities to achieve effective cross-modal interaction accommodating missing modalities in the clinical context. The model is further refined by inter-modal, inter-sample contrastive learning to improve the representations for better predictive performance. We denote the model as Attention-based cRoss-MOdal fUsion with contRast (ARMOUR). We evaluate ARMOUR using two input modalities (structured measurements and unstructured text), six clinical prediction tasks, and two evaluation regimes, either including or excluding samples with missing modalities.

RESULTS

Our model shows improved performances over unimodal or multimodal baselines in both evaluation regimes, including or excluding patients with missing modalities in the input. The contrastive learning improves the representation power and is shown to be essential for better results. The simple setup of modality-specific tokens enables ARMOUR to handle patients with missing modalities and allows comparison with existing unimodal benchmark results.

CONCLUSION

We propose a multimodal model for robust clinical prediction to achieve improved performance while accommodating patients with missing modalities. This work could inspire future research to study the effective incorporation of multiple, more complex modalities of clinical data into a single model.

Collapse

Pu Y, Beck D, Verspoor K. Graph embedding-based link prediction for literature-based discovery in Alzheimer's Disease. J Biomed Inform 2023;145:104464. [PMID: 37541406 DOI: 10.1016/j.jbi.2023.104464] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 07/29/2023] [Accepted: 07/30/2023] [Indexed: 08/06/2023]

Abstract

OBJECTIVE

We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology.

METHODS

We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed.

RESULTS

We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation.

CONCLUSION

Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases.

AVAILABILITY

Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.

Collapse

Coiera EW, Verspoor K, Hansen DP. We need to chat about artificial intelligence. Med J Aust 2023;219:98-100. [PMID: 37302124 PMCID: PMC10952508 DOI: 10.5694/mja2.51992] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 05/15/2023] [Accepted: 05/23/2023] [Indexed: 06/13/2023]

Šuster S, Baldwin T, Verspoor K. Analysis of predictive performance and reliability of classifiers for quality assessment of medical evidence revealed important variation by medical area. J Clin Epidemiol 2023;159:58-69. [PMID: 37120028 DOI: 10.1016/j.jclinepi.2023.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 03/30/2023] [Accepted: 04/18/2023] [Indexed: 05/01/2023]

El-Hayek C, Barzegar S, Faux N, Doyle K, Pillai P, Mutch SJ, Vaisey A, Ward R, Sanci L, Dunn AG, Hellard ME, Hocking JS, Verspoor K, Boyle DI. An evaluation of existing text de-identification tools for use with patient progress notes from Australian general practice. Int J Med Inform 2023;173:105021. [PMID: 36870249 DOI: 10.1016/j.ijmedinf.2023.105021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 02/07/2023] [Accepted: 02/10/2023] [Indexed: 02/13/2023]

Abstract

INTRODUCTION

Digitized patient progress notes from general practice represent a significant resource for clinical and public health research but cannot feasibly and ethically be used for these purposes without automated de-identification. Internationally, several open-source natural language processing tools have been developed, however, given wide variations in clinical documentation practices, these cannot be utilized without appropriate review. We evaluated the performance of four de-identification tools and assessed their suitability for customization to Australian general practice progress notes.

METHODS

Four tools were selected: three rule-based (HMS Scrubber, MIT De-id, Philter) and one machine learning (MIST). 300 patient progress notes from three general practice clinics were manually annotated with personally identifying information. We conducted a pairwise comparison between the manual annotations and patient identifiers automatically detected by each tool, measuring recall (sensitivity), precision (positive predictive value), f1-score (harmonic mean of precision and recall), and f2-score (weighs recall 2x higher than precision). Error analysis was also conducted to better understand each tool's structure and performance.

RESULTS

Manual annotation detected 701 identifiers in seven categories. The rule-based tools detected identifiers in six categories and MIST in three. Philter achieved the highest aggregate recall (67%) and the highest recall for NAME (87%). HMS Scrubber achieved the highest recall for DATE (94%) and all tools performed poorly on LOCATION. MIST achieved the highest precision for NAME and DATE while also achieving similar recall to the rule-based tools for DATE and highest recall for LOCATION. Philter had the lowest aggregate precision (37%), however preliminary adjustments of its rules and dictionaries showed a substantial reduction in false positives.

CONCLUSION

Existing off-the-shelf solutions for automated de-identification of clinical text are not immediately suitable for our context without modification. Philter is the most promising candidate due to its high recall and flexibility however will require extensive revising of its pattern matching rules and dictionaries.

Collapse

Šuster S, Baldwin T, Lau JH, Jimeno Yepes A, Martinez Iraola D, Otmakhova Y, Verspoor K. Automating Quality Assessment of Medical Evidence in Systematic Reviews: Model Development and Validation Study. J Med Internet Res 2023;25:e35568. [PMID: 36722350 PMCID: PMC10131699 DOI: 10.2196/35568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 01/18/2023] [Accepted: 01/31/2023] [Indexed: 02/01/2023] Open

Abstract

BACKGROUND

Assessment of the quality of medical evidence available on the web is a critical step in the preparation of systematic reviews. Existing tools that automate parts of this task validate the quality of individual studies but not of entire bodies of evidence and focus on a restricted set of quality criteria.

OBJECTIVE

We proposed a quality assessment task that provides an overall quality rating for each body of evidence (BoE), as well as finer-grained justification for different quality criteria according to the Grading of Recommendation, Assessment, Development, and Evaluation formalization framework. For this purpose, we constructed a new data set and developed a machine learning baseline system (EvidenceGRADEr).

METHODS

We algorithmically extracted quality-related data from all summaries of findings found in the Cochrane Database of Systematic Reviews. Each BoE was defined by a set of population, intervention, comparison, and outcome criteria and assigned a quality grade (high, moderate, low, or very low) together with quality criteria (justification) that influenced that decision. Different statistical data, metadata about the review, and parts of the review text were extracted as support for grading each BoE. After pruning the resulting data set with various quality checks, we used it to train several neural-model variants. The predictions were compared against the labels originally assigned by the authors of the systematic reviews.

RESULTS

Our quality assessment data set, Cochrane Database of Systematic Reviews Quality of Evidence, contains 13,440 instances, or BoEs labeled for quality, originating from 2252 systematic reviews published on the internet from 2002 to 2020. On the basis of a 10-fold cross-validation, the best neural binary classifiers for quality criteria detected risk of bias at 0.78 F₁ (P=.68; R=0.92) and imprecision at 0.75 F₁ (P=.66; R=0.86), while the performance on inconsistency, indirectness, and publication bias criteria was lower (F₁ in the range of 0.3-0.4). The prediction of the overall quality grade into 1 of the 4 levels resulted in 0.5 F₁. When casting the task as a binary problem by merging the Grading of Recommendation, Assessment, Development, and Evaluation classes (high+moderate vs low+very low-quality evidence), we attained 0.74 F₁. We also found that the results varied depending on the supporting information that is provided as an input to the models.

CONCLUSIONS

Different factors affect the quality of evidence in the context of systematic reviews of medical evidence. Some of these (risk of bias and imprecision) can be automated with reasonable accuracy. Other quality dimensions such as indirectness, inconsistency, and publication bias prove more challenging for machine learning, largely because they are much rarer. This technology could substantially reduce reviewer workload in the future and expedite quality assessment as part of evidence synthesis.

Collapse

Rozova V, Khanina A, Teng JC, Teh JSK, Worth LJ, Slavin MA, Thursky KA, Verspoor K. Detecting evidence of invasive fungal infections in cytology and histopathology reports enriched with concept-level annotations. J Biomed Inform 2023;139:104293. [PMID: 36682389 DOI: 10.1016/j.jbi.2023.104293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 01/09/2023] [Accepted: 01/16/2023] [Indexed: 01/22/2023]

Affiliation(s)

Vlada Rozova School of Computing Technologies, RMIT University, Melbourne, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, Australia; National Centre for Infections in Cancer, Peter MacCallum, Cancer Centre, Melbourne, Australia.
Anna Khanina National Centre for Infections in Cancer, Peter MacCallum, Cancer Centre, Melbourne, Australia; Department of Infectious Diseases, Peter MacCallum Cancer Centre, Melbourne, Australia; Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Australia
Jasmine C Teng National Centre for Infections in Cancer, Peter MacCallum, Cancer Centre, Melbourne, Australia; Department of Infectious Diseases, Peter MacCallum Cancer Centre, Melbourne, Australia
Joanne S K Teh National Centre for Infections in Cancer, Peter MacCallum, Cancer Centre, Melbourne, Australia; Department of Infectious Diseases, Peter MacCallum Cancer Centre, Melbourne, Australia
Leon J Worth National Centre for Infections in Cancer, Peter MacCallum, Cancer Centre, Melbourne, Australia; Department of Infectious Diseases, Peter MacCallum Cancer Centre, Melbourne, Australia; Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Australia
Monica A Slavin National Centre for Infections in Cancer, Peter MacCallum, Cancer Centre, Melbourne, Australia; Department of Infectious Diseases, Peter MacCallum Cancer Centre, Melbourne, Australia; Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Australia
Karin A Thursky National Centre for Infections in Cancer, Peter MacCallum, Cancer Centre, Melbourne, Australia; Department of Infectious Diseases, Peter MacCallum Cancer Centre, Melbourne, Australia; Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Australia; National Centre for Antimicrobial Stewardship, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
Karin Verspoor School of Computing Technologies, RMIT University, Melbourne, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, Australia.

Collapse

Jimeno Yepes AJ, Verspoor K. Classifying literature mentions of biological pathogens as experimentally studied using natural language processing. J Biomed Semantics 2023;14:1. [PMID: 36721225 PMCID: PMC9889128 DOI: 10.1186/s13326-023-00282-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 01/17/2023] [Indexed: 02/02/2023] Open

Abstract

BACKGROUND

Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health.

OBJECTIVE

In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications.

METHODS

We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen.

RESULTS

We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents.

CONCLUSIONS

We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisation algorithms were additionally evaluated on a small manually annotated data set shows that the data set that we have generated allows characterising pathogens of interest.

TRIAL REGISTRATION

N/A.

Collapse

Liu Y, Teo SM, Méric G, Tang HHF, Zhu Q, Sanders JG, Vázquez-Baeza Y, Verspoor K, Vartiainen VA, Jousilahti P, Lahti L, Niiranen T, Havulinna AS, Knight R, Salomaa V, Inouye M. The gut microbiome is a significant risk factor for future chronic lung disease. J Allergy Clin Immunol 2022;151:943-952. [PMID: 36587850 PMCID: PMC10109092 DOI: 10.1016/j.jaci.2022.12.810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 11/21/2022] [Accepted: 12/05/2022] [Indexed: 12/30/2022]

Affiliation(s)

Yang Liu Department of Clinical Pathology, Melbourne Medical School, The University of Melbourne, Melbourne, Australia; Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia.
Shu Mei Teo Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia; Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom; Centre for Youth Mental Health, University of Melbourne, Melbourne, Australia
Guillaume Méric Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia
Howard H F Tang Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia; Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
Qiyun Zhu School of Life Sciences, Arizona State University, Tempe, Ariz; Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, Ariz
Jon G Sanders Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY
Yoshiki Vázquez-Baeza Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, Calif
Karin Verspoor School of Computing Technologies, RMIT University, Melbourne, Australia; School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
Ville A Vartiainen Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland; Individualized Drug Therapy Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland; Department of Pulmonary Medicine, Heart and Lung Center, Helsinki University Hospital, Helsinki, Finland
Pekka Jousilahti Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
Leo Lahti Department of Computing, University of Turku, Turku, Finland
Teemu Niiranen Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland; Division of Medicine, Turku University Hospital and University of Turku, Turku, Finland
Aki S Havulinna Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland; Institute for Molecular Medicine Finland, FIMM-HiLIFE, University of Helsinki, Helsinki, Finland
Rob Knight Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, Calif; Department of Computer Science and Engineering, University of California San Diego, La Jolla, Calif; Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, Calif
Veikko Salomaa Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
Michael Inouye Department of Clinical Pathology, Melbourne Medical School, The University of Melbourne, Melbourne, Australia; Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia; Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom; British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom; British Heart Foundation Cambridge Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom; Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom; The Alan Turing Institute, London, United Kingdom; Heart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom.

Collapse

Eysenbach G, Šuster S, Baldwin T, Verspoor K. Predicting Publication of Clinical Trials Using Structured and Unstructured Data: Model Development and Validation Study. J Med Internet Res 2022;24:e38859. [PMID: 36563029 PMCID: PMC9823568 DOI: 10.2196/38859] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 10/14/2022] [Accepted: 11/16/2022] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Publication of registered clinical trials is a critical step in the timely dissemination of trial findings. However, a significant proportion of completed clinical trials are never published, motivating the need to analyze the factors behind success or failure to publish. This could inform study design, help regulatory decision-making, and improve resource allocation. It could also enhance our understanding of bias in the publication of trials and publication trends based on the research direction or strength of the findings. Although the publication of clinical trials has been addressed in several descriptive studies at an aggregate level, there is a lack of research on the predictive analysis of a trial's publishability given an individual (planned) clinical trial description.

OBJECTIVE

We aimed to conduct a study that combined structured and unstructured features relevant to publication status in a single predictive approach. Established natural language processing techniques as well as recent pretrained language models enabled us to incorporate information from the textual descriptions of clinical trials into a machine learning approach. We were particularly interested in whether and which textual features could improve the classification accuracy for publication outcomes.

METHODS

In this study, we used metadata from ClinicalTrials.gov (a registry of clinical trials) and MEDLINE (a database of academic journal articles) to build a data set of clinical trials (N=76,950) that contained the description of a registered trial and its publication outcome (27,702/76,950, 36% published and 49,248/76,950, 64% unpublished). This is the largest data set of its kind, which we released as part of this work. The publication outcome in the data set was identified from MEDLINE based on clinical trial identifiers. We carried out a descriptive analysis and predicted the publication outcome using 2 approaches: a neural network with a large domain-specific language model and a random forest classifier using a weighted bag-of-words representation of text.

RESULTS

First, our analysis of the newly created data set corroborates several findings from the existing literature regarding attributes associated with a higher publication rate. Second, a crucial observation from our predictive modeling was that the addition of textual features (eg, eligibility criteria) offers consistent improvements over using only structured data (F₁-score=0.62-0.64 vs F₁-score=0.61 without textual features). Both pretrained language models and more basic word-based representations provide high-utility text representations, with no significant empirical difference between the two.

CONCLUSIONS

Different factors affect the publication of a registered clinical trial. Our approach to predictive modeling combines heterogeneous features, both structured and unstructured. We show that methods from natural language processing can provide effective textual features to enable more accurate prediction of publication success, which has not been explored for this task previously.

Collapse

Ghosh Roy G, Geard N, Verspoor K, He S. MPVNN: Mutated Pathway Visible Neural Network architecture for interpretable prediction of cancer-specific survival risk. Bioinformatics 2022;38:5026-5032. [PMID: 36124954 DOI: 10.1093/bioinformatics/btac636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 08/04/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open

Goudey B, Geard N, Verspoor K, Zobel J. Propagation, detection and correction of errors using the sequence database network. Brief Bioinform 2022;23:6764545. [PMID: 36266246 PMCID: PMC9677457 DOI: 10.1093/bib/bbac416] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 07/31/2022] [Accepted: 08/28/2022] [Indexed: 12/14/2022] Open

Abstract

Nucleotide and protein sequences stored in public databases are the cornerstone of many bioinformatics analyses. The records containing these sequences are prone to a wide range of errors, including incorrect functional annotation, sequence contamination and taxonomic misclassification. One source of information that can help to detect errors are the strong interdependency between records. Novel sequences in one database draw their annotations from existing records, may generate new records in multiple other locations and will have varying degrees of similarity with existing records across a range of attributes. A network perspective of these relationships between sequence records, within and across databases, offers new opportunities to detect-or even correct-erroneous entries and more broadly to make inferences about record quality. Here, we describe this novel perspective of sequence database records as a rich network, which we call the sequence database network, and illustrate the opportunities this perspective offers for quantification of database quality and detection of spurious entries. We provide an overview of the relevant databases and describe how the interdependencies between sequence records across these databases can be exploited by network analyses. We review the process of sequence annotation and provide a classification of sources of error, highlighting propagation as a major source. We illustrate the value of a network perspective through three case studies that use network analysis to detect errors, and explore the quality and quantity of critical relationships that would inform such network analyses. This systematic description of a network perspective of sequence database records provides a novel direction to combat the proliferation of errors within these critical bioinformatics resources.

Collapse

Liu J, Capurro D, Nguyen A, Verspoor K. "Note Bloat" impacts deep learning-based NLP models for clinical prediction tasks. J Biomed Inform 2022;133:104149. [PMID: 35878821 DOI: 10.1016/j.jbi.2022.104149] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 05/28/2022] [Accepted: 07/19/2022] [Indexed: 10/17/2022]

Lederman A, Lederman R, Verspoor K. Tasks as needs: reframing the paradigm of clinical natural language processing research for real-world decision support. J Am Med Inform Assoc 2022;29:1810-1817. [PMID: 35848784 PMCID: PMC9471702 DOI: 10.1093/jamia/ocac121] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 06/06/2022] [Accepted: 07/04/2022] [Indexed: 12/13/2022] Open

Chen J, Goudey B, Zobel J, Geard N, Verspoor K. Exploring automatic inconsistency detection for literature-based gene ontology annotation. Bioinformatics 2022;38:i273-i281. [PMID: 35758780 PMCID: PMC9235499 DOI: 10.1093/bioinformatics/btac230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/08/2022] [Indexed: 11/12/2022] Open

Liu Y, Méric G, Havulinna AS, Teo SM, Åberg F, Ruuskanen M, Sanders J, Zhu Q, Tripathi A, Verspoor K, Cheng S, Jain M, Jousilahti P, Vázquez-Baeza Y, Loomba R, Lahti L, Niiranen T, Salomaa V, Knight R, Inouye M. Early prediction of incident liver disease using conventional risk factors and gut-microbiome-augmented gradient boosting. Cell Metab 2022;34:719-730.e4. [PMID: 35354069 PMCID: PMC9097589 DOI: 10.1016/j.cmet.2022.03.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 01/06/2022] [Accepted: 03/08/2022] [Indexed: 02/08/2023]

Affiliation(s)

Yang Liu Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia; Department of Clinical Pathology, Melbourne Medical School, The University of Melbourne, Melbourne, VIC, Australia.
Guillaume Méric Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia; Department of Clinical Pathology, Melbourne Medical School, The University of Melbourne, Melbourne, VIC, Australia; Baker Department of Cardiometabolic Health, The University of Melbourne, Melbourne, VIC, Australia; Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia
Aki S Havulinna Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland; Institute of Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
Shu Mei Teo Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia; Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Fredrik Åberg Transplantation and Liver Surgery Clinic, Helsinki University Hospital, University of Helsinki, Helsinki, Finland
Matti Ruuskanen Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland; Department of Internal Medicine, University of Turku, Turku, Finland
Jon Sanders Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA
Qiyun Zhu Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA
Anupriya Tripathi Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA; Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
Karin Verspoor School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, Australia; School of Computing Technologies, RMIT University, Melbourne, VIC, Australia
Susan Cheng Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
Mohit Jain Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA; Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
Pekka Jousilahti Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
Yoshiki Vázquez-Baeza Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA; Department of Computer Science & Engineering, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA, USA
Rohit Loomba NAFLD Research Center, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
Leo Lahti Department of Computing, University of Turku, Turku, Finland
Teemu Niiranen Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland; Department of Internal Medicine, University of Turku, Turku, Finland; Division of Medicine, Turku University Hospital, Turku, Finland
Veikko Salomaa Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
Rob Knight Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA, USA; Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA; Department of Computer Science & Engineering, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA, USA
Michael Inouye Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia; Department of Clinical Pathology, Melbourne Medical School, The University of Melbourne, Melbourne, VIC, Australia; Baker Department of Cardiometabolic Health, The University of Melbourne, Melbourne, VIC, Australia; Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK; Health Data Research UK Cambridge, Wellcome Genome Campus, University of Cambridge, Cambridge, UK; British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK; British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK; The Alan Turing Institute, London, UK.

Collapse

Hur B, Hardefeldt LY, Verspoor K, Baldwin T, Gilkerson JR. Overcoming challenges in extracting prescribing habits from veterinary clinics using big data and deep learning. Aust Vet J 2022;100:220-222. [DOI: 10.1111/avj.13145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 01/02/2022] [Indexed: 11/27/2022]

Cao K, Verspoor K, Sahebjada S, Baird PN. Accuracy of Machine Learning Assisted Detection of Keratoconus: A Systematic Review and Meta-Analysis. J Clin Med 2022;11:jcm11030478. [PMID: 35159930 PMCID: PMC8836961 DOI: 10.3390/jcm11030478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 01/10/2022] [Accepted: 01/13/2022] [Indexed: 12/26/2022] Open

Abstract (1) Background: The objective of this review was to synthesize available data on the use of machine learning to evaluate its accuracy (as determined by pooled sensitivity and specificity) in detecting keratoconus (KC), and measure reporting completeness of machine learning models in KC based on TRIPOD (the transparent reporting of multivariable prediction models for individual prognosis or diagnosis) statement. (2) Methods: Two independent reviewers searched the electronic databases for all potential articles on machine learning and KC published prior to 2021. The TRIPOD 29-item checklist was used to evaluate the adherence to reporting guidelines of the studies, and the adherence rate to each item was computed. We conducted a meta-analysis to determine the pooled sensitivity and specificity of machine learning models for detecting KC. (3) Results: Thirty-five studies were included in this review. Thirty studies evaluated machine learning models for detecting KC eyes from controls and 14 studies evaluated machine learning models for detecting early KC eyes from controls. The pooled sensitivity for detecting KC was 0.970 (95% CI 0.949–0.982), with a pooled specificity of 0.985 (95% CI 0.971–0.993), whereas the pooled sensitivity of detecting early KC was 0.882 (95% CI 0.822–0.923), with a pooled specificity of 0.947 (95% CI 0.914–0.967). Between 3% and 48% of TRIPOD items were adhered to in studies, and the average (median) adherence rate for a single TRIPOD item was 23% across all studies. (4) Conclusions: Application of machine learning model has the potential to make the diagnosis and monitoring of KC more efficient, resulting in reduced vision loss to the patients. This review provides current information on the machine learning models that have been developed for detecting KC and early KC. Presently, the machine learning models performed poorly in identifying early KC from control eyes and many of these research studies did not follow established reporting standards, thus resulting in the failure of these clinical translation of these machine learning models. We present possible approaches for future studies for improvement in studies related to both KC and early KC models to more efficiently and widely utilize machine learning models for diagnostic process. Collapse

Elangovan A, Li Y, Pires DEV, Davis MJ, Verspoor K. Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT. BMC Bioinformatics 2022;23:4. [PMID: 34983371 PMCID: PMC8729035 DOI: 10.1186/s12859-021-04504-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open

Abstract

MOTIVATION

Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.

METHOD

We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.

RESULTS AND CONCLUSION

The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.

Collapse

Rozova V, Witt K, Robinson J, Li Y, Verspoor K. Detection of self-harm and suicidal ideation in emergency department triage notes. J Am Med Inform Assoc 2021;29:472-480. [PMID: 34897466 PMCID: PMC8800520 DOI: 10.1093/jamia/ocab261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 09/30/2021] [Accepted: 11/11/2021] [Indexed: 12/15/2022] Open

Zhai Z, Druckenbrodt C, Thorne C, Akhondi SA, Nguyen DQ, Cohn T, Verspoor K. ChemTables: a dataset for semantic classification on tables in chemical patents. J Cheminform 2021;13:97. [PMID: 34895295 PMCID: PMC8665561 DOI: 10.1186/s13321-021-00568-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 11/06/2021] [Indexed: 11/10/2022] Open

Abstract

Chemical patents are a commonly used channel for disclosing novel compounds and reactions, and hence represent important resources for chemical and pharmaceutical research. Key chemical data in patents is often presented in tables. Both the number and the size of tables can be very large in patent documents. In addition, various types of information can be presented in tables in patents, including spectroscopic and physical data, or pharmacological use and effects of chemicals. Since images of Markush structures and merged cells are commonly used in these tables, their structure also shows substantial variation. This heterogeneity in content and structure of tables in chemical patents makes relevant information difficult to find. We therefore propose a new text mining task of automatically categorising tables in chemical patents based on their contents. Categorisation of tables based on the nature of their content can help to identify tables containing key information, improving the accessibility of information in patents that is highly relevant for new inventions. For developing and evaluating methods for the table classification task, we developed a new dataset, called CHEMTABLES, which consists of 788 chemical patent tables with labels of their content type. We introduce this data set in detail. We further establish strong baselines for the table classification task in chemical patents by applying state-of-the-art neural network models developed for natural language processing, including TabNet, ResNet and Table-BERT on CHEMTABLES. The best performing model, Table-BERT, achieves a performance of 88.66 micro-averaged [Formula: see text] score on the table classification task. The CHEMTABLES dataset is publicly available at https://doi.org/10.17632/g7tjh7tbrj.3 , subject to the CC BY NC 3.0 license. Code/models evaluated in this work are in a Github repository https://github.com/zenanz/ChemTables .

Collapse

Chen J, Geard N, Zobel J, Verspoor K. Automatic consistency assurance for literature-based gene ontology annotation. BMC Bioinformatics 2021;22:565. [PMID: 34823464 PMCID: PMC8620237 DOI: 10.1186/s12859-021-04479-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 11/15/2021] [Indexed: 12/21/2022] Open

Cao K, Verspoor K, Chan E, Daniell M, Sahebjada S, Baird PN. Machine learning with a reduced dimensionality representation of comprehensive Pentacam tomography parameters to identify subclinical keratoconus. Comput Biol Med 2021;138:104884. [PMID: 34607273 DOI: 10.1016/j.compbiomed.2021.104884] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/15/2021] [Accepted: 09/19/2021] [Indexed: 12/26/2022]

Ghosh Roy G, Geard N, Verspoor K, He S. PoLoBag: Polynomial Lasso Bagging for signed gene regulatory network inference from expression data. Bioinformatics 2021;36:5187-5193. [PMID: 32697830 DOI: 10.1093/bioinformatics/btaa651] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 06/06/2020] [Accepted: 07/16/2020] [Indexed: 02/01/2023] Open

Verspoor K. The Evolution of Clinical Knowledge During COVID-19: Towards a Global Learning Health System. Yearb Med Inform 2021;30:176-184. [PMID: 34479389 PMCID: PMC8416229 DOI: 10.1055/s-0041-1726503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Liu J, Capurro D, Nguyen A, Verspoor K. Early prediction of diagnostic-related groups and estimation of hospital cost by processing clinical notes. NPJ Digit Med 2021;4:103. [PMID: 34211109 PMCID: PMC8249417 DOI: 10.1038/s41746-021-00474-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 06/08/2021] [Indexed: 11/09/2022] Open

Ghosh Roy G, He S, Geard N, Verspoor K. Bow-tie architecture of gene regulatory networks in species of varying complexity. J R Soc Interface 2021;18:20210069. [PMID: 34102083 PMCID: PMC8187011 DOI: 10.1098/rsif.2021.0069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Reddy S, Bhaskar R, Padmanabhan S, Verspoor K, Mamillapalli C, Lahoti R, Makinen VP, Pradhan S, Kushwah P, Sinha S. Use and validation of text mining and cluster algorithms to derive insights from Corona Virus Disease-2019 (COVID-19) medical literature. Comput Methods Programs Biomed Update 2021;1:100010. [PMID: 34337589 DOI: 10.1016/j.cmpbup.2021.100014] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/01/2021] [Accepted: 04/02/2021] [Indexed: 05/26/2023]

He J, Nguyen DQ, Akhondi SA, Druckenbrodt C, Thorne C, Hoessel R, Afzal Z, Zhai Z, Fang B, Yoshikawa H, Albahem A, Cavedon L, Cohn T, Baldwin T, Verspoor K. ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents. Front Res Metr Anal 2021;6:654438. [PMID: 33870071 PMCID: PMC8028406 DOI: 10.3389/frma.2021.654438] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Accepted: 02/24/2021] [Indexed: 11/21/2022] Open

Reddy S, Bhaskar R, Padmanabhan S, Verspoor K, Mamillapalli C, Lahoti R, Makinen VP, Pradhan S, Kushwah P, Sinha S. Use and validation of text mining and cluster algorithms to derive insights from Corona Virus Disease-2019 (COVID-19) medical literature. Comput Methods Programs Biomed Update 2021;1:100010. [PMID: 34337589 PMCID: PMC8050406 DOI: 10.1016/j.cmpbup.2021.100010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/01/2021] [Accepted: 04/02/2021] [Indexed: 05/04/2023]

Robinson J, Witt K, Lamblin M, Spittal MJ, Carter G, Verspoor K, Page A, Rajaram G, Rozova V, Hill NTM, Pirkis J, Bleeker C, Pleban A, Knott JC. Development of a Self-Harm Monitoring System for Victoria. Int J Environ Res Public Health 2020;17:ijerph17249385. [PMID: 33333970 PMCID: PMC7765445 DOI: 10.3390/ijerph17249385] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 11/28/2020] [Accepted: 12/10/2020] [Indexed: 12/18/2022]

Affiliation(s)

Jo Robinson Orygen, Parkville, VIC 3052, Australia; (K.W.); (M.L.); (G.R.); (N.T.M.H.); (C.B.) Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC 3052, Australia Correspondence: ; Tel.: +61-393-420-2866
Katrina Witt Orygen, Parkville, VIC 3052, Australia; (K.W.); (M.L.); (G.R.); (N.T.M.H.); (C.B.) Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC 3052, Australia
Michelle Lamblin Orygen, Parkville, VIC 3052, Australia; (K.W.); (M.L.); (G.R.); (N.T.M.H.); (C.B.) Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC 3052, Australia
Matthew J. Spittal Centre for Mental Health, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, VIC 3010 Australia; (M.J.S.); (J.P.)
Greg Carter Centre for Brain and Mental Health Research, Faculty of Health and Medicine, University of Newcastle, Callaghan, NSW 2308, Australia; Calvary Mater Newcastle, Callaghan, NSW 2308, Australia
Karin Verspoor School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3052, Australia; (K.V.); (V.R.) Centre for Digital Transformation of Health, The University of Melbourne, Melbourne, VIC 3000, Australia
Andrew Page Translational Health Research Institute, Western Sydney University, Campbelltown, NSW 2560, Australia;
Gowri Rajaram Orygen, Parkville, VIC 3052, Australia; (K.W.); (M.L.); (G.R.); (N.T.M.H.); (C.B.) Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC 3052, Australia
Vlada Rozova School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3052, Australia; (K.V.); (V.R.)
Nicole T. M. Hill Orygen, Parkville, VIC 3052, Australia; (K.W.); (M.L.); (G.R.); (N.T.M.H.); (C.B.) Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC 3052, Australia Telethon Kids Institute, Nedlands, WA 6009, Australia
Jane Pirkis Centre for Mental Health, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, VIC 3010 Australia; (M.J.S.); (J.P.)
Caitlin Bleeker Orygen, Parkville, VIC 3052, Australia; (K.W.); (M.L.); (G.R.); (N.T.M.H.); (C.B.) Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC 3052, Australia
Alex Pleban Mid-West Area Mental Health Service, Emergency Department, Sunshine Hospital, Sunshine, VIC 3021, Australia;
Jonathan C. Knott Centre for Integrated Critical Care, Melbourne Medical School, The University of Melbourne, Parkville, VIC 3010, Australia;

Collapse

Al Bkhetan Z, Chana G, Ramamohanarao K, Verspoor K, Goudey B. Evaluation of consensus strategies for haplotype phasing. Brief Bioinform 2020;22:5998997. [PMID: 33236761 DOI: 10.1093/bib/bbaa280] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 09/22/2020] [Accepted: 09/22/2020] [Indexed: 01/05/2023] Open

Hardefeldt L, Hur B, Verspoor K, Baldwin T, Bailey KE, Scarborough R, Richards S, Billman-Jacobe H, Browning GF, Gilkerson J. Use of cefovecin in dogs and cats attending first-opinion veterinary practices in Australia. Vet Rec 2020;187:e95. [PMID: 32826347 DOI: 10.1136/vr.105997] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 06/13/2020] [Accepted: 07/13/2020] [Indexed: 12/12/2022]

Affiliation(s)

Laura Hardefeldt National Centre for Antimicrobial Stewardship, Carlton, Victoria, Australia .,Asia Pacific Centre for Animal Health, University of Melbourne, Parkville, Victoria, Australia
Brian Hur National Centre for Antimicrobial Stewardship, Carlton, Victoria, Australia.,Asia Pacific Centre for Animal Health, University of Melbourne, Parkville, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia
Karin Verspoor School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia.,Health and Biomedical Informatics Centre, University of Melbourne, Parkville, Victoria, Australia
Timothy Baldwin School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia
Kirsten E Bailey National Centre for Antimicrobial Stewardship, Carlton, Victoria, Australia.,Asia Pacific Centre for Animal Health, University of Melbourne, Parkville, Victoria, Australia
Ri Scarborough National Centre for Antimicrobial Stewardship, Carlton, Victoria, Australia.,Asia Pacific Centre for Animal Health, University of Melbourne, Parkville, Victoria, Australia
Suzanna Richards National Centre for Antimicrobial Stewardship, Carlton, Victoria, Australia.,Veterinary Biosciences, University of Melbourne, Parkville, Victoria, Australia
Helen Billman-Jacobe National Centre for Antimicrobial Stewardship, Carlton, Victoria, Australia.,Asia Pacific Centre for Animal Health, University of Melbourne, Parkville, Victoria, Australia
Glenn Francis Browning National Centre for Antimicrobial Stewardship, Carlton, Victoria, Australia.,Asia Pacific Centre for Animal Health, University of Melbourne, Parkville, Victoria, Australia
James Gilkerson Asia Pacific Centre for Animal Health, University of Melbourne, Parkville, Victoria, Australia

Collapse

Pedersen M, Verspoor K, Jenkinson M, Law M, Abbott DF, Jackson GD. Artificial intelligence for clinical decision support in neurology. Brain Commun 2020;2:fcaa096. [PMID: 33134913 PMCID: PMC7585692 DOI: 10.1093/braincomms/fcaa096] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 05/19/2020] [Accepted: 06/12/2020] [Indexed: 01/13/2023] Open

Cao K, Verspoor K, Sahebjada S, Baird PN. Evaluating the Performance of Various Machine Learning Algorithms to Detect Subclinical Keratoconus. Transl Vis Sci Technol 2020;9:24. [PMID: 32818085 PMCID: PMC7396174 DOI: 10.1167/tvst.9.2.24] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 02/05/2020] [Indexed: 12/26/2022] Open

Jose JM, Yilmaz E, Magalhães J, Castells P, Ferro N, Silva MJ, Martins F, Akhondi SA, Cohn T, Baldwin T, Verspoor K. ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents. Lecture Notes in Computer Science 2020;12036. [PMCID: PMC7148043 DOI: 10.1007/978-3-030-45442-5_74] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Al Bkhetan Z, Zobel J, Kowalczyk A, Verspoor K, Goudey B. Exploring effective approaches for haplotype block phasing. BMC Bioinformatics 2019;20:540. [PMID: 31666002 PMCID: PMC6822470 DOI: 10.1186/s12859-019-3095-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Accepted: 09/10/2019] [Indexed: 01/19/2023] Open

Abstract

BACKGROUND

Knowledge of phase, the specific allele sequence on each copy of homologous chromosomes, is increasingly recognized as critical for detecting certain classes of disease-associated mutations. One approach for detecting such mutations is through phased haplotype association analysis. While the accuracy of methods for phasing genotype data has been widely explored, there has been little attention given to phasing accuracy at haplotype block scale. Understanding the combined impact of the accuracy of phasing tool and the method used to determine haplotype blocks on the error rate within the determined blocks is essential to conduct accurate haplotype analyses.

RESULTS

We present a systematic study exploring the relationship between seven widely used phasing methods and two common methods for determining haplotype blocks. The evaluation focuses on the number of haplotype blocks that are incorrectly phased. Insights from these results are used to develop a haplotype estimator based on a consensus of three tools. The consensus estimator achieved the most accurate phasing in all applied tests. Individually, EAGLE2, BEAGLE and SHAPEIT2 alternate in being the best performing tool in different scenarios. Determining haplotype blocks based on linkage disequilibrium leads to more correctly phased blocks compared to a sliding window approach. We find that there is little difference between phasing sections of a genome (e.g. a gene) compared to phasing entire chromosomes. Finally, we show that the location of phasing error varies when the tools are applied to the same data several times, a finding that could be important for downstream analyses.

CONCLUSIONS

The choice of phasing and block determination algorithms and their interaction impacts the accuracy of phased haplotype blocks. This work provides guidance and evidence for the different design choices needed for analyses using haplotype blocks. The study highlights a number of issues that may have limited the replicability of previous haplotype analysis.

Collapse

Hassanzadeh H, Nguyen A, Verspoor K. Quantifying semantic similarity of clinical evidence in the biomedical literature to facilitate related evidence synthesis. J Biomed Inform 2019;100:103321. [PMID: 31676460 DOI: 10.1016/j.jbi.2019.103321] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Revised: 09/28/2019] [Accepted: 10/25/2019] [Indexed: 10/25/2022]

Abstract

OBJECTIVE

Published clinical trials and high quality peer reviewed medical publications are considered as the main sources of evidence used for synthesizing systematic reviews or practicing Evidence Based Medicine (EBM). Finding all relevant published evidence for a particular medical case is a time and labour intensive task, given the breadth of the biomedical literature. Automatic quantification of conceptual relationships between key clinical evidence within and across publications, despite variations in the expression of clinically-relevant concepts, can help to facilitate synthesis of evidence. In this study, we aim to provide an approach towards expediting evidence synthesis by quantifying semantic similarity of key evidence as expressed in the form of individual sentences. Such semantic textual similarity can be applied as a key approach for supporting selection of related studies.

MATERIAL AND METHODS

We propose a generalisable approach for quantifying semantic similarity of clinical evidence in the biomedical literature, specifically considering the similarity of sentences corresponding to a given type of evidence, such as clinical interventions, population information, clinical findings, etc. We develop three sets of generic, ontology-based, and vector-space models of similarity measures that make use of a variety of lexical, conceptual, and contextual information to quantify the similarity of full sentences containing clinical evidence. To understand the impact of different similarity measures on the overall evidence semantic similarity quantification, we provide a comparative analysis of these measures when used as input to an unsupervised linear interpolation and a supervised regression ensemble. In order to provide a reliable test-bed for this experiment, we generate a dataset of 1000 pairs of sentences from biomedical publications that are annotated by ten human experts. We also extend the experiments on an external dataset for further generalisability testing.

RESULTS

The combination of all diverse similarity measures showed stronger correlations with the gold standard similarity scores in the dataset than any individual kind of measure. Our approach reached near 0.80 average Pearson correlation across different clinical evidence types using the devised similarity measures. Although they were more effective when combined together, individual generic and vector-space measures also resulted in strong similarity quantification when used in both unsupervised and supervised models. On the external dataset, our similarity measures were highly competitive with the state-of-the-art approaches developed and trained specifically on that dataset for predicting semantic similarity.

CONCLUSION

Experimental results showed that the proposed semantic similarity quantification approach can effectively identify related clinical evidence that is reported in the literature. The comparison with a state-of-the-art method demonstrated the effectiveness of the approach, and experiments with an external dataset support its generalisability.

Collapse

Lopez-Campos G, Kiossoglou P, Borda A, Hawthorne C, Gray K, Verspoor K. Characterizing the Scope of Exposome Research Through Topic Modeling and Ontology Analysis. Stud Health Technol Inform 2019;264:1530-1531. [PMID: 31438216 DOI: 10.3233/shti190519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Hur B, Hardefeldt LY, Verspoor K, Baldwin T, Gilkerson JR. Using natural language processing and VetCompass to understand antimicrobial usage patterns in Australia. Aust Vet J 2019;97:298-300. [PMID: 31209869 DOI: 10.1111/avj.12836] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 02/16/2019] [Indexed: 11/30/2022]

Bouadjenek MR, Zobel J, Verspoor K. Automated assessment of biological database assertions using the scientific literature. BMC Bioinformatics 2019;20:216. [PMID: 31035936 PMCID: PMC6489365 DOI: 10.1186/s12859-019-2801-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 04/09/2019] [Indexed: 12/27/2022] Open

Nguyen DQ, Verspoor K. From POS tagging to dependency parsing for biomedical event extraction. BMC Bioinformatics 2019;20:72. [PMID: 30755172 PMCID: PMC6373122 DOI: 10.1186/s12859-019-2604-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 01/03/2019] [Indexed: 01/03/2023] Open

Islamaj Dogan R, Kim S, Chatr-Aryamontri A, Wei CH, Comeau DC, Antunes R, Matos S, Chen Q, Elangovan A, Panyam NC, Verspoor K, Liu H, Wang Y, Liu Z, Altinel B, Hüsünbeyi ZM, Özgür A, Fergadis A, Wang CK, Dai HJ, Tran T, Kavuluru R, Luo L, Steppi A, Zhang J, Qu J, Lu Z. Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine. Database (Oxford) 2019;2019:5303240. [PMID: 30689846 PMCID: PMC6348314 DOI: 10.1093/database/bay147] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 12/19/2018] [Indexed: 12/16/2022]

Abstract

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.

Collapse

Affiliation(s)

Rezarta Islamaj Dogan National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Sun Kim National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Andrew Chatr-Aryamontri Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada
Chih-Hsuan Wei National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Donald C Comeau National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Rui Antunes Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
Sérgio Matos Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
Qingyu Chen School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Aparna Elangovan School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Nagesh C Panyam School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Karin Verspoor School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Hongfang Liu Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
Yanshan Wang Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
Zhuang Liu School of Computer Science and Technology, Dalian University of Technology, Dalian, China
Berna Altinel Department of Computer Engineering, Marmara University, Istanbul, Turkey
Zehra Melce Hüsünbeyi Department of Computer Engineering, Bogaziçi University, Istanbul, Turkey
Arzucan Özgür
Aris Fergadis School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Athens, Greece
Chen-Kai Wang Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
Hong-Jie Dai Department of Electrical Engineering, National Kaousiung University of Science and Technology, Kaohsiung, Taiwan
Tung Tran Department of Computer Science, University of Kentucky, Lexington, KY, USA
Ramakanth Kavuluru Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
Ling Luo College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Albert Steppi Department of Statistics, Florida State University, Florida, USA
Jinfeng Zhang Department of Statistics, Florida State University, Florida, USA
Jinchan Qu Department of Statistics, Florida State University, Florida, USA
Zhiyong Lu National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA

Collapse

Chen Q, Zhang X, Wan Y, Zobel J, Verspoor K. Search Effectiveness in Nonredundant Sequence Databases: Assessments and Solutions. J Comput Biol 2018;26:605-617. [PMID: 30585742 DOI: 10.1089/cmb.2018.0198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Khumrin P, Ryan A, Juddy T, Verspoor K. DrKnow: A Diagnostic Learning Tool with Feedback from Automated Clinical Decision Support. AMIA Annu Symp Proc 2018;2018:1348-1357. [PMID: 30815179 PMCID: PMC6371235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]