1
|
Pinard CJ, Poon AC, Lagree A, Wu K, Li J, Tran WT. Precision in Parsing: Evaluation of an Open-Source Named Entity Recognizer (NER) in Veterinary Oncology. Vet Comp Oncol 2025; 23:102-108. [PMID: 39711253 PMCID: PMC11830456 DOI: 10.1111/vco.13035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 11/14/2024] [Accepted: 12/02/2024] [Indexed: 12/24/2024]
Abstract
Integrating Artificial Intelligence (AI) through Natural Language Processing (NLP) can improve veterinary medical oncology clinical record analytics. Named Entity Recognition (NER), a critical component of NLP, can facilitate efficient data extraction and automated labelling for research and clinical decision-making. This study assesses the efficacy of the Bio-Epidemiology-NER (BioEN), an open-source NER developed using human epidemiological and medical data, on veterinary medical oncology records. The NER's performance was compared with manual annotations by a veterinary medical oncologist and a veterinary intern. Evaluation metrics included Jaccard similarity, intra-rater reliability, ROUGE scores, and standard NER performance metrics (precision, recall, F1-score). Results indicate poor direct translatability to veterinary medical oncology record text and room for improvement in the NER's performance, with precision, recall, and F1-score suggesting a marginally better alignment with the oncologist than the intern. While challenges remain, these insights contribute to the ongoing development of AI tools tailored for veterinary healthcare and highlight the need for veterinary-specific models.
Collapse
Affiliation(s)
- Christopher J. Pinard
- Department of Clinical StudiesOntario Veterinary College, University of GuelphGuelphOntarioCanada
- Department of OncologyLakeshore Animal Health PartnersMississaugaOntarioCanada
- Centre for Advancing Responsible & Ethical Artificial Intelligence, University of GuelphGuelphOntarioCanada
- Radiogenomics Laboratory, Sunnybrook Health Sciences CentreTorontoOntarioCanada
- ANI.ML Research, ANI.ML Health Inc.TorontoOntarioCanada
| | - Andrew C. Poon
- VCA Mississauga Oakville Veterinary Emergency HospitalMississaugaOntarioCanada
| | - Andrew Lagree
- Radiogenomics Laboratory, Sunnybrook Health Sciences CentreTorontoOntarioCanada
- ANI.ML Research, ANI.ML Health Inc.TorontoOntarioCanada
- Odette Cancer Program, Sunnybrook Health Sciences CentreTorontoOntarioCanada
| | - Kuan‐Chuen Wu
- ANI.ML Research, ANI.ML Health Inc.TorontoOntarioCanada
| | - Jiaxu Li
- Radiogenomics Laboratory, Sunnybrook Health Sciences CentreTorontoOntarioCanada
| | - William T. Tran
- Radiogenomics Laboratory, Sunnybrook Health Sciences CentreTorontoOntarioCanada
- Odette Cancer Program, Sunnybrook Health Sciences CentreTorontoOntarioCanada
- Department of Radiation OncologyUniversity of TorontoTorontoOntarioCanada
- Temerty Centre for AI Research and Education in Medicine, University of TorontoTorontoOntarioCanada
| |
Collapse
|
2
|
Davies H, Nenadic G, Alfattni G, Arguello Casteleiro M, Al Moubayed N, Farrell S, Radford AD, Noble PJM. Text mining for disease surveillance in veterinary clinical data: part two, training computers to identify features in clinical text. Front Vet Sci 2024; 11:1352726. [PMID: 39239390 PMCID: PMC11376235 DOI: 10.3389/fvets.2024.1352726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 07/17/2024] [Indexed: 09/07/2024] Open
Abstract
In part two of this mini-series, we evaluate the range of machine-learning tools now available for application to veterinary clinical text-mining. These tools will be vital to automate extraction of information from large datasets of veterinary clinical narratives curated by projects such as the Small Animal Veterinary Surveillance Network (SAVSNET) and VetCompass, where volumes of millions of records preclude reading records and the complexities of clinical notes limit usefulness of more "traditional" text-mining approaches. We discuss the application of various machine learning techniques ranging from simple models for identifying words and phrases with similar meanings to expand lexicons for keyword searching, to the use of more complex language models. Specifically, we describe the use of language models for record annotation, unsupervised approaches for identifying topics within large datasets, and discuss more recent developments in the area of generative models (such as ChatGPT). As these models become increasingly complex it is pertinent that researchers and clinicians work together to ensure that the outputs of these models are explainable in order to instill confidence in any conclusions drawn from them.
Collapse
Affiliation(s)
- Heather Davies
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, Manchester University, Manchester, United Kingdom
| | - Ghada Alfattni
- Department of Computer Science, Manchester University, Manchester, United Kingdom
| | | | - Noura Al Moubayed
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Sean Farrell
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Alan D Radford
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - P-J M Noble
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
3
|
Lyu D, Wang X, Chen Y, Wang F. Language model and its interpretability in biomedicine: A scoping review. iScience 2024; 27:109334. [PMID: 38495823 PMCID: PMC10940999 DOI: 10.1016/j.isci.2024.109334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024] Open
Abstract
With advancements in large language models, artificial intelligence (AI) is undergoing a paradigm shift where AI models can be repurposed with minimal effort across various downstream tasks. This provides great promise in learning generally useful representations from biomedical corpora, at scale, which would empower AI solutions in healthcare and biomedical research. Nonetheless, our understanding of how they work, when they fail, and what they are capable of remains underexplored due to their emergent properties. Consequently, there is a need to comprehensively examine the use of language models in biomedicine. This review aims to summarize existing studies of language models in biomedicine and identify topics ripe for future research, along with the technical and analytical challenges w.r.t. interpretability. We expect this review to help researchers and practitioners better understand the landscape of language models in biomedicine and what methods are available to enhance the interpretability of their models.
Collapse
Affiliation(s)
- Daoming Lyu
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY, USA
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Xingbo Wang
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY, USA
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Fei Wang
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY, USA
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
4
|
Dettori A, Ferroni L, Felici A, Scoccia E, Maresca C. Canine mortality in Umbria Region (Central Italy): a population-based analysis. Vet Res Commun 2023; 47:2301-2306. [PMID: 37264175 DOI: 10.1007/s11259-023-10146-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 05/26/2023] [Indexed: 06/03/2023]
Abstract
Companion dogs may be valuable sentinels to better understand the environmental determinants of morbidity and mortality in humans. This study aimed to assess the dog population and mortality in Umbria Region. The source of data was the local Canine Registry. Attribute-specific crude mortality rates by sex, age, and breed were produced on a five-year basis (2014-2018). The human ICD-10 was employed to code the causes of deaths. Over 2014-2018, an annual average population of 226,875 specimens and a total of 46,743 deaths were estimated. Mortality rate was higher in young males than in young females. A specific cause of death was reported for 5,209 dogs; the 62.8 per cent (95%CI = 61.4-64.1) was due to external causes. Neoplasms were the fourth cause of death. Differences in mortality between sexes were consistent with human ones. The death registration procedure needs improvement by a systematic coding of the causes. An adjustment of the human ICD could address the lack of a coding system until the introduction of international standards for animals.
Collapse
Affiliation(s)
- Annalisa Dettori
- Istituto Zooprofilattico Sperimentale dell'Umbria e delle Marche "Togo Rosati", via G. Salvemini, 1, 06126, Perugia, Italy
| | - Laura Ferroni
- Istituto Zooprofilattico Sperimentale dell'Umbria e delle Marche "Togo Rosati", via G. Salvemini, 1, 06126, Perugia, Italy
| | - Andrea Felici
- Istituto Zooprofilattico Sperimentale dell'Umbria e delle Marche "Togo Rosati", via G. Salvemini, 1, 06126, Perugia, Italy
| | - Eleonora Scoccia
- Istituto Zooprofilattico Sperimentale dell'Umbria e delle Marche "Togo Rosati", via G. Salvemini, 1, 06126, Perugia, Italy.
| | - Carmen Maresca
- Istituto Zooprofilattico Sperimentale dell'Umbria e delle Marche "Togo Rosati", via G. Salvemini, 1, 06126, Perugia, Italy
| |
Collapse
|
5
|
Farrell S, Appleton C, Noble PJM, Al Moubayed N. PetBERT: automated ICD-11 syndromic disease coding for outbreak detection in first opinion veterinary electronic health records. Sci Rep 2023; 13:18015. [PMID: 37865683 PMCID: PMC10590382 DOI: 10.1038/s41598-023-45155-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 10/17/2023] [Indexed: 10/23/2023] Open
Abstract
Effective public health surveillance requires consistent monitoring of disease signals such that researchers and decision-makers can react dynamically to changes in disease occurrence. However, whilst surveillance initiatives exist in production animal veterinary medicine, comparable frameworks for companion animals are lacking. First-opinion veterinary electronic health records (EHRs) have the potential to reveal disease signals and often represent the initial reporting of clinical syndromes in animals presenting for medical attention, highlighting their possible significance in early disease detection. Yet despite their availability, there are limitations surrounding their free text-based nature, inhibiting the ability for national-level mortality and morbidity statistics to occur. This paper presents PetBERT, a large language model trained on over 500 million words from 5.1 million EHRs across the UK. PetBERT-ICD is the additional training of PetBERT as a multi-label classifier for the automated coding of veterinary clinical EHRs with the International Classification of Disease 11 framework, achieving F1 scores exceeding 83% across 20 disease codings with minimal annotations. PetBERT-ICD effectively identifies disease outbreaks, outperforming current clinician-assigned point-of-care labelling strategies up to 3 weeks earlier. The potential for PetBERT-ICD to enhance disease surveillance in veterinary medicine represents a promising avenue for advancing animal health and improving public health outcomes.
Collapse
Affiliation(s)
- Sean Farrell
- Department of Computer Science, Durham University, Durham, UK.
| | - Charlotte Appleton
- Centre for Health Informatics, Computing, and Statistics, Lancaster Medical School, Lancaster University, Lancaster, UK
| | - Peter-John Mäntylä Noble
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Noura Al Moubayed
- Department of Computer Science, Durham University, Durham, UK
- Evergreen Life Ltd, Manchester, UK
| |
Collapse
|
6
|
Kennedy U, Paterson M, Clark N. Using a gradient boosted model for case ascertainment from free-text veterinary records. Prev Vet Med 2023; 212:105850. [PMID: 36638610 DOI: 10.1016/j.prevetmed.2023.105850] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 01/06/2023] [Accepted: 01/09/2023] [Indexed: 01/11/2023]
Abstract
Case ascertainment for prevalence and incidence studies from veterinary clinical data poses a major challenge because medical notes are not consistently structured or complete. Using natural language processing (NLP) and machine learning, this study aimed to obtain accurate case recognition for feline upper respiratory tract infections (primarily caused by viruses such as feline herpes virus (FHV-1) and feline calici virus (FCV), and bacteria such as Chlamydophila felis, Mycoplasma felis and Bordetella bronchiseptica using retrospective electronic veterinary records from the Royal Society for Prevention of Cruelty to Animals, Queensland (RSPCA Qld). Data cleaning and NLP on eight years of free-text veterinary records from RSPCA Queensland was carried out to derive text-based predictors. The NLP steps included sorting records by length of stay, vectorising, tokenising and spell checking against a bespoke veterinary database. A gradient boosted model (GBM) was trained to predict the probability of each animal having a diagnosis of upper respiratory infection. A manually annotated dataset was used for training the algorithm to learn dominant patterns between predictors (frequencies of n-grams) and responses (manual binary case classification). The GBM's performance was tested against an out of sample validation dataset, and model agnostics were used to interrogate the model's learning process. The GBM used patient-level frequencies of 1250 unique n-grams as predictor variables and was able to predict the probability of cases in the validation dataset with an accuracy of 0.95 (95% CI 0.92, 0.97) and F1 score of 0.96. Predictors that exerted the highest influence on the model included frequencies of "doxycycline", "flu", "sneezing", "doxybrom" and "ocular". The trained GBM was deployed on the full dataset spanning eight years, comprising 60,258 clinical entries. The prevalence in the full dataset was predicted to be 23.59%, which is in line with domain expertise from practicing veterinarians at the shelter. Case ascertainment is a crucial step for further epidemiological study of cat flu. Ultimately, this tool can be extended to other clinical procedures, conditions, and diseases such as intensive care treatment due to snake bites and tick paralysis, physical injuries such as orthopaedic fractures or chest injuries and labour-intensive infectious diseases like parvovirus, canine cough, and ringworm, all of which require prolonged quarantine and care.
Collapse
Affiliation(s)
- Uttara Kennedy
- UQ School of Veterinary Science, The University of Queensland, Gatton, Queensland 4343, Australia; RSPCA Queensland, Animal Care Campus, 139 Wacol Station Road, Wacol, Queensland 4076, Australia.
| | - Mandy Paterson
- UQ School of Veterinary Science, The University of Queensland, Gatton, Queensland 4343, Australia; RSPCA Queensland, Animal Care Campus, 139 Wacol Station Road, Wacol, Queensland 4076, Australia
| | - Nicholas Clark
- UQ School of Veterinary Science, The University of Queensland, Gatton, Queensland 4343, Australia
| |
Collapse
|
7
|
Hennessey E, DiFazio M, Hennessey R, Cassel N. Artificial intelligence in veterinary diagnostic imaging: A literature review. Vet Radiol Ultrasound 2022; 63 Suppl 1:851-870. [PMID: 36468206 DOI: 10.1111/vru.13163] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 05/05/2022] [Accepted: 07/07/2022] [Indexed: 12/09/2022] Open
Abstract
Artificial intelligence in veterinary medicine is an emerging field. Machine learning, a subfield of artificial intelligence, allows computer programs to analyze large imaging datasets and learn to perform tasks relevant to veterinary diagnostic imaging. This review summarizes the small, yet growing body of artificial intelligence literature in veterinary imaging, provides necessary background to understand these papers, and provides author commentary on the state of the field. To date, less than 40 peer-reviewed publications have utilized machine learning to perform imaging-associated tasks across multiple anatomic regions in veterinary clinical and biomedical research. Major challenges in this field include collection and cleaning of sufficient image data, selection of high-quality ground truth labels, formation of relationships between veterinary and machine learning professionals, and closure of the gap between academic uses of artificial intelligence and currently available commercial products. Further development of artificial intelligence has the potential to help meet the growing need for radiological services through applications in workflow, quality control, and image interpretation for both general practitioners and radiologists.
Collapse
Affiliation(s)
- Erin Hennessey
- Department of Clinical Sciences, College of Veterinary Medicine, Kansas State University, Manhattan, Kansas, USA.,Army Medical Department, Student Detachment, San Antonio, Texas, USA
| | - Matthew DiFazio
- Department of Clinical Sciences, College of Veterinary Medicine, Kansas State University, Manhattan, Kansas, USA
| | - Ryan Hennessey
- Department of Computer Science, College of Engineering, Kansas State University, Manhattan, Kansas, USA
| | - Nicky Cassel
- Department of Clinical Sciences, College of Veterinary Medicine, Kansas State University, Manhattan, Kansas, USA
| |
Collapse
|
8
|
Ouyang ZB, Hodgson JL, Robson E, Havas K, Stone E, Poljak Z, Bernardo TM. Day-1 Competencies for Veterinarians Specific to Health Informatics. Front Vet Sci 2021; 8:651238. [PMID: 34179157 PMCID: PMC8231916 DOI: 10.3389/fvets.2021.651238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
In 2015, the American Association of Veterinary Medical Colleges (AAVMC) developed the Competency-Based Veterinary Education (CBVE) framework to prepare practice-ready veterinarians through competency-based education, which is an outcomes-based approach to equipping students with the skills, knowledge, attitudes, values, and abilities to do their jobs. With increasing use of health informatics (HI: the use of information technology to deliver healthcare) by veterinarians, competencies in HI need to be developed. To reach consensus on a HI competency framework in this study, the Competency Framework Development (CFD) process was conducted using an online adaptation of Developing-A-Curriculum, an established methodology in veterinary medicine for reaching consensus among experts. The objectives of this study were to (1) create an HI competency framework for new veterinarians; (2) group the competency statements into common themes; (3) map the HI competency statements to the AAVMC competencies as illustrative sub-competencies; (4) provide insight into specific technologies that are currently relevant to new veterinary graduates; and (5) measure panelist satisfaction with the CFD process. The primary emphasis of the final HI competency framework was that veterinarians must be able to assess, select, and implement technology to optimize the client-patient experience, delivery of healthcare, and work-life balance for the veterinary team. Veterinarians must also continue their own education regarding technology by engaging relevant experts and opinion leaders.
Collapse
Affiliation(s)
- Zenhwa Ben Ouyang
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| | - Jennifer Louise Hodgson
- Department of Population Health Sciences, Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, United States
| | | | | | - Elizabeth Stone
- Department of Clinical Studies, University of Guelph, Guelph, ON, Canada
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| | - Theresa Marie Bernardo
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| |
Collapse
|
9
|
Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks. NPJ Digit Med 2021; 4:37. [PMID: 33637859 PMCID: PMC7910461 DOI: 10.1038/s41746-021-00404-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 01/26/2021] [Indexed: 12/02/2022] Open
Abstract
Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.
Collapse
|
10
|
Lustgarten JL, Zehnder A, Shipman W, Gancher E, Webb TL. Veterinary informatics: forging the future between veterinary medicine, human medicine, and One Health initiatives-a joint paper by the Association for Veterinary Informatics (AVI) and the CTSA One Health Alliance (COHA). JAMIA Open 2020; 3:306-317. [PMID: 32734172 PMCID: PMC7382640 DOI: 10.1093/jamiaopen/ooaa005] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 12/26/2019] [Accepted: 02/26/2020] [Indexed: 12/25/2022] Open
Abstract
Objectives This manuscript reviews the current state of veterinary medical electronic health records and the ability to aggregate and analyze large datasets from multiple organizations and clinics. We also review analytical techniques as well as research efforts into veterinary informatics with a focus on applications relevant to human and animal medicine. Our goal is to provide references and context for these resources so that researchers can identify resources of interest and translational opportunities to advance the field. Methods and Results This review covers various methods of veterinary informatics including natural language processing and machine learning techniques in brief and various ongoing and future projects. After detailing techniques and sources of data, we describe some of the challenges and opportunities within veterinary informatics as well as providing reviews of common One Health techniques and specific applications that affect both humans and animals. Discussion Current limitations in the field of veterinary informatics include limited sources of training data for developing machine learning and artificial intelligence algorithms, siloed data between academic institutions, corporate institutions, and many small private practices, and inconsistent data formats that make many integration problems difficult. Despite those limitations, there have been significant advancements in the field in the last few years and continued development of a few, key, large data resources that are available for interested clinicians and researchers. These real-world use cases and applications show current and significant future potential as veterinary informatics grows in importance. Veterinary informatics can forge new possibilities within veterinary medicine and between veterinary medicine, human medicine, and One Health initiatives.
Collapse
Affiliation(s)
- Jonathan L Lustgarten
- Association for Veterinary Informatics, Dixon, California, USA.,VCA Inc., Health Technology & Informatics, Los Angeles, California, USA
| | | | - Wayde Shipman
- Veterinary Medical Databases, Columbia, Missouri, USA
| | - Elizabeth Gancher
- Department of Infectious diseases and HIV medicine, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Tracy L Webb
- Department of Clinical Sciences, Colorado State University, Fort Collins, Colorado, USA
| |
Collapse
|
11
|
Wei X, Ke J, Huang H, Zhou S, Guo A, Wang K, Zhan Y, Mai C, Ao W, Xie F, Luo R, Xiao J, Wei H, Chen B. Screening and Identification of Potential Biomarkers for Hepatocellular Carcinoma: An Analysis of TCGA Database and Clinical Validation. Cancer Manag Res 2020; 12:1991-2000. [PMID: 32231440 PMCID: PMC7085335 DOI: 10.2147/cmar.s239795] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Accepted: 02/20/2020] [Indexed: 12/12/2022] Open
Abstract
Introduction Hepatocellular carcinoma (HCC) is the fifth most common cancer in the world. Up to now, many genes associated with HCC have not yet been identified. In this study, we screened the HCC-related genes through the integrated analysis of the TCGA database, of which the potential biomarkers were also further validated by clinical specimens. The discovery of potential biomarkers for HCC provides more opportunities for diagnostic indicators or gene-targeted therapies. Methods Cancer-related genes in The Cancer Genome Atlas (TCGA) HCC database were screened by a random forest (RF) classifier based on the RF algorithm. Proteins encoded by the candidate genes and other associated proteins obtained via protein–protein interaction (PPI) analysis were subjected to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. The newly identified genes were further validated in the HCC cell lines and clinical tissue specimens by Western blotting, immunofluorescence, and immunohistochemistry (IHC). Survival analysis verified the clinical value of genes. Results Ten genes with the best feature importance in the RF classifier were screened as candidate genes. By comprehensive analysis of PPI, GO and KEGG, these genes were confirmed to be closely related to HCC tumors. Representative NOX4 and FLVCR1 were selected for further validation by biochemical analysis which showed upregulation in both cancer cell lines and clinical tumor tissues. High expression of NOX4 or FLVCR1 in cancer cells predicts low survival. Conclusion Herein, we report that NOX4 and FLVCR1 are promising biomarkers for HCC that may be used as diagnostic indicators or therapeutic targets.
Collapse
Affiliation(s)
- Xianli Wei
- Department of Medical Instruments, Guangdong Food and Drug Vocational College, Guangzhou 510520, People's Republic of China
| | - Junzi Ke
- Department of Biochemistry, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China.,Research Center of Integrative Medicine, School of Basic Medicine, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China
| | - Haonan Huang
- Department of Biochemistry, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China.,Research Center of Integrative Medicine, School of Basic Medicine, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China
| | - Shikun Zhou
- Department of Biochemistry, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China.,Research Center of Integrative Medicine, School of Basic Medicine, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China
| | - Ao Guo
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, People's Republic of China
| | - Kun Wang
- Department of Biochemistry, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China.,Research Center of Integrative Medicine, School of Basic Medicine, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China
| | - Yujuan Zhan
- Department of Biochemistry, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China.,Research Center of Integrative Medicine, School of Basic Medicine, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China
| | - Cong Mai
- Department of Abdominal Surgery, Cancer Center of Guangzhou Medical University, Guangzhou 510095, People's Republic of China
| | - Weizhen Ao
- Research Center of Integrative Medicine, School of Basic Medicine, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China
| | - Fuda Xie
- The Second Clinical College, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China.,Guangdong Provincial Academy of Chinese Medical Sciences, Guangzhou 510006, People's Republic of China
| | - Rongping Luo
- School of Foreign Language, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
| | - Jianyong Xiao
- Department of Biochemistry, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China
| | - Hang Wei
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, People's Republic of China
| | - Bonan Chen
- Department of Biochemistry, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China.,Research Center of Integrative Medicine, School of Basic Medicine, Guangzhou University of Chinese Medicine, Guangzhou 510006, People's Republic of China
| |
Collapse
|
12
|
Nie A, Pineda AL, Wright MW, Wand H, Wulf B, Costa H, Patel R, Bustamante CD, Zou J. LitGen: Genetic Literature Recommendation Guided by Human Explanations. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:67-78. [PMID: 31797587 PMCID: PMC7478937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathogenicity using different types of approaches and evidences-e.g. biochemical assays or case control analysis. In collaboration with the Clinical Genomic Resource (ClinGen)-the flagship NIH program for clinical curation-we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity. LitGen uses semi-supervised deep learning to predict the type of evi+dence provided by each paper. It is trained on papers annotated by ClinGen curators and systematically evaluated on new test data collected by ClinGen. LitGen further leverages rich human explanations and unlabeled data to gain 7.9%-12.6% relative performance improvement over models learned only on the annotated papers. It is a useful framework to improve clinical variant curation.
Collapse
Affiliation(s)
- Allen Nie
- Department of Biomedical Data Science, Stanford University School of Medicine,Department of Computer Science, Stanford University
| | - Arturo L. Pineda
- Department of Biomedical Data Science, Stanford University School of Medicine
| | - Matt W. Wright
- Department of Biomedical Data Science, Stanford University School of Medicine,Department of Pathology, Stanford University School of Medicine
| | - Hannah Wand
- Department of Biomedical Data Science, Stanford University School of Medicine,Department of Pathology, Stanford University School of Medicine,Department of Cardiology, Stanford Healthcare
| | - Bryan Wulf
- Department of Biomedical Data Science, Stanford University School of Medicine
| | - Helio Costa
- Department of Biomedical Data Science, Stanford University School of Medicine
| | - Ronak Patel
- Department of Molecular and Human Genetics, Baylor College of Medicine
| | - Carlos D. Bustamante
- Department of Biomedical Data Science, Stanford University School of Medicine,Chan-Zuckerberg Biohub
| | - James Zou
- Department of Biomedical Data Science, Stanford University School of Medicine,Department of Computer Science, Stanford University,Chan-Zuckerberg Biohub
| |
Collapse
|