1
|
Machine learning approaches for classification of colorectal cancer with and without feature selection method on microarray data. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101419] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
|
2
|
Grossman LV, Masterson Creber RM, Benda NC, Wright D, Vawdrey DK, Ancker JS. Interventions to increase patient portal use in vulnerable populations: a systematic review. J Am Med Inform Assoc 2021; 26:855-870. [PMID: 30958532 DOI: 10.1093/jamia/ocz023] [Citation(s) in RCA: 137] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 02/05/2019] [Accepted: 02/12/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND More than 100 studies document disparities in patient portal use among vulnerable populations. Developing and testing strategies to reduce disparities in use is essential to ensure portals benefit all populations. OBJECTIVE To systematically review the impact of interventions designed to: (1) increase portal use or predictors of use in vulnerable patient populations, or (2) reduce disparities in use. MATERIALS AND METHODS A librarian searched Ovid MEDLINE, EMBASE, CINAHL, and Cochrane Reviews for studies published before September 1, 2018. Two reviewers independently selected English-language research articles that evaluated any interventions designed to impact an eligible outcome. One reviewer extracted data and categorized interventions, then another assessed accuracy. Two reviewers independently assessed risk of bias. RESULTS Out of 18 included studies, 15 (83%) assessed an intervention's impact on portal use, 7 (39%) on predictors of use, and 1 (6%) on disparities in use. Most interventions studied focused on the individual (13 out of 26, 50%), as opposed to facilitating conditions, such as the tool, task, environment, or organization (SEIPS model). Twelve studies (67%) reported a statistically significant increase in portal use or predictors of use, or reduced disparities. Five studies (28%) had high or unclear risk of bias. CONCLUSION Individually focused interventions have the most evidence for increasing portal use in vulnerable populations. Interventions affecting other system elements (tool, task, environment, organization) have not been sufficiently studied to draw conclusions. Given the well-established evidence for disparities in use and the limited research on effective interventions, research should move beyond identifying disparities to systematically addressing them at multiple levels.
Collapse
Affiliation(s)
- Lisa V Grossman
- Department of Biomedical Informatics, College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | | | - Natalie C Benda
- Department of Healthcare Policy & Research, Weill Cornell Medicine, New York, New York, USA
| | - Drew Wright
- Samuel J Wood Library, Information Technologies and Services, Weill Cornell Medicine, New York, New York, USA
| | - David K Vawdrey
- Department of Biomedical Informatics, College of Physicians and Surgeons, Columbia University, New York, New York, USA.,Value Institute, NewYork-Presbyterian Hospital, New York, New York, USA
| | - Jessica S Ancker
- Department of Healthcare Policy & Research, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
3
|
Qummar S, Khan FG, Shah S, Khan A, Din A, Gao J. Deep Learning Techniques for Diabetic Retinopathy Detection. Curr Med Imaging 2021; 16:1201-1213. [DOI: 10.2174/1573405616666200213114026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 11/26/2019] [Accepted: 12/19/2019] [Indexed: 11/22/2022]
Abstract
Diabetes occurs due to the excess of glucose in the blood that may affect many organs
of the body. Elevated blood sugar in the body causes many problems including Diabetic Retinopathy
(DR). DR occurs due to the mutilation of the blood vessels in the retina. The manual detection
of DR by ophthalmologists is complicated and time-consuming. Therefore, automatic detection is
required, and recently different machine and deep learning techniques have been applied to detect
and classify DR. In this paper, we conducted a study of the various techniques available in the literature
for the identification/classification of DR, the strengths and weaknesses of available datasets
for each method, and provides the future directions. Moreover, we also discussed the different
steps of detection, that are: segmentation of blood vessels in a retina, detection of lesions, and other
abnormalities of DR.
Collapse
Affiliation(s)
- Sehrish Qummar
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, Pakistan
| | - Fiaz Gul Khan
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, Pakistan
| | - Sajid Shah
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, Pakistan
| | - Ahmad Khan
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, Pakistan
| | - Ahmad Din
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, Pakistan
| | - Jinfeng Gao
- Department of Information Engineering, Huanghuai University, Henan, China
| |
Collapse
|
4
|
Liu S, Wang Y, Wen A, Wang L, Hong N, Shen F, Bedrick S, Hersh W, Liu H. Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation. JMIR Med Inform 2020; 8:e17376. [PMID: 33021486 PMCID: PMC7576539 DOI: 10.2196/17376] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 06/04/2020] [Accepted: 07/28/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text-Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.
Collapse
Affiliation(s)
- Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Na Hong
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Steven Bedrick
- Department of Computer Science and Electrical Engineering, Oregon Health & Science University, Portland, OR, United States
| | - William Hersh
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
5
|
Sarrouti M, Ouatik El Alaoui S. SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions. Artif Intell Med 2019; 102:101767. [PMID: 31980104 DOI: 10.1016/j.artmed.2019.101767] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 11/19/2019] [Accepted: 11/19/2019] [Indexed: 12/11/2022]
Abstract
BACKGROUND AND OBJECTIVE Question answering (QA), the identification of short accurate answers to users questions written in natural language expressions, is a longstanding issue widely studied over the last decades in the open-domain. However, it still remains a real challenge in the biomedical domain as the most of the existing systems support a limited amount of question and answer types as well as still require further efforts in order to improve their performance in terms of precision for the supported questions. Here, we present a semantic biomedical QA system named SemBioNLQA which has the ability to handle the kinds of yes/no, factoid, list, and summary natural language questions. METHODS This paper describes the system architecture and an evaluation of the developed end-to-end biomedical QA system named SemBioNLQA, which consists of question classification, document retrieval, passage retrieval and answer extraction modules. It takes natural language questions as input, and outputs both short precise answers and summaries as results. The SemBioNLQA system, dealing with four types of questions, is based on (1) handcrafted lexico-syntactic patterns and a machine learning algorithm for question classification, (2) PubMed search engine and UMLS similarity for document retrieval, (3) the BM25 model, stemmed words and UMLS concepts for passage retrieval, and (4) UMLS metathesaurus, BioPortal synonyms, sentiment analysis and term frequency metric for answer extraction. RESULTS AND CONCLUSION Compared with the current state-of-the-art biomedical QA systems, SemBioNLQA, a fully automated system, has the potential to deal with a large amount of question and answer types. SemBioNLQA retrieves quickly users' information needs by returning exact answers (e.g., "yes", "no", a biomedical entity name, etc.) and ideal answers (i.e., paragraph-sized summaries of relevant information) for yes/no, factoid and list questions, whereas it provides only the ideal answers for summary questions. Moreover, experimental evaluations performed on biomedical questions and answers provided by the BioASQ challenge especially in 2015, 2016 and 2017 (as part of our participation), show that SemBioNLQA achieves good performances compared with the most current state-of-the-art systems and allows a practical and competitive alternative to help information seekers find exact and ideal answers to their biomedical questions. The SemBioNLQA source code is publicly available at https://github.com/sarrouti/sembionlqa.
Collapse
Affiliation(s)
- Mourad Sarrouti
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda, MD.
| | - Said Ouatik El Alaoui
- National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco
| |
Collapse
|
6
|
Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:139-153. [PMID: 29994486 PMCID: PMC6388621 DOI: 10.1109/tcbb.2018.2849968] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This article reviews recent advances in applying natural language processing (NLP) to Electronic Health Records (EHRs) for computational phenotyping. NLP-based computational phenotyping has numerous applications including diagnosis categorization, novel phenotype discovery, clinical trial screening, pharmacogenomics, drug-drug interaction (DDI), and adverse drug event (ADE) detection, as well as genome-wide and phenome-wide association studies. Significant progress has been made in algorithm development and resource construction for computational phenotyping. Among the surveyed methods, well-designed keyword search and rule-based systems often achieve good performance. However, the construction of keyword and rule lists requires significant manual effort, which is difficult to scale. Supervised machine learning models have been favored because they are capable of acquiring both classification patterns and structures from data. Recently, deep learning and unsupervised learning have received growing attention, with the former favored for its performance and the latter for its ability to find novel phenotypes. Integrating heterogeneous data sources have become increasingly important and have shown promise in improving model performance. Often, better performance is achieved by combining multiple modalities of information. Despite these many advances, challenges and opportunities remain for NLP-based computational phenotyping, including better model interpretability and generalizability, and proper characterization of feature relations in clinical narratives.
Collapse
Affiliation(s)
- Zexian Zeng
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
| | - Yu Deng
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
| | - Xiaoyu Li
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA 02115.
| | - Tristan Naumann
- Science and Artificial Intelligence Lab, Massachusetts Institue of Technology, Cambridge, MA 02139.
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
| |
Collapse
|
7
|
Chen J, Jagannatha AN, Fodeh SJ, Yu H. Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach. JMIR Med Inform 2017; 5:e42. [PMID: 29089288 PMCID: PMC5686421 DOI: 10.2196/medinform.8531] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2017] [Revised: 09/19/2017] [Accepted: 09/20/2017] [Indexed: 11/13/2022] Open
Abstract
Background Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. Objective We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation—that is, creating lay definitions for these terms. Methods Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. Results The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P<.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS’s performance substantially. Conclusions ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS’s performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.
Collapse
Affiliation(s)
- Jinying Chen
- Department of Quantitative Health Sicences, University of Massachusetts Medical School, Worcester, MA, United States
| | | | - Samah J Fodeh
- Yale Center for Medical Informatics, Yale University, New Haven, CT, United States
| | - Hong Yu
- Department of Quantitative Health Sicences, University of Massachusetts Medical School, Worcester, MA, United States.,Bedford Veterans Affairs Medical Center, Bedford, MA, United States
| |
Collapse
|