1
|
Patra BG, Sharma MM, Vekaria V, Adekkanattu P, Patterson OV, Glicksberg B, Lepow LA, Ryu E, Biernacka JM, Furmanchuk A, George TJ, Hogan W, Wu Y, Yang X, Bian J, Weissman M, Wickramaratne P, Mann JJ, Olfson M, Campion TR, Weiner M, Pathak J. Extracting social determinants of health from electronic health records using natural language processing: a systematic review. J Am Med Inform Assoc 2021; 28:2716-2727. [PMID: 34613399 PMCID: PMC8633615 DOI: 10.1093/jamia/ocab170] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/09/2021] [Accepted: 08/04/2021] [Indexed: 11/27/2022] Open
Abstract
OBJECTIVE Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. MATERIALS AND METHODS A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. RESULTS Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9). CONCLUSION NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.
Collapse
Affiliation(s)
- Braja G Patra
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Mohit M Sharma
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Veer Vekaria
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Prakash Adekkanattu
- Information Technologies and Services, Weill Cornell Medicine, New York, New York, USA
| | - Olga V Patterson
- Department of Internal Medicine, Division of Epidemiology, University of Utah, Salt Lake City, Utah, USA
- US Department of Veterans Affairs, Salt Lake City, Utah, USA
| | | | - Lauren A Lepow
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Euijung Ryu
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | - Joanna M Biernacka
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | | | - Thomas J George
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - William Hogan
- Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, Florida, USA, and
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Myrna Weissman
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Priya Wickramaratne
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - J John Mann
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Mark Olfson
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Thomas R Campion
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
- Information Technologies and Services, Weill Cornell Medicine, New York, New York, USA
| | - Mark Weiner
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
2
|
Bompelli A, Wang Y, Wan R, Singh E, Zhou Y, Xu L, Oniani D, Kshatriya BSA, Balls-Berry J(JE, Zhang R. Social and Behavioral Determinants of Health in the Era of Artificial Intelligence with Electronic Health Records: A Scoping Review. HEALTH DATA SCIENCE 2021; 2021:9759016. [PMID: 38487504 PMCID: PMC10880156 DOI: 10.34133/2021/9759016] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 06/28/2021] [Indexed: 03/17/2024]
Abstract
Background. There is growing evidence that social and behavioral determinants of health (SBDH) play a substantial effect in a wide range of health outcomes. Electronic health records (EHRs) have been widely employed to conduct observational studies in the age of artificial intelligence (AI). However, there has been limited review into how to make the most of SBDH information from EHRs using AI approaches.Methods. A systematic search was conducted in six databases to find relevant peer-reviewed publications that had recently been published. Relevance was determined by screening and evaluating the articles. Based on selected relevant studies, a methodological analysis of AI algorithms leveraging SBDH information in EHR data was provided.Results. Our synthesis was driven by an analysis of SBDH categories, the relationship between SBDH and healthcare-related statuses, natural language processing (NLP) approaches for extracting SBDH from clinical notes, and predictive models using SBDH for health outcomes.Discussion. The associations between SBDH and health outcomes are complicated and diverse; several pathways may be involved. Using NLP technology to support the extraction of SBDH and other clinical ideas simplifies the identification and extraction of essential concepts from clinical data, efficiently unlocks unstructured data, and aids in the resolution of unstructured data-related issues.Conclusion. Despite known associations between SBDH and diseases, SBDH factors are rarely investigated as interventions to improve patient outcomes. Gaining knowledge about SBDH and how SBDH data can be collected from EHRs using NLP approaches and predictive models improves the chances of influencing health policy change for patient wellness, ultimately promoting health and health equity.
Collapse
Affiliation(s)
- Anusha Bompelli
- Department of Pharmaceutical Care & Health Systems, University of Minnesota, USA
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, USA
| | - Ruyuan Wan
- Department of Computer Science, University of Minnesota, USA
| | - Esha Singh
- Department of Computer Science, University of Minnesota, USA
| | - Yuqi Zhou
- Institute for Health Informatics and College of Pharmacy, University of Minnesota, USA
| | - Lin Xu
- Carlson School of Business, University of Minnesota, USA
| | - David Oniani
- Department of Computer Science and Mathematics, Luther College, USA
| | | | | | - Rui Zhang
- Institute for Health Informatics, Department of Pharmaceutical Care & Health Systems, University of Minnesota, USA
| |
Collapse
|
3
|
Stemerman R, Arguello J, Brice J, Krishnamurthy A, Houston M, Kitzmiller R. Identification of social determinants of health using multi-label classification of electronic health record clinical notes. JAMIA Open 2021; 4:ooaa069. [PMID: 34514351 PMCID: PMC8423426 DOI: 10.1093/jamiaopen/ooaa069] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 11/16/2020] [Accepted: 11/20/2020] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES Social determinants of health (SDH), key contributors to health, are rarely systematically measured and collected in the electronic health record (EHR). We investigate how to leverage clinical notes using novel applications of multi-label learning (MLL) to classify SDH in mental health and substance use disorder patients who frequent the emergency department. METHODS AND MATERIALS We labeled a gold-standard corpus of EHR clinical note sentences (N = 4063) with 6 identified SDH-related domains recommended by the Institute of Medicine for inclusion in the EHR. We then trained 5 classification models: linear-Support Vector Machine, K-Nearest Neighbors, Random Forest, XGBoost, and bidirectional Long Short-Term Memory (BI-LSTM). We adopted 5 common evaluation measures: accuracy, average precision-recall (AP), area under the curve receiver operating characteristic (AUC-ROC), Hamming loss, and log loss to compare the performance of different methods for MLL classification using the F1 score as the primary evaluation metric. RESULTS Our results suggested that, overall, BI-LSTM outperformed the other classification models in terms of AUC-ROC (93.9), AP (0.76), and Hamming loss (0.12). The AUC-ROC values of MLL models of SDH related domains varied between (0.59-1.0). We found that 44.6% of our study population (N = 1119) had at least one positive documentation of SDH. DISCUSSION AND CONCLUSION The proposed approach of training an MLL model on an SDH rich data source can produce a high performing classifier using only unstructured clinical notes. We also provide evidence that model performance is associated with lexical diversity by health professionals and the auto-generation of clinical note sentences to document SDH.
Collapse
Affiliation(s)
- Rachel Stemerman
- Carolina Health Informatics Program, The University of North Carolina, Chapel Hill, North Carolina, USA
| | - Jaime Arguello
- School of Information and Library Sciences, The University of North Carolina, Chapel Hill, North Carolina, USA
| | - Jane Brice
- Department of Emergency Medicine, The University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Ashok Krishnamurthy
- Department of Computer Science, The University of North Carolina, Chapel Hill, North Carolina, USA
| | - Mary Houston
- Department of Emergency Medicine, The University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Rebecca Kitzmiller
- School of Nursing, The University of North Carolina, Chapel Hill, North Carolina, USA
| |
Collapse
|
4
|
Chilman N, Song X, Roberts A, Tolani E, Stewart R, Chui Z, Birnie K, Harber-Aschan L, Gazard B, Chandran D, Sanyal J, Hatch S, Kolliakou A, Das-Munshi J. Text mining occupations from the mental health electronic health record: a natural language processing approach using records from the Clinical Record Interactive Search (CRIS) platform in south London, UK. BMJ Open 2021; 11:e042274. [PMID: 33766838 PMCID: PMC7996661 DOI: 10.1136/bmjopen-2020-042274] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 11/06/2020] [Accepted: 11/10/2020] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVES We set out to develop, evaluate and implement a novel application using natural language processing to text mine occupations from the free-text of psychiatric clinical notes. DESIGN Development and validation of a natural language processing application using General Architecture for Text Engineering software to extract occupations from de-identified clinical records. SETTING AND PARTICIPANTS Electronic health records from a large secondary mental healthcare provider in south London, accessed through the Clinical Record Interactive Search platform. The text mining application was run over the free-text fields in the electronic health records of 341 720 patients (all aged ≥16 years). OUTCOMES Precision and recall estimates of the application performance; occupation retrieval using the application compared with structured fields; most common patient occupations; and analysis of key sociodemographic and clinical indicators for occupation recording. RESULTS Using the structured fields alone, only 14% of patients had occupation recorded. By implementing the text mining application in addition to the structured fields, occupations were identified in 57% of patients. The application performed on gold-standard human-annotated clinical text at a precision level of 0.79 and recall level of 0.77. The most common patient occupations recorded were 'student' and 'unemployed'. Patients with more service contact were more likely to have an occupation recorded, as were patients of a male gender, older age and those living in areas of lower deprivation. CONCLUSION This is the first time a natural language processing application has been used to successfully derive patient-level occupations from the free-text of electronic mental health records, performing with good levels of precision and recall, and applied at scale. This may be used to inform clinical studies relating to the broader social determinants of health using electronic health records.
Collapse
Affiliation(s)
- Natasha Chilman
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Xingyi Song
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | - Angus Roberts
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Esther Tolani
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Zoe Chui
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Karen Birnie
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- King's College Hospital NHS Trust, London, UK
| | - Lisa Harber-Aschan
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Billy Gazard
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - David Chandran
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Jyoti Sanyal
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Stephani Hatch
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- Economic and Social Research Council (ESRC) Centre for Society and Mental Health, King's College London, London, UK
| | - Anna Kolliakou
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Jayati Das-Munshi
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
- Economic and Social Research Council (ESRC) Centre for Society and Mental Health, King's College London, London, UK
| |
Collapse
|
5
|
Decker BM, Hill CE, Baldassano SN, Khankhanian P. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches. Seizure 2021; 85:138-144. [PMID: 33461032 DOI: 10.1016/j.seizure.2020.11.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 12/16/2022] Open
Abstract
As automated data extraction and natural language processing (NLP) are rapidly evolving, improving healthcare delivery by harnessing large data is garnering great interest. Assessing antiepileptic drug (AED) efficacy and other epilepsy variables pertinent to healthcare delivery remain a critical barrier to improving patient care. In this systematic review, we examined automatic electronic health record (EHR) extraction methodologies pertinent to epilepsy. We also reviewed more generalizable NLP pipelines to extract other critical patient variables. Our review found varying reports of performance measures. Whereas automated data extraction pipelines are a crucial advancement, this review calls attention to standardizing NLP methodology and accuracy reporting for greater generalizability. Moreover, the use of crowdsourcing competitions to spur innovative NLP pipelines would further advance this field.
Collapse
Affiliation(s)
- Barbara M Decker
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States.
| | - Chloé E Hill
- Department of Neurology, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI, 48109, United States
| | - Steven N Baldassano
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| | - Pouya Khankhanian
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| |
Collapse
|
6
|
Cooke Bailey JN, Bush WS, Crawford DC. Editorial: The Importance of Diversity in Precision Medicine Research. Front Genet 2020; 11:875. [PMID: 33005167 PMCID: PMC7479241 DOI: 10.3389/fgene.2020.00875] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 07/17/2020] [Indexed: 11/13/2022] Open
Affiliation(s)
- Jessica N. Cooke Bailey
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States
| | - William S. Bush
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH, United States
| | - Dana C. Crawford
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH, United States
| |
Collapse
|
7
|
Hollister BM, Farber-Eger E, Aldrich MC, Crawford DC. A Social Determinant of Health May Modify Genetic Associations for Blood Pressure: Evidence From a SNP by Education Interaction in an African American Population. Front Genet 2019; 10:428. [PMID: 31134134 PMCID: PMC6523518 DOI: 10.3389/fgene.2019.00428] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 04/18/2019] [Indexed: 01/11/2023] Open
Abstract
African Americans experience the highest burden of hypertension in the United States compared with other groups. Genetic contributions to this complex condition are now emerging in this as well as other populations through large-scale genome-wide association studies (GWAS) and meta-analyses. Despite these recent discovery efforts, relatively few large-scale studies of blood pressure have considered the joint influence of genetics and social determinants of health despite extensive evidence supporting their impact on hypertension. To identify these expected interactions, we accessed a subset of the Vanderbilt University Medical Center (VUMC) biorepository linked to de-identified electronic health records (EHRs) of adult African Americans genotyped using the Illumina Metabochip (n = 2,577). To examine potential interactions between education, a recognized social determinant of health, and genetic variants contributing to blood pressure, we used linear regression models to investigate two-way interactions for systolic and diastolic blood pressure (DBP). We identified a two-way interaction between rs6687976 and education affecting DBP (p = 0.052). Individuals homozygous for the minor allele and having less than a high school education had higher DBP compared with (1) individuals homozygous for the minor allele and high school education or greater and (2) individuals not homozygous for the minor allele and less than a high school education. To our knowledge, this is the first EHR -based study to suggest a gene-environment interaction for blood pressure in African Americans, supporting the hypothesis that genetic contributions to hypertension may be modulated by social factors.
Collapse
Affiliation(s)
- Brittany M Hollister
- Social and Behavioral Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States
| | - Eric Farber-Eger
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Melinda C Aldrich
- Department of Thoracic Surgery, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Dana C Crawford
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, United States
| |
Collapse
|
8
|
Pendergrass SA, Crawford DC. Using Electronic Health Records To Generate Phenotypes For Research. CURRENT PROTOCOLS IN HUMAN GENETICS 2019; 100:e80. [PMID: 30516347 PMCID: PMC6318047 DOI: 10.1002/cphg.80] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Electronic health records contain patient-level data collected during and for clinical care. Data within the electronic health record include diagnostic billing codes, procedure codes, vital signs, laboratory test results, clinical imaging, and physician notes. With repeated clinic visits, these data are longitudinal, providing important information on disease development, progression, and response to treatment or intervention strategies. The near universal adoption of electronic health records nationally has the potential to provide population-scale real-world clinical data accessible for biomedical research, including genetic association studies. For this research potential to be realized, high-quality research-grade variables must be extracted from these clinical data warehouses. We describe here common and emerging electronic phenotyping approaches applied to electronic health records, as well as current limitations of both the approaches and the biases associated with these clinically collected data that impact their use in research. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Sarah A. Pendergrass
- Biomedical and Translational Informatics Institute,
Geisinger Research, Rockville MD
| | - Dana C. Crawford
- Institute for Computational Biology, Department of
Population and Quantitative Health Sciences, Case Western Reserve University,
Cleveland, OH
| |
Collapse
|
9
|
Wang H, Liu X, Tao Y, Ye W, Jin Q, Cohen WW, Xing EP. Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019; 24:112-123. [PMID: 30864315 PMCID: PMC6417822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The increasing amount of scientific literature in biological and biomedical science research has created a challenge in continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this challenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the exibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative artificial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles. We construct a system, whose current primary task is to build the genetic association database between genes and complex traits of human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effectiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.
Collapse
Affiliation(s)
- Haohan Wang
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiang Liu
- Chinese University of Hong Kong Shenzhen, China
| | - Yifeng Tao
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Wenting Ye
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Qiao Jin
- Tsinghua University Beijing, China
| | - William W. Cohen
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA,Google AI Pittsburgh, PA, USA
| | - Eric P. Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA,Pettum Inc. Pittsburgh, PA, USA
| |
Collapse
|
10
|
Alegría M, NeMoyer A, Falgas I, Wang Y, Alvarez K. Social Determinants of Mental Health: Where We Are and Where We Need to Go. Curr Psychiatry Rep 2018; 20:95. [PMID: 30221308 PMCID: PMC6181118 DOI: 10.1007/s11920-018-0969-9] [Citation(s) in RCA: 387] [Impact Index Per Article: 55.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
PURPOSE OF REVIEW The present review synthesizes recent literature on social determinants and mental health outcomes and provides recommendations for how to advance the field. We summarize current studies related to changes in the conceptualization of social determinants, how social determinants impact mental health, what we have learned from social determinant interventions, and new methods to collect, use, and analyze social determinant data. RECENT FINDINGS Recent research has increasingly focused on interactions between multiple social determinants, interventions to address upstream causes of mental health challenges, and use of simulation models to represent complex systems. However, methodological challenges and inconsistent findings prevent a definitive understanding of which social determinants should be addressed to improve mental health, and within what populations these interventions may be most effective. Recent advances in strategies to collect, evaluate, and analyze social determinants suggest the potential to better appraise their impact and to implement relevant interventions.
Collapse
Affiliation(s)
- Margarita Alegría
- Disparities Research Unit, Department of Medicine, Massachusetts General Hospital, 50 Staniford Street, Suite 830, Boston, MA, 02114, USA. .,Department of Psychiatry, Harvard Medical School, Boston, MA, USA.
| | - Amanda NeMoyer
- Disparities Research Unit, Department of Medicine, Massachusetts General Hospital,Department of Health Care Policy, Harvard Medical School
| | - Irene Falgas
- Disparities Research Unit, Department of Medicine, Massachusetts General Hospital
| | - Ye Wang
- Disparities Research Unit, Department of Medicine, Massachusetts General Hospital
| | - Kiara Alvarez
- Disparities Research Unit, Department of Medicine, Massachusetts General Hospital,Department of Psychiatry, Harvard Medical School
| |
Collapse
|
11
|
Hindorff LA, Bonham VL, Ohno-Machado L. Enhancing diversity to reduce health information disparities and build an evidence base for genomic medicine. Per Med 2018; 15:403-412. [PMID: 30209973 PMCID: PMC6287493 DOI: 10.2217/pme-2018-0037] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 06/27/2018] [Indexed: 12/17/2022]
Abstract
Advances in genomic medicine are arising from efforts to build a national learning healthcare system (LHS) and large-scale precision medicine studies. However, the underlying evidence base lacks sufficient data from populations historically underrepresented in biomedical research. Although the literature on health and healthcare disparities is extensive, disparities in the availability and quality of health information about diverse and underrepresented populations are less well characterized. This Perspective describes scientific and ethical benefits to incorporating health information from diverse and underrepresented populations in the LHS, resulting in a more robust and generalizable LHS. Near-term recommendations for incorporating diversity into the evidence base for genomic medicine are proposed, even as the groundwork for national and international efforts is underway.
Collapse
Affiliation(s)
- Lucia A Hindorff
- Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Vence L Bonham
- Division of Intramural Research, Social & Behavioral Research Branch & Office of the Director, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| |
Collapse
|
12
|
Blizinsky KD, Bonham VL. Leveraging the Learning Health Care Model to Improve Equity in the Age of Genomic Medicine. Learn Health Syst 2018; 2:e10046. [PMID: 29457138 PMCID: PMC5813818 DOI: 10.1002/lrh2.10046] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Revised: 09/22/2017] [Accepted: 10/17/2017] [Indexed: 01/09/2023] Open
Abstract
To fully achieve the goals of a genomics-enabled learning health care system, purposeful efforts to understand and reduce health disparities and improve equity of care are essential. This paper highlights three major challenges facing genomics-enabled learning health care systems, as they pertain to ancestrally diverse populations: inequality in the utility of genomic medicine; lack of access to pharmacogenomics in clinical care; and inadequate incorporation of social and environmental data into the electronic health care record (EHR). We advance a framework that can not only be used to directly improve care for all within the learning health system, but can also be used to focus on the needs to address racial and ethnic health disparities and improve health equity.
Collapse
Affiliation(s)
- Katherine D. Blizinsky
- Social and Behavioral Research Branch, National Human Genome Research InstituteNational Institutes of HealthBethesdaMaryland
- All of Us Research ProgramNational Institutes of HealthRockvilleMaryland
- Rush Alzheimer's Disease CenterRush UniversityChicagoIllinois
| | - Vence L. Bonham
- Social and Behavioral Research Branch, National Human Genome Research InstituteNational Institutes of HealthBethesdaMaryland
| |
Collapse
|
13
|
Kasthurirathne SN, Vest JR, Menachemi N, Halverson PK, Grannis SJ. Assessing the capacity of social determinants of health data to augment predictive models identifying patients in need of wraparound social services. J Am Med Inform Assoc 2018; 25:47-53. [PMID: 29177457 PMCID: PMC7647142 DOI: 10.1093/jamia/ocx130] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 08/15/2017] [Accepted: 10/19/2017] [Indexed: 11/12/2022] Open
Abstract
Introduction A growing variety of diverse data sources is emerging to better inform health care delivery and health outcomes. We sought to evaluate the capacity for clinical, socioeconomic, and public health data sources to predict the need for various social service referrals among patients at a safety-net hospital. Materials and Methods We integrated patient clinical data and community-level data representing patients' social determinants of health (SDH) obtained from multiple sources to build random forest decision models to predict the need for any, mental health, dietitian, social work, or other SDH service referrals. To assess the impact of SDH on improving performance, we built separate decision models using clinical and SDH determinants and clinical data only. Results Decision models predicting the need for any, mental health, and dietitian referrals yielded sensitivity, specificity, and accuracy measures ranging between 60% and 75%. Specificity and accuracy scores for social work and other SDH services ranged between 67% and 77%, while sensitivity scores were between 50% and 63%. Area under the receiver operating characteristic curve values for the decision models ranged between 70% and 78%. Models for predicting the need for any services reported positive predictive values between 65% and 73%. Positive predictive values for predicting individual outcomes were below 40%. Discussion The need for various social service referrals can be predicted with considerable accuracy using a wide range of readily available clinical and community data that measure socioeconomic and public health conditions. While the use of SDH did not result in significant performance improvements, our approach represents a novel and important application of risk predictive modeling.
Collapse
Affiliation(s)
| | - Joshua R Vest
- Indiana University Richard M. Fairbanks School of Public Health, Indianapolis, IN, USA
- Regenstrief Institute, Indianapolis, IN, USA
| | - Nir Menachemi
- Indiana University Richard M. Fairbanks School of Public Health, Indianapolis, IN, USA
- Regenstrief Institute, Indianapolis, IN, USA
| | - Paul K Halverson
- Indiana University Richard M. Fairbanks School of Public Health, Indianapolis, IN, USA
| | - Shaun J Grannis
- Regenstrief Institute, Indianapolis, IN, USA
- Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
14
|
Extracting Country-of-Origin from Electronic Health Records for Gene- Environment Studies as Part of the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) Study. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:50-57. [PMID: 28815105 PMCID: PMC5543359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We describe here the extraction of country-of-origin, an acculturation variable relevant for gene-environment studies, in a biorepository linked to de-identified electronic health records (EHRs) assessed by the Epidemiologic Architecture for Genes Linked to Environment (EAGLE), a study site of the Population Architecture using Genomics and Epidemiology (PAGE) I study. We extracted country-of-origin from the unstructured clinical free text using regular expressions within the MySQL relational database system in a cohort of 15,863 subjects of mostly non-European descent (including 11,519 African Americans, 1,702 Hispanics, and 1,118 Asians). We performed searches for 231 world countries (including independent sovereign states, dependent areas, and disputed territories) and common misspellings in >14 gigabytes of data including >13 billion characters of clinical text. Manual review of a fraction of the initial country-of-origin assignments established rules for data cleaning and quality control to achieve final country-of-origin status for each subject. After data cleaning, a total of 1,911/15,893 (12.02%) subjects were assigned to a country-of-origin outside of the United States. Mexico was the most commonly assigned country outside of the United States (264 subjects; 13.8% of subjects with a foreign country-of-origin assignment). The distribution of the countries assigned followed expectations based on known migration patterns to the United States with an emphasis on the southeastern region. These data suggest country-of-origin can be successfully extracted from unstructured clinical text for downstream genetic association studies.
Collapse
|