1
|
Chaturvedi J, Velupillai S, Stewart R, Roberts A. Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach. Stud Health Technol Inform 2024; 310:695-699. [PMID: 38269898 DOI: 10.3233/shti231054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Pain is a common reason for accessing healthcare resources and is a growing area of research, especially in its overlap with mental health. Mental health electronic health records are a good data source to study this overlap. However, much information on pain is held in the free text of these records, where mentions of pain present a unique natural language processing problem due to its ambiguous nature. This project uses data from an anonymised mental health electronic health records database. A machine learning based classification algorithm is trained to classify sentences as discussing patient pain or not. This will facilitate the extraction of relevant pain information from large databases. 1,985 documents were manually triple-annotated for creation of gold standard training data, which was used to train four classification algorithms. The best performing model achieved an F1-score of 0.98 (95% CI 0.98-0.99).
Collapse
Affiliation(s)
- Jaya Chaturvedi
- Institute of Psychiatry, Psychology and Neurosciences, King's College London
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neurosciences, King's College London
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neurosciences, King's College London
- Health Data Research, UK
- South London and Maudsley Biomedical Research Centre, London, United Kingdom
| | - Angus Roberts
- Institute of Psychiatry, Psychology and Neurosciences, King's College London
- Health Data Research, UK
| |
Collapse
|
2
|
Chaturvedi J, Wang T, Velupillai S, Stewart R, Roberts A. Development of a Knowledge Graph Embeddings Model for Pain. AMIA Annu Symp Proc 2024; 2023:299-308. [PMID: 38222382 PMCID: PMC10785867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Pain is a complex concept that can interconnect with other concepts such as a disorder that might cause pain, a medication that might relieve pain, and so on. To fully understand the context of pain experienced by either an individual or across a population, we may need to examine all concepts related to pain and the relationships between them. This is especially useful when modeling pain that has been recorded in electronic health records. Knowledge graphs represent concepts and their relations by an interlinked network, enabling semantic and context-based reasoning in a computationally tractable form. These graphs can, however, be too large for efficient computation. Knowledge graph embeddings help to resolve this by representing the graphs in a low-dimensional vector space. These embeddings can then be used in various downstream tasks such as classification and link prediction. The various relations associated with pain which are required to construct such a knowledge graph can be obtained from external medical knowledge bases such as SNOMED CT, a hierarchical systematic nomenclature of medical terms. A knowledge graph built in this way could be further enriched with real-world examples of pain and its relations extracted from electronic health records. This paper describes the construction of such knowledge graph embedding models of pain concepts, extracted from the unstructured text of mental health electronic health records, combined with external knowledge created from relations described in SNOMED CT, and their evaluation on a subject-object link prediction task. The performance of the models was compared with other baseline models.
Collapse
Affiliation(s)
- Jaya Chaturvedi
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
| | - Tao Wang
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Angus Roberts
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| |
Collapse
|
3
|
Parlatini V, Frangou L, Zhang S, Epstein S, Morris A, Grant C, Zalewski L, Jewell A, Velupillai S, Simonoff E, Downs J. Emotional and behavioral outcomes among youths with mental disorders during the first Covid lockdown and school closures in England: a large clinical population study using health care record integrated surveys. Soc Psychiatry Psychiatr Epidemiol 2024; 59:175-186. [PMID: 37353579 PMCID: PMC10799796 DOI: 10.1007/s00127-023-02517-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 06/06/2023] [Indexed: 06/25/2023]
Abstract
PURPOSE Emotional and behavioral problems in children and young people (CYP) have increased over the pandemic. Those with pre-existing mental disorders are more vulnerable but have been understudied. We investigated emotional and behavioral outcomes in this population; differences across diagnostic groups; and social, educational, and clinical determinants. METHODS We invited 5386 caregivers and CYP (aged 5-17) under child mental health services pre-pandemic to complete an online survey on CYP's emotional/behavioral symptoms and pandemic-related circumstances, and integrated responses with clinicodemographic information extracted from electronic health records. We compared four parent-rated outcomes (total emotional/behavioral scores and emotional/behavioral changes as compared to before the pandemic) across the three most common diagnostic groups in our population (Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorder (ASD) and emotional disorders (EmD)). We then estimated the association of clinicodemographic and pandemic-related characteristics with emotional/behavioral outcomes. RESULTS A total of 1741 parents (32.3%) completed the survey. Parents of CYP with ADHD or ASD reported more behavioral difficulties (t(591) = 5.618 (0.001); t(663) = 6.527 (0.001)); greater emotional deterioration (t(591) = 2.592 (0.009); t(664) = 4.670 (< 0.001); and greater behavioral deterioration (t(594) = 4.529 (< 0.001); t(664) = 5.082 (< 0.001)) as compared to the EmD group. Those with ASD and EmD showed more emotional difficulties than ADHD (t(891) = - 4.431 (< 0.001); t(590) = - 3.254 (0.001)). Across diagnoses, poor parental mental health and challenges with education were most strongly associated with worse outcomes. CONCLUSIONS Within our clinical population, CYP with ADHD/ASD were the most adversely affected during lockdown. Enhancing clinical service provision that tackles parental stress and supports education may help mitigate the impact of future restrictions.
Collapse
Affiliation(s)
- V Parlatini
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, UK.
| | - L Frangou
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
| | - S Zhang
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
| | - S Epstein
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
| | - A Morris
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
| | - C Grant
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
- Department of Epidemiology and Public Health, University College London, London, UK
| | - L Zalewski
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- National Institute for Health Research (NIHR) Biomedical Research Centre, South London and Maudsley NHS Foundation Trust, London, UK
| | - A Jewell
- National Institute for Health Research (NIHR) Biomedical Research Centre, South London and Maudsley NHS Foundation Trust, London, UK
| | - S Velupillai
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - E Simonoff
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
- National Institute for Health Research (NIHR) Biomedical Research Centre, South London and Maudsley NHS Foundation Trust, London, UK
| | - J Downs
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
- National Institute for Health Research (NIHR) Biomedical Research Centre, South London and Maudsley NHS Foundation Trust, London, UK
| |
Collapse
|
4
|
Chaturvedi J, Chance N, Mirza L, Vernugopan V, Velupillai S, Stewart R, Roberts A. Development of a Corpus Annotated With Mentions of Pain in Mental Health Records: Natural Language Processing Approach. JMIR Form Res 2023; 7:e45849. [PMID: 37358897 PMCID: PMC10337440 DOI: 10.2196/45849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 04/06/2023] [Accepted: 04/06/2023] [Indexed: 04/09/2023] Open
Abstract
BACKGROUND Pain is a widespread issue, with 20% of adults (1 in 5) experiencing it globally. A strong association has been demonstrated between pain and mental health conditions, and this association is known to exacerbate disability and impairment. Pain is also known to be strongly related to emotions, which can lead to damaging consequences. As pain is a common reason for people to access health care facilities, electronic health records (EHRs) are a potential source of information on this pain. Mental health EHRs could be particularly beneficial since they can show the overlap of pain with mental health. Most mental health EHRs contain the majority of their information within the free-text sections of the records. However, it is challenging to extract information from free text. Natural language processing (NLP) methods are therefore required to extract this information from the text. OBJECTIVE This research describes the development of a corpus of manually labeled mentions of pain and pain-related entities from the documents of a mental health EHR database, for use in the development and evaluation of future NLP methods. METHODS The EHR database used, Clinical Record Interactive Search, consists of anonymized patient records from The South London and Maudsley National Health Service Foundation Trust in the United Kingdom. The corpus was developed through a process of manual annotation where pain mentions were marked as relevant (ie, referring to physical pain afflicting the patient), negated (ie, indicating absence of pain), or not relevant (ie, referring to pain affecting someone other than the patient, or metaphorical and hypothetical mentions). Relevant mentions were also annotated with additional attributes such as anatomical location affected by pain, pain character, and pain management measures, if mentioned. RESULTS A total of 5644 annotations were collected from 1985 documents (723 patients). Over 70% (n=4028) of the mentions found within the documents were annotated as relevant, and about half of these mentions also included the anatomical location affected by the pain. The most common pain character was chronic pain, and the most commonly mentioned anatomical location was the chest. Most annotations (n=1857, 33%) were from patients who had a primary diagnosis of mood disorders (International Classification of Diseases-10th edition, chapter F30-39). CONCLUSIONS This research has helped better understand how pain is mentioned within the context of mental health EHRs and provided insight into the kind of information that is typically mentioned around pain in such a data source. In future work, the extracted information will be used to develop and evaluate a machine learning-based NLP application to automatically extract relevant pain information from EHR databases.
Collapse
Affiliation(s)
- Jaya Chaturvedi
- Department of Biostatistics and Health Informatics, King's College London, London, United Kingdom
| | - Natalia Chance
- Department of Psychological Medicine, King's College London, London, United Kingdom
| | - Luwaiza Mirza
- Department of Psychological Medicine, King's College London, London, United Kingdom
| | - Veshalee Vernugopan
- College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Sumithra Velupillai
- Department of Psychological Medicine, King's College London, London, United Kingdom
| | - Robert Stewart
- Department of Psychological Medicine, King's College London, London, United Kingdom
- South London and Maudsley Biomedical Research Centre, London, United Kingdom
| | - Angus Roberts
- Department of Biostatistics and Health Informatics, King's College London, London, United Kingdom
- South London and Maudsley Biomedical Research Centre, London, United Kingdom
| |
Collapse
|
5
|
Ter-Minassian L, Viani N, Wickersham A, Cross L, Stewart R, Velupillai S, Downs J. Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data. BMJ Open 2022; 12:e058058. [PMID: 36576182 PMCID: PMC9723859 DOI: 10.1136/bmjopen-2021-058058] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 08/08/2022] [Indexed: 12/11/2022] Open
Abstract
OBJECTIVES Attention deficit hyperactivity disorder (ADHD) is a prevalent childhood disorder, but often goes unrecognised and untreated. To improve access to services, accurate predictions of populations at high risk of ADHD are needed for effective resource allocation. Using a unique linked health and education data resource, we examined how machine learning (ML) approaches can predict risk of ADHD. DESIGN Retrospective population cohort study. SETTING South London (2007-2013). PARTICIPANTS n=56 258 pupils with linked education and health data. PRIMARY OUTCOME MEASURES Using area under the curve (AUC), we compared the predictive accuracy of four ML models and one neural network for ADHD diagnosis. Ethnic group and language biases were weighted using a fair pre-processing algorithm. RESULTS Random forest and logistic regression prediction models provided the highest predictive accuracy for ADHD in population samples (AUC 0.86 and 0.86, respectively) and clinical samples (AUC 0.72 and 0.70). Precision-recall curve analyses were less favourable. Sociodemographic biases were effectively reduced by a fair pre-processing algorithm without loss of accuracy. CONCLUSIONS ML approaches using linked routinely collected education and health data offer accurate, low-cost and scalable prediction models of ADHD. These approaches could help identify areas of need and inform resource allocation. Introducing 'fairness weighting' attenuates some sociodemographic biases which would otherwise underestimate ADHD risk within minority groups.
Collapse
Affiliation(s)
| | - Natalia Viani
- Department of Psychological Medicine, King's College London, London, UK
| | - Alice Wickersham
- Department of Psychological Medicine, King's College London, London, UK
| | - Lauren Cross
- Department of Psychological Medicine, King's College London, London, UK
| | - Robert Stewart
- Department of Psychological Medicine, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | | | - Johnny Downs
- South London and Maudsley NHS Foundation Trust, London, UK
- Department of Child and Adolescent Psychiatry, King's College London, London, UK
| |
Collapse
|
6
|
Cusick M, Velupillai S, Downs J, Campion TR, Sholle ET, Dutta R, Pathak J. Portability of natural language processing methods to detect suicidality from clinical text in US and UK electronic health records. J Affect Disord Rep 2022; 10:100430. [PMID: 36644339 PMCID: PMC9835770 DOI: 10.1016/j.jadr.2022.100430] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Background In the global effort to prevent death by suicide, many academic medical institutions are implementing natural language processing (NLP) approaches to detect suicidality from unstructured clinical text in electronic health records (EHRs), with the hope of targeting timely, preventative interventions to individuals most at risk of suicide. Despite the international need, the development of these NLP approaches in EHRs has been largely local and not shared across healthcare systems. Methods In this study, we developed a process to share NLP approaches that were individually developed at King's College London (KCL), UK and Weill Cornell Medicine (WCM), US - two academic medical centers based in different countries with vastly different healthcare systems. We tested and compared the algorithms' performance on manually annotated clinical notes (KCL: n = 4,911 and WCM = 837). Results After a successful technical porting of the NLP approaches, our quantitative evaluation determined that independently developed NLP approaches can detect suicidality at another healthcare organization with a different EHR system, clinical documentation processes, and culture, yet do not achieve the same level of success as at the institution where the NLP algorithm was developed (KCL approach: F1-score 0.85 vs. 0.68, WCM approach: F1-score 0.87 vs. 0.72). Limitations Independent NLP algorithm development and patient cohort selection at the two institutions comprised direct comparability. Conclusions Shared use of these NLP approaches is a critical step forward towards improving data-driven algorithms for early suicide risk identification and timely prevention.
Collapse
Affiliation(s)
- Marika Cusick
- WeiCornell Medicine, 402 E. 67th St., New York, NY 10065, USA, South London and Maudsley NHS Foundation Trust, London, UK, Corresponding author. (M. Cusick)
| | - Sumithra Velupillai
- IoPPN, King’s College London, London, UK, South London and Maudsley NHS Foundation Trust, London, UK
| | - Johnny Downs
- IoPPN, King’s College London, London, UK, South London and Maudsley NHS Foundation Trust, London, UK
| | - Thomas R. Campion
- WeiCornell Medicine, 402 E. 67th St., New York, NY 10065, USA, South London and Maudsley NHS Foundation Trust, London, UK
| | - Evan T. Sholle
- WeiCornell Medicine, 402 E. 67th St., New York, NY 10065, USA, South London and Maudsley NHS Foundation Trust, London, UK
| | - Rina Dutta
- IoPPN, King’s College London, London, UK, South London and Maudsley NHS Foundation Trust, London, UK
| | - Jyotishman Pathak
- WeiCornell Medicine, 402 E. 67th St., New York, NY 10065, USA, South London and Maudsley NHS Foundation Trust, London, UK
| |
Collapse
|
7
|
Mahabadi Z, Mahabadi M, Velupillai S, Roberts A, McGuire P, Ibrahim Z, Patel R. Evaluating physical urban features in several mental illnesses using electronic health record data. Front Digit Health 2022; 4:874237. [PMID: 36158997 PMCID: PMC9490173 DOI: 10.3389/fdgth.2022.874237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 08/08/2022] [Indexed: 01/19/2023] Open
Abstract
Objectives Understanding the potential impact of physical characteristics of the urban environment on clinical outcomes on several mental illnesses. Materials and Methods Physical features of the urban environment were examined as predictors for affective and non-affective several mental illnesses (SMI), the number and length of psychiatric hospital admissions, and the number of short and long-acting injectable antipsychotic prescriptions. In addition, the urban features with the greatest weight in the predicted model were determined. The data included 28 urban features and 6 clinical variables obtained from 30,210 people with SMI receiving care from the South London and Maudsley NHS Foundation Trust (SLaM) using the Clinical Record Interactive Search (CRIS) tool. Five machine learning regression models were evaluated for the highest prediction accuracy followed by the Self-Organising Map (SOM) to represent the results visually. Results The prevalence of SMI, number and duration of psychiatric hospital admission, and antipsychotic prescribing were greater in urban areas. However, machine learning analysis was unable to accurately predict clinical outcomes using urban environmental data. Discussion The urban environment is associated with an increased prevalence of SMI. However, urban features alone cannot explain the variation observed in psychotic disorder prevalence or clinical outcomes measured through psychiatric hospitalisation or exposure to antipsychotic treatments. Conclusion Urban areas are associated with a greater prevalence of SMI but clinical outcomes are likely to depend on a combination of urban and individual patient-level factors. Future mental healthcare service planning should focus on providing appropriate resources to people with SMI in urban environments.
Collapse
Affiliation(s)
- Zahra Mahabadi
- Centre for Urban Science and Progress, King’s College London, London, United Kingdom,Correspondence: Zahra Mahabadi
| | - Maryam Mahabadi
- Warwick Manufacturing Group, University of Warwick, Coventry, United Kingdom
| | - Sumithra Velupillai
- Department of Psychosis Studies, Institute of Psychiatry, Psychology, and Neuroscience, King’s College London, London, United Kingdom
| | - Angus Roberts
- Department of Psychosis Studies, Institute of Psychiatry, Psychology, and Neuroscience, King’s College London, London, United Kingdom,Health Data Research UK, London, United Kingdom
| | - Philip McGuire
- Department of Psychosis Studies, Institute of Psychiatry, Psychology, and Neuroscience, King’s College London, London, United Kingdom,NIHR Maudsley Biomedical Research Centre, South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Zina Ibrahim
- Department of Biostatistics & Health Informatics, King’s College London, London, United Kingdom
| | - Rashmi Patel
- Department of Psychosis Studies, Institute of Psychiatry, Psychology, and Neuroscience, King’s College London, London, United Kingdom,NIHR Maudsley Biomedical Research Centre, South London and Maudsley NHS Foundation Trust, London, United Kingdom
| |
Collapse
|
8
|
Widnall E, Epstein S, Polling C, Velupillai S, Jewell A, Dutta R, Simonoff E, Stewart R, Gilbert R, Ford T, Hotopf M, Hayes RD, Downs J. Autism spectrum disorders as a risk factor for adolescent self-harm: a retrospective cohort study of 113,286 young people in the UK. BMC Med 2022; 20:137. [PMID: 35484575 PMCID: PMC9052640 DOI: 10.1186/s12916-022-02329-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 03/09/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Individuals with autism spectrum disorder (ASD) are at particularly high risk of suicide and suicide attempts. Presentation to a hospital with self-harm is one of the strongest risk factors for later suicide. We describe the use of a novel data linkage between routinely collected education data and child and adolescent mental health data to examine whether adolescents with ASD are at higher risk than the general population of presenting to emergency care with self-harm. METHODS A retrospective cohort study was conducted on the population aged 11-17 resident in four South London boroughs between January 2009 and March 2013, attending state secondary schools, identified in the National Pupil Database (NPD). Exposure data on ASD status were derived from the NPD. We used Cox regression to model time to first self-harm presentation to the Emergency Department (ED). RESULTS One thousand twenty adolescents presented to the ED with self-harm, and 763 matched to the NPD. The sample for analysis included 113,286 adolescents (2.2% with ASD). For boys only, there was an increased risk of self-harm associated with ASD (adjusted hazard ratio 2·79, 95% CI 1·40-5·57, P<0·01). Several other factors including school absence, exclusion from school and having been in foster care were also associated with a higher risk of self-harm. CONCLUSIONS This study provides evidence that ASD in boys, and other educational, social and clinical factors, are risk factors for emergency presentation with self-harm in adolescents. These findings are an important step in developing early recognition and prevention programmes.
Collapse
Affiliation(s)
- Emily Widnall
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- Department of Population Health Sciences, University of Bristol, Bristol, UK
| | - Sophie Epstein
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
- South London and Maudsley NHS Foundation Trust, London, UK.
| | - Catherine Polling
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Amelia Jewell
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Rina Dutta
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Emily Simonoff
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Ruth Gilbert
- Population, Policy and Practice Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, London, UK
| | - Tamsin Ford
- Population, Policy and Practice Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, London, UK
- Department of Psychiatry, University of Cambridge, Cambridge, UK
| | - Matthew Hotopf
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Richard D Hayes
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Johnny Downs
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| |
Collapse
|
9
|
Botelle R, Bhavsar V, Kadra-Scalzo G, Mascio A, Williams MV, Roberts A, Velupillai S, Stewart R. Can natural language processing models extract and classify instances of interpersonal violence in mental healthcare electronic records: an applied evaluative study. BMJ Open 2022; 12:e052911. [PMID: 35172999 PMCID: PMC8852656 DOI: 10.1136/bmjopen-2021-052911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE This paper evaluates the application of a natural language processing (NLP) model for extracting clinical text referring to interpersonal violence using electronic health records (EHRs) from a large mental healthcare provider. DESIGN A multidisciplinary team iteratively developed guidelines for annotating clinical text referring to violence. Keywords were used to generate a dataset which was annotated (ie, classified as affirmed, negated or irrelevant) for: presence of violence, patient status (ie, as perpetrator, witness and/or victim of violence) and violence type (domestic, physical and/or sexual). An NLP approach using a pretrained transformer model, BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) was fine-tuned on the annotated dataset and evaluated using 10-fold cross-validation. SETTING We used the Clinical Records Interactive Search (CRIS) database, comprising over 500 000 de-identified EHRs of patients within the South London and Maudsley NHS Foundation Trust, a specialist mental healthcare provider serving an urban catchment area. PARTICIPANTS Searches of CRIS were carried out based on 17 predefined keywords. Randomly selected text fragments were taken from the results for each keyword, amounting to 3771 text fragments from the records of 2832 patients. OUTCOME MEASURES We estimated precision, recall and F1 score for each NLP model. We examined sociodemographic and clinical variables in patients giving rise to the text data, and frequencies for each annotated violence characteristic. RESULTS Binary classification models were developed for six labels (violence presence, perpetrator, victim, domestic, physical and sexual). Among annotations affirmed for the presence of any violence, 78% (1724) referred to physical violence, 61% (1350) referred to patients as perpetrator and 33% (731) to domestic violence. NLP models' precision ranged from 89% (perpetrator) to 98% (sexual); recall ranged from 89% (victim, perpetrator) to 97% (sexual). CONCLUSIONS State of the art NLP models can extract and classify clinical text on violence from EHRs at acceptable levels of scale, efficiency and accuracy.
Collapse
Affiliation(s)
- Riley Botelle
- School of Medical Education, Guy's, King's and St Thomas' School of Medicine, London, UK
| | - Vishal Bhavsar
- Section of Women's Mental Health, Department of Health Services and Population Research, King's College London, London, UK
| | - Giouliana Kadra-Scalzo
- Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Aurelie Mascio
- Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Marcus V Williams
- School of Medical Education, Guy's, King's and St Thomas' School of Medicine, London, UK
| | - Angus Roberts
- Biostatistics and Health Informatics, King's College London, London, UK
- Health Data Research UK, London, UK
| | - Sumithra Velupillai
- Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Robert Stewart
- Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley Mental Health NHS Trust, London, UK
| |
Collapse
|
10
|
Cliffe C, Seyedsalehi A, Vardavoulia K, Bittar A, Velupillai S, Shetty H, Schmidt U, Dutta R. Using natural language processing to extract self-harm and suicidality data from a clinical sample of patients with eating disorders: a retrospective cohort study. BMJ Open 2021; 11:e053808. [PMID: 34972768 PMCID: PMC8720985 DOI: 10.1136/bmjopen-2021-053808] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVES The objective of this study was to determine risk factors for those diagnosed with eating disorders who report self-harm and suicidality. DESIGN AND SETTING This study was a retrospective cohort study within a secondary mental health service, South London and Maudsley National Health Service Trust. PARTICIPANTS All diagnosed with an F50 diagnosis of eating disorder from January 2009 to September 2019 were included. INTERVENTION AND MEASURES Electronic health records (EHRs) for these patients were extracted and two natural language processing tools were used to determine documentation of self-harm and suicidality in their clinical notes. These tools were validated manually for attribute agreement scores within this study. RESULTS The attribute agreements for precision of positive mentions of self-harm were 0.96 and for suicidality were 0.80; this demonstrates a 'near perfect' and 'strong' agreement and highlights the reliability of the tools in identifying the EHRs reporting self-harm or suicidality. There were 7434 patients with EHRs available and diagnosed with eating disorders included in the study from the dates January 2007 to September 2019. Of these, 4591 (61.8%) had a mention of self-harm within their records and 4764 (64.0%) had a mention of suicidality; 3899 (52.4%) had mentions of both. Patients reporting either self-harm or suicidality were more likely to have a diagnosis of anorexia nervosa (AN) (self-harm, AN OR=3.44, 95% CI 1.05 to 11.3, p=0.04; suicidality, AN OR=8.20, 95% CI 2.17 to 30.1; p=0.002). They were also more likely to have a diagnosis of borderline personality disorder (p≤0.001), bipolar disorder (p<0.001) or substance misuse disorder (p<0.001). CONCLUSION A high percentage of patients (>60%) diagnosed with eating disorders report either self-harm or suicidal thoughts. Relative to other eating disorders, those diagnosed with AN were more likely to report either self-harm or suicidal thoughts. Psychiatric comorbidity, in particular borderline personality disorder and substance misuse, was also associated with an increase risk in self-harm and suicidality. Therefore, risk assessment among patients diagnosed with eating disorders is crucial.
Collapse
Affiliation(s)
- Charlotte Cliffe
- South London & Maudsley, NHS Foundation Trust, London, UK
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| | - Aida Seyedsalehi
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| | - Katerina Vardavoulia
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| | - André Bittar
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| | - Hitesh Shetty
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| | - Ulrike Schmidt
- South London & Maudsley, NHS Foundation Trust, London, UK
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| | - Rina Dutta
- South London & Maudsley, NHS Foundation Trust, London, UK
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| |
Collapse
|
11
|
Laparra E, Mascio A, Velupillai S, Miller T. A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records. Yearb Med Inform 2021; 30:239-244. [PMID: 34479396 PMCID: PMC8416218 DOI: 10.1055/s-0041-1726522] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Objectives:
We survey recent work in biomedical NLP on building more adaptable or generalizable models, with a focus on work dealing with electronic health record (EHR) texts, to better understand recent trends in this area and identify opportunities for future research.
Methods:
We searched PubMed, the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computational Linguistics (ACL) anthology, the Association for the Advancement of Artificial Intelligence (AAAI) proceedings, and Google Scholar for the years 2018-2020. We reviewed abstracts to identify the most relevant and impactful work, and manually extracted data points from each of these papers to characterize the types of methods and tasks that were studied, in which clinical domains, and current state-of-the-art results.
Results:
The ubiquity of pre-trained transformers in clinical NLP research has contributed to an increase in domain adaptation and generalization-focused work that uses these models as the key component. Most recently, work has started to train biomedical transformers and to extend the fine-tuning process with additional domain adaptation techniques. We also highlight recent research in cross-lingual adaptation, as a special case of adaptation.
Conclusions:
While pre-trained transformer models have led to some large performance improvements, general domain pre-training does not always transfer adequately to the clinical domain due to its highly specialized language. There is also much work to be done in showing that the gains obtained by pre-trained transformers are beneficial in real world use cases. The amount of work in domain adaptation and transfer learning is limited by dataset availability and creating datasets for new domains is challenging. The growing body of research in languages other than English is encouraging, and more collaboration between researchers across the language divide would likely accelerate progress in non-English clinical NLP.
Collapse
Affiliation(s)
- Egoitz Laparra
- School of Information, University of Arizona, Tucson, USA
| | - Aurelie Mascio
- Department of Biostatistics and Health Informatics, King's College London, London, United Kingdom
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, USA.,Department of Pediatrics, Harvard Medical School, Boston, USA
| |
Collapse
|
12
|
Abstract
BACKGROUND Rates of suicide attempts and deaths are highest on Mondays and these occur more frequently in the morning or early afternoon, suggesting weekly temporal and diurnal variation in suicidal behaviour. It is unknown whether there are similar time trends on social media, of posts relevant to suicide. We aimed to determine temporal and diurnal variation in posting patterns on the Reddit forum SuicideWatch, an online community for individuals who might be at risk of, or who know someone at risk of suicide. METHODS We used time series analysis to compare date and time stamps of 90,518 SuicideWatch posts from 1st December 2008 to 31st August 2015 to (i) 6,616,431 posts on the most commonly subscribed general subreddit, AskReddit and (ii) 66,934 of these AskReddit posts, which were posted by the SuicideWatch authors. RESULTS Mondays showed the highest proportion of posts on SuicideWatch. Clear diurnal variation was observed, with a peak in the early morning (2:00-5:00 h), and a subsequent decrease to a trough in late morning/early afternoon (11:00-14:00 h). Conversely, the highest volume of posts in the control data was between 20:00-23:00 h. CONCLUSIONS Posts on SuicideWatch occurred most frequently on Mondays: the day most associated with suicide risk. The early morning peak in SuicideWatch posts precedes the time of day during which suicide attempts and deaths most commonly occur. Further research of these weekly and diurnal rhythms should help target populations with support and suicide prevention interventions when needed most.
Collapse
Affiliation(s)
- Rina Dutta
- Department of Psychological Medicine, School of Academic Psychiatry, King’s College London, IoPPN, PO Box 84, 3rd Floor East Wing, Room E3.07, De Crespigny Park, London, SE5 8AF UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - George Gkotsis
- Department of Psychological Medicine, School of Academic Psychiatry, King’s College London, IoPPN, PO Box 84, 3rd Floor East Wing, Room E3.07, De Crespigny Park, London, SE5 8AF UK
| | - Sumithra Velupillai
- Department of Psychological Medicine, School of Academic Psychiatry, King’s College London, IoPPN, PO Box 84, 3rd Floor East Wing, Room E3.07, De Crespigny Park, London, SE5 8AF UK
- School of Electrical Engineering and Computer Science, KTH, Stockholm, Sweden
| | - Ioannis Bakolis
- Department of Psychological Medicine, School of Academic Psychiatry, King’s College London, IoPPN, PO Box 84, 3rd Floor East Wing, Room E3.07, De Crespigny Park, London, SE5 8AF UK
| | - Robert Stewart
- Department of Psychological Medicine, School of Academic Psychiatry, King’s College London, IoPPN, PO Box 84, 3rd Floor East Wing, Room E3.07, De Crespigny Park, London, SE5 8AF UK
- South London and Maudsley NHS Foundation Trust, London, UK
| |
Collapse
|
13
|
Bittar A, Velupillai S, Roberts A, Dutta R. Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis. JMIR Med Inform 2021; 9:e22397. [PMID: 33847595 PMCID: PMC8080148 DOI: 10.2196/22397] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 11/26/2020] [Accepted: 12/05/2020] [Indexed: 11/21/2022] Open
Abstract
Background Suicide is a serious public health issue, accounting for 1.4% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. One avenue of research involves using sentiment analysis to examine clinicians’ subjective judgments when reporting on patients. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). However, little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora. Objective This study aims to quantitatively and qualitatively evaluate the coverage of six general-purpose sentiment lexicons against a corpus of EHR texts to ascertain the extent to which such lexical resources are fit for use in suicide risk assessment. Methods The data for this study were a corpus of 198,451 EHR texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt (cases, n=2913) with those not preceding such an attempt (controls, n=14,727). We calculated word frequency distributions within each subcorpus to identify representative keywords for both the case and control subcorpora. We quantified the relative coverage of the 6 lexicons with respect to this list of representative keywords in terms of weighted precision, recall, and F score. Results The six lexicons achieved reasonable precision (0.53-0.68) but very low recall (0.04-0.36). Many of the most representative keywords in the suicide-related (case) subcorpus were not identified by any of the lexicons. The sentiment-bearing status of these keywords for this use case is thus doubtful. Conclusions Our findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment. We propose a set of guidelines for the creation of more suitable lexical resources for distinguishing suicide-related from non–suicide-related EHR texts.
Collapse
Affiliation(s)
- André Bittar
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Angus Roberts
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Rina Dutta
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.,South London and Maudsley NHS Foundation Trust, London, United Kingdom
| |
Collapse
|
14
|
Viani N, Botelle R, Kerwin J, Yin L, Patel R, Stewart R, Velupillai S. A natural language processing approach for identifying temporal disease onset information from mental healthcare text. Sci Rep 2021; 11:757. [PMID: 33436814 PMCID: PMC7804184 DOI: 10.1038/s41598-020-80457-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 12/21/2020] [Indexed: 11/09/2022] Open
Abstract
Receiving timely and appropriate treatment is crucial for better health outcomes, and research on the contribution of specific variables is essential. In the mental health domain, an important research variable is the date of psychosis symptom onset, as longer delays in treatment are associated with worse intervention outcomes. The growing adoption of electronic health records (EHRs) within mental health services provides an invaluable opportunity to study this problem at scale retrospectively. However, disease onset information is often only available in open text fields, requiring natural language processing (NLP) techniques for automated analyses. Since this variable can be documented at different points during a patient's care, NLP methods that model clinical and temporal associations are needed. We address the identification of psychosis onset by: 1) manually annotating a corpus of mental health EHRs with disease onset mentions, 2) modelling the underlying NLP problem as a paragraph classification approach, and 3) combining multiple onset paragraphs at the patient level to generate a ranked list of likely disease onset dates. For 22/31 test patients (71%) the correct onset date was found among the top-3 NLP predictions. The proposed approach was also applied at scale, allowing an onset date to be estimated for 2483 patients.
Collapse
Affiliation(s)
| | | | - Jack Kerwin
- IoPPN, King's College London, SE5 8AF, London, UK
| | - Lucia Yin
- IoPPN, King's College London, SE5 8AF, London, UK
| | - Rashmi Patel
- IoPPN, King's College London, SE5 8AF, London, UK
- South London and Maudsley NHS Foundation Trust, SE5 8AZ, London, UK
| | - Robert Stewart
- IoPPN, King's College London, SE5 8AF, London, UK
- South London and Maudsley NHS Foundation Trust, SE5 8AZ, London, UK
| | | |
Collapse
|
15
|
Affiliation(s)
- Robert Stewart
- King's College London, (Institute of Psychiatry, Psychology and Neuroscience), London, UK.
- South London and Maudsley NHS Foundation Trust, London, UK.
| | - Sumithra Velupillai
- King's College London, (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| |
Collapse
|
16
|
Bittar A, Velupillai S, Downs J, Sedgwick R, Dutta R. Reviewing a Decade of Research Into Suicide and Related Behaviour Using the South London and Maudsley NHS Foundation Trust Clinical Record Interactive Search (CRIS) System. Front Psychiatry 2020; 11:553463. [PMID: 33329090 PMCID: PMC7729078 DOI: 10.3389/fpsyt.2020.553463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Accepted: 10/29/2020] [Indexed: 11/13/2022] Open
Abstract
Suicide is a serious public health issue worldwide, yet current clinical methods for assessing a person's risk of taking their own life remain unreliable and new methods for assessing suicide risk are being explored. The widespread adoption of electronic health records (EHRs) has opened up new possibilities for epidemiological studies of suicide and related behaviour amongst those receiving healthcare. These types of records capture valuable information entered by healthcare practitioners at the point of care. However, much recent work has relied heavily on the structured data of EHRs, whilst much of the important information about a patient's care pathway is recorded in the unstructured text of clinical notes. Accessing and structuring text data for use in clinical research, and particularly for suicide and self-harm research, is a significant challenge that is increasingly being addressed using methods from the fields of natural language processing (NLP) and machine learning (ML). In this review, we provide an overview of the range of suicide-related studies that have been carried out using the Clinical Records Interactive Search (CRIS): a database for epidemiological and clinical research that contains de-identified EHRs from the South London and Maudsley NHS Foundation Trust. We highlight the variety of clinical research questions, cohorts and techniques that have been explored for suicide and related behaviour research using CRIS, including the development of NLP and ML approaches. We demonstrate how EHR data provides comprehensive material to study prevalence of suicide and self-harm in clinical populations. Structured data alone is insufficient and NLP methods are needed to more accurately identify relevant information from EHR data. We also show how the text in clinical notes provide signals for ML approaches to suicide risk assessment. We envision increased progress in the decades to come, particularly in externally validating findings across multiple sites and countries, both in terms of clinical evidence and in terms of NLP and machine learning method transferability.
Collapse
Affiliation(s)
- André Bittar
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Johnny Downs
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Rosemary Sedgwick
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Rina Dutta
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| |
Collapse
|
17
|
Widnall E, Grant CE, Wang T, Cross L, Velupillai S, Roberts A, Stewart R, Simonoff E, Downs J. User Perspectives of Mood-Monitoring Apps Available to Young People: Qualitative Content Analysis. JMIR Mhealth Uhealth 2020; 8:e18140. [PMID: 33037875 PMCID: PMC7585773 DOI: 10.2196/18140] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 06/04/2020] [Accepted: 08/11/2020] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Mobile health apps are increasingly available and used in a clinical context to monitor young people's mood and mental health. Despite the benefits of accessibility and cost-effectiveness, consumer engagement remains a hurdle for uptake and continued use. Hundreds of mood-monitoring apps are publicly available to young people on app stores; however, few studies have examined consumer perspectives. App store reviews held on Google and Apple platforms provide a large, rich source of naturally generated, publicly available user reviews. Although commercial developers use these data to modify and improve their apps, to date, there has been very little in-depth evaluation of app store user reviews within scientific research, and our current understanding of what makes apps engaging and valuable to young people is limited. OBJECTIVE This study aims to gain a better understanding of what app users consider useful to encourage frequent and prolonged use of mood-monitoring apps appropriate for young people. METHODS A systematic approach was applied to the selection of apps and reviews. We identified mood-monitoring apps (n=53) by a combination of automated application programming interface (API) methods. We only included apps appropriate for young people based on app store age categories (apps available to those younger than 18 years). We subsequently downloaded all available user reviews via API data scraping methods and selected a representative subsample of reviews (n=1803) for manual qualitative content analysis. RESULTS The qualitative content analysis revealed 8 main themes: accessibility (34%), flexibility (21%), recording and representation of mood (18%), user requests (17%), reflecting on mood (16%), technical features (16%), design (13%), and health promotion (11%). A total of 6 minor themes were also identified: notification and reminders; recommendation; privacy, security, and transparency; developer; adverts; and social/community. CONCLUSIONS Users value mood-monitoring apps that can be personalized to their needs, have a simple and intuitive design, and allow accurate representation and review of complex and fluctuating moods. App store reviews are a valuable repository of user engagement feedback and provide a wealth of information about what users value in an app and what user needs are not being met. Users perceive mood-monitoring apps positively, but over 20% of reviews identified the need for improvement.
Collapse
Affiliation(s)
- Emily Widnall
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - Claire Ellen Grant
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - Tao Wang
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - Lauren Cross
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - Angus Roberts
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - Emily Simonoff
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - Johnny Downs
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| |
Collapse
|
18
|
Bhavsar V, Sanyal J, Patel R, Shetty H, Velupillai S, Stewart R, Broadbent M, MacCabe JH, Das-Munshi J, Howard LM. The association between neighbourhood characteristics and physical victimisation in men and women with mental disorders. BJPsych Open 2020; 6:e73. [PMID: 32669154 PMCID: PMC7443921 DOI: 10.1192/bjo.2020.52] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 05/15/2020] [Accepted: 06/02/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND How neighbourhood characteristics affect the physical safety of people with mental illness is unclear. AIMS To examine neighbourhood effects on physical victimisation towards people using mental health services. METHOD We developed and evaluated a machine-learning-derived free-text-based natural language processing (NLP) algorithm to ascertain clinical text referring to physical victimisation. This was applied to records on all patients attending National Health Service mental health services in Southeast London. Sociodemographic and clinical data, and diagnostic information on use of acute hospital care (from Hospital Episode Statistics, linked to Clinical Record Interactive Search), were collected in this group, defined as 'cases' and concurrently sampled controls. Multilevel logistic regression models estimated associations (odds ratios, ORs) between neighbourhood-level fragmentation, crime, income deprivation, and population density and physical victimisation. RESULTS Based on a human-rated gold standard, the NLP algorithm had a positive predictive value of 0.92 and sensitivity of 0.98 for (clinically recorded) physical victimisation. A 1 s.d. increase in neighbourhood crime was accompanied by a 7% increase in odds of physical victimisation in women and an 13% increase in men (adjusted OR (aOR) for women: 1.07, 95% CI 1.01-1.14, aOR for men: 1.13, 95% CI 1.06-1.21, P for gender interaction, 0.218). Although small, adjusted associations for neighbourhood fragmentation appeared greater in magnitude for women (aOR = 1.05, 95% CI 1.01-1.11) than men, where this association was not statistically significant (aOR = 1.00, 95% CI 0.95-1.04, P for gender interaction, 0.096). Neighbourhood income deprivation was associated with victimisation in men and women with similar magnitudes of association. CONCLUSIONS Neighbourhood factors influencing safety, as well as individual characteristics including gender, may be relevant to understanding pathways to physical victimisation towards people with mental illness.
Collapse
Affiliation(s)
- Vishal Bhavsar
- Section of Women's Mental Health, Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK
| | - Jyoti Sanyal
- Clinical Informatics, BRC Nucleus, South London and Maudsley NHS Foundation Trust, UK
| | - Rashmi Patel
- Department of Psychosis Studies, Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK
| | - Hitesh Shetty
- Clinical Informatics, BRC Nucleus, South London and Maudsley NHS Foundation Trust, UK
| | | | - Robert Stewart
- BRC Nucleus, South London and Maudsley NHS Foundation Trust, UK
| | - Matthew Broadbent
- Clinical Informatics, BRC Nucleus, South London and Maudsley NHS Foundation Trust, UK
| | | | - Jayati Das-Munshi
- Department of Health Services and Population Research, King's College London, UK
| | - Louise M. Howard
- Section of Women's Mental Health, Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK
| |
Collapse
|
19
|
Tollinton L, Metcalf AM, Velupillai S. Enhancing predictions of patient conveyance using emergency call handler free text notes for unconscious and fainting incidents reported to the London Ambulance Service. Int J Med Inform 2020; 141:104179. [PMID: 32663739 DOI: 10.1016/j.ijmedinf.2020.104179] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 04/28/2020] [Accepted: 05/13/2020] [Indexed: 11/29/2022]
Abstract
OBJECTIVE Pre-hospital emergency medical services use clinical decision support systems (CDSS) to triage calls. Call handlers often supplement this by making free text notes covering key incident information. We investigate whether machine learning approaches using features from such free text notes can improve prediction of unconscious patients who require conveyance. MATERIALS AND METHODS We analysed a subset of all London Ambulance Service calls that were triaged through the Medical Priority Dispatch System (MPDS) as involving an unconscious or fainting patient in 2018. We use and compare two machine learning algorithms: random forest (RF) and gradient boosting machine (GBM). For each incident, we predict whether the patient will be conveyed to a hospital emergency department or equivalent using as features 1) the MPDS code, 2) the free text notes and 3) the two together. We evaluate model performance using the area under the curve (AUC) metric. Given the imbalance of outcomes (patient conveyed 71 %, not conveyed 29 %), we also consider sensitivity and specificity. RESULTS Using only the MPDS code resulted in an AUC of 0.57. Using the text notes gave an improved AUC score of 0.63 and combining the two gave an AUC score of 0.64 (scores were similar for RF and GBM). GBM models scored better on sensitivity (0.93 vs 0.62 for RF in the combined model), but specificity was lower (0.17 vs. 0.56 for RF in the combined model). CONCLUSIONS Using information contained in the free text notes made by call handlers in combination with MPDS improves prediction of unconscious and fainting patients requiring conveyance to a hospital emergency department (or equivalent) when compared with machine learning models using MPDS codes only. This suggests there is some useful information in unstructured data captured by emergency call handlers that complements MPDS codes. Quantifying this gain can help inform emergency medical service policy when evaluating the decision to expand or augment existing CDSS.
Collapse
Affiliation(s)
- Liam Tollinton
- Centre for Urban Science and Progress Studies, King's College London, UK
| | | | - Sumithra Velupillai
- Centre for Urban Science and Progress Studies, King's College London, UK; Institute for Psychiatry, Psychology & Neuroscience, King's College London, UK.
| |
Collapse
|
20
|
Priou S, Viani N, Vernugopan V, Tytherleigh C, Hassan FA, Dutta R, Chalder T, Velupillai S. Clinical History Segment Extraction from Chronic Fatigue Syndrome Assessments to Model Disease Trajectories. Stud Health Technol Inform 2020; 270:98-102. [PMID: 32570354 DOI: 10.3233/shti200130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Chronic fatigue syndrome (CFS) is a long-term illness with a wide range of symptoms and condition trajectories. To improve the understanding of these, automated analysis of large amounts of patient data holds promise. Routinely documented assessments are useful for large-scale analysis, however relevant information is mainly in free text. As a first step to extract symptom and condition trajectories, natural language processing (NLP) methods are useful to identify important textual content and relevant information. In this paper, we propose an agnostic NLP method of extracting segments of patients' clinical histories in CFS assessments. Moreover, we present initial results on the advantage of using these segments to quantify and analyse the presence of certain clinically relevant concepts.
Collapse
Affiliation(s)
- Sonia Priou
- IoPPN, King's College London; NIHR Maudsley Biomedical Research Centre
| | - Natalia Viani
- IoPPN, King's College London; NIHR Maudsley Biomedical Research Centre
| | | | - Chloe Tytherleigh
- IoPPN, King's College London; NIHR Maudsley Biomedical Research Centre
| | | | - Rina Dutta
- IoPPN, King's College London; NIHR Maudsley Biomedical Research Centre
| | - Trudie Chalder
- IoPPN, King's College London; NIHR Maudsley Biomedical Research Centre
| | | |
Collapse
|
21
|
Leightley D, Pernet D, Velupillai S, Stewart RJ, Mark KM, Opie E, Murphy D, Fear NT, Stevelink SAM. The Development of the Military Service Identification Tool: Identifying Military Veterans in a Clinical Research Database Using Natural Language Processing and Machine Learning. JMIR Med Inform 2020; 8:e15852. [PMID: 32348287 PMCID: PMC7281146 DOI: 10.2196/15852] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 12/11/2019] [Accepted: 01/26/2020] [Indexed: 02/07/2023] Open
Abstract
Background Electronic health care records (EHRs) are a rich source of health-related information, with potential for secondary research use. In the United Kingdom, there is no national marker for identifying those who have previously served in the Armed Forces, making analysis of the health and well-being of veterans using EHRs difficult. Objective This study aimed to develop a tool to identify veterans from free-text clinical documents recorded in a psychiatric EHR database. Methods Veterans were manually identified using the South London and Maudsley (SLaM) Biomedical Research Centre Clinical Record Interactive Search—a database holding secondary mental health care electronic records for the SLaM National Health Service Foundation Trust. An iterative approach was taken; first, a structured query language (SQL) method was developed, which was then refined using natural language processing and machine learning to create the Military Service Identification Tool (MSIT) to identify if a patient was a civilian or veteran. Performance, defined as correct classification of veterans compared with incorrect classification, was measured using positive predictive value, negative predictive value, sensitivity, F1 score, and accuracy (otherwise termed Youden Index). Results A gold standard dataset of 6672 free-text clinical documents was manually annotated by human coders. Of these documents, 66.00% (4470/6672) were then used to train the SQL and MSIT approaches and 34.00% (2202/6672) were used for testing the approaches. To develop the MSIT, an iterative 2-stage approach was undertaken. In the first stage, an SQL method was developed to identify veterans using a keyword rule–based approach. This approach obtained an accuracy of 0.93 in correctly predicting civilians and veterans, a positive predictive value of 0.81, a sensitivity of 0.75, and a negative predictive value of 0.95. This method informed the second stage, which was the development of the MSIT using machine learning, which, when tested, obtained an accuracy of 0.97, a positive predictive value of 0.90, a sensitivity of 0.91, and a negative predictive value of 0.98. Conclusions The MSIT has the potential to be used in identifying veterans in the United Kingdom from free-text clinical documents, providing new and unique insights into the health and well-being of this population and their use of mental health care services.
Collapse
Affiliation(s)
- Daniel Leightley
- King's Centre for Military Health Research, King's College London, London, United Kingdom
| | - David Pernet
- King's Centre for Military Health Research, King's College London, London, United Kingdom
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.,South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Robert J Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.,South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Katharine M Mark
- King's Centre for Military Health Research, King's College London, London, United Kingdom
| | - Elena Opie
- King's Centre for Military Health Research, King's College London, London, United Kingdom
| | - Dominic Murphy
- King's Centre for Military Health Research, King's College London, London, United Kingdom.,Combat Stress, Letherhead, United Kingdom
| | - Nicola T Fear
- King's Centre for Military Health Research, King's College London, London, United Kingdom.,Academic Department of Military Mental Health, King's College London, London, United Kingdom
| | - Sharon A M Stevelink
- King's Centre for Military Health Research, King's College London, London, United Kingdom.,Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| |
Collapse
|
22
|
Ive J, Viani N, Kam J, Yin L, Verma S, Puntis S, Cardinal RN, Roberts A, Stewart R, Velupillai S. Generation and evaluation of artificial mental health records for Natural Language Processing. NPJ Digit Med 2020; 3:69. [PMID: 32435697 PMCID: PMC7224173 DOI: 10.1038/s41746-020-0267-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 03/13/2020] [Indexed: 11/22/2022] Open
Abstract
A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.
Collapse
Affiliation(s)
- Julia Ive
- Department of Computing, Imperial College London, London, SW7 2AZ UK
| | | | - Joyce Kam
- IoPPN, King’s College London, SE5 8AF London, UK
| | - Lucia Yin
- IoPPN, King’s College London, SE5 8AF London, UK
| | - Somain Verma
- IoPPN, King’s College London, SE5 8AF London, UK
| | - Stephen Puntis
- Department of Psychiatry, University of Oxford, Warneford Hospital, OX3 7JX Oxford, UK
| | - Rudolf N. Cardinal
- Department of Psychiatry, University of Cambridge, Downing Street, Cambridge, CB2 3EB UK
- Cambridge Biomedical Campus, Cambridgeshire and Peterborough NHS Foundation Trust, Box 190, Cambridge, CB2 0QQ UK
| | | | - Robert Stewart
- IoPPN, King’s College London, SE5 8AF London, UK
- South London and Maudsley NHS Foundation Trust, SE5 8AZ London, UK
| | | |
Collapse
|
23
|
Holden R, Mueller J, McGowan J, Sanyal J, Kikoler M, Simonoff E, Velupillai S, Downs J. Investigating Bullying as a Predictor of Suicidality in a Clinical Sample of Adolescents with Autism Spectrum Disorder. Autism Res 2020; 13:988-997. [PMID: 32198982 PMCID: PMC8647922 DOI: 10.1002/aur.2292] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/13/2020] [Accepted: 03/03/2020] [Indexed: 12/12/2022]
Abstract
For typically developing adolescents, being bullied is associated with increased risk of suicidality. Although adolescents with autism spectrum disorder (ASD) are at increased risk of both bullying and suicidality, there is very little research that examines the extent to which an experience of being bullied may increase suicidality within this specific population. To address this, we conducted a retrospective cohort study to investigate the longitudinal association between experiencing bullying and suicidality in a clinical population of 680 adolescents with ASD. Electronic health records of adolescents (13–17 years), using mental health services in South London, with a diagnosis of ASD were analyzed. Natural language processing was employed to identify mentions of bullying and suicidality in the free text fields of adolescents' clinical records. Cox regression analysis was employed to investigate the longitudinal relationship between bullying and suicidality outcomes. Reported experience of bullying in the first month of clinical contact was associated with an increased risk suicidality over the follow‐up period (hazard ratio = 1.82; 95% confidence interval = 1.28–2.59). In addition, female gender, psychosis, affective disorder diagnoses, and higher intellectual ability were all associated with suicidality at follow‐up. This study is the first to demonstrate the strength of longitudinal associations between bullying and suicidality in a clinical population of adolescents with ASD, using automated approaches to detect key life events within clinical records. Our findings provide support for identifying and dealing with bullying in schools, and for antibullying strategy's incorporation into wider suicide prevention programs for young people with ASD. Autism Res 2020, 13: 988‐997. © 2020 The Authors. Autism Research published by International Society for Autism Research published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Rachel Holden
- NIHR South London and Maudsley NHS Foundation Trust, Biomedical Research Centre, London, UK.,Canterbury Christ Church University, Canterbury, UK
| | - Joanne Mueller
- NIHR South London and Maudsley NHS Foundation Trust, Biomedical Research Centre, London, UK.,Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - John McGowan
- Canterbury Christ Church University, Canterbury, UK
| | - Jyoti Sanyal
- NIHR South London and Maudsley NHS Foundation Trust, Biomedical Research Centre, London, UK
| | | | - Emily Simonoff
- NIHR South London and Maudsley NHS Foundation Trust, Biomedical Research Centre, London, UK.,Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Sumithra Velupillai
- NIHR South London and Maudsley NHS Foundation Trust, Biomedical Research Centre, London, UK.,Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Johnny Downs
- NIHR South London and Maudsley NHS Foundation Trust, Biomedical Research Centre, London, UK.,Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| |
Collapse
|
24
|
Viani N, Kam J, Yin L, Bittar A, Dutta R, Patel R, Stewart R, Velupillai S. Temporal information extraction from mental health records to identify duration of untreated psychosis. J Biomed Semantics 2020; 11:2. [PMID: 32156302 PMCID: PMC7063705 DOI: 10.1186/s13326-020-00220-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 03/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Duration of untreated psychosis (DUP) is an important clinical construct in the field of mental health, as longer DUP can be associated with worse intervention outcomes. DUP estimation requires knowledge about when psychosis symptoms first started (symptom onset), and when psychosis treatment was initiated. Electronic health records (EHRs) represent a useful resource for retrospective clinical studies on DUP, but the core information underlying this construct is most likely to lie in free text, meaning it is not readily available for clinical research. Natural Language Processing (NLP) is a means to addressing this problem by automatically extracting relevant information in a structured form. As a first step, it is important to identify appropriate documents, i.e., those that are likely to include the information of interest. Next, temporal information extraction methods are needed to identify time references for early psychosis symptoms. This NLP challenge requires solving three different tasks: time expression extraction, symptom extraction, and temporal "linking". In this study, we focus on the first step, using two relevant EHR datasets. RESULTS We applied a rule-based NLP system for time expression extraction that we had previously adapted to a corpus of mental health EHRs from patients with a diagnosis of schizophrenia (first referrals). We extended this work by applying this NLP system to a larger set of documents and patients, to identify additional texts that would be relevant for our long-term goal, and developed a new corpus from a subset of these new texts (early intervention services). Furthermore, we added normalized value annotations ("2011-05") to the annotated time expressions ("May 2011") in both corpora. The finalized corpora were used for further NLP development and evaluation, with promising results (normalization accuracy 71-86%). To highlight the specificities of our annotation task, we also applied the final adapted NLP system to a different temporally annotated clinical corpus. CONCLUSIONS Developing domain-specific methods is crucial to address complex NLP tasks such as symptom onset extraction and retrospective calculation of duration of a preclinical syndrome. To the best of our knowledge, this is the first clinical text resource annotated for temporal entities in the mental health domain.
Collapse
Affiliation(s)
- Natalia Viani
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London, SE5 8AF UK
| | - Joyce Kam
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London, SE5 8AF UK
| | - Lucia Yin
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London, SE5 8AF UK
| | - André Bittar
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London, SE5 8AF UK
| | - Rina Dutta
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London, SE5 8AF UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Rashmi Patel
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London, SE5 8AF UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London, SE5 8AF UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London, SE5 8AF UK
| |
Collapse
|
25
|
Bittar A, Velupillai S, Roberts A, Dutta R. Text Classification to Inform Suicide Risk Assessment in Electronic Health Records. Stud Health Technol Inform 2019; 264:40-44. [PMID: 31437881 DOI: 10.3233/shti190179] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Assessing a patient's risk of an impending suicide attempt has been hampered by limited information about dynamic factors that change rapidly in the days leading up to an attempt. The storage of patient data in electronic health records (EHRs) has facilitated population-level risk assessment studies using machine learning techniques. Until recently, most such work has used only structured EHR data and excluded the unstructured text of clinical notes. In this article, we describe our experiments on suicide risk assessment, modelling the problem as a classification task. Given the wealth of text data in mental health EHRs, we aimed to assess the impact of using this data in distinguishing periods prior to a suicide attempt from those not preceding such an attempt. We compare three different feature sets, one structured and two text-based, and show that inclusion of text features significantly improves classification accuracy in suicide risk assessment.
Collapse
Affiliation(s)
- André Bittar
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,School of Electrical Engineering and Computer Science, KTH, Stockholm, Sweden
| | - Angus Roberts
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Rina Dutta
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,South London and Maudsley NHS Foundation Trust, London, UK
| |
Collapse
|
26
|
Velupillai S, Epstein S, Bittar A, Stephenson T, Dutta R, Downs J. Identifying Suicidal Adolescents from Mental Health Records Using Natural Language Processing. Stud Health Technol Inform 2019; 264:413-417. [PMID: 31437956 DOI: 10.3233/shti190254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Suicidal ideation is a risk factor for self-harm, completed suicide and can be indicative of mental health issues. Adolescents are a particularly vulnerable group, but few studies have examined suicidal behaviour prevalence in large cohorts. Electronic Health Records (EHRs) are a rich source of secondary health care data that could be used to estimate prevalence. Most EHR documentation related to suicide risk is written in free text, thus requiring Natural Language Processing (NLP) approaches. We adapted and evaluated a simple lexicon- and rule-based NLP approach to identify suicidal adolescents from a large EHR database. We developed a comprehensive manually annotated EHR reference standard and assessed NLP performance at both document and patient level on data from 200 patients ( 5000 documents). We achieved promising results (>80% f1 score at both document and patient level). Simple NLP approaches can be successfully used to identify patients who exhibit suicidal risk behaviour, and our proposed approach could be useful for other populations and settings.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,School of Electrical Engineering and Computer Science, KTH, Stockholm, Sweden
| | - Sophie Epstein
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,South London and Maudsley NHS Foundation Trust, London, UK
| | - André Bittar
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | | | - Rina Dutta
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,South London and Maudsley NHS Foundation Trust, London, UK
| | - Johnny Downs
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,South London and Maudsley NHS Foundation Trust, London, UK
| |
Collapse
|
27
|
Viani N, Kam J, Yin L, Verma S, Stewart R, Patel R, Velupillai S. Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms. Stud Health Technol Inform 2019; 264:418-422. [PMID: 31437957 DOI: 10.3233/shti190255] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
For patients with a diagnosis of schizophrenia, determining symptom onset is crucial for timely and successful intervention. In mental health records, information about early symptoms is often documented only in free text, and thus needs to be extracted to support clinical research. To achieve this, natural language processing (NLP) methods can be used. Development and evaluation of NLP systems requires manually annotated corpora. We present a corpus of mental health records annotated with temporal relations for psychosis symptoms. We propose a methodology for document selection and manual annotation to detect symptom onset information, and develop an annotated corpus. To assess the utility of the created corpus, we propose a pilot NLP system. To the best of our knowledge, this is the first temporally-annotated corpus tailored to a specific clinical use-case.
Collapse
Affiliation(s)
- Natalia Viani
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Joyce Kam
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Lucia Yin
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Somain Verma
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,South London and Maudsley NHS Foundation Trust, London, UK
| | - Rashmi Patel
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,South London and Maudsley NHS Foundation Trust, London, UK
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.,School of Electrical Engineering and Computer Science, KTH, Stockholm, Sweden
| |
Collapse
|
28
|
|
29
|
Velupillai S, Hadlaczky G, Baca-Garcia E, Gorrell GM, Werbeloff N, Nguyen D, Patel R, Leightley D, Downs J, Hotopf M, Dutta R. Risk Assessment Tools and Data-Driven Approaches for Predicting and Preventing Suicidal Behavior. Front Psychiatry 2019; 10:36. [PMID: 30814958 PMCID: PMC6381841 DOI: 10.3389/fpsyt.2019.00036] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 01/21/2019] [Indexed: 12/14/2022] Open
Abstract
Risk assessment of suicidal behavior is a time-consuming but notoriously inaccurate activity for mental health services globally. In the last 50 years a large number of tools have been designed for suicide risk assessment, and tested in a wide variety of populations, but studies show that these tools suffer from low positive predictive values. More recently, advances in research fields such as machine learning and natural language processing applied on large datasets have shown promising results for health care, and may enable an important shift in advancing precision medicine. In this conceptual review, we discuss established risk assessment tools and examples of novel data-driven approaches that have been used for identification of suicidal behavior and risk. We provide a perspective on the strengths and weaknesses of these applications to mental health-related data, and suggest research directions to enable improvement in clinical practice.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.,School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.,South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Gergö Hadlaczky
- National Center for Suicide Research and Prevention (NASP), Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, Stockholm, Sweden.,National Center for Suicide Research and Prevention (NASP), Centre for Health Economics, Informatics and Health Services Research (CHIS), Stockholm Health Care Services (SLSO), Stockholm, Sweden
| | - Enrique Baca-Garcia
- Department of Psychiatry, IIS-Jimenez Diaz Foundation, Madrid, Spain.,Department of Psychiatry, Autonoma University, Madrid, Spain.,Department of Psychiatry, General Hospital of Villalba, Madrid, Spain.,CIBERSAM, Carlos III Institute of Health, Madrid, Spain.,Department of Psychiatry, University Hospital Rey Juan Carlos, Móstoles, Spain.,Department of Psychiatry, University Hospital Infanta Elena, Valdemoro, Spain.,Department of Psychiatry, Universidad Católica del Maule, Talca, Chile
| | - Genevieve M Gorrell
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
| | - Nomi Werbeloff
- Division of Psychiatry, University College London, London, United Kingdom
| | - Dong Nguyen
- Alan Turing Institute, London, United Kingdom.,School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Rashmi Patel
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.,South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Daniel Leightley
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Johnny Downs
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.,South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Matthew Hotopf
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.,South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Rina Dutta
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.,South London and Maudsley NHS Foundation Trust, London, United Kingdom
| |
Collapse
|
30
|
Velupillai S, Suominen H, Liakata M, Roberts A, Shah AD, Morley K, Osborn D, Hayes J, Stewart R, Downs J, Chapman W, Dutta R. Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. J Biomed Inform 2018; 88:11-19. [PMID: 30368002 PMCID: PMC6986921 DOI: 10.1016/j.jbi.2018.10.005] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 10/14/2018] [Accepted: 10/15/2018] [Indexed: 12/27/2022]
Abstract
The importance of incorporating Natural Language Processing (NLP) methods in clinical informatics research has been increasingly recognized over the past years, and has led to transformative advances. Typically, clinical NLP systems are developed and evaluated on word, sentence, or document level annotations that model specific attributes and features, such as document content (e.g., patient status, or report type), document section types (e.g., current medications, past medical history, or discharge summary), named entities and concepts (e.g., diagnoses, symptoms, or treatments) or semantic attributes (e.g., negation, severity, or temporality). From a clinical perspective, on the other hand, research studies are typically modelled and evaluated on a patient- or population-level, such as predicting how a patient group might respond to specific treatments or patient monitoring over time. While some NLP tasks consider predictions at the individual or group user level, these tasks still constitute a minority. Owing to the discrepancy between scientific objectives of each field, and because of differences in methodological evaluation priorities, there is no clear alignment between these evaluation approaches. Here we provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice versa. A particular focus is placed on mental health research, an area still relatively understudied by the clinical NLP research community, but where NLP methods are of notable relevance. Recent advances in clinical NLP method development have been significant, but we propose more emphasis needs to be placed on rigorous evaluation for the field to advance further. To enable this, we provide actionable suggestions, including a minimal protocol that could be used when reporting clinical NLP method development and its evaluation.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; School of Electrical Engineering and Computer Science, KTH, Stockholm, Sweden.
| | - Hanna Suominen
- College of Engineering and Computer Science, The Australian National University, Data61/CSIRO, University of Canberra, Australia; University of Turku, Finland.
| | - Maria Liakata
- Department of Computer Science, University of Warwick/Alan Turing Institute, UK.
| | - Angus Roberts
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK.
| | - Anoop D Shah
- Institute of Health Informatics, University College London, UK; University College London NHS Foundation Trust, London, UK.
| | - Katherine Morley
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; Melbourne School of Population and Global Health, The University of Melbourne, Australia.
| | - David Osborn
- Division of Psychiatry, University College London, UK; Camden and Islington NHS Foundation Trust, London, UK.
| | - Joseph Hayes
- Division of Psychiatry, University College London, UK; Camden and Islington NHS Foundation Trust, London, UK.
| | - Robert Stewart
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| | - Johnny Downs
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| | - Wendy Chapman
- Department of Biomedical Informatics, University of Utah, United States.
| | - Rina Dutta
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, UK; South London and Maudsley NHS Foundation Trust, London, UK.
| |
Collapse
|
31
|
Abstract
This article describes the development and evaluation of a set of knowledge patterns that provide guidelines and implications of design for developers of mental health portals. The knowledge patterns were based on three foundations: (1) knowledge integration of language technology approaches; (2) experiments with language technology applications and (3) user studies of portal interaction. A mixed-methods approach was employed for the evaluation of the knowledge patterns: formative workshops with knowledge pattern experts and summative surveys with experts in specific domains. The formative evaluation improved the cohesion of the patterns. The results of the summative evaluation showed that the problems discussed in the patterns were relevant for the domain, and that the knowledge embedded was useful to solve them. Ten patterns out of thirteen achieved an average score above 4.0, which is a positive result that leads us to conclude that they can be used as guidelines for developing health portals.
Collapse
|
32
|
Fernandes AC, Dutta R, Velupillai S, Sanyal J, Stewart R, Chandran D. Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing. Sci Rep 2018; 8:7426. [PMID: 29743531 PMCID: PMC5943451 DOI: 10.1038/s41598-018-25773-2] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 04/27/2018] [Indexed: 01/11/2023] Open
Abstract
Research into suicide prevention has been hampered by methodological limitations such as low sample size and recall bias. Recently, Natural Language Processing (NLP) strategies have been used with Electronic Health Records to increase information extraction from free text notes as well as structured fields concerning suicidality and this allows access to much larger cohorts than previously possible. This paper presents two novel NLP approaches - a rule-based approach to classify the presence of suicide ideation and a hybrid machine learning and rule-based approach to identify suicide attempts in a psychiatric clinical database. Good performance of the two classifiers in the evaluation study suggest they can be used to accurately detect mentions of suicide ideation and attempt within free-text documents in this psychiatric database. The novelty of the two approaches lies in the malleability of each classifier if a need to refine performance, or meet alternate classification requirements arises. The algorithms can also be adapted to fit infrastructures of other clinical datasets given sufficient clinical recording practice knowledge, without dependency on medical codes or additional data extraction of known risk factors to predict suicidal behaviour.
Collapse
Affiliation(s)
- Andrea C Fernandes
- Institute of Psychiatry, Psychology and Neuroscience, Academic Department of Psychological Medicine, London, SE5 8AF, United Kingdom.
- UK National Institute for Health Research Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust and King's College London, London, SE5 8AZ, United Kingdom.
| | - Rina Dutta
- Institute of Psychiatry, Psychology and Neuroscience, Academic Department of Psychological Medicine, London, SE5 8AF, United Kingdom
- UK National Institute for Health Research Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust and King's College London, London, SE5 8AZ, United Kingdom
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, Academic Department of Psychological Medicine, London, SE5 8AF, United Kingdom
- UK National Institute for Health Research Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust and King's College London, London, SE5 8AZ, United Kingdom
| | - Jyoti Sanyal
- Institute of Psychiatry, Psychology and Neuroscience, Academic Department of Psychological Medicine, London, SE5 8AF, United Kingdom
- UK National Institute for Health Research Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust and King's College London, London, SE5 8AZ, United Kingdom
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, Academic Department of Psychological Medicine, London, SE5 8AF, United Kingdom
- UK National Institute for Health Research Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust and King's College London, London, SE5 8AZ, United Kingdom
| | - David Chandran
- Institute of Psychiatry, Psychology and Neuroscience, Academic Department of Psychological Medicine, London, SE5 8AF, United Kingdom
- UK National Institute for Health Research Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust and King's College London, London, SE5 8AZ, United Kingdom
| |
Collapse
|
33
|
Downs J, Velupillai S, George G, Holden R, Kikoler M, Dean H, Fernandes A, Dutta R. Detection of Suicidality in Adolescents with Autism Spectrum Disorders: Developing a Natural Language Processing Approach for Use in Electronic Health Records. AMIA Annu Symp Proc 2018; 2017:641-649. [PMID: 29854129 PMCID: PMC5977628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Over 15% of young people with autism spectrum disorders (ASD) will contemplate or attempt suicide during adolescence. Yet, there is limited evidence concerning risk factors for suicidality in childhood ASD. Electronic health records (EHRs) can be used to create retrospective clinical cohort data for large samples of children with ASD. However systems to accurately extract suicidality-related concepts need to be developed so that putative models of suicide risk in ASD can be explored. We present a systematic approach to 1) adapt Natural Language Processing (NLP) solutions to screen with high sensitivity for reference to suicidal constructs in a large clinical ASD EHR corpus (230,465 documents), and 2) evaluate within a screened subset of 500 patients, the performance of an NLP classification tool for positive and negated suicidal mentions within clinical text. When evaluated, the NLP classification tool showed high system performance for positive suicidality with precision, recall, and F1 scores all > 0.85 at a document and patient level. The application therefore provides accurate output for epidemiological research into the factors contributing to the onset and recurrence of suicidality, and potential utility within clinical settings as an automated surveillance or risk prediction tool for specialist ASD services.
Collapse
Affiliation(s)
- Johnny Downs
- Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Sumithra Velupillai
- Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- School of Computer Science and Communication, KTH, Stockholm
| | - Gkotsis George
- Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Rachel Holden
- Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- University of Canterbury, Southborough, UK
| | - Maxim Kikoler
- Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- University of Canterbury, Southborough, UK
| | - Harry Dean
- Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Andrea Fernandes
- Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Rina Dutta
- Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| |
Collapse
|
34
|
Névéol A, Dalianis H, Velupillai S, Savova G, Zweigenbaum P. Clinical Natural Language Processing in languages other than English: opportunities and challenges. J Biomed Semantics 2018; 9:12. [PMID: 29602312 PMCID: PMC5877394 DOI: 10.1186/s13326-018-0179-8] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 02/14/2018] [Indexed: 01/22/2023] Open
Abstract
Background Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. Main Body We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. Conclusion We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.
Collapse
Affiliation(s)
- Aurélie Névéol
- LIMSI, CNRS, Université Paris Saclay, Rue John von Neumann, Paris, F-91405 Orsay, France
| | | | - Sumithra Velupillai
- School of Computer Science and Communication, KTH, Stockholm, Sweden.,Institute of Psychiatry, Psychology and Neuroscience, King's College, London, UK
| | - Guergana Savova
- Children's Hospital Boston and Harvard Medical School, Boston, Massachusetts, USA
| | - Pierre Zweigenbaum
- LIMSI, CNRS, Université Paris Saclay, Rue John von Neumann, Paris, F-91405 Orsay, France
| |
Collapse
|
35
|
Jackson R, Patel R, Velupillai S, Gkotsis G, Hoyle D, Stewart R. Knowledge discovery for Deep Phenotyping serious mental illness from Electronic Mental Health records. F1000Res 2018; 7:210. [PMID: 29899974 PMCID: PMC5968362 DOI: 10.12688/f1000research.13830.2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/30/2018] [Indexed: 11/23/2022] Open
Abstract
Background: Deep Phenotyping is the precise and comprehensive analysis of phenotypic features in which the individual components of the phenotype are observed and described. In UK mental health clinical practice, most clinically relevant information is recorded as free text in the Electronic Health Record, and offers a granularity of information beyond what is expressed in most medical knowledge bases. The SNOMED CT nomenclature potentially offers the means to model such information at scale, yet given a sufficiently large body of clinical text collected over many years, it is difficult to identify the language that clinicians favour to express concepts. Methods: By utilising a large corpus of healthcare data, we sought to make use of semantic modelling and clustering techniques to represent the relationship between the clinical vocabulary of internationally recognised SMI symptoms and the preferred language used by clinicians within a care setting. We explore how such models can be used for discovering novel vocabulary relevant to the task of phenotyping Serious Mental Illness (SMI) with only a small amount of prior knowledge. Results: 20 403 terms were derived and curated via a two stage methodology. The list was reduced to 557 putative concepts based on eliminating redundant information content. These were then organised into 9 distinct categories pertaining to different aspects of psychiatric assessment. 235 concepts were found to be expressions of putative clinical significance. Of these, 53 were identified having novel synonymy with existing SNOMED CT concepts. 106 had no mapping to SNOMED CT. Conclusions: We demonstrate a scalable approach to discovering new concepts of SMI symptomatology based on real-world clinical observation. Such approaches may offer the opportunity to consider broader manifestations of SMI symptomatology than is typically assessed via current diagnostic frameworks, and create the potential for enhancing nomenclatures such as SNOMED CT based on real-world expressions.
Collapse
Affiliation(s)
- Richard Jackson
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK
| | - Rashmi Patel
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK.,South London and Maudsley NHS Foundation Trust, London, SE5 8AZ, UK
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK.,School of Computer Science and Communication, TH Royal Institute of Technology, Stockholm, SE-100 44, Sweden
| | - George Gkotsis
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK
| | | | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK.,South London and Maudsley NHS Foundation Trust, London, SE5 8AZ, UK
| |
Collapse
|
36
|
Gkotsis G, Oellrich A, Velupillai S, Liakata M, Hubbard TJP, Dobson RJB, Dutta R. Corrigendum: Characterisation of mental health conditions in social media using Informed Deep Learning. Sci Rep 2017; 7:46813. [PMID: 28507325 PMCID: PMC5432835 DOI: 10.1038/srep46813] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
37
|
Gkotsis G, Oellrich A, Velupillai S, Liakata M, Hubbard TJP, Dobson RJB, Dutta R. Characterisation of mental health conditions in social media using Informed Deep Learning. Sci Rep 2017; 7:45141. [PMID: 28327593 PMCID: PMC5361083 DOI: 10.1038/srep45141] [Citation(s) in RCA: 86] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 02/15/2017] [Indexed: 11/09/2022] Open
Abstract
The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients' own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of 'in the moment' daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.
Collapse
Affiliation(s)
| | | | - Sumithra Velupillai
- King’s College London, IoPPN, London, SE5 8AF, UK
- School of Computer Science and Communication, KTH, Stockholm
| | - Maria Liakata
- Department of Computer Science, University of Warwick, Coventry
| | - Tim J. P. Hubbard
- King’s College London, Department of Medical & Molecular Genetics, London, SE1 9RT
| | - Richard J. B. Dobson
- King’s College London, IoPPN, London, SE5 8AF, UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT, UK
| | - Rina Dutta
- King’s College London, IoPPN, London, SE5 8AF, UK
| |
Collapse
|
38
|
Mowery DL, South BR, Christensen L, Leng J, Peltonen LM, Salanterä S, Suominen H, Martinez D, Velupillai S, Elhadad N, Savova G, Pradhan S, Chapman WW. Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. J Biomed Semantics 2016; 7:43. [PMID: 27370271 PMCID: PMC4930590 DOI: 10.1186/s13326-016-0084-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 06/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. METHODS In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. RESULTS The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. CONCLUSION Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.
Collapse
Affiliation(s)
- Danielle L Mowery
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
| | - Brett R South
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Lee Christensen
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Jianwei Leng
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Laura-Maria Peltonen
- Nursing Science, University of Turku, and Turku University Hospital, Turku, Finland
| | - Sanna Salanterä
- Nursing Science, University of Turku, and Turku University Hospital, Turku, Finland
| | - Hanna Suominen
- Data61, CSIRO, The Australian National University, University of Canberra, and University of Turku, Locked Bag 8001, Canberra, 2601, ACT, Australia
| | - David Martinez
- MedWhat.com, San Francisco, CA, USA.,University of Melbourne, Parkville, VIC, Australia
| | - Sumithra Velupillai
- Department of Computer and Systems Sciences (DSV), Stockholm University, Stockholm, Sweden
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Guergana Savova
- Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Sameer Pradhan
- Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Wendy W Chapman
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
39
|
Velupillai S, Mowery DL, Abdelrahman S, Christensen L, Chapman WW. Towards a Generalizable Time Expression Model for Temporal Reasoning in Clinical Notes. AMIA Annu Symp Proc 2015; 2015:1252-1259. [PMID: 26958265 PMCID: PMC4765564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Accurate temporal identification and normalization is imperative for many biomedical and clinical tasks such as generating timelines and identifying phenotypes. A major natural language processing challenge is developing and evaluating a generalizable temporal modeling approach that performs well across corpora and institutions. Our long-term goal is to create such a model. We initiate our work on reaching this goal by focusing on temporal expression (TIMEX3) identification. We present a systematic approach to 1) generalize existing solutions for automated TIMEX3 span detection, and 2) assess similarities and differences by various instantiations of TIMEX3 models applied on separate clinical corpora. When evaluated on the 2012 i2b2 and the 2015 Clinical TempEval challenge corpora, our conclusion is that our approach is successful - we achieve competitive results for automated classification, and we identify similarities and differences in TIMEX3 modeling that will be informative in the development of a simplified, general temporal model.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Department of Computer and Systems Sciences (DSV), Stockholm University, Stockholm, Sweden; Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | - Danielle L Mowery
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | - Samir Abdelrahman
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT; Computer Science Department, Faculty of Computers and Information, Cairo University, Egypt
| | - Lee Christensen
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | - Wendy W Chapman
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| |
Collapse
|
40
|
Abstract
OBJECTIVES We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis. METHODS We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers. RESULTS Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis. Three key clinical NLP subtasks that enable such analysis were identified: 1) developing more efficient methods for corpus creation (annotation and de-identification), 2) generating building blocks for extracting meaning (morphological, syntactic, and semantic subtasks), and 3) leveraging NLP for clinical utility (NLP applications and infrastructure for clinical use cases). Finally, we provide a reflection upon most recent developments and potential areas of future NLP development and applications. CONCLUSIONS There has been an increase of advances within key NLP subtasks that support semantic analysis. Performance of NLP semantic analysis is, in many cases, close to that of agreement between humans. The creation and release of corpora annotated with complex semantic information models has greatly supported the development of new tools and approaches. Research on non-English languages is continuously growing. NLP methods have sometimes been successfully employed in real-world clinical tasks. However, there is still a gap between the development of advanced resources and their utilization in clinical settings. A plethora of new clinical use cases are emerging due to established health care initiatives and additional patient-generated sources through the extensive use of social media and other devices.
Collapse
Affiliation(s)
- S Velupillai
- Sumithra Velupillai, Department of Computer and Systems Sciences, Stockholm University, Postbox 7003, 164 07 Kista, Sweden, Tel: +46 8 161 174, Fax: +46 8 703 9025, E-mail:
| | | | | | | | | |
Collapse
|
41
|
Velupillai S, Duneld M, Henriksson A, Kvist M, Skeppstedt M, Dalianis H. Louhi 2014: Special issue on health text mining and information analysis. BMC Med Inform Decis Mak 2015; 15 Suppl 2:S1. [PMID: 26099575 PMCID: PMC4474544 DOI: 10.1186/1472-6947-15-s2-s1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
42
|
Velupillai S, Skeppstedt M, Kvist M, Mowery D, Chapman BE, Dalianis H, Chapman WW. Cue-based assertion classification for Swedish clinical text--developing a lexicon for pyConTextSwe. Artif Intell Med 2014; 61:137-44. [PMID: 24556644 PMCID: PMC4104142 DOI: 10.1016/j.artmed.2014.01.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2013] [Revised: 12/19/2013] [Accepted: 01/10/2014] [Indexed: 11/17/2022]
Abstract
OBJECTIVE The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. METHODS AND MATERIAL We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe's performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system's final performance. RESULTS Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83% F-score, overall). The system's final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. CONCLUSIONS We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden.
| | - Maria Skeppstedt
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden.
| | - Maria Kvist
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden; Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, Widerström Building, Tomtebodavägen 18A, Solna, Sweden.
| | - Danielle Mowery
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, BAUM 423, Pittsburgh, PA 15206-3701, United States.
| | - Brian E Chapman
- Department of Radiology, University of Utah, 729 Arapeen Drive, Salt Lake City, UT 84108, United States.
| | - Hercules Dalianis
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden.
| | - Wendy W Chapman
- Department of Biomedical Informatics, University of Utah, 26 South 2000 East, Room 5775 HSEB, Salt Lake City, UT 84112-5775, United States.
| |
Collapse
|
43
|
Lövestam E, Velupillai S, Kvist M. Abbreviations in Swedish Clinical Text--use by three professions. Stud Health Technol Inform 2014; 205:720-724. [PMID: 25160281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
A list of 266 abbreviations from dieticians' notes in patient records was used to extract the same abbreviations from patient records written by three professions: dieticians, nurses and physicians. A context analysis of 40 of the abbreviations showed that ambiguous meanings were common. Abbreviations used by dieticians were found to be used by other professions, but not always with the same meaning. This ambiguity of abbreviations might cause misunderstandings and put patient safety at risk.
Collapse
Affiliation(s)
- Elin Lövestam
- Dept. of Food, Nutrition and Dietetics, Uppsala university, Uppsala, Sweden
| | - Sumithra Velupillai
- Dept. of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden
| | - Maria Kvist
- Dept. of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden
| |
Collapse
|
44
|
Suominen H, Salanterä S, Velupillai S, Chapman WW, Savova G, Elhadad N, Pradhan S, South BR, Mowery DL, Jones GJF, Leveling J, Kelly L, Goeuriot L, Martinez D, Zuccon G. Overview of the ShARe/CLEF eHealth Evaluation Lab 2013. Lecture Notes in Computer Science 2013. [DOI: 10.1007/978-3-642-40802-1_24] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
45
|
Chapman WW, Hillert D, Velupillai S, Kvist M, Skeppstedt M, Chapman BE, Conway M, Tharp M, Mowery DL, Deleger L. Extending the NegEx lexicon for multiple languages. Stud Health Technol Inform 2013; 192:677-681. [PMID: 23920642 PMCID: PMC3923890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We translated an existing English negation lexicon (NegEx) to Swedish, French, and German and compared the lexicon on corpora from each language. We observed Zipf's law for all languages, i.e., a few phrases occur a large number of times, and a large number of phrases occur fewer times. Negation triggers "no" and "not" were common for all languages; however, other triggers varied considerably. The lexicon is available in OWL and RDF format and can be extended to other languages. We discuss the challenges in translating negation triggers to other languages and issues in representing multilingual lexical knowledge.
Collapse
Affiliation(s)
- Wendy W Chapman
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Allvin H, Carlsson E, Dalianis H, Danielsson-Ojala R, Daudaravičius V, Hassel M, Kokkinakis D, Lundgrén-Laine H, Nilsson GH, Nytrø Ø, Salanterä S, Skeppstedt M, Suominen H, Velupillai S. Characteristics of Finnish and Swedish intensive care nursing narratives: a comparative analysis to support the development of clinical language technologies. J Biomed Semantics 2011; 2 Suppl 3:S1. [PMID: 21992572 PMCID: PMC3194173 DOI: 10.1186/2041-1480-2-s3-s1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Free text is helpful for entering information into electronic health records, but reusing it is a challenge. The need for language technology for processing Finnish and Swedish healthcare text is therefore evident; however, Finnish and Swedish are linguistically very dissimilar. In this paper we present a comparison of characteristics in Finnish and Swedish free-text nursing narratives from intensive care. This creates a framework for characterising and comparing clinical text and lays the groundwork for developing clinical language technologies. METHODS Our material included daily nursing narratives from one intensive care unit in Finland and one in Sweden. Inclusion criteria for patients were an inpatient period of least five days and an age of at least 16 years. We performed a comparative analysis as part of a collaborative effort between Finnish- and Swedish-speaking healthcare and language technology professionals that included both qualitative and quantitative aspects. The qualitative analysis addressed the content and structure of three average-sized health records from each country. In the quantitative analysis 514 Finnish and 379 Swedish health records were studied using various language technology tools. RESULTS Although the two languages are not closely related, nursing narratives in Finland and Sweden had many properties in common. Both made use of specialised jargon and their content was very similar. However, many of these characteristics were challenging regarding development of language technology to support producing and using clinical documentation. CONCLUSIONS The way Finnish and Swedish intensive care nursing was documented, was not country or language dependent, but shared a common context, principles and structural features and even similar vocabulary elements. Technology solutions are therefore likely to be applicable to a wider range of natural languages, but they need linguistic tailoring. AVAILABILITY The Finnish and Swedish data can be found at: http://www.dsv.su.se/hexanord/data/.
Collapse
Affiliation(s)
- Helen Allvin
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Elin Carlsson
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Hercules Dalianis
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Riitta Danielsson-Ojala
- Department of Nursing Science, University of Turku and Hospital District of Southwest Finland, FI-20014 University of Turku, Turku, Finland
| | - Vidas Daudaravičius
- Faculty of Informatics, Vytautas Magnus University, S. Daukanto g. 27 (301–309), LT-44249 Kaunas, Lithuania
| | - Martin Hassel
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Dimitrios Kokkinakis
- Department of Swedish, University of Gothenburg, Box 200, SE-405 30 Gothenburg, Sweden
| | - Heljä Lundgrén-Laine
- Department of Nursing Science, University of Turku and Hospital District of Southwest Finland, FI-20014 University of Turku, Turku, Finland
| | - Gunnar H Nilsson
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Øystein Nytrø
- Department of Computer and Information Science, Norwegian University of Science and Technology, Sem Sælands vei 7-9, NO-7491 Trondheim, Norway
| | - Sanna Salanterä
- Department of Nursing Science, University of Turku and Hospital District of Southwest Finland, FI-20014 University of Turku, Turku, Finland
| | - Maria Skeppstedt
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Hanna Suominen
- NICTA, Canberra Research Laboratory and Australian National University, College of Engineering and Computer Science, Locked Bag 8001, ACT-2601, Canberra, Australia
| | - Sumithra Velupillai
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| |
Collapse
|
47
|
Velupillai S, Dalianis H, Kvist M. Factuality levels of diagnoses in Swedish clinical text. Stud Health Technol Inform 2011; 169:559-563. [PMID: 21893811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Different levels of knowledge certainty, or factuality levels, are expressed in clinical health record documentation. This information is currently not fully exploited, as the subtleties expressed in natural language cannot easily be machine analyzed. Extracting relevant information from knowledge-intensive resources such as electronic health records can be used for improving health care in general by e.g. building automated information access systems. We present an annotation model of six factuality levels linked to diagnoses in Swedish clinical assessments from an emergency ward. Our main findings are that overall agreement is fairly high (0.7/0.58 F-measure, 0.73/0.6 Cohen's κ, Intra/Inter). These distinctions are important for knowledge models, since only approx. 50% of the diagnoses are affirmed with certainty. Moreover, our results indicate that there are patterns inherent in the diagnosis expressions themselves conveying factuality levels, showing that certainty is not only dependent on context cues.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Dept. of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | | | | |
Collapse
|
48
|
Dalianis H, Velupillai S. De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields. J Biomed Semantics 2010; 1:6. [PMID: 20618985 PMCID: PMC2895734 DOI: 10.1186/2041-1480-1-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 04/12/2010] [Indexed: 12/05/2022] Open
Abstract
Background In order to perform research on the information contained in Electronic Patient Records (EPRs), access to the data itself is needed. This is often very difficult due to confidentiality regulations. The data sets need to be fully de-identified before they can be distributed to researchers. De-identification is a difficult task where the definitions of annotation classes are not self-evident. Results We present work on the creation of two refined variants of a manually annotated Gold standard for de-identification, one created automatically, and one created through discussions among the annotators. The data is a subset from the Stockholm EPR Corpus, a data set available within our research group. These are used for the training and evaluation of an automatic system based on the Conditional Random Fields algorithm. Evaluating with four-fold cross-validation on sets of around 4-6 000 annotation instances, we obtained very promising results for both Gold Standards: F-score around 0.80 for a number of experiments, with higher results for certain annotation classes. Moreover, 49 false positives that were verified true positives were found by the system but missed by the annotators. Conclusions Our intention is to make this Gold standard, The Stockholm EPR PHI Corpus, available to other research groups in the future. Despite being slightly more time-consuming we believe the manual consensus gold standard is the most valuable for further research. We also propose a set of annotation classes to be used for similar de-identification tasks.
Collapse
Affiliation(s)
- Hercules Dalianis
- Department of Computer and Systems Sciences, (DSV), Stockholm University Forum 100, 164 40 Kista, Sweden.
| | | |
Collapse
|
49
|
Velupillai S, Dalianis H, Hassel M, Nilsson GH. Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial. Int J Med Inform 2009; 78:e19-26. [PMID: 19482543 DOI: 10.1016/j.ijmedinf.2009.04.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2008] [Revised: 03/02/2009] [Accepted: 04/09/2009] [Indexed: 11/26/2022]
Abstract
BACKGROUND Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Protected Health Information (PHI)-annotation trial for EPRs written in Swedish. METHODS This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic de-identification trial. Results are measured with precision, recall and F-measure. RESULTS This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results. CONCLUSION Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Department of Computer and Systems Sciences, Stockholm University/KTH, Kista, Sweden.
| | | | | | | |
Collapse
|
50
|
Davidson M, Batchelar D, Velupillai S, Denstedt J, Cunningham I. Sci-YIS Fri - 10: Tomographic composition analysis of intact urinary calculi by x-ray coherent scatter. Med Phys 2005. [DOI: 10.1118/1.2031032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|