1
|
Barcroft JF, Yom-Tov E, Lampos V, Ellis LB, Guzman D, Ponce-López V, Bourne T, Cox IJ, Saso S. Using online search activity for earlier detection of gynaecological malignancy. BMC Public Health 2024; 24:608. [PMID: 38462622 PMCID: PMC10926628 DOI: 10.1186/s12889-024-17673-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 01/04/2024] [Indexed: 03/12/2024] Open
Abstract
BACKGROUND Ovarian cancer is the most lethal and endometrial cancer the most common gynaecological cancer in the UK, yet neither have a screening program in place to facilitate early disease detection. The aim is to evaluate whether online search data can be used to differentiate between individuals with malignant and benign gynaecological diagnoses. METHODS This is a prospective cohort study evaluating online search data in symptomatic individuals (Google user) referred from primary care (GP) with a suspected cancer to a London Hospital (UK) between December 2020 and June 2022. Informed written consent was obtained and online search data was extracted via Google takeout and anonymised. A health filter was applied to extract health-related terms for 24 months prior to GP referral. A predictive model (outcome: malignancy) was developed using (1) search queries (terms model) and (2) categorised search queries (categories model). Area under the ROC curve (AUC) was used to evaluate model performance. 844 women were approached, 652 were eligible to participate and 392 were recruited. Of those recruited, 108 did not complete enrollment, 12 withdrew and 37 were excluded as they did not track Google searches or had an empty search history, leaving a cohort of 235. RESULTS The cohort had a median age of 53 years old (range 20-81) and a malignancy rate of 26.0%. There was a difference in online search data between those with a benign and malignant diagnosis, noted as early as 360 days in advance of GP referral, when search queries were used directly, but only 60 days in advance, when queries were divided into health categories. A model using online search data from patients (n = 153) who performed health-related search and corrected for sample size, achieved its highest sample-corrected AUC of 0.82, 60 days prior to GP referral. CONCLUSIONS Online search data appears to be different between individuals with malignant and benign gynaecological conditions, with a signal observed in advance of GP referral date. Online search data needs to be evaluated in a larger dataset to determine its value as an early disease detection tool and whether its use leads to improved clinical outcomes.
Collapse
Affiliation(s)
- Jennifer F Barcroft
- Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London, W12 0HS, UK
| | | | - Vasileios Lampos
- Department of Computer Science, University College London, London, UK
| | - Laura Burney Ellis
- Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London, W12 0HS, UK
| | - David Guzman
- Department of Computer Science, University College London, London, UK
| | | | - Tom Bourne
- Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London, W12 0HS, UK
| | - Ingemar J Cox
- Department of Computer Science, University College London, London, UK
- Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Srdjan Saso
- Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London, W12 0HS, UK.
| |
Collapse
|
2
|
Yom-Tov E, Navar I, Fraenkel E, Berry JD. Identifying amyotrophic lateral sclerosis through interactions with an internet search engine. Muscle Nerve 2024; 69:40-47. [PMID: 37877320 DOI: 10.1002/mus.27991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/03/2023] [Accepted: 10/05/2023] [Indexed: 10/26/2023]
Abstract
INTRODUCTION/AIMS Amyotrophic lateral sclerosis (ALS), a motor neuron disease, remains a clinical diagnosis with an average time from onset of symptoms to diagnosis of about 1 year. Herein we examine the possibility that interactions with an internet search engine can identify people with ALS. METHODS We identified 285 anonymous Bing users whose queries indicated that they had been diagnosed with ALS and matched them to: (1) 3276 control users; and (2) 1814 users whose searches indicated they had ALS disease mimics. We tested whether the ALS group could be distinguished from controls and disease mimics based on search engine query data. Finally, we conducted a prospective validation from participants who provided access to their Bing search data. RESULTS The model distinguished between the ALS group and controls with an area under the curve (AUC) of 0.81. Model scores for the ALS group differed from the disease mimics group (rank sum test, p < .05 with Bonferroni correction). Mild cognitive impairment could not be distinguished from ALS (p > .05). In the prospective analysis, the model reached an AUC of 0.74. DISCUSSION Our results suggest that interactions with search engines should be further studied to understand the potential to act as a tool to assist in screening for ALS and to reduce diagnostic delay.
Collapse
Affiliation(s)
| | | | - Ernest Fraenkel
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - James D Berry
- Department of Neurology, Sean M. Healey and AMG Center for ALS, Massachusetts General Hospital, Boston, Massachusetts, USA
| |
Collapse
|
3
|
Hossain MA, Amenta F. Machine Learning-Based Classification of Parkinson's Disease Patients Using Speech Biomarkers. JOURNAL OF PARKINSON'S DISEASE 2024; 14:95-109. [PMID: 38160364 PMCID: PMC10836572 DOI: 10.3233/jpd-230002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/09/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND Parkinson's disease (PD) is the most prevalent neurodegenerative movement disorder and a growing health concern in demographically aging societies. The prevalence of PD among individuals over the age of 60 and 80 years has been reported to range between 1% and 4%. A timely diagnosis of PD is desirable, even though it poses challenges to medical systems. OBJECTIVE This study aimed to classify PD and healthy controls based on the analysis of voice records at different frequencies using machine learning (ML) algorithms. METHODS The voices of 252 individuals aged 33 to 87 years were recorded. Based on the voice record data, ML algorithms can distinguish PD patients and healthy controls. One binary decision variable was associated with 756 instances and 754 attributes. Voice records data were analyzed through supervised ML algorithms and pipelines. A 10-fold cross-validation method was used to validate models. RESULTS In the classification of PD patients, ML models were performed with 84.21 accuracy, 93 precision, 89 Sensitivity, 89 F1-scores, and 87 AUC. The pipeline performance improved to accuracy: 85.09, precision: 92, Sensitivity:91, F1-score: 89, and AUC: 90. The Pipeline methods improved the performance of classifying PD from voice record. CONCLUSIONS Our study demonstrated that ML classifiers and pipelines can classify PD patients based on speech biomarkers. It was found that pipelines were more effective at selecting the most relevant features from high-dimensional data and at accurately classifying PD patients and healthy controls. This approach can therefore be used for early diagnosis of initial forms of PD.
Collapse
Affiliation(s)
- Mohammad Amran Hossain
- Telemedicine and Telepharmacy Centre, School of Medicinal and Health Products Sciences, University of Camerino, Camerino, Italy
| | - Francesco Amenta
- Telemedicine and Telepharmacy Centre, School of Medicinal and Health Products Sciences, University of Camerino, Camerino, Italy
| |
Collapse
|
4
|
Cohen Zion M, Gescheit I, Levy N, Yom-Tov E. Identifying Sleep Disorders From Search Engine Activity: Combining User-Generated Data With a Clinically Validated Questionnaire. J Med Internet Res 2022; 24:e41288. [DOI: 10.2196/41288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 10/25/2022] [Accepted: 11/15/2022] [Indexed: 11/24/2022] Open
Abstract
Background
Sleep disorders are experienced by up to 40% of the population but their diagnosis is often delayed by the availability of specialists.
Objective
We propose the use of search engine activity in conjunction with a validated web-based sleep questionnaire to facilitate wide-scale screening of prevalent sleep disorders.
Methods
Search advertisements offering a web-based sleep disorder screening questionnaire were shown on the Bing search engine to individuals who indicated an interest in sleep disorders. People who clicked on the advertisements and completed the sleep questionnaire were identified as being at risk for 1 of 4 common sleep disorders. A machine learning algorithm was applied to previous search engine queries to predict their suspected sleep disorder, as identified by the questionnaire.
Results
A total of 397 users consented to participate in the study and completed the questionnaire. Of them, 132 had sufficient past query data for analysis. Our findings show that diurnal patterns of people with sleep disorders were shifted by 2-3 hours compared to those of the controls. Past query activity was predictive of sleep disorders, approaching an area under the receiver operating characteristic curve of 0.62-0.69, depending on the sleep disorder.
Conclusions
Targeted advertisements can be used as an initial screening tool for people with sleep disorders. However, search engine data are seemingly insufficient as a sole method for screening. Nevertheless, we believe that evaluable web-based information, easily collected and processed with little effort on part of the physician and with low burden on the individual, can assist in the diagnostic process and possibly drive people to seek sleep assessment and diagnosis earlier than they currently do.
Collapse
|
5
|
Kamikubo R, Wang L, Marte C, Mahmood A, Kacorri H. Data Representativeness in Accessibility Datasets: A Meta-Analysis. ASSETS. ANNUAL ACM CONFERENCE ON ASSISTIVE TECHNOLOGIES 2022; 2022:8. [PMID: 36939417 PMCID: PMC10024595 DOI: 10.1145/3517428.3544826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/21/2023]
Abstract
As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets-datasets sourced from people with disabilities and older adults-that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (e.g., gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.
Collapse
Affiliation(s)
- Rie Kamikubo
- College of Information Studies, University of Maryland, College Park, United States
| | - Lining Wang
- Department of Computer Science, University of Maryland, College Park, United States
| | - Crystal Marte
- College of Information Studies, University of Maryland, College Park, United States
| | - Amnah Mahmood
- Department of Mathematics, University of Maryland, College Park, United States
| | - Hernisa Kacorri
- College of Information Studies, University of Maryland, College Park, United States
| |
Collapse
|
6
|
Salari N, Kazeminia M, Sagha H, Daneshkhah A, Ahmadi A, Mohammadi M. The performance of various machine learning methods for Parkinson’s disease recognition: a systematic review. CURRENT PSYCHOLOGY 2022. [DOI: 10.1007/s12144-022-02949-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
7
|
Shaklai S, Gilad-Bachrach R, Yom-Tov E, Stern N. Detecting Impending Stroke From Cognitive Traits Evident in Internet Searches: Analysis of Archival Data. J Med Internet Res 2021; 23:e27084. [PMID: 34047699 PMCID: PMC8196360 DOI: 10.2196/27084] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 02/03/2021] [Accepted: 05/04/2021] [Indexed: 01/13/2023] Open
Abstract
Background Cerebrovascular disease is a leading cause of mortality and disability. Common risk assessment tools for stroke are based on the Framingham equation, which relies on traditional cardiovascular risk factors to predict an acute event in the near decade. However, no tools are currently available to predict a near/impending stroke, which might alert patients at risk to seek immediate preventive action (eg, anticoagulants for atrial fibrillation, control of hypertension). Objective Here, we propose that an algorithm based on internet search queries can identify people at increased risk for a near stroke event. Methods We analyzed queries submitted to the Bing search engine by 285 people who self-identified as having undergone a stroke event and 1195 controls with regard to attributes previously shown to reflect cognitive function. Controls included random people 60 years and above, or those of similar age who queried for one of nine control conditions. Results The model performed well against all comparator groups with an area under the receiver operating characteristic curve of 0.985 or higher and a true positive rate (at a 1% false-positive rate) above 80% for separating patients from each of the controls. The predictive power rose as the stroke date approached and if data were acquired beginning 120 days prior to the event. Good prediction accuracy was obtained for a prospective cohort of users collected 1 year later. The most predictive attributes of the model were associated with cognitive function, including the use of common queries, repetition of queries, appearance of spelling mistakes, and number of queries per session. Conclusions The proposed algorithm offers a screening test for a near stroke event. After clinical validation, this algorithm may enable the administration of rapid preventive intervention. Moreover, it could be applied inexpensively, continuously, and on a large scale with the aim of reducing stroke events.
Collapse
Affiliation(s)
- Sigal Shaklai
- Institute of Endocrinology, Metabolism and Hypertension, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel.,Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Sagol Center for Epigenetics of Aging and Metabolism, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Ran Gilad-Bachrach
- Faculty of Bio-Medical Engineering, Tel Aviv University, Tel Aviv, Israel.,Edmond J Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv, Israel
| | - Elad Yom-Tov
- Microsoft Research, Herzeliya, Israel.,Faculty of Industrial Engineering and Management, Technion, Haifa, Israel
| | - Naftali Stern
- Institute of Endocrinology, Metabolism and Hypertension, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel.,Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Sagol Center for Epigenetics of Aging and Metabolism, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| |
Collapse
|
8
|
A Study on the Essential and Parkinson’s Arm Tremor Classification. SIGNALS 2021. [DOI: 10.3390/signals2020016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
In this article, the challenge of discriminating between essential and Parkinson’s tremor is addressed. Although a variety of methods have been proposed for diagnosing the severity of these highly occurring tremor types, their rapid and effective identification, especially in their early stages, proves particularly difficult and complicated due to their wide range of causes and similarity of symptoms. To this goal, a clinical analysis was performed, where a number of volunteers including essential and Parkinson’s tremor-diagnosed patients underwent a series of pre-defined motion patterns, during which a wearable sensing setup was used to measure their lower arm tremor characteristics from multiple selected points. Extracted features from the acquired accelerometer signals were used to train classification algorithms, including decision trees, discriminant analysis, support vector machine (SVM), K-nearest neighbor (KNN) and ensemble learning algorithms, for providing a comparative study and evaluating the potential of utilizing machine learning to accurately discriminate among different tremor types. Overall, SVM related classifiers proved to be the most successful in terms of classifying between Parkinson’s, essential and no tremor diagnosed with percentages reaching up to 100% for a single accelerometer measurement at the metacarpal area. In general and in motion while holding an object position, Coarse Gaussian SVM classifier reached 82.62% accuracy.
Collapse
|
9
|
Kamikubo R, Dwivedi U, Kacorri H. Sharing Practices for Datasets Related to Accessibility and Aging. ASSETS. ANNUAL ACM CONFERENCE ON ASSISTIVE TECHNOLOGIES 2021; 1:10.1145/3441852.3471208. [PMID: 35187541 PMCID: PMC8855358 DOI: 10.1145/3441852.3471208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Datasets sourced from people with disabilities and older adults play an important role in innovation, benchmarking, and mitigating bias for both assistive and inclusive AI-infused applications. However, they are scarce. We conduct a systematic review of 137 accessibility datasets manually located across different disciplines over the last 35 years. Our analysis highlights how researchers navigate tensions between benefits and risks in data collection and sharing. We uncover patterns in data collection purpose, terminology, sample size, data types, and data sharing practices across communities of focus. We conclude by critically reflecting on challenges and opportunities related to locating and sharing accessibility datasets calling for technical, legal, and institutional privacy frameworks that are more attuned to concerns from these communities.
Collapse
Affiliation(s)
- Rie Kamikubo
- College of Information Studies University of Maryland, College Park
| | - Utkarsh Dwivedi
- College of Information Studies University of Maryland, College Park
| | - Hernisa Kacorri
- College of Information Studies University of Maryland, College Park
| |
Collapse
|
10
|
Yom-Tov E, Cherlow Y. Ethical Challenges and Opportunities Associated With the Ability to Perform Medical Screening From Interactions With Search Engines: Viewpoint. J Med Internet Res 2020; 22:e21922. [PMID: 32936082 PMCID: PMC7527909 DOI: 10.2196/21922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/09/2020] [Accepted: 08/11/2020] [Indexed: 11/13/2022] Open
Abstract
Recent research has shown the efficacy of screening for serious medical conditions from data collected while people interact with online services. In particular, queries to search engines and the interactions with them were shown to be advantageous for screening a range of conditions including diabetes, several forms of cancer, eating disorders, and depression. These screening abilities offer unique advantages in that they can serve a broad strata of the society, including people in underserved populations and in countries with poor access to medical services. However, these advantages need to be balanced against the potential harm to privacy, autonomy, and nonmaleficence, which are recognized as the cornerstones of ethical medical care. Here, we discuss these opportunities and challenges, both when collecting data to develop online screening services and when deploying them. We offer several solutions that balance the advantages of these services with the ethical challenges they pose.
Collapse
Affiliation(s)
- Elad Yom-Tov
- Microsoft Research, Herzeliya, Israel.,Faculty of Industrial Engineering and Management, Technion, Haifa, Israel
| | | |
Collapse
|