1
|
Chen H, Zhang B, Huang J. Recent advances and applications of artificial intelligence in 3D bioprinting. BIOPHYSICS REVIEWS 2024; 5:031301. [PMID: 39036708 PMCID: PMC11260195 DOI: 10.1063/5.0190208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 06/11/2024] [Indexed: 07/23/2024]
Abstract
3D bioprinting techniques enable the precise deposition of living cells, biomaterials, and biomolecules, emerging as a promising approach for engineering functional tissues and organs. Meanwhile, recent advances in 3D bioprinting enable researchers to build in vitro models with finely controlled and complex micro-architecture for drug screening and disease modeling. Recently, artificial intelligence (AI) has been applied to different stages of 3D bioprinting, including medical image reconstruction, bioink selection, and printing process, with both classical AI and machine learning approaches. The ability of AI to handle complex datasets, make complex computations, learn from past experiences, and optimize processes dynamically makes it an invaluable tool in advancing 3D bioprinting. The review highlights the current integration of AI in 3D bioprinting and discusses future approaches to harness the synergistic capabilities of 3D bioprinting and AI for developing personalized tissues and organs.
Collapse
Affiliation(s)
| | - Bin Zhang
- Department of Mechanical and Aerospace Engineering, Brunel University London, London, United Kingdom
| | - Jie Huang
- Department of Mechanical Engineering, University College London, London, United Kingdom
| |
Collapse
|
2
|
Sogandi F. Identifying diseases symptoms and general rules using supervised and unsupervised machine learning. Sci Rep 2024; 14:17956. [PMID: 39095606 PMCID: PMC11297332 DOI: 10.1038/s41598-024-69029-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 07/30/2024] [Indexed: 08/04/2024] Open
Abstract
The symptoms of diseases can vary among individuals and may remain undetected in the early stages. Detecting these symptoms is crucial in the initial stage to effectively manage and treat cases of varying severity. Machine learning has made major advances in recent years, proving its effectiveness in various healthcare applications. This study aims to identify patterns of symptoms and general rules regarding symptoms among patients using supervised and unsupervised machine learning. The integration of a rule-based machine learning technique and classification methods is utilized to extend a prediction model. This study analyzes patient data that was available online through the Kaggle repository. After preprocessing the data and exploring descriptive statistics, the Apriori algorithm was applied to identify frequent symptoms and patterns in the discovered rules. Additionally, the study applied several machine learning models for predicting diseases, including stepwise regression, support vector machine, bootstrap forest, boosted trees, and neural-boosted methods. Several predictive machine learning models were applied to the dataset to predict diseases. It was discovered that the stepwise method for fitting outperformed all competitors in this study, as determined through cross-validation conducted for each model based on established criteria. Moreover, numerous significant decision rules were extracted in the study, which can streamline clinical applications without the need for additional expertise. These rules enable the prediction of relationships between symptoms and diseases, as well as between different diseases. Therefore, the results obtained in this study have the potential to improve the performance of prediction models. We can discover diseases symptoms and general rules using supervised and unsupervised machine learning for the dataset. Overall, the proposed algorithm can support not only healthcare professionals but also patients who face cost and time constraints in diagnosing and treating these diseases.
Collapse
Affiliation(s)
- Fatemeh Sogandi
- Department of Industrial Engineering, University of Torbat Heydarieh, Torbat Heydarieh, Iran.
| |
Collapse
|
3
|
Ong SQ, Ahmad H. Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysis. PeerJ 2024; 12:e17045. [PMID: 39670104 PMCID: PMC11636683 DOI: 10.7717/peerj.17045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 02/13/2024] [Indexed: 12/14/2024] Open
Abstract
Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquito-human interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion.
Collapse
Affiliation(s)
- Song-Quan Ong
- Institute of Tropical and Conservation, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia
- Department of Ecoscience and Arctic Research Centre, Aarhus University, Aarhus, Denmark
| | - Hamdan Ahmad
- Vector Control Research Unit, Universiti Sains Malaysia, Bayan Lepas, Penang, Malaysia
| |
Collapse
|
4
|
Lossio-Ventura JA, Weger R, Lee AY, Guinee EP, Chung J, Atlas L, Linos E, Pereira F. A Comparison of ChatGPT and Fine-Tuned Open Pre-Trained Transformers (OPT) Against Widely Used Sentiment Analysis Tools: Sentiment Analysis of COVID-19 Survey Data. JMIR Ment Health 2024; 11:e50150. [PMID: 38271138 PMCID: PMC10813836 DOI: 10.2196/50150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND Health care providers and health-related researchers face significant challenges when applying sentiment analysis tools to health-related free-text survey data. Most state-of-the-art applications were developed in domains such as social media, and their performance in the health care context remains relatively unknown. Moreover, existing studies indicate that these tools often lack accuracy and produce inconsistent results. OBJECTIVE This study aims to address the lack of comparative analysis on sentiment analysis tools applied to health-related free-text survey data in the context of COVID-19. The objective was to automatically predict sentence sentiment for 2 independent COVID-19 survey data sets from the National Institutes of Health and Stanford University. METHODS Gold standard labels were created for a subset of each data set using a panel of human raters. We compared 8 state-of-the-art sentiment analysis tools on both data sets to evaluate variability and disagreement across tools. In addition, few-shot learning was explored by fine-tuning Open Pre-Trained Transformers (OPT; a large language model [LLM] with publicly available weights) using a small annotated subset and zero-shot learning using ChatGPT (an LLM without available weights). RESULTS The comparison of sentiment analysis tools revealed high variability and disagreement across the evaluated tools when applied to health-related survey data. OPT and ChatGPT demonstrated superior performance, outperforming all other sentiment analysis tools. Moreover, ChatGPT outperformed OPT, exhibited higher accuracy by 6% and higher F-measure by 4% to 7%. CONCLUSIONS This study demonstrates the effectiveness of LLMs, particularly the few-shot learning and zero-shot learning approaches, in the sentiment analysis of health-related survey data. These results have implications for saving human labor and improving efficiency in sentiment analysis tasks, contributing to advancements in the field of automated sentiment analysis.
Collapse
Affiliation(s)
| | - Rachel Weger
- School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Angela Y Lee
- Department of Communication, Stanford University, Stanford, CA, United States
| | - Emily P Guinee
- National Institute of Mental Health, National Institutes of Health, Bethesda, MD, United States
| | - Joyce Chung
- National Institute of Mental Health, National Institutes of Health, Bethesda, MD, United States
| | - Lauren Atlas
- National Center For Complementary and Alternative Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Eleni Linos
- School of Medicine, Stanford University, Stanford, CA, United States
| | - Francisco Pereira
- National Institute of Mental Health, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
5
|
Sarsam SM, Alzahrani AI, Al-Samarraie H. Early-stage pregnancy recognition on microblogs: Machine learning and lexicon-based approaches. Heliyon 2023; 9:e20132. [PMID: 37809524 PMCID: PMC10559919 DOI: 10.1016/j.heliyon.2023.e20132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 09/02/2023] [Accepted: 09/12/2023] [Indexed: 10/10/2023] Open
Abstract
Pregnancy carries high medical and psychosocial risks that could lead pregnant women to experience serious health consequences. Providing protective measures for pregnant women is one of the critical tasks during the pregnancy period. This study proposes an emotion-based mechanism to detect the early stage of pregnancy using real-time data from Twitter. Pregnancy-related emotions (e.g., anger, fear, sadness, joy, and surprise) and polarity (positive and negative) were extracted from users' tweets using NRC Affect Intensity Lexicon and SentiStrength techniques. Then, pregnancy-related terms were extracted and mapped with pregnancy-related sentiments using part-of-speech tagging and association rules mining techniques. The results showed that pregnancy tweets contained high positivity, as well as significant amounts of joy, sadness, and fear. The classification results demonstrated the possibility of using users' sentiments for early-stage pregnancy recognition on microblogs. The proposed mechanism offers valuable insights to healthcare decision-makers, allowing them to develop a comprehensive understanding of users' health status based on social media posts.
Collapse
Affiliation(s)
- Samer Muthana Sarsam
- School of Strategy and Leadership, Coventry University, Coventry, United Kingdom
| | - Ahmed Ibrahim Alzahrani
- Computer Science Department, Community College, King Saud University, Riyadh, 11437, Saudi Arabia
| | - Hosam Al-Samarraie
- School of Design, University of Leeds, Leeds, United Kingdom
- Centre for Instructional Technology and Multimedia, Universiti Sains Malaysia, Penang, Malaysia
| |
Collapse
|
6
|
Sarsam SM, Al-Samarraie H, Alzahrani AI, Shibghatullah AS. A non-invasive machine learning mechanism for early disease recognition on Twitter: The case of anemia. Artif Intell Med 2022; 134:102428. [PMID: 36462907 DOI: 10.1016/j.artmed.2022.102428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 09/10/2022] [Accepted: 10/13/2022] [Indexed: 12/14/2022]
Abstract
Social media sites, such as Twitter, provide the means for users to share their stories, feelings, and health conditions during the disease course. Anemia, the most common type of blood disorder, is recognized as a major public health problem all over the world. Yet very few studies have explored the potential of recognizing anemia from online posts. This study proposed a novel mechanism for recognizing anemia based on the associations between disease symptoms and patients' emotions posted on the Twitter platform. We used k-means and Latent Dirichlet Allocation (LDA) algorithms to group similar tweets and to identify hidden disease topics. Both disease emotions and symptoms were mapped using the Apriori algorithm. The proposed approach was evaluated using a number of classifiers. A higher prediction accuracy of 98.96 % was achieved using Sequential Minimal Optimization (SMO). The results revealed that fear and sadness emotions are dominant among anemic patients. The proposed mechanism is the first of its kind to diagnose anemia using textual information posted on social media sites. It can advance the development of intelligent health monitoring systems and clinical decision-support systems.
Collapse
Affiliation(s)
| | - Hosam Al-Samarraie
- School of Design, University of Leeds, Leeds, UK; Centre for Instructional Technology & Multimedia, Universiti Sains Malaysia, Penang, Malaysia.
| | | | | |
Collapse
|
7
|
Matsunaga N, Kamata K, Asai Y, Tsuzuki S, Sakamoto Y, Ijichi S, Akiyama T, Yu J, Yamada G, Terada M, Suzuki S, Suzuki K, Saito S, Hayakawa K, Ohmagari N. Predictive model of risk factors of High Flow Nasal Cannula using machine learning in COVID-19. Infect Dis Model 2022; 7:526-534. [PMID: 35945955 PMCID: PMC9352414 DOI: 10.1016/j.idm.2022.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/23/2022] [Accepted: 07/27/2022] [Indexed: 01/08/2023] Open
Abstract
With the rapid increase in the number of COVID-19 patients in Japan, the number of patients receiving oxygen at home has also increased rapidly, and some of these patients have died. An efficient approach to identify high-risk patients with slowly progressing and rapidly worsening COVID-19, and to avoid missing the timing of therapeutic intervention will improve patient prognosis and prevent medical complications. Patients admitted to medical institutions in Japan from November 14, 2020 to April 11, 2021 and registered in the COVID-19 Registry Japan were included. Risk factors for patients with High Flow Nasal Cannula invasive respiratory management or higher were comprehensively explored using machine learning. Age-specific cohorts were created, and severity prediction was performed for the patient surge period. We were able to obtain a model that was able to predict severe disease with a sensitivity of 57% when the specificity was set at 90% for those aged 40-59 years, and with a specificity of 50% and 43% when the sensitivity was set at 90% for those aged 60-79 years and 80 years and older, respectively. We were able to identify lactate dehydrogenase level (LDH) as an important factor in predicting the severity of illness in all age groups. Using machine learning, we were able to identify risk factors with high accuracy, and predict the severity of the disease. We plan to develop a tool that will be useful in determining the indications for hospitalisation for patients undergoing home care and early hospitalisation.
Collapse
Affiliation(s)
- Nobuaki Matsunaga
- AMR Clinical Reference Center, National Center for Global Health and Medicine, Tokyo, Japan
| | | | - Yusuke Asai
- AMR Clinical Reference Center, National Center for Global Health and Medicine, Tokyo, Japan
- Disease Control and Prevention Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Shinya Tsuzuki
- AMR Clinical Reference Center, National Center for Global Health and Medicine, Tokyo, Japan
- Disease Control and Prevention Center, National Center for Global Health and Medicine, Tokyo, Japan
| | | | | | - Takayuki Akiyama
- AMR Clinical Reference Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Jiefu Yu
- AMR Clinical Reference Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Gen Yamada
- Disease Control and Prevention Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Mari Terada
- Disease Control and Prevention Center, National Center for Global Health and Medicine, Tokyo, Japan
- Biostatistics Division, Center for Research Administration and Support, National Cancer Center, Tokyo, Japan
| | - Setsuko Suzuki
- Disease Control and Prevention Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Kumiko Suzuki
- AMR Clinical Reference Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Sho Saito
- Disease Control and Prevention Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Kayoko Hayakawa
- AMR Clinical Reference Center, National Center for Global Health and Medicine, Tokyo, Japan
- Disease Control and Prevention Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Norio Ohmagari
- AMR Clinical Reference Center, National Center for Global Health and Medicine, Tokyo, Japan
- Disease Control and Prevention Center, National Center for Global Health and Medicine, Tokyo, Japan
| |
Collapse
|
8
|
Modern Machine-Learning Predictive Models for Diagnosing Infectious Diseases. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:6902321. [PMID: 35693267 PMCID: PMC9185172 DOI: 10.1155/2022/6902321] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 04/03/2022] [Accepted: 05/26/2022] [Indexed: 12/16/2022]
Abstract
Controlling infectious diseases is a major health priority because they can spread and infect humans, thus evolving into epidemics or pandemics. Therefore, early detection of infectious diseases is a significant need, and many researchers have developed models to diagnose them in the early stages. This paper reviewed research articles for recent machine-learning (ML) algorithms applied to infectious disease diagnosis. We searched the Web of Science, ScienceDirect, PubMed, Springer, and IEEE databases from 2015 to 2022, identified the pros and cons of the reviewed ML models, and discussed the possible recommendations to advance the studies in this field. We found that most of the articles used small datasets, and few of them used real-time data. Our results demonstrated that a suitable ML technique depends on the nature of the dataset and the desired goal. Moreover, heterogeneous data could ensure the model's generalization, while big data, many features, and a hybrid model will increase the resulting performance. Furthermore, using other techniques such as deep learning and NLP to extract vast features from unstructured data is a powerful approach to enhancing the performance of ML diagnostic models.
Collapse
|
9
|
Lee S, Ma S, Meng J, Zhuang J, Peng TQ. Detecting Sentiment toward Emerging Infectious Diseases on Social Media: A Validity Evaluation of Dictionary-Based Sentiment Analysis. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19116759. [PMID: 35682341 PMCID: PMC9180278 DOI: 10.3390/ijerph19116759] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 05/27/2022] [Accepted: 05/30/2022] [Indexed: 11/30/2022]
Abstract
Despite the popularity and efficiency of dictionary-based sentiment analysis (DSA) for public health research, limited empirical evidence has been produced about the validity of DSA and potential harms to the validity of DSA. A random sample of a second-hand Ebola tweet dataset was used to evaluate the validity of DSA compared to the manual coding approach and examine the influences of textual features on the validity of DSA. The results revealed substantial inconsistency between DSA and the manual coding approach. The presence of certain textual features such as negation can partially account for the inconsistency between DSA and manual coding. The findings imply that scholars should be careful and critical about findings in disease-related public health research that use DSA. Certain textual features should be more carefully addressed in DSA.
Collapse
Affiliation(s)
- Sanguk Lee
- Department of Communication, Michigan State University, East Lansing, MI 48824, USA; (S.L.); (S.M.); (J.M.)
| | - Siyuan Ma
- Department of Communication, Michigan State University, East Lansing, MI 48824, USA; (S.L.); (S.M.); (J.M.)
| | - Jingbo Meng
- Department of Communication, Michigan State University, East Lansing, MI 48824, USA; (S.L.); (S.M.); (J.M.)
| | - Jie Zhuang
- Bob Schieffer College of Communication, Texas Christian University, Fort Worth, TX 76129, USA;
| | - Tai-Quan Peng
- Department of Communication, Michigan State University, East Lansing, MI 48824, USA; (S.L.); (S.M.); (J.M.)
- Correspondence: ; Tel.: +1-517-355-0221; Fax: +1-517-432-1192
| |
Collapse
|
10
|
Patel R, Tseng CC, Choudhry HS, Lemdani MS, Talmor G, Paskhover B. Applying Machine Learning to Determine Popular Patient Questions About Mentoplasty on Social Media. Aesthetic Plast Surg 2022; 46:2273-2279. [PMID: 35201377 DOI: 10.1007/s00266-022-02808-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 01/22/2022] [Indexed: 11/28/2022]
Abstract
PURPOSE Patient satisfaction in esthetic surgery often necessitates synergy between patient and physician goals. The authors aim to characterize patient questions before and after mentoplasty to reflect the patient perspective and enhance the physician-patient relationship. METHODS Mentoplasty reviews were gathered from Realself.com using an automated web crawler. Questions were defined as preoperative or postoperative. Each question was reviewed and characterized by the authors into general categories to best reflect the overall theme of the question. A machine learning approach was utilized to create a list of the most common patient questions, asked both preoperatively and postoperatively. RESULTS A total of 2,012 questions were collected. Of these, 1,708 (84.9%) and 304 (15.1%) preoperative and postoperative questions, respectively. The primary category for patients preoperatively was "eligibility for surgery" (86.3%), followed by "surgical techniques and logistics" (5.4%) and "cost" (5.4%). Of the postoperative questions, the most common questions were about "options to revise surgery" (44.1%), "symptoms after surgery" (27.0%), and "appearance" (26.3%). Our machine learning approach generated the 10 most common pre- and postoperative questions about mentoplasty. The majority of preoperative questions dealt with potential surgical indications, while most postoperative questions principally addressed appearance. CONCLUSIONS The majority of mentoplasty patient questions were preoperative and asked about eligibility of surgery. Our study also found a significant proportion of postoperative questions inquired about revision, suggesting a small but nontrivial subset of patients highly dissatisfied with their results. Our 10 most common preoperative and postoperative question handout can help better inform physicians about the patient perspective on mentoplasty throughout their surgical course. Level of Evidence V This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Rushi Patel
- Department of Otolaryngology - Head and Neck Surgery, Rutgers New Jersey Medical School, 90 Bergen St., Suite 8100, Newark, NJ, 07103, USA
| | - Christopher C Tseng
- Department of Otolaryngology - Head and Neck Surgery, Rutgers New Jersey Medical School, 90 Bergen St., Suite 8100, Newark, NJ, 07103, USA
| | - Hannaan S Choudhry
- Department of Otolaryngology - Head and Neck Surgery, Rutgers New Jersey Medical School, 90 Bergen St., Suite 8100, Newark, NJ, 07103, USA
| | - Mehdi S Lemdani
- Department of Otolaryngology - Head and Neck Surgery, Rutgers New Jersey Medical School, 90 Bergen St., Suite 8100, Newark, NJ, 07103, USA
| | - Guy Talmor
- Department of Otolaryngology - Head and Neck Surgery, Rutgers New Jersey Medical School, 90 Bergen St., Suite 8100, Newark, NJ, 07103, USA
| | - Boris Paskhover
- Department of Otolaryngology - Head and Neck Surgery, Rutgers New Jersey Medical School, 90 Bergen St., Suite 8100, Newark, NJ, 07103, USA.
| |
Collapse
|
11
|
Catlow J, Bray B, Morris E, Rutter M. Power of big data to improve patient care in gastroenterology. Frontline Gastroenterol 2022; 13:237-244. [PMID: 35493622 PMCID: PMC8996101 DOI: 10.1136/flgastro-2019-101239] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 05/23/2021] [Indexed: 02/04/2023] Open
Abstract
Big data is defined as being large, varied or frequently updated, and usually generated from real-world interaction. With the unprecedented availability of big data, comes an obligation to maximise its potential for healthcare improvements in treatment effectiveness, disease prevention and healthcare delivery. We review the opportunities and challenges that big data brings to gastroenterology. We review its sources for healthcare improvement in gastroenterology, including electronic medical records, patient registries and patient-generated data. Big data can complement traditional research methods in hypothesis generation, supporting studies and disseminating findings; and in some cases holds distinct advantages where traditional trials are unfeasible. There is great potential power in patient-level linkage of datasets to help quantify inequalities, identify best practice and improve patient outcomes. We exemplify this with the UK colorectal cancer repository and the potential of linkage using the National Endoscopy Database, the inflammatory bowel disease registry and the National Health Service bowel cancer screening programme. Artificial intelligence and machine learning are increasingly being used to improve diagnostics in gastroenterology, with image analysis entering clinical practice, and the potential of machine learning to improve outcome prediction and diagnostics in other clinical areas. Big data brings issues with large sample sizes, real-world biases, data curation, keeping clinical context at analysis and General Data Protection Regulation compliance. There is a tension between our obligation to use data for the common good and protecting individual patient's data. We emphasise the importance of engaging with our patients to enable them to understand their data usage as fully as they wish.
Collapse
Affiliation(s)
- Jamie Catlow
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Gastroenterology, University Hospital of North Tees, Stockton-on-Tees, UK
| | - Benjamin Bray
- Medical Director & Head of Epidemiology, EMEA Data Science, IQVIA Europe, Reading, UK
- Medicine Clinical Academic Group, King's College London, London, UK
| | - Eva Morris
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, Oxfordshire, UK
| | - Matt Rutter
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Gastroenterology, University Hospital of North Tees, Stockton-on-Tees, UK
| |
Collapse
|
12
|
Garett R, Young SD. Digital Public Health Surveillance Tools for Alcohol Use and HIV Risk Behaviors. AIDS Behav 2021; 25:333-338. [PMID: 33730254 PMCID: PMC7966886 DOI: 10.1007/s10461-021-03221-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/08/2021] [Indexed: 11/25/2022]
Abstract
There is a need for real-time and predictive data on alcohol use both broadly and specific to HIV. However, substance use and HIV data often suffer from lag times in reporting as they are typically measured from surveys, clinical case visits and other methods requiring extensive time for collection and analysis. Social big data might help to address this problem and be used to provide near real-time assessments of people's alcohol use and/or alcohol. This manuscript describes three types of social data sources (i.e., social media data, internet search data, and wearable device data) that might be used in surveillance of alcohol and HIV, and then discusses the implications and potential of implementing them as additional tools for public health surveillance.
Collapse
Affiliation(s)
- Renee Garett
- ElevateU, LLC; and Department of Informatics, University of California, Irvine, CA, USA
| | - Sean D Young
- Department of Emergency Medicine, University of California, Irvine, Irvine, CA, USA.
- University of California Institute for Prediction Technology, Department of Informatics, University of California, Irvine, Bren Hall, Irvine, CA, 6091, USA.
| |
Collapse
|
13
|
Abstract
Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 published studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, and other aspects derived. Social Opinion Mining can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. The latest developments in Social Opinion Mining beyond 2018 are also presented together with future research directions, with the aim of leaving a wider academic and societal impact in several real-world applications.
Collapse
Affiliation(s)
- Keith Cortis
- ADAPT Centre, Dublin City University, Dublin, Ireland
| | - Brian Davis
- ADAPT Centre, Dublin City University, Dublin, Ireland
| |
Collapse
|
14
|
Pavan Kumar C, Dhinesh Babu L. Fuzzy based feature engineering architecture for sentiment analysis of medical discussion over online social networks. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-202874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Sentiment analysis is widely used to retrieve the hidden sentiments in medical discussions over Online Social Networking platforms such as Twitter, Facebook, Instagram. People often tend to convey their feelings concerning their medical problems over social media platforms. Practitioners and health care workers have started to observe these discussions to assess the impact of health-related issues among the people. This helps in providing better care to improve the quality of life. Dementia is a serious disease in western countries like the United States of America and the United Kingdom, and the respective governments are providing facilities to the affected people. There is much chatter over social media platforms concerning the patients’ care, healthy measures to be followed to avoid disease, check early indications. These chatters have to be carefully monitored to help the officials take necessary precautions for the betterment of the affected. A novel Feature engineering architecture that involves feature-split for sentiment analysis of medical chatter over online social networks with the pipeline is proposed that can be used on any Machine Learning model. The proposed model used the fuzzy membership function in refining the outputs. The machine learning model has obtained sentiment score is subjected to fuzzification and defuzzification by using the trapezoid membership function and center of sums method, respectively. Three datasets are considered for comparison of the proposed and the regular model. The proposed approach delivered better results than the normal approach and is proved to be an effective approach for sentiment analysis of medical discussions over online social networks.
Collapse
Affiliation(s)
- C.S. Pavan Kumar
- School of Computer Science & Engineering, Vellore Institute of Technology, Vellore, India
| | - L.D. Dhinesh Babu
- School of Information Technology & Engineering, Vellore Institute of Technology, Vellore India
| |
Collapse
|
15
|
Alamoodi AH, Zaidan BB, Zaidan AA, Albahri OS, Mohammed KI, Malik RQ, Almahdi EM, Chyad MA, Tareq Z, Albahri AS, Hameed H, Alaa M. Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review. EXPERT SYSTEMS WITH APPLICATIONS 2021; 167:114155. [PMID: 33139966 PMCID: PMC7591875 DOI: 10.1016/j.eswa.2020.114155] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 10/23/2020] [Accepted: 10/23/2020] [Indexed: 05/05/2023]
Abstract
The COVID-19 pandemic caused by the novel coronavirus SARS-CoV-2 occurred unexpectedly in China in December 2019. Tens of millions of confirmed cases and more than hundreds of thousands of confirmed deaths are reported worldwide according to the World Health Organisation. News about the virus is spreading all over social media websites. Consequently, these social media outlets are experiencing and presenting different views, opinions and emotions during various outbreak-related incidents. For computer scientists and researchers, big data are valuable assets for understanding people's sentiments regarding current events, especially those related to the pandemic. Therefore, analysing these sentiments will yield remarkable findings. To the best of our knowledge, previous related studies have focused on one kind of infectious disease. No previous study has examined multiple diseases via sentiment analysis. Accordingly, this research aimed to review and analyse articles about the occurrence of different types of infectious diseases, such as epidemics, pandemics, viruses or outbreaks, during the last 10 years, understand the application of sentiment analysis and obtain the most important literature findings. Articles on related topics were systematically searched in five major databases, namely, ScienceDirect, PubMed, Web of Science, IEEE Xplore and Scopus, from 1 January 2010 to 30 June 2020. These indices were considered sufficiently extensive and reliable to cover our scope of the literature. Articles were selected based on our inclusion and exclusion criteria for the systematic review, with a total of n = 28 articles selected. All these articles were formed into a coherent taxonomy to describe the corresponding current standpoints in the literature in accordance with four main categories: lexicon-based models, machine learning-based models, hybrid-based models and individuals. The obtained articles were categorised into motivations related to disease mitigation, data analysis and challenges faced by researchers with respect to data, social media platforms and community. Other aspects, such as the protocol being followed by the systematic review and demographic statistics of the literature distribution, were included in the review. Interesting patterns were observed in the literature, and the identified articles were grouped accordingly. This study emphasised the current standpoint and opportunities for research in this area and promoted additional efforts towards the understanding of this research field.
Collapse
Affiliation(s)
- A H Alamoodi
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - B B Zaidan
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
- Future Technology Research Center, National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin 64002, Taiwan, ROC
| | - A A Zaidan
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - O S Albahri
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - K I Mohammed
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - R Q Malik
- Department of Engineering Technology, Universiti Tun Hussein Onn (UTHM), Batu Pahat, Malaysia
| | - E M Almahdi
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - M A Chyad
- Department of Computing, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| | - Z Tareq
- Department of Computer Science, Computer Science and Mathematics College, Tikrit University, Tikrit 34001, Iraq
| | - A S Albahri
- Informatics Institute for Postgraduate Studies (IIPS), Iraqi Commission for Computers and Informatics (ICCI), Baghdad, Iraq
| | - Hamsa Hameed
- Faculty of Human Development, Sultan Idris University of Education (UPSI), Tanjung Malim, Malaysia
| | - Musaab Alaa
- Faculty of Languages and Communication, Sultan Idris University of Education (UPSI), Tanjong Malim, Malaysia
| |
Collapse
|
16
|
Li L, Ma Z, Lee H, Lee S. Can social media data be used to evaluate the risk of human interactions during the COVID-19 pandemic? INTERNATIONAL JOURNAL OF DISASTER RISK REDUCTION : IJDRR 2021; 56:102142. [PMID: 33643835 PMCID: PMC7902209 DOI: 10.1016/j.ijdrr.2021.102142] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 01/25/2021] [Accepted: 02/16/2021] [Indexed: 06/12/2023]
Abstract
The U.S. has taken multiple measures to contain the spread of COVID-19, including the implementation of lockdown orders and social distancing practices. Evaluating social distancing is critical since it reflects the risk of close human interactions. While questionnaire surveys or mobility data-based systems have provided valuable insights, social media data can contribute as an additional instrument to help monitor the risk of human interactions during the pandemic. For this reason, this study introduced a social media-based approach that quantifies the pro/anti-lockdown ratio as an indicator of the risk of human interactions. With the aid of natural language processing and machine learning techniques, this study classified the lockdown-related tweets and quantified the pro/anti-lockdown ratio for each state over time. The anti-lockdown ratio showed a moderate and negative correlation with the state-level social distancing index on a weekly basis, suggesting that people are more likely to travel out of the state where the higher anti-lockdown level is observed. The study further showed that the perception expressed on social media could reflect people's behaviors. The findings of the study are of significance for government agencies to assess the risk of close human interactions and to evaluate their policy effectiveness in the context of social distancing and lockdown.
Collapse
Affiliation(s)
- Lingyao Li
- Department of Civil and Environmental Engineering, A. James Clark School of Engineering, University of Maryland, College Park, MD, USA
| | - Zihui Ma
- Department of Civil and Environmental Engineering, A. James Clark School of Engineering, University of Maryland, College Park, MD, USA
| | - Hyesoo Lee
- University of Maryland School of Dentistry, Baltimore, MD, USA
| | - Sanggyu Lee
- Department of Civil and Environmental Engineering, A. James Clark School of Engineering, University of Maryland, College Park, MD, USA
| |
Collapse
|
17
|
Ray A, Chaudhuri AK. Smart healthcare disease diagnosis and patient management: Innovation, improvement and skill development. MACHINE LEARNING WITH APPLICATIONS 2021. [DOI: 10.1016/j.mlwa.2020.100011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
18
|
Develop and implement unsupervised learning through hybrid FFPA clustering in large-scale datasets. Soft comput 2021. [DOI: 10.1007/s00500-020-05140-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
19
|
Alshamrani R, Althbiti A, Alshamrani Y, Alkomah F, Ma X. Model-Driven Decision Making in Multiple Sclerosis Research: Existing Works and Latest Trends. PATTERNS 2020; 1:100121. [PMID: 33294867 PMCID: PMC7691382 DOI: 10.1016/j.patter.2020.100121] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Multiple sclerosis (MS) is a neurological disorder that strikes the central nervous system. Due to the complexity of this disease, healthcare sectors are increasingly in need of shared clinical decision-making tools to provide practitioners with insightful knowledge and information about MS. These tools ought to be comprehensible by both technical and non-technical healthcare audiences. To aid this cause, this literature review analyzes the state-of-the-art decision support systems (DSSs) in MS research with a special focus on model-driven decision-making processes. The review clusters common methodologies used to support the decision-making process in classifying, diagnosing, predicting, and treating MS. This work observes that the majority of the investigated DSSs rely on knowledge-based and machine learning (ML) approaches, so the utilization of ontology and ML in the MS domain is observed to extend the scope of this review. Finally, this review summarizes the state-of-the-art DSSs, discusses the methods that have commonalities, and addresses the future work of applying DSS technologies in the MS field. Multiple sclerosis (MS) is a disorder that strikes the central nervous system of the human body. This article reviews state-of-the-art decision support systems (DSSs) in MS research, as recent studies have highlighted the importance of DSSs in the medical realm. However, the utilization of decision support systems for MS remains an open challenge. A special focus in this article is given to model-driven DSSs, which uses knowledge representation to simplify the complex process for decision makers. We find that most investigated studies use knowledge-based and machine learning approaches. Based on the literature review, we suggest some future work of applying DSSs in the MS domain. Potential future directions should focus on applying DSS technologies to understand the MS patterns, etiology, effects on the quality-of-life, and correlations with other disorders.
Collapse
Affiliation(s)
- Rayan Alshamrani
- Department of Computer Science, University of Idaho, Moscow, ID 83844-1010, USA.,Department of Information Technology, Taif University, Taif, Makkah 26571, Saudi Arabia
| | - Ashrf Althbiti
- Department of Computer Science, University of Idaho, Moscow, ID 83844-1010, USA.,Department of Information Technology, Taif University, Taif, Makkah 26571, Saudi Arabia
| | - Yara Alshamrani
- Department of Information Technology, Taif University, Taif, Makkah 26571, Saudi Arabia.,INTO Program, Washington State University, Pullman, WA 99164-3251, USA
| | - Fatimah Alkomah
- Department of Computer Science, University of Idaho, Moscow, ID 83844-1010, USA.,Department of Information Systems, Princess Nourah Bint Abdulrahman University, Riyadh 11671, Saudi Arabia
| | - Xiaogang Ma
- Department of Computer Science, University of Idaho, Moscow, ID 83844-1010, USA
| |
Collapse
|
20
|
A Comparative Performance Evaluation of Classification Algorithms for Clinical Decision Support Systems. MATHEMATICS 2020. [DOI: 10.3390/math8101814] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Classification algorithms are widely taken into account for clinical decision support systems. However, it is not always straightforward to understand the behavior of such algorithms on a multiple disease prediction task. When a new classifier is introduced, we, in most cases, will ask ourselves whether the classifier performs well on a particular clinical dataset or not. The decision to utilize classifiers mostly relies upon the type of data and classification task, thus making it often made arbitrarily. In this study, a comparative evaluation of a wide-array classifier pertaining to six different families, i.e., tree, ensemble, neural, probability, discriminant, and rule-based classifiers are dealt with. A number of real-world publicly datasets ranging from different diseases are taken into account in the experiment in order to demonstrate the generalizability of the classifiers in multiple disease prediction. A total of 25 classifiers, 14 datasets, and three different resampling techniques are explored. This study reveals that the classifier that is likely to become the best performer is the conditional inference tree forest (cforest), followed by linear discriminant analysis, generalize linear model, random forest, and Gaussian process classifier. This work contributes to existing literature regarding a thorough benchmark of classification algorithms for multiple diseases prediction.
Collapse
|
21
|
Hao Y, Xu T, Hu H, Wang P, Bai Y. Prediction and analysis of Corona Virus Disease 2019. PLoS One 2020; 15:e0239960. [PMID: 33017421 PMCID: PMC7535054 DOI: 10.1371/journal.pone.0239960] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 09/16/2020] [Indexed: 12/24/2022] Open
Abstract
The outbreak of Corona Virus Disease 2019 (COVID-19) in Wuhan has significantly impacted the economy and society globally. Countries are in a strict state of prevention and control of this pandemic. In this study, the development trend analysis of the cumulative confirmed cases, cumulative deaths, and cumulative cured cases was conducted based on data from Wuhan, Hubei Province, China from January 23, 2020 to April 6, 2020 using an Elman neural network, long short-term memory (LSTM), and support vector machine (SVM). A SVM with fuzzy granulation was used to predict the growth range of confirmed new cases, new deaths, and new cured cases. The experimental results showed that the Elman neural network and SVM used in this study can predict the development trend of cumulative confirmed cases, deaths, and cured cases, whereas LSTM is more suitable for the prediction of the cumulative confirmed cases. The SVM with fuzzy granulation can successfully predict the growth range of confirmed new cases and new cured cases, although the average predicted values are slightly large. Currently, the United States is the epicenter of the COVID-19 pandemic. We also used data modeling from the United States to further verify the validity of the proposed models.
Collapse
Affiliation(s)
- Yan Hao
- School of Information and Communication Engineering, North University of China, Taiyuan, China
| | - Ting Xu
- Department of Mathematics, School of Science, North University of China, Taiyuan, China
| | - Hongping Hu
- Department of Mathematics, School of Science, North University of China, Taiyuan, China
| | - Peng Wang
- Department of Mathematics, School of Science, North University of China, Taiyuan, China
| | - Yanping Bai
- Department of Mathematics, School of Science, North University of China, Taiyuan, China
| |
Collapse
|
22
|
Predicting WNV Circulation in Italy Using Earth Observation Data and Extreme Gradient Boosting Model. REMOTE SENSING 2020. [DOI: 10.3390/rs12183064] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
West Nile Disease (WND) is one of the most spread zoonosis in Italy and Europe caused by a vector-borne virus. Its transmission cycle is well understood, with birds acting as the primary hosts and mosquito vectors transmitting the virus to other birds, while humans and horses are occasional dead-end hosts. Identifying suitable environmental conditions across large areas containing multiple species of potential hosts and vectors can be difficult. The recent and massive availability of Earth Observation data and the continuous development of innovative Machine Learning methods can contribute to automatically identify patterns in big datasets and to make highly accurate identification of areas at risk. In this paper, we investigated the West Nile Virus (WNV) circulation in relation to Land Surface Temperature, Normalized Difference Vegetation Index and Surface Soil Moisture collected during the 160 days before the infection took place, with the aim of evaluating the predictive capacity of lagged remotely sensed variables in the identification of areas at risk for WNV circulation. WNV detection in mosquitoes, birds and horses in 2017, 2018 and 2019, has been collected from the National Information System for Animal Disease Notification. An Extreme Gradient Boosting model was trained with data from 2017 and 2018 and tested for the 2019 epidemic, predicting the spatio-temporal WNV circulation two weeks in advance with an overall accuracy of 0.84. This work lays the basis for a future early warning system that could alert public authorities when climatic and environmental conditions become favourable to the onset and spread of WNV.
Collapse
|
23
|
Di Cara NH, Boyd A, Tanner AR, Al Baghal T, Calderwood L, Sloan LS, Davis OSP, Haworth CMA. Views on social media and its linkage to longitudinal data from two generations of a UK cohort study. Wellcome Open Res 2020. [PMID: 32904854 DOI: 10.12688/wellcomeopenres.15755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Background: Cohort studies gather huge volumes of information about a range of phenotypes but new sources of information such as social media data are yet to be integrated. Participant's long-term engagement with cohort studies, as well as the potential for their social media data to be linked to other longitudinal data, may give participants a unique perspective on the acceptability of this growing research area. Methods: Two focus groups explored participant views towards the acceptability and best practice for the collection of social media data for research purposes. Participants were drawn from the Avon Longitudinal Study of Parents and Children cohort; individuals from the index cohort of young people (N=9) and from the parent generation (N=5) took part in two separate 90-minute focus groups. The discussions were audio recorded and subjected to qualitative analysis. Results: Participants were generally supportive of the collection of social media data to facilitate health and social research. They felt that their trust in the cohort study would encourage them to do so. Concern was expressed about the collection of data from friends or connections who had not consented. In terms of best practice for collecting the data, participants generally preferred the use of anonymous data derived from social media to be shared with researchers. Conclusion: Cohort studies have trusting relationships with their participants; for this relationship to extend to linking their social media data with longitudinal information, procedural safeguards are needed. Participants understand the goals and potential of research integrating social media data into cohort studies, but further research is required on the acquisition of their friend's data. The views gathered from participants provide important guidance for future work seeking to integrate social media in cohort studies.
Collapse
Affiliation(s)
- Nina H Di Cara
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK.,Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Andy Boyd
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Alastair R Tanner
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK.,Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Tarek Al Baghal
- Institute for Social and Economic Research, University of Essex, Colchester, Essex, CO4 3SQ, UK
| | - Lisa Calderwood
- Centre for Longitudinal Studies, University College London, London, WC1H 0NU, UK
| | - Luke S Sloan
- School of Social Sciences, Cardiff University, Cardiff, CF10 3AT, UK
| | - Oliver S P Davis
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK.,Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK.,The Alan Turing Institute, British Library, London, NW1 2DB, UK
| | - Claire M A Haworth
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK.,Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK.,The Alan Turing Institute, British Library, London, NW1 2DB, UK.,School of Psychological Science, University of Bristol, Bristol, BS8 1TU, UK
| |
Collapse
|
24
|
Di Cara NH, Boyd A, Tanner AR, Al Baghal T, Calderwood L, Sloan LS, Davis OSP, Haworth CMA. Views on social media and its linkage to longitudinal data from two generations of a UK cohort study. Wellcome Open Res 2020; 5:44. [PMID: 32904854 PMCID: PMC7459850 DOI: 10.12688/wellcomeopenres.15755.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/06/2020] [Indexed: 11/22/2022] Open
Abstract
Background: Cohort studies gather huge volumes of information about a range of phenotypes but new sources of information such as social media data are yet to be integrated. Participant's long-term engagement with cohort studies, as well as the potential for their social media data to be linked to other longitudinal data, may give participants a unique perspective on the acceptability of this growing research area. Methods: Two focus groups explored participant views towards the acceptability and best practice for the collection of social media data for research purposes. Participants were drawn from the Avon Longitudinal Study of Parents and Children cohort; individuals from the index cohort of young people (N=9) and from the parent generation (N=5) took part in two separate 90-minute focus groups. The discussions were audio recorded and subjected to qualitative analysis. Results: Participants were generally supportive of the collection of social media data to facilitate health and social research. They felt that their trust in the cohort study would encourage them to do so. Concern was expressed about the collection of data from friends or connections who had not consented. In terms of best practice for collecting the data, participants generally preferred the use of anonymous data derived from social media to be shared with researchers. Conclusion: Cohort studies have trusting relationships with their participants; for this relationship to extend to linking their social media data with longitudinal information, procedural safeguards are needed. Participants understand the goals and potential of research integrating social media data into cohort studies, but further research is required on the acquisition of their friend's data. The views gathered from participants provide important guidance for future work seeking to integrate social media in cohort studies.
Collapse
Affiliation(s)
- Nina H. Di Cara
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Andy Boyd
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Alastair R. Tanner
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Tarek Al Baghal
- Institute for Social and Economic Research, University of Essex, Colchester, Essex, CO4 3SQ, UK
| | - Lisa Calderwood
- Centre for Longitudinal Studies, University College London, London, WC1H 0NU, UK
| | - Luke S. Sloan
- School of Social Sciences, Cardiff University, Cardiff, CF10 3AT, UK
| | - Oliver S. P. Davis
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- The Alan Turing Institute, British Library, London, NW1 2DB, UK
| | - Claire M. A. Haworth
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- The Alan Turing Institute, British Library, London, NW1 2DB, UK
- School of Psychological Science, University of Bristol, Bristol, BS8 1TU, UK
| |
Collapse
|
25
|
Abstract
Objectives: In this study, we carried out a text analysis on the information disseminated and discussed among netizens on the Baidu Post Bar (the world’s largest Chinese forum) during the coronavirus disease 2019 (COVID-19) epidemic, to create a policy basis for health administrative departments. Methods: We used Python tools to search for the relevant data on the Baidu Post Bar. Next, a text analysis was performed on the posts’ contents using a combination of latent Dirichlet allocation (LDA), sentiment analysis, and correlation analysis. Results: According to the LDA analysis, the public was highly interested in topics such as COVID-19 prevention, infection symptoms, infection and coping measures, sources of transmission and treatments, community management, and work resumption. The majority of the public had negative emotional values, yet a portion of the public held positive emotional values. We also performed a correlation analysis of the influencing factors was established. Conclusions: Netizens’ degree of concern shown in their posts was greatly associated with the spread of COVID-19. With the rise, diffusion, outbreak, and mitigation of COVID-19 in China, netizens have successively created a large number of posts, and the topics of discussion varied over time. Therefore, the media and the government have the responsibility to distribute positive information, to correctly guide the public’s emotions to bring some sort of reassurance to the public.
Collapse
|
26
|
Bansal A, Padappayil RP, Garg C, Singal A, Gupta M, Klein A. Utility of Artificial Intelligence Amidst the COVID 19 Pandemic: A Review. J Med Syst 2020; 44:156. [PMID: 32740678 PMCID: PMC7395799 DOI: 10.1007/s10916-020-01617-3] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 07/15/2020] [Indexed: 01/07/2023]
Abstract
The term machine learning refers to a collection of tools used for identifying patterns in data. As opposed to traditional methods of pattern identification, machine learning tools relies on artificial intelligence to map out patters from large amounts of data, can self-improve as and when new data becomes available and is quicker in accomplishing these tasks. This review describes various techniques of machine learning that have been used in the past in the prediction, detection and management of infectious diseases, and how these tools are being brought into the battle against COVID-19. In addition, we also discuss their applications in various stages of the pandemic, the advantages, disadvantages and possible pit falls.
Collapse
Affiliation(s)
- Agam Bansal
- Internal Medicine, Cleveland Clinic, Cleveland, OH USA
| | | | - Chandan Garg
- Deptartment of Statistics, Columbia University, New York, NY USA
| | - Anjali Singal
- Deptartment of Anatomy, All India Institute of Medical Sciences, Bathinda, India
| | - Mohak Gupta
- All India Institute of Medical Sciences, New Delhi, India
| | - Allan Klein
- Deptartment of Cardiology, Cleveland Clinic, Cleveland, OH USA
| |
Collapse
|
27
|
Edo-Osagie O, De La Iglesia B, Lake I, Edeghere O. A scoping review of the use of Twitter for public health research. Comput Biol Med 2020; 122:103770. [PMID: 32502758 PMCID: PMC7229729 DOI: 10.1016/j.compbiomed.2020.103770] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 04/01/2020] [Accepted: 04/17/2020] [Indexed: 11/25/2022]
Abstract
Public health practitioners and researchers have used traditional medical databases to study and understand public health for a long time. Recently, social media data, particularly Twitter, has seen some use for public health purposes. Every large technological development in history has had an impact on the behaviour of society. The advent of the internet and social media is no different. Social media creates public streams of communication, and scientists are starting to understand that such data can provide some level of access into the people's opinions and situations. As such, this paper aims to review and synthesize the literature on Twitter applications for public health, highlighting current research and products in practice. A scoping review methodology was employed and four leading health, computer science and cross-disciplinary databases were searched. A total of 755 articles were retreived, 92 of which met the criteria for review. From the reviewed literature, six domains for the application of Twitter to public health were identified: (i) Surveillance; (ii) Event Detection; (iii) Pharmacovigilance; (iv) Forecasting; (v) Disease Tracking; and (vi) Geographic Identification. From our review, we were able to obtain a clear picture of the use of Twitter for public health. We gained insights into interesting observations such as how the popularity of different domains changed with time, the diseases and conditions studied and the different approaches to understanding each disease, which algorithms and techniques were popular with each domain, and more.
Collapse
Affiliation(s)
- Oduwa Edo-Osagie
- School of Computing Science, University of East Anglia, Norwich, NR4 7TJ, UK.
| | | | - Iain Lake
- School of Environmental Science, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Obaghe Edeghere
- National Infection Service, Public Health England, Birmingham, B3 2PW, UK
| |
Collapse
|
28
|
Choi S, Seo J. An Exploratory Study of the Research on Caregiver Depression: Using Bibliometrics and LDA Topic Modeling. Issues Ment Health Nurs 2020; 41:592-601. [PMID: 32286089 DOI: 10.1080/01612840.2019.1705944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Purpose: The purpose of this paper is to provide readers with a comprehensive overview of scholarly work on the depression of caregivers using bibliometrics and text mining. Methods: A total of 426 articles published between 2000 and 2018 were retrieved from the Clarivate Analytics Web of Science, and then, computer-aided bibliometric analysis as well as Latent Dirichlet Allocation (LDA) topic modeling were conducted on the collection of the data. Results: Descriptive statistics on the increasing number of publications, network analysis of scientific collaboration between countries, word co-occurrence analysis, conceptual structure, and six latent topics (k = 6) identified are discussed. Conclusions: Preventing or managing depression among caregivers is a growing field with the highest priority for the aging population. In the future, collaborating between countries and reflecting cultural backgrounds in caregiver depression research are needed. This study is expected to contribute to the field of psychological distress of caregivers in looking a big picture of the current position through data-driven analysis and moving forward towards a better direction.
Collapse
Affiliation(s)
- Soyoung Choi
- The Pennsylvania State University, College of Nursing, University Park, Pennsylvania, USA
| | - JooYoung Seo
- The Pennsylvania State University, Learning, Design, and Technology, University Park, Pennsylvania, USA
| |
Collapse
|
29
|
Di Cara NH, Boyd A, Tanner AR, Al Baghal T, Calderwood L, Sloan LS, Davis OSP, Haworth CMA. Views on social media and its linkage to longitudinal data from two generations of a UK cohort study. Wellcome Open Res 2020; 5:44. [PMID: 32904854 PMCID: PMC7459850 DOI: 10.12688/wellcomeopenres.15755.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2020] [Indexed: 03/30/2024] Open
Abstract
Background: Cohort studies gather huge volumes of information about a range of phenotypes but new sources of information such as social media data are yet to be integrated. Participant's long-term engagement with cohort studies, as well as the potential for their social media data to be linked to other longitudinal data, could provide novel advances but may also give participants a unique perspective on the acceptability of this growing research area. Methods: Two focus groups explored participant views towards the acceptability and best practice for the collection of social media data for research purposes. Participants were drawn from the Avon Longitudinal Study of Parents and Children cohort; individuals from the index cohort of young people (N=9) and from the parent generation (N=5) took part in two separate 90-minute focus groups. The discussions were audio recorded and subjected to qualitative analysis. Results: Participants were generally supportive of the collection of social media data to facilitate health and social research. They felt that their trust in the cohort study would encourage them to do so. Concern was expressed about the collection of data from friends or connections who had not consented. In terms of best practice for collecting the data, participants generally preferred the use of anonymous data derived from social media to be shared with researchers. Conclusion: Cohort studies have trusting relationships with their participants; for this relationship to extend to linking their social media data with longitudinal information, procedural safeguards are needed. Participants understand the goals and potential of research integrating social media data into cohort studies, but further research is required on the acquisition of their friend's data. The views gathered from participants provide important guidance for future work seeking to integrate social media in cohort studies.
Collapse
Affiliation(s)
- Nina H. Di Cara
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Andy Boyd
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Alastair R. Tanner
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Tarek Al Baghal
- Institute for Social and Economic Research, University of Essex, Colchester, Essex, CO4 3SQ, UK
| | - Lisa Calderwood
- Centre for Longitudinal Studies, University College London, London, WC1H 0NU, UK
| | - Luke S. Sloan
- School of Social Sciences, Cardiff University, Cardiff, CF10 3AT, UK
| | - Oliver S. P. Davis
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- The Alan Turing Institute, British Library, London, NW1 2DB, UK
| | - Claire M. A. Haworth
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UNITED KINGDOM, BS8 2BN, UK
- Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- The Alan Turing Institute, British Library, London, NW1 2DB, UK
- School of Psychological Science, University of Bristol, Bristol, BS8 1TU, UK
| |
Collapse
|
30
|
Zunic A, Corcoran P, Spasic I. Sentiment Analysis in Health and Well-Being: Systematic Review. JMIR Med Inform 2020; 8:e16023. [PMID: 32012057 PMCID: PMC7013658 DOI: 10.2196/16023] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 10/26/2019] [Accepted: 10/27/2019] [Indexed: 12/22/2022] Open
Abstract
Background Sentiment analysis (SA) is a subfield of natural language processing whose aim is to automatically classify the sentiment expressed in a free text. It has found practical applications across a wide range of societal contexts including marketing, economy, and politics. This review focuses specifically on applications related to health, which is defined as “a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity.” Objective This study aimed to establish the state of the art in SA related to health and well-being by conducting a systematic review of the recent literature. To capture the perspective of those individuals whose health and well-being are affected, we focused specifically on spontaneously generated content and not necessarily that of health care professionals. Methods Our methodology is based on the guidelines for performing systematic reviews. In January 2019, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified a total of 86 relevant studies and extracted data about the datasets analyzed, discourse topics, data creators, downstream applications, algorithms used, and their evaluation. Results The majority of data were collected from social networking and Web-based retailing platforms. The primary purpose of online conversations is to exchange information and provide social support online. These communities tend to form around health conditions with high severity and chronicity rates. Different treatments and services discussed include medications, vaccination, surgery, orthodontic services, individual physicians, and health care services in general. We identified 5 roles with respect to health and well-being among the authors of the types of spontaneously generated narratives considered in this review: a sufferer, an addict, a patient, a carer, and a suicide victim. Out of 86 studies considered, only 4 reported the demographic characteristics. A wide range of methods were used to perform SA. Most common choices included support vector machines, naïve Bayesian learning, decision trees, logistic regression, and adaptive boosting. In contrast with general trends in SA research, only 1 study used deep learning. The performance lags behind the state of the art achieved in other domains when measured by F-score, which was found to be below 60% on average. In the context of SA, the domain of health and well-being was found to be resource poor: few domain-specific corpora and lexica are shared publicly for research purposes. Conclusions SA results in the area of health and well-being lag behind those in other domains. It is yet unclear if this is because of the intrinsic differences between the domains and their respective sublanguages, the size of training datasets, the lack of domain-specific sentiment lexica, or the choice of algorithms.
Collapse
Affiliation(s)
- Anastazia Zunic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Padraig Corcoran
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Irena Spasic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
31
|
Abstract
Infectious diseases are caused by microorganisms belonging to the class of bacteria, viruses, fungi, or parasites. These pathogens are transmitted, directly or indirectly, and can lead to epidemics or even pandemics. The resulting infection may lead to mild-to-severe symptoms such as life-threatening fever or diarrhea. Infectious diseases may be asymptomatic in some individuals but may lead to disastrous effects in others. Despite the advances in medicine, infectious diseases are a leading cause of death worldwide, especially in low-income countries. With the advent of mathematical tools, scientists are now able to better predict epidemics, understand the specificity of each pathogen, and identify potential targets for drug development. Artificial intelligence and its components have been widely publicized for their ability to better diagnose certain types of cancer from imaging data. This chapter aims at identifying potential applications of machine learning in the field of infectious diseases. We are deliberately focusing on key aspects of infection: diagnosis, transmission, response to treatment, and resistance. We are proposing the use of extreme values as an avenue of interest for future developments in the field of infectious diseases. This chapter covers a series of applications selectively chosen to showcase how artificial intelligence is moving the field of infectious disease further and how it helps institutions to better tackles them, especially in low-income countries.
Collapse
Affiliation(s)
- Said Agrebi
- Yobitrust, Technopark El Gazala, Ariana, Tunisia
| | - Anis Larbi
- Singapore Immunology Network, Agency for Science, Technology and Research, Singapore, Singapore,Department of Microbiology & Immunology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|
32
|
Park S, Lee SW, Han S, Cha M. Clustering Insomnia Patterns by Data From Wearable Devices: Algorithm Development and Validation Study. JMIR Mhealth Uhealth 2019; 7:e14473. [PMID: 31804187 PMCID: PMC6923760 DOI: 10.2196/14473] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 09/30/2019] [Accepted: 10/19/2019] [Indexed: 11/13/2022] Open
Abstract
Background As societies become more complex, larger populations suffer from insomnia. In 2014, the US Centers for Disease Control and Prevention declared that sleep disorders should be dealt with as a public health epidemic. However, it is hard to provide adequate treatment for each insomnia sufferer, since various behavioral characteristics influence symptoms of insomnia collectively. Objective We aim to develop a neural-net based unsupervised user clustering method towards insomnia sufferers in order to clarify the unique traits for each derived groups. Unlike the current diagnosis of insomnia that requires qualitative analysis from interview results, the classification of individuals with insomnia by using various information modalities from smart bands and neural-nets can provide better insight into insomnia treatments. Methods This study, as part of the precision psychiatry initiative, is based on a smart band experiment conducted over 6 weeks on individuals with insomnia. During the experiment period, a total of 42 participants (19 male; average age 22.00 [SD 2.79]) from a large university wore smart bands 24/7, and 3 modalities were collected and examined: sleep patterns, daily activities, and personal demographics. We considered the consecutive daily information as a form of images, learned the latent variables of the images via a convolutional autoencoder (CAE), and clustered and labeled the input images based on the derived features. We then converted consecutive daily information into a sequence of the labels for each subject and finally clustered the people with insomnia based on their predominant labels. Results Our method identified 5 new insomnia-activity clusters of participants that conventional methods have not recognized, and significant differences in sleep and behavioral characteristics were shown among groups (analysis of variance on rank: F4,37=2.36, P=.07 for the sleep_min feature; F4,37=9.05, P<.001 for sleep_efficiency; F4,37=8.16, P<.001 for active_calorie; F4,37=6.53, P<.001 for walks; and F4,37=3.51, P=.02 for stairs). Analyzing the consecutive data through a CAE and clustering could reveal intricate connections between insomnia and various everyday activity markers. Conclusions Our research suggests that unsupervised learning allows health practitioners to devise precise and tailored interventions at the level of data-guided user clusters (ie, precision psychiatry), which could be a novel solution to treating insomnia and other mental disorders.
Collapse
Affiliation(s)
- Sungkyu Park
- Graduate School of Culture Technology, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Sang Won Lee
- Kyungpook National University Chilgok Hospital, Daegu, Republic of Korea
| | - Sungwon Han
- School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Meeyoung Cha
- School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.,Data Science Group, Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, Republic of Korea
| |
Collapse
|
33
|
Ghani NA, Hamid S, Targio Hashem IA, Ahmed E. Social media big data analytics: A survey. COMPUTERS IN HUMAN BEHAVIOR 2019. [DOI: 10.1016/j.chb.2018.08.039] [Citation(s) in RCA: 155] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
34
|
Abstract
In the last several decades, avian influenza virus has caused numerous outbreaks around the world. These outbreaks pose a significant threat to the poultry industry and also to public health. When an avian influenza (AI) outbreak occurs, it is critical to make informed decisions about the potential risks, impact, and control measures. To this end, many modeling approaches have been proposed to acquire knowledge from different sources of data and perspectives to enhance decision making. Although some of these approaches have shown to be effective, they do not follow the process of knowledge discovery in databases (KDD). KDD is an iterative process, consisting of five steps, that aims at extracting unknown and useful information from the data. The present review attempts to survey AI modeling methods in the context of KDD process. We first divide the modeling techniques used in AI into two main categories: data-intensive modeling and small-data modeling. We then investigate the existing gaps in the literature and suggest several potential directions and techniques for future studies. Overall, this review provides insights into the control of AI in terms of the risk of introduction and spread of the virus.
Collapse
|
35
|
PISIoT: A Machine Learning and IoT-Based Smart Health Platform for Overweight and Obesity Control. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9153037] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Overweight and obesity are affecting productivity and quality of life worldwide. The Internet of Things (IoT) makes it possible to interconnect, detect, identify, and process data between objects or services to fulfill a common objective. The main advantages of IoT in healthcare are the monitoring, analysis, diagnosis, and control of conditions such as overweight and obesity and the generation of recommendations to prevent them. However, the objects used in the IoT have limited resources, so it has become necessary to consider other alternatives to analyze the data generated from monitoring, analysis, diagnosis, control, and the generation of recommendations, such as machine learning. This work presents PISIoT: a machine learning and IoT-based smart health platform for the prevention, detection, treatment, and control of overweight and obesity, and other associated conditions or health problems. Weka API and the J48 machine learning algorithm were used to identify critical variables and classify patients, while Apache Mahout and RuleML were used to generate medical recommendations. Finally, to validate the PISIoT platform, we present a case study on the prevention of myocardial infarction in elderly patients with obesity by monitoring biomedical variables.
Collapse
|
36
|
Edo-Osagie O, Smith G, Lake I, Edeghere O, De La Iglesia B. Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance. PLoS One 2019; 14:e0210689. [PMID: 31318885 PMCID: PMC6638773 DOI: 10.1371/journal.pone.0210689] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 06/13/2019] [Indexed: 11/19/2022] Open
Abstract
We investigate the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance efforts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a specific syndrome-asthma/difficulty breathing. We outline data collection using the Twitter streaming API as well as analysis and pre-processing of the collected data. Even with keyword-based data collection, many of the tweets collected are not be relevant because they represent chatter, or talk of awareness instead of an individual suffering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. For this, we investigate text classification techniques, and in particular we focus on semi-supervised classification techniques since they enable us to use more of the Twitter data collected while only doing very minimal labelling. In this paper, we propose a semi-supervised approach to symptomatic tweet classification and relevance filtering. We also propose alternative techniques to popular deep learning approaches. Additionally, we highlight the use of emojis and other special features capturing the tweet's tone to improve the classification performance. Our results show that negative emojis and those that denote laughter provide the best classification performance in conjunction with a simple word-level n-gram approach. We obtain good performance in classifying symptomatic tweets with both supervised and semi-supervised algorithms and found that the proposed semi-supervised algorithms preserve more of the relevant tweets and may be advantageous in the context of a weak signal. Finally, we found some correlation (r = 0.414, p = 0.0004) between the Twitter signal generated with the semi-supervised system and data from consultations for related health conditions.
Collapse
Affiliation(s)
- Oduwa Edo-Osagie
- School of Computing Science, University of East Anglia, Norwich, Norfolk, United Kingdom
| | - Gillian Smith
- Real-time Syndromic Surveillance Team, National Infection Service, Public Health England, Birmingham, United Kingdom
| | - Iain Lake
- School of Environmental Sciences, University of East Anglia, Norwich, Norfolk, United Kingdom
| | - Obaghe Edeghere
- Epidemiology West Midlands, Field Service, National Infection Service, Public Health England, Birmingham, United Kingdom
| | - Beatriz De La Iglesia
- School of Computing Science, University of East Anglia, Norwich, Norfolk, United Kingdom
| |
Collapse
|
37
|
Yin Z, Sulieman LM, Malin BA. A systematic literature review of machine learning in online personal health data. J Am Med Inform Assoc 2019; 26:561-576. [PMID: 30908576 PMCID: PMC7647332 DOI: 10.1093/jamia/ocz009] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 01/06/2019] [Accepted: 01/11/2019] [Indexed: 02/07/2023] Open
Abstract
OBJECTIVE User-generated content (UGC) in online environments provides opportunities to learn an individual's health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. MATERIALS AND METHODS We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. RESULTS We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. CONCLUSIONS The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.
Collapse
Affiliation(s)
- Zhijun Yin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Lina M Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
38
|
O'Sullivan S, Ali Z, Jiang X, Abdolvand R, Ünlü MS, Silva HPD, Baca JT, Kim B, Scott S, Sajid MI, Moradian S, Mansoorzare H, Holzinger A. Developments in Transduction, Connectivity and AI/Machine Learning for Point-of-Care Testing. SENSORS (BASEL, SWITZERLAND) 2019; 19:E1917. [PMID: 31018573 PMCID: PMC6515310 DOI: 10.3390/s19081917] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 04/02/2019] [Accepted: 04/02/2019] [Indexed: 12/19/2022]
Abstract
We review some emerging trends in transduction, connectivity and data analytics for Point-of-Care Testing (POCT) of infectious and non-communicable diseases. The patient need for POCT is described along with developments in portable diagnostics, specifically in respect of Lab-on-chip and microfluidic systems. We describe some novel electrochemical and photonic systems and the use of mobile phones in terms of hardware components and device connectivity for POCT. Developments in data analytics that are applicable for POCT are described with an overview of data structures and recent AI/Machine learning trends. The most important methodologies of machine learning, including deep learning methods, are summarised. The potential value of trends within POCT systems for clinical diagnostics within Lower Middle Income Countries (LMICs) and the Least Developed Countries (LDCs) are highlighted.
Collapse
Affiliation(s)
- Shane O'Sullivan
- Department of Pathology, Faculdade de Medicina, Universidade de São Paulo, São Paulo 05508-060, Brazil.
| | - Zulfiqur Ali
- Healthcare Innovation Centre, Teesside University, Middlesbrough TS1 3BX, UK.
| | - Xiaoyi Jiang
- Faculty of Mathematics and Computer Science, University Münster, Münster 48149, Germany.
| | - Reza Abdolvand
- Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816, USA.
| | - M Selim Ünlü
- Department of Electrical and Computer Engineering and Biomedical Engineering, Boston University, Boston, MA 02215, USA.
| | | | - Justin T Baca
- Department of Emergency Medicine, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA.
| | - Brian Kim
- Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816, USA.
| | - Simon Scott
- Healthcare Innovation Centre, Teesside University, Middlesbrough TS1 3BX, UK.
| | - Mohammed Imran Sajid
- Department of Upper GI Surgery, Wirral University Teaching Hospital, Wirral CH49 5PE, UK.
| | - Sina Moradian
- Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816, USA.
| | - Hakhamanesh Mansoorzare
- Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816, USA.
| | - Andreas Holzinger
- Institute for interactive Systems and Data Science, Graz University of Technology, Graz 8074, Austria.
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz 8036, Austria.
| |
Collapse
|
39
|
Kalimeri K, Delfino M, Cattuto C, Perrotta D, Colizza V, Guerrisi C, Turbelin C, Duggan J, Edmunds J, Obi C, Pebody R, Franco AO, Moreno Y, Meloni S, Koppeschaar C, Kjelsø C, Mexia R, Paolotti D. Unsupervised extraction of epidemic syndromes from participatory influenza surveillance self-reported symptoms. PLoS Comput Biol 2019; 15:e1006173. [PMID: 30958817 PMCID: PMC6472822 DOI: 10.1371/journal.pcbi.1006173] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 04/18/2019] [Accepted: 03/01/2019] [Indexed: 11/18/2022] Open
Abstract
Seasonal influenza surveillance is usually carried out by sentinel general practitioners (GPs) who compile weekly reports based on the number of influenza-like illness (ILI) clinical cases observed among visited patients. This traditional practice for surveillance generally presents several issues, such as a delay of one week or more in releasing reports, population biases in the health-seeking behaviour, and the lack of a common definition of ILI case. On the other hand, the availability of novel data streams has recently led to the emergence of non-traditional approaches for disease surveillance that can alleviate these issues. In Europe, a participatory web-based surveillance system called Influenzanet represents a powerful tool for monitoring seasonal influenza epidemics thanks to aid of self-selected volunteers from the general population who monitor and report their health status through Internet-based surveys, thus allowing a real-time estimate of the level of influenza circulating in the population. In this work, we propose an unsupervised probabilistic framework that combines time series analysis of self-reported symptoms collected by the Influenzanet platforms and performs an algorithmic detection of groups of symptoms, called syndromes. The aim of this study is to show that participatory web-based surveillance systems are capable of detecting the temporal trends of influenza-like illness even without relying on a specific case definition. The methodology was applied to data collected by Influenzanet platforms over the course of six influenza seasons, from 2011-2012 to 2016-2017, with an average of 34,000 participants per season. Results show that our framework is capable of selecting temporal trends of syndromes that closely follow the ILI incidence rates reported by the traditional surveillance systems in the various countries (Pearson correlations ranging from 0.69 for Italy to 0.88 for the Netherlands, with the sole exception of Ireland with a correlation of 0.38). The proposed framework was able to forecast quite accurately the ILI trend of the forthcoming influenza season (2016-2017) based only on the available information of the previous years (2011-2016). Furthermore, to broaden the scope of our approach, we applied it both in a forecasting fashion to predict the ILI trend of the 2016-2017 influenza season (Pearson correlations ranging from 0.60 for Ireland and UK, and 0.85 for the Netherlands) and also to detect gastrointestinal syndrome in France (Pearson correlation of 0.66). The final result is a near-real-time flexible surveillance framework not constrained by any specific case definition and capable of capturing the heterogeneity in symptoms circulation during influenza epidemics in the various European countries.
Collapse
Affiliation(s)
| | | | | | | | - Vittoria Colizza
- INSERM, Sorbonne Université, Institut Pierre Louis d’Epidémiologie et de Santé Publique, IPLESP, Paris, France
| | - Caroline Guerrisi
- Sorbonne Université, INSERM, Institut Pierre Louis d’Epidémiologie et de Santé Publique, IPLESP, Paris, France
| | - Clement Turbelin
- Sorbonne Université, INSERM, Institut Pierre Louis d’Epidémiologie et de Santé Publique, IPLESP, Paris, France
| | - Jim Duggan
- School of Computer Science, National University of Ireland Galway, Galway, Ireland
| | - John Edmunds
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Chinelo Obi
- Immunisation and Countermeasures Division, National Infections Service, Public Health England, London, United Kingdom
| | - Richard Pebody
- Immunisation and Countermeasures Division, National Infections Service, Public Health England, London, United Kingdom
| | | | - Yamir Moreno
- ISI Foundation, Turin, Italy
- Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain
- Department of Theoretical Physics, University of Zaragoza, Zaragoza, Spain
| | - Sandro Meloni
- IFISC, Institute for Cross-Disciplinary Physics and Complex Systems (CSIC-UIB), Palma de Mallorca, Spain
| | | | | | - Ricardo Mexia
- Departamento de Epidemiologia, Instituto Nacional de Saúde Doutor Ricardo Jorge, Lisbon, Portugal
| | | |
Collapse
|
40
|
Jiménez-Zafra SM, Martín-Valdivia MT, Molina-González MD, Ureña-López LA. How do we talk about doctors and drugs? Sentiment analysis in forums expressing opinions for medical domain. Artif Intell Med 2019; 93:50-57. [DOI: 10.1016/j.artmed.2018.03.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Revised: 03/19/2018] [Accepted: 03/29/2018] [Indexed: 11/26/2022]
|
41
|
Lopez C, Tucker S, Salameh T, Tucker C. An unsupervised machine learning method for discovering patient clusters based on genetic signatures. J Biomed Inform 2018; 85:30-39. [PMID: 30016722 PMCID: PMC6621561 DOI: 10.1016/j.jbi.2018.07.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 06/22/2018] [Accepted: 07/07/2018] [Indexed: 01/04/2023]
Abstract
INTRODUCTION Many chronic disorders have genomic etiology, disease progression, clinical presentation, and response to treatment that vary on a patient-to-patient basis. Such variability creates a need to identify characteristics within patient populations that have clinically relevant predictive value in order to advance personalized medicine. Unsupervised machine learning methods are suitable to address this type of problem, in which no a priori class label information is available to guide this search. However, it is challenging for existing methods to identify cluster memberships that are not just a result of natural sampling variation. Moreover, most of the current methods require researchers to provide specific input parameters a priori. METHOD This work presents an unsupervised machine learning method to cluster patients based on their genomic makeup without providing input parameters a priori. The method implements internal validity metrics to algorithmically identify the number of clusters, as well as statistical analyses to test for the significance of the results. Furthermore, the method takes advantage of the high degree of linkage disequilibrium between single nucleotide polymorphisms. Finally, a gene pathway analysis is performed to identify potential relationships between the clusters in the context of known biological knowledge. DATASETS AND RESULTS The method is tested with a cluster validation and a genomic dataset previously used in the literature. Benchmark results indicate that the proposed method provides the greatest performance out of the methods tested. Furthermore, the method is implemented on a sample genome-wide study dataset of 191 multiple sclerosis patients. The results indicate that the method was able to identify genetically distinct patient clusters without the need to select parameters a priori. Additionally, variants identified as significantly different between clusters are shown to be enriched for protein-protein interactions, especially in immune processes and cell adhesion pathways, via Gene Ontology term analysis. CONCLUSION Once links are drawn between clusters and clinically relevant outcomes, Immunochip data can be used to classify high-risk and newly diagnosed chronic disease patients into known clusters for predictive value. Further investigation can extend beyond pathway analysis to evaluate these clusters for clinical significance of genetically related characteristics such as age of onset, disease course, heritability, and response to treatment.
Collapse
Affiliation(s)
- Christian Lopez
- Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Scott Tucker
- Hershey College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USA; Engineering Science and Mechanics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Tarik Salameh
- Hershey College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USA
| | - Conrad Tucker
- Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA; Engineering Design Technology and Professional Programs, The Pennsylvania State University, University Park, PA 16802, USA; Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
42
|
Oldroyd RA, Morris MA, Birkin M. Identifying Methods for Monitoring Foodborne Illness: Review of Existing Public Health Surveillance Techniques. JMIR Public Health Surveill 2018; 4:e57. [PMID: 29875090 PMCID: PMC6010836 DOI: 10.2196/publichealth.8218] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 01/16/2018] [Accepted: 01/31/2018] [Indexed: 11/13/2022] Open
Abstract
Background Traditional methods of monitoring foodborne illness are associated with problems of untimeliness and underreporting. In recent years, alternative data sources such as social media data have been used to monitor the incidence of disease in the population (infodemiology and infoveillance). These data sources prove timelier than traditional general practitioner data, they can help to fill the gaps in the reporting process, and they often include additional metadata that is useful for supplementary research. Objective The aim of the study was to identify and formally analyze research papers using consumer-generated data, such as social media data or restaurant reviews, to quantify a disease or public health ailment. Studies of this nature are scarce within the food safety domain, therefore identification and understanding of transferrable methods in other health-related fields are of particular interest. Methods Structured scoping methods were used to identify and analyze primary research papers using consumer-generated data for disease or public health surveillance. The title, abstract, and keyword fields of 5 databases were searched using predetermined search terms. A total of 5239 papers matched the search criteria, of which 145 were taken to full-text review—62 papers were deemed relevant and were subjected to data characterization and thematic analysis. Results The majority of studies (40/62, 65%) focused on the surveillance of influenza-like illness. Only 10 studies (16%) used consumer-generated data to monitor outbreaks of foodborne illness. Twitter data (58/62, 94%) and Yelp reviews (3/62, 5%) were the most commonly used data sources. Studies reporting high correlations against baseline statistics used advanced statistical and computational approaches to calculate the incidence of disease. These include classification and regression approaches, clustering approaches, and lexicon-based approaches. Although they are computationally intensive due to the requirement of training data, studies using classification approaches reported the best performance. Conclusions By analyzing studies in digital epidemiology, computer science, and public health, this paper has identified and analyzed methods of disease monitoring that can be transferred to foodborne disease surveillance. These methods fall into 4 main categories: basic approach, classification and regression, clustering approaches, and lexicon-based approaches. Although studies using a basic approach to calculate disease incidence generally report good performance against baseline measures, they are sensitive to chatter generated by media reports. More computationally advanced approaches are required to filter spurious messages and protect predictive systems against false alarms. Research using consumer-generated data for monitoring influenza-like illness is expansive; however, research regarding the use of restaurant reviews and social media data in the context of food safety is limited. Considering the advantages reported in this review, methods using consumer-generated data for foodborne disease surveillance warrant further investment.
Collapse
Affiliation(s)
- Rachel A Oldroyd
- Leeds Institute for Data Analytics, University of Leeds, Leeds, United Kingdom.,School of Geography, University of Leeds, Leeds, United Kingdom
| | - Michelle A Morris
- Leeds Institute for Data Analytics, University of Leeds, Leeds, United Kingdom.,School of Medicine, University of Leeds, Leeds, United Kingdom
| | - Mark Birkin
- Leeds Institute for Data Analytics, University of Leeds, Leeds, United Kingdom.,School of Geography, University of Leeds, Leeds, United Kingdom
| |
Collapse
|
43
|
Disease Diagnosis in Smart Healthcare: Innovation, Technologies and Applications. SUSTAINABILITY 2017. [DOI: 10.3390/su9122309] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|