1
|
Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.16867.5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, 26 terms appeared with a frequency of 0.08 or greater, while in 2021 27 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population.
Collapse
|
2
|
Bonnici V, Cicceri G, Distefano S, Galletta L, Polignano M, Scaffidi C. Covid19/IT the digital side of Covid19: A picture from Italy with clustering and taxonomy. PLoS One 2022; 17:e0269687. [PMID: 35679235 PMCID: PMC9182266 DOI: 10.1371/journal.pone.0269687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 05/26/2022] [Indexed: 11/19/2022] Open
Abstract
The Covid19 pandemic has significantly impacted on our lives, triggering a strong reaction resulting in vaccines, more effective diagnoses and therapies, policies to contain the pandemic outbreak, to name but a few. A significant contribution to their success comes from the computer science and information technology communities, both in support to other disciplines and as the primary driver of solutions for, e.g., diagnostics, social distancing, and contact tracing. In this work, we surveyed the Italian computer science and engineering community initiatives against the Covid19 pandemic. The 128 responses thus collected document the response of such a community during the first pandemic wave in Italy (February-May 2020), through several initiatives carried out by both single researchers and research groups able to promptly react to Covid19, even remotely. The data obtained by the survey are here reported, discussed and further investigated by Natural Language Processing techniques, to generate semantic clusters based on embedding representations of the surveyed activity descriptions. The resulting clusters have been then used to extend an existing Covid19 taxonomy with the classification of related research activities in computer science and information technology areas, summarizing this work contribution through a reproducible survey-to-taxonomy methodology.
Collapse
|
3
|
Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.16867.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, 22 terms appeared with a frequency of 0.05 or greater, while in 2021 27 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population.
Collapse
|
4
|
Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.16867.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, 22 terms appeared with a frequency of 0.05 or greater, while in 2021 27 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population.
Collapse
|
5
|
Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.16867.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, nine terms appeared with a frequency of 0.10 or greater, while in 2021 43 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population.
Collapse
|
6
|
Kumar SA, Nasralla MM, García-Magariño I, Kumar H. A machine-learning scraping tool for data fusion in the analysis of sentiments about pandemics for supporting business decisions with human-centric AI explanations. PeerJ Comput Sci 2021; 7:e713. [PMID: 34616891 PMCID: PMC8459777 DOI: 10.7717/peerj-cs.713] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 08/23/2021] [Indexed: 06/13/2023]
Abstract
The COVID-19 pandemic is changing daily routines for many citizens with a high impact on the economy in some sectors. Small-medium enterprises of some sectors need to be aware of both the pandemic evolution and the corresponding sentiments of customers in order to figure out which are the best commercialization techniques. This article proposes an expert system based on the combination of machine learning and sentiment analysis in order to support business decisions with data fusion through web scraping. The system uses human-centric artificial intelligence for automatically generating explanations. The expert system feeds from online content from different sources using a scraping module. It allows users to interact with the expert system providing feedback, and the system uses this feedback to improve its recommendations with supervised learning.
Collapse
Affiliation(s)
| | - Moustafa M. Nasralla
- Department of Communications and Networks Engineering, Prince Sultan University, Riyadh, Saudi Arabia
| | - Iván García-Magariño
- Universidad Complutense de Madrid, Madrid, Spain
- Instituto de Tecnología del Conocimiento, UCM, Madrid, Spain
| | - Harsh Kumar
- Peoples’ Friendship University of Russia, Moscow, Russia
| |
Collapse
|
7
|
Guo Y, Zhang Y, Lyu T, Prosperi M, Wang F, Xu H, Bian J. The application of artificial intelligence and data integration in COVID-19 studies: a scoping review. J Am Med Inform Assoc 2021; 28:2050-2067. [PMID: 34151987 PMCID: PMC8344463 DOI: 10.1093/jamia/ocab098] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 05/03/2021] [Accepted: 05/06/2021] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE To summarize how artificial intelligence (AI) is being applied in COVID-19 research and determine whether these AI applications integrated heterogenous data from different sources for modeling. MATERIALS AND METHODS We searched 2 major COVID-19 literature databases, the National Institutes of Health's LitCovid and the World Health Organization's COVID-19 database on March 9, 2021. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline, 2 reviewers independently reviewed all the articles in 2 rounds of screening. RESULTS In the 794 studies included in the final qualitative analysis, we identified 7 key COVID-19 research areas in which AI was applied, including disease forecasting, medical imaging-based diagnosis and prognosis, early detection and prognosis (non-imaging), drug repurposing and early drug discovery, social media data analysis, genomic, transcriptomic, and proteomic data analysis, and other COVID-19 research topics. We also found that there was a lack of heterogenous data integration in these AI applications. DISCUSSION Risk factors relevant to COVID-19 outcomes exist in heterogeneous data sources, including electronic health records, surveillance systems, sociodemographic datasets, and many more. However, most AI applications in COVID-19 research adopted a single-sourced approach that could omit important risk factors and thus lead to biased algorithms. Integrating heterogeneous data for modeling will help realize the full potential of AI algorithms, improve precision, and reduce bias. CONCLUSION There is a lack of data integration in the AI applications in COVID-19 research and a need for a multilevel AI framework that supports the analysis of heterogeneous data from different sources.
Collapse
Affiliation(s)
- Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, Florida, USA
| | - Yahan Zhang
- Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, Florida, USA
| | - Tianchen Lyu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, Florida, USA
| | - Mattia Prosperi
- Department of Epidemiology, College of Public Health and Health Professions & College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, Florida, USA
| |
Collapse
|
8
|
Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.16867.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, nine terms appeared with a frequency of 0.10 or greater, while in 2021 43 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population.
Collapse
|
9
|
Su Z, McDonnell D, Bentley BL, He J, Shi F, Cheshmehzangi A, Ahmad J, Jia P. Addressing Biodisaster X Threats With Artificial Intelligence and 6G Technologies: Literature Review and Critical Insights. J Med Internet Res 2021; 23:e26109. [PMID: 33961583 PMCID: PMC8153034 DOI: 10.2196/26109] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 01/21/2021] [Accepted: 04/07/2021] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND With advances in science and technology, biotechnology is becoming more accessible to people of all demographics. These advances inevitably hold the promise to improve personal and population well-being and welfare substantially. It is paradoxical that while greater access to biotechnology on a population level has many advantages, it may also increase the likelihood and frequency of biodisasters due to accidental or malicious use. Similar to "Disease X" (describing unknown naturally emerging pathogenic diseases with a pandemic potential), we term this unknown risk from biotechnologies "Biodisaster X." To date, no studies have examined the potential role of information technologies in preventing and mitigating Biodisaster X. OBJECTIVE This study aimed to explore (1) what Biodisaster X might entail and (2) solutions that use artificial intelligence (AI) and emerging 6G technologies to help monitor and manage Biodisaster X threats. METHODS A review of the literature on applying AI and 6G technologies for monitoring and managing biodisasters was conducted on PubMed, using articles published from database inception through to November 16, 2020. RESULTS Our findings show that Biodisaster X has the potential to upend lives and livelihoods and destroy economies, essentially posing a looming risk for civilizations worldwide. To shed light on Biodisaster X threats, we detailed effective AI and 6G-enabled strategies, ranging from natural language processing to deep learning-based image analysis to address issues ranging from early Biodisaster X detection (eg, identification of suspicious behaviors), remote design and development of pharmaceuticals (eg, treatment development), and public health interventions (eg, reactive shelter-at-home mandate enforcement), as well as disaster recovery (eg, sentiment analysis of social media posts to shed light on the public's feelings and readiness for recovery building). CONCLUSIONS Biodisaster X is a looming but avoidable catastrophe. Considering the potential human and economic consequences Biodisaster X could cause, actions that can effectively monitor and manage Biodisaster X threats must be taken promptly and proactively. Rather than solely depending on overstretched professional attention of health experts and government officials, it is perhaps more cost-effective and practical to deploy technology-based solutions to prevent and control Biodisaster X threats. This study discusses what Biodisaster X could entail and emphasizes the importance of monitoring and managing Biodisaster X threats by AI techniques and 6G technologies. Future studies could explore how the convergence of AI and 6G systems may further advance the preparedness for high-impact, less likely events beyond Biodisaster X.
Collapse
Affiliation(s)
- Zhaohui Su
- Center on Smart and Connected Health Technologies, Mays Cancer Center, School of Nursing, UT Health San Antonio, San Antonio, TX, United States
| | - Dean McDonnell
- Department of Humanities, Institute of Technology Carlow, Carlow, Ireland
| | - Barry L Bentley
- Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, United Kingdom
| | - Jiguang He
- Centre for Wireless Communications, University of Oulu, Oulu, Finland
| | - Feng Shi
- Department of Research and Development, Shanghai United Imaging Intelligence, Shanghai, China
| | - Ali Cheshmehzangi
- Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, China
- Network for Education and Research on Peace and Sustainability, Hiroshima University, Hiroshima, Japan
| | - Junaid Ahmad
- Prime Institute of Public Health, Peshawar Medical College, Peshawar, Pakistan
| | - Peng Jia
- Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong, China
- International Institute of Spatial Lifecourse Epidemiology, Hong Kong, China
| |
Collapse
|
10
|
He Z, Erdengasileng A, Luo X, Xing A, Charness N, Bian J. How the clinical research community responded to the COVID-19 pandemic: an analysis of the COVID-19 clinical studies in ClinicalTrials.gov. JAMIA Open 2021; 4:ooab032. [PMID: 34056559 PMCID: PMC8083215 DOI: 10.1093/jamiaopen/ooab032] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 02/15/2021] [Accepted: 04/13/2021] [Indexed: 02/07/2023] Open
Abstract
OBJECTIVE In the past few months, a large number of clinical studies on the novel coronavirus disease (COVID-19) have been initiated worldwide to find effective therapeutics, vaccines, and preventive strategies for COVID-19. In this study, we aim to understand the landscape of COVID-19 clinical research and identify the issues that may cause recruitment difficulty or reduce study generalizability. METHODS We analyzed 3765 COVID-19 studies registered in the largest public registry-ClinicalTrials.gov, leveraging natural language processing (NLP) and using descriptive, association, and clustering analyses. We first characterized COVID-19 studies by study features such as phase and tested intervention. We then took a deep dive and analyzed their eligibility criteria to understand whether these studies: (1) considered the reported underlying health conditions that may lead to severe illnesses, and (2) excluded older adults, either explicitly or implicitly, which may reduce the generalizability of these studies to the older adults population. RESULTS Our analysis included 2295 interventional studies and 1470 observational studies. Most trials did not explicitly exclude older adults with common chronic conditions. However, known risk factors such as diabetes and hypertension were considered by less than 5% of trials based on their trial description. Pregnant women were excluded by 34.9% of the studies. CONCLUSIONS Most COVID-19 clinical studies included both genders and older adults. However, risk factors such as diabetes, hypertension, and pregnancy were under-represented, likely skewing the population that was sampled. A careful examination of existing COVID-19 studies can inform future COVID-19 trial design towards balanced internal validity and generalizability.
Collapse
Affiliation(s)
- Zhe He
- School of Information, Florida State University, Tallahassee, Florida, USA
| | | | - Xiao Luo
- Department of Computer Information and Graphics Technology, Indiana University–Purdue University Indianapolis, Indianapolis, Indiana, USA
| | - Aiwen Xing
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Neil Charness
- Department of Psychology, Florida State University, Tallahassee, Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| |
Collapse
|