1
|
Ofer D, Brandes N, Linial M. The language of proteins: NLP, machine learning & protein sequences. Comput Struct Biotechnol J 2021; 19:1750-1758. [PMID: 33897979 PMCID: PMC8050421 DOI: 10.1016/j.csbj.2021.03.022] [Citation(s) in RCA: 146] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 03/19/2021] [Accepted: 03/19/2021] [Indexed: 12/12/2022] Open
Abstract
Natural language processing (NLP) is a field of computer science concerned with automated text and language analysis. In recent years, following a series of breakthroughs in deep and machine learning, NLP methods have shown overwhelming progress. Here, we review the success, promise and pitfalls of applying NLP algorithms to the study of proteins. Proteins, which can be represented as strings of amino-acid letters, are a natural fit to many NLP methods. We explore the conceptual similarities and differences between proteins and language, and review a range of protein-related tasks amenable to machine learning. We present methods for encoding the information of proteins as text and analyzing it with NLP methods, reviewing classic concepts such as bag-of-words, k-mers/n-grams and text search, as well as modern techniques such as word embedding, contextualized embedding, deep learning and neural language models. In particular, we focus on recent innovations such as masked language modeling, self-supervised learning and attention-based models. Finally, we discuss trends and challenges in the intersection of NLP and protein research.
Collapse
|
Review |
4 |
146 |
2
|
Abstract
Adult semantic memory has been traditionally conceptualized as a relatively static memory system that consists of knowledge about the world, concepts, and symbols. Considerable work in the past few decades has challenged this static view of semantic memory, and instead proposed a more fluid and flexible system that is sensitive to context, task demands, and perceptual and sensorimotor information from the environment. This paper (1) reviews traditional and modern computational models of semantic memory, within the umbrella of network (free association-based), feature (property generation norms-based), and distributional semantic (natural language corpora-based) models, (2) discusses the contribution of these models to important debates in the literature regarding knowledge representation (localist vs. distributed representations) and learning (error-free/Hebbian learning vs. error-driven/predictive learning), and (3) evaluates how modern computational models (neural network, retrieval-based, and topic models) are revisiting the traditional "static" conceptualization of semantic memory and tackling important challenges in semantic modeling such as addressing temporal, contextual, and attentional influences, as well as incorporating grounding and compositionality into semantic representations. The review also identifies new challenges regarding the abundance and availability of data, the generalization of semantic models to other languages, and the role of social interaction and collaboration in language learning and development. The concluding section advocates the need for integrating representational accounts of semantic memory with process-based accounts of cognitive behavior, as well as the need for explicit comparisons of computational models to human baselines in semantic tasks to adequately assess their psychological plausibility as models of human semantic memory.
Collapse
|
Review |
4 |
69 |
3
|
Van Bulck L, Moons P. What if your patient switches from Dr. Google to Dr. ChatGPT? A vignette-based survey of the trustworthiness, value, and danger of ChatGPT-generated responses to health questions. Eur J Cardiovasc Nurs 2024; 23:95-98. [PMID: 37094282 DOI: 10.1093/eurjcn/zvad038] [Citation(s) in RCA: 59] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 04/17/2023] [Accepted: 04/20/2023] [Indexed: 04/26/2023]
Abstract
ChatGPT is a new artificial intelligence system that revolutionizes the way how information can be sought and obtained. In this study, the trustworthiness, value, and danger of ChatGPT-generated responses on four vignettes that represented virtual patient questions were evaluated by 20 experts in the domain of congenital heart disease, atrial fibrillation, heart failure, or cholesterol. Experts generally considered ChatGPT-generated responses trustworthy and valuable, with few considering them dangerous. Forty percent of the experts found ChatGPT responses more valuable than Google. Experts appreciated the sophistication and nuances in the responses but also recognized that responses were often incomplete and sometimes misleading.
Collapse
|
|
1 |
59 |
4
|
Liu Z, Roberts RA, Lal-Nag M, Chen X, Huang R, Tong W. AI-based language models powering drug discovery and development. Drug Discov Today 2021; 26:2593-2607. [PMID: 34216835 PMCID: PMC8604259 DOI: 10.1016/j.drudis.2021.06.009] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 04/28/2021] [Accepted: 06/25/2021] [Indexed: 02/08/2023]
Abstract
The discovery and development of new medicines is expensive, time-consuming, and often inefficient, with many failures along the way. Powered by artificial intelligence (AI), language models (LMs) have changed the landscape of natural language processing (NLP), offering possibilities to transform treatment development more effectively. Here, we summarize advances in AI-powered LMs and their potential to aid drug discovery and development. We highlight opportunities for AI-powered LMs in target identification, clinical design, regulatory decision-making, and pharmacovigilance. We specifically emphasize the potential role of AI-powered LMs for developing new treatments for Coronavirus 2019 (COVID-19) strategies, including drug repurposing, which can be extrapolated to other infectious diseases that have the potential to cause pandemics. Finally, we set out the remaining challenges and propose possible solutions for improvement.
Collapse
|
Review |
4 |
50 |
5
|
Moons P, Van Bulck L. Using ChatGPT and Google Bard to improve the readability of written patient information: a proof of concept. Eur J Cardiovasc Nurs 2024; 23:122-126. [PMID: 37603843 DOI: 10.1093/eurjcn/zvad087] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 08/16/2023] [Accepted: 08/17/2023] [Indexed: 08/23/2023]
Abstract
Patient information materials often tend to be written at a reading level that is too advanced for patients. In this proof-of-concept study, we used ChatGPT and Google Bard to reduce the reading level of three selected patient information sections from scientific journals. ChatGPT successfully improved readability. However, it could not achieve the recommended 6th-grade reading level. Bard reached the reading level of 6th graders but oversimplified the texts by omitting up to 83% of the content. Despite the present limitations, developers of patient information are encouraged to employ large language models, preferably ChatGPT, to optimize their materials.
Collapse
|
|
1 |
37 |
6
|
Karkera N, Acharya S, Palaniappan SK. Leveraging pre-trained language models for mining microbiome-disease relationships. BMC Bioinformatics 2023; 24:290. [PMID: 37468830 PMCID: PMC10357883 DOI: 10.1186/s12859-023-05411-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 07/13/2023] [Indexed: 07/21/2023] Open
Abstract
BACKGROUND The growing recognition of the microbiome's impact on human health and well-being has prompted extensive research into discovering the links between microbiome dysbiosis and disease (healthy) states. However, this valuable information is scattered in unstructured form within biomedical literature. The structured extraction and qualification of microbe-disease interactions are important. In parallel, recent advancements in deep-learning-based natural language processing algorithms have revolutionized language-related tasks such as ours. This study aims to leverage state-of-the-art deep-learning language models to extract microbe-disease relationships from biomedical literature. RESULTS In this study, we first evaluate multiple pre-trained large language models within a zero-shot or few-shot learning context. In this setting, the models performed poorly out of the box, emphasizing the need for domain-specific fine-tuning of these language models. Subsequently, we fine-tune multiple language models (specifically, GPT-3, BioGPT, BioMedLM, BERT, BioMegatron, PubMedBERT, BioClinicalBERT, and BioLinkBERT) using labeled training data and evaluate their performance. Our experimental results demonstrate the state-of-the-art performance of these fine-tuned models ( specifically GPT-3, BioMedLM, and BioLinkBERT), achieving an average F1 score, precision, and recall of over [Formula: see text] compared to the previous best of 0.74. CONCLUSION Overall, this study establishes that pre-trained language models excel as transfer learners when fine-tuned with domain and problem-specific data, enabling them to achieve state-of-the-art results even with limited training data for extracting microbiome-disease interactions from scientific publications.
Collapse
|
research-article |
2 |
18 |
7
|
Lakretz Y, Hupkes D, Vergallito A, Marelli M, Baroni M, Dehaene S. Mechanisms for handling nested dependencies in neural-network language models and humans. Cognition 2021; 213:104699. [PMID: 33941375 DOI: 10.1016/j.cognition.2021.104699] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 03/17/2021] [Accepted: 03/22/2021] [Indexed: 11/25/2022]
Abstract
Recursive processing in sentence comprehension is considered a hallmark of human linguistic abilities. However, its underlying neural mechanisms remain largely unknown. We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing, namely the storing of grammatical number and gender information in working memory and its use in long-distance agreement (e.g., capturing the correct number agreement between subject and verb when they are separated by other phrases). Although the network, a recurrent architecture with Long Short-Term Memory units, was solely trained to predict the next word in a large corpus, analysis showed the emergence of a very sparse set of specialized units that successfully handled local and long-distance syntactic agreement for grammatical number. However, the simulations also showed that this mechanism does not support full recursion and fails with some long-range embedded dependencies. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns, with or without embedding. Human and model error patterns were remarkably similar, showing that the model echoes various effects observed in human data. However, a key difference was that, with embedded long-range dependencies, humans remained above chance level, while the model's systematic errors brought it below chance. Overall, our study shows that exploring the ways in which modern artificial neural networks process sentences leads to precise and testable hypotheses about human linguistic performance.
Collapse
|
|
4 |
11 |
8
|
Kauf C, Ivanova AA, Rambelli G, Chersoni E, She JS, Chowdhury Z, Fedorenko E, Lenci A. Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely. Cogn Sci 2023; 47:e13386. [PMID: 38009752 DOI: 10.1111/cogs.13386] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 10/27/2023] [Accepted: 11/04/2023] [Indexed: 11/29/2023]
Abstract
Word co-occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs' semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pretrained LLMs (from 2018's BERT to 2023's MPT) assign a higher likelihood to plausible descriptions of agent-patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n = 1215), we found that pretrained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign a higher likelihood to possible versus impossible events (The teacher bought the laptop vs. The laptop bought the teacher). However, LLMs show less consistent preferences for likely versus unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLM scores generalize well across syntactic variants (active vs. passive constructions) but less well across semantic variants (synonymous sentences), (iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence plausibility serves as an organizing dimension in internal LLM representations. Overall, our results show that important aspects of event knowledge naturally emerge from distributional linguistic patterns, but also highlight a gap between representations of possible/impossible and likely/unlikely events.
Collapse
|
|
2 |
8 |
9
|
Semantic coherence markers: The contribution of perplexity metrics. Artif Intell Med 2022; 134:102393. [PMID: 36462890 DOI: 10.1016/j.artmed.2022.102393] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 07/04/2022] [Accepted: 08/29/2022] [Indexed: 12/13/2022]
Abstract
Devising automatic tools to assist specialists in the early detection of mental disturbances and psychotic disorders is to date a challenging scientific problem and a practically relevant activity. In this work we explore how language models (that are probability distributions over text sequences) can be employed to analyze language and discriminate between mentally impaired and healthy subjects. We have preliminarily explored whether perplexity can be considered a reliable metrics to characterize an individual's language. Perplexity was originally conceived as an information-theoretic measure to assess how much a given language model is suited to predict a text sequence or, equivalently, how much a word sequence fits into a specific language model. We carried out an extensive experimentation with healthy subjects, and employed language models as diverse as N-grams - from 2-grams to 5-grams - and GPT-2, a transformer-based language model. Our experiments show that irrespective of the complexity of the employed language model, perplexity scores are stable and sufficiently consistent for analyzing the language of individual subjects, and at the same time sensitive enough to capture differences due to linguistic registers adopted by the same speaker, e.g., in interviews and political rallies. A second array of experiments was designed to investigate whether perplexity scores may be used to discriminate between the transcripts of healthy subjects and subjects suffering from Alzheimer Disease (AD). Our best performing models achieved full accuracy and F-score (1.00 in both precision/specificity and recall/sensitivity) in categorizing subjects from both the AD class, and control subjects. These results suggest that perplexity can be a valuable analytical metrics with potential application to supporting early diagnosis of symptoms of mental disorders.
Collapse
|
|
3 |
7 |
10
|
Verlingue L, Boyer C, Olgiati L, Brutti Mairesse C, Morel D, Blay JY. Artificial intelligence in oncology: ensuring safe and effective integration of language models in clinical practice. THE LANCET REGIONAL HEALTH. EUROPE 2024; 46:101064. [PMID: 39290808 PMCID: PMC11406067 DOI: 10.1016/j.lanepe.2024.101064] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 08/07/2024] [Accepted: 08/26/2024] [Indexed: 09/19/2024]
Abstract
In this Personal View, we address the latest advancements in automatic text analysis with artificial intelligence (AI) in medicine, with a focus on its implications in aiding treatment decisions in medical oncology. Acknowledging that a majority of hospital medical content is embedded in narrative format, natural language processing has become one of the most dynamic research fields for developing clinical decision support tools. In addition, large language models have recently reached unprecedented performance, notably when answering medical questions. Emerging applications include prognosis estimation, treatment recommendations, multidisciplinary tumor board recommendations and matching patients to recruiting clinical trials. Altogether, we advocate for a forward-looking approach in which the community efficiently initiates global prospective clinical evaluations of promising AI-based decision support systems. Such assessments will be essential to validate and evaluate potential biases, ensuring these innovations can be effectively and safely translated into practical tools for oncological practice. We are at a pivotal moment, where continued advancements in patient care must be pursued with scientific rigor.
Collapse
|
Review |
1 |
7 |
11
|
Valentín-Bravo FJ, Mateos-Álvarez E, Usategui-Martín R, Andrés-Iglesias C, Pastor-Jimeno JC, Pastor-Idoate S. Artificial Intelligence and new language models in Ophthalmology: Complications of the use of silicone oil in vitreoretinal surgery. ARCHIVOS DE LA SOCIEDAD ESPANOLA DE OFTALMOLOGIA 2023; 98:298-303. [PMID: 37094759 DOI: 10.1016/j.oftale.2023.04.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 03/22/2023] [Indexed: 04/26/2023]
Abstract
Artificial intelligence (AI) is an emerging technology that facilitates everyday tasks and automates tasks in various fields such as medicine. However, the emergence of a language model in academia has generated a lot of interest. This paper evaluates the potential of ChatGPT, a language model developed by OpenAI, and DALL-E 2, an image generator, in the writing of scientific articles in ophthalmology. The selected topic is the complications of the use of silicone oil in vitreoretinal surgery. ChatGPT was used to generate an abstract and a structured article, suggestions for a title and bibliographical references. In conclusion, despite the knowledge demonstrated by this tool, the scientific accuracy and reliability on specific topics is insufficient for the automatic generation of scientifically rigorous articles. In addition, scientists should be aware of the possible ethical and legal implications of these tools.
Collapse
|
Case Reports |
2 |
4 |
12
|
Haverkamp W, Strodthoff N, Tennenbaum J, Israel C. [Big hype about ChapGPT in medicine : Is it something for rhythmologists? What must be taken into consideration?]. Herzschrittmacherther Elektrophysiol 2023; 34:240-245. [PMID: 37523010 PMCID: PMC10462516 DOI: 10.1007/s00399-023-00960-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 07/20/2023] [Indexed: 08/01/2023]
Abstract
ChatGPT, a chatbot based on a large language model, is currently attracting much attention. Modern machine learning (ML) architectures enable the program to answer almost any question, to summarize, translate, and even generate its own texts, all in a text-based dialogue with the user. Underlying technologies, summarized under the acronym NLP (natural language processing), go back to the 1960s. In almost all areas including medicine, ChatGPT is raising enormous hopes. It can easily pass medical exams and may be useful in patient care, diagnostic and therapeutic assistance, and medical research. The enthusiasm for this new technology shown even by medical professionals is surprising. Although the system knows much, it does not know everything; not everything it outputs is accurate either. Every output has to be carefully checked by the user for correctness, which is often not easily done since references to sources are lacking. Issues regarding data protection and ethics also arise. Today's language models are not free of bias and systematic distortion. These shortcomings have led to calls for stronger regulation of the use of ChatGPT and an increasing number of similar language models. However, this new technology represents an enormous progress in knowledge processing and dissemination. Numerous scenarios in which ChatGPT can provide assistance are conceivable, including in rhythmology. In the future, it will be crucial to render the models error-free and transparent and to clearly define the rules for their use. Responsible use requires systematic training to improve the digital competence of users, including physicians who use such programs.
Collapse
|
English Abstract |
2 |
4 |
13
|
Henriksson A, Pawar Y, Hedberg P, Nauclér P. Multimodal fine-tuning of clinical language models for predicting COVID-19 outcomes. Artif Intell Med 2023; 146:102695. [PMID: 38042595 DOI: 10.1016/j.artmed.2023.102695] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 10/12/2023] [Accepted: 10/29/2023] [Indexed: 12/04/2023]
Abstract
Clinical prediction models tend only to incorporate structured healthcare data, ignoring information recorded in other data modalities, including free-text clinical notes. Here, we demonstrate how multimodal models that effectively leverage both structured and unstructured data can be developed for predicting COVID-19 outcomes. The models are trained end-to-end using a technique we refer to as multimodal fine-tuning, whereby a pre-trained language model is updated based on both structured and unstructured data. The multimodal models are trained and evaluated using a multicenter cohort of COVID-19 patients encompassing all encounters at the emergency department of six hospitals. Experimental results show that multimodal models, leveraging the notion of multimodal fine-tuning and trained to predict (i) 30-day mortality, (ii) safe discharge and (iii) readmission, outperform unimodal models trained using only structured or unstructured healthcare data on all three outcomes. Sensitivity analyses are performed to better understand how well the multimodal models perform on different patient groups, while an ablation study is conducted to investigate the impact of different types of clinical notes on model performance. We argue that multimodal models that make effective use of routinely collected healthcare data to predict COVID-19 outcomes may facilitate patient management and contribute to the effective use of limited healthcare resources.
Collapse
|
Multicenter Study |
2 |
4 |
14
|
Portelance E, Duan Y, Frank MC, Lupyan G. Predicting Age of Acquisition for Children's Early Vocabulary in Five Languages Using Language Model Surprisal. Cogn Sci 2023; 47:e13334. [PMID: 37695825 DOI: 10.1111/cogs.13334] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 07/26/2023] [Accepted: 08/14/2023] [Indexed: 09/13/2023]
Abstract
What makes a word easy to learn? Early-learned words are frequent and tend to name concrete referents. But words typically do not occur in isolation. Some words are predictable from their contexts; others are less so. Here, we investigate whether predictability relates to when children start producing different words (age of acquisition; AoA). We operationalized predictability in terms of a word's surprisal in child-directed speech, computed using n-gram and long-short-term-memory (LSTM) language models. Predictability derived from LSTMs was generally a better predictor than predictability derived from n-gram models. Across five languages, average surprisal was positively correlated with the AoA of predicates and function words but not nouns. Controlling for concreteness and word frequency, more predictable predicates and function words were learned earlier. Differences in predictability between languages were associated with cross-linguistic differences in AoA: the same word (when it was a predicate) was produced earlier in languages where the word was more predictable.
Collapse
|
|
2 |
3 |
15
|
Fröhling L, Zubiaga A. Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover. PeerJ Comput Sci 2021; 7:e443. [PMID: 33954234 PMCID: PMC8049133 DOI: 10.7717/peerj-cs.443] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 02/24/2021] [Indexed: 06/12/2023]
Abstract
The recent improvements of language models have drawn much attention to potential cases of use and abuse of automatically generated text. Great effort is put into the development of methods to detect machine generations among human-written text in order to avoid scenarios in which the large-scale generation of text with minimal cost and effort undermines the trust in human interaction and factual information online. While most of the current approaches rely on the availability of expensive language models, we propose a simple feature-based classifier for the detection problem, using carefully crafted features that attempt to model intrinsic differences between human and machine text. Our research contributes to the field in producing a detection method that achieves performance competitive with far more expensive methods, offering an accessible "first line-of-defense" against the abuse of language models. Furthermore, our experiments show that different sampling methods lead to different types of flaws in generated text.
Collapse
|
research-article |
4 |
3 |
16
|
Casalnuovo C, Lee K, Wang H, Devanbu P, Morgan E. Do Programmers Prefer Predictable Expressions in Code? Cogn Sci 2020; 44:e12921. [PMID: 33314282 DOI: 10.1111/cogs.12921] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 10/15/2020] [Accepted: 10/19/2020] [Indexed: 11/29/2022]
Abstract
Source code is a form of human communication, albeit one where the information shared between the programmers reading and writing the code is constrained by the requirement that the code executes correctly. Programming languages are more syntactically constrained than natural languages, but they are also very expressive, allowing a great many different ways to express even very simple computations. Still, code written by developers is highly predictable, and many programming tools have taken advantage of this phenomenon, relying on language model surprisal as a guiding mechanism. While surprisal has been validated as a measure of cognitive load in natural language, its relation to human cognitive processes in code is still poorly understood. In this paper, we explore the relationship between surprisal and programmer preference at a small granularity-do programmers prefer more predictable expressions in code? Using meaning-preserving transformations, we produce equivalent alternatives to developer-written code expressions and run a corpus study on Java and Python projects. In general, language models rate the code expressions developers choose to write as more predictable than these transformed alternatives. Then, we perform two human subject studies asking participants to choose between two equivalent snippets of Java code with different surprisal scores (one original and transformed). We find that programmers do prefer more predictable variants, and that stronger language models like the transformer align more often and more consistently with these preferences.
Collapse
|
|
5 |
2 |
17
|
Dadkhah M, Oermann MH, Hegedüs M, Raman R, Dávid LD. Diagnosis Unreliability of ChatGPT for Journal Evaluation. Adv Pharm Bull 2024; 14:1-4. [PMID: 38585462 PMCID: PMC10997925 DOI: 10.34172/apb.2024.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 10/10/2023] [Indexed: 04/09/2024] Open
Abstract
Purpose Academic and other researchers have limited tools with which to address the current proliferation of predatory and hijacked journals. These journals can have negative effects on science, research funding, and the dissemination of information. As most predatory and hijacked journals are not error free, this study used ChatGPT, an artificial intelligence (AI) technology tool, to conduct an evaluation of journal quality. Methods Predatory and hijacked journals were analyzed for reliability using ChatGPT, and the reliability of result have been discussed. Results It shows that ChatGPT is an unreliable tool for journal quality evaluation for both hijacked and predatory journals. Conclusion To show how to address this gap, an early trial version of Journal Checker Chatbot has been developed and is discussed as an alternative chatbot that can assist researchers in detecting hijacked journals.
Collapse
|
Editorial |
1 |
2 |
18
|
Weichselbraun A, Steixner J, Braşoveanu AMP, Scharl A, Göbel M, Nixon LJB. Automatic Expansion of Domain-Specific Affective Models for Web Intelligence Applications. Cognit Comput 2021; 14:228-245. [PMID: 33552304 PMCID: PMC7846919 DOI: 10.1007/s12559-021-09839-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Accepted: 01/12/2021] [Indexed: 11/29/2022]
Abstract
Sentic computing relies on well-defined affective models of different complexity—polarity to distinguish positive and negative sentiment, for example, or more nuanced models to capture expressions of human emotions. When used to measure communication success, even the most granular affective model combined with sophisticated machine learning approaches may not fully capture an organisation’s strategic positioning goals. Such goals often deviate from the assumptions of standardised affective models. While certain emotions such as Joy and Trust typically represent desirable brand associations, specific communication goals formulated by marketing professionals often go beyond such standard dimensions. For instance, the brand manager of a television show may consider fear or sadness to be desired emotions for its audience. This article introduces expansion techniques for affective models, combining common and commonsense knowledge available in knowledge graphs with language models and affective reasoning, improving coverage and consistency as well as supporting domain-specific interpretations of emotions. An extensive evaluation compares the performance of different expansion techniques: (i) a quantitative evaluation based on the revisited Hourglass of Emotions model to assess performance on complex models that cover multiple affective categories, using manually compiled gold standard data, and (ii) a qualitative evaluation of a domain-specific affective model for television programme brands. The results of these evaluations demonstrate that the introduced techniques support a variety of embeddings and pre-trained models. The paper concludes with a discussion on applying this approach to other scenarios where affective model resources are scarce.
Collapse
|
Journal Article |
4 |
2 |
19
|
Barreto S, Moura R, Carvalho J, Paes A, Plastino A. Sentiment analysis in tweets: an assessment study from classical to modern word representation models. Data Min Knowl Discov 2023; 37:318-380. [PMID: 36406157 PMCID: PMC9664439 DOI: 10.1007/s10618-022-00853-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 06/29/2022] [Indexed: 11/16/2022]
Abstract
With the exponential growth of social media networks, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter - the tweets - have earned significant attention as a rich source of information to guide many decision-making processes. However, their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks, including sentiment analysis. Sentiment classification is tackled mainly by machine learning-based classifiers. The literature has adopted different types of word representation models to transform tweets to vector-based inputs to feed sentiment classifiers. The representations come from simple count-based methods, such as bag-of-words, to more sophisticated ones, such as BERTweet, built upon the trendy BERT architecture. Nevertheless, most studies mainly focus on evaluating those models using only a small number of datasets. Despite the progress made in recent years in language modeling, there is still a gap regarding a robust evaluation of induced embeddings applied to sentiment analysis on tweets. Furthermore, while fine-tuning the model from downstream tasks is prominent nowadays, less attention has been given to adjustments based on the specific linguistic style of the data. In this context, this study fulfills an assessment of existing neural language models in distinguishing the sentiment expressed in tweets, by using a rich collection of 22 datasets from distinct domains and five classification algorithms. The evaluation includes static and contextualized representations. Contexts are assembled from Transformer-based autoencoder models that are also adapted based on the masked language model task, using a plethora of strategies.
Collapse
|
research-article |
2 |
1 |
20
|
Chen J, Engelhard M, Henao R, Berchuck S, Eichner B, Perrin EM, Sapiro G, Dawson G. Enhancing early autism prediction based on electronic records using clinical narratives. J Biomed Inform 2023; 144:104390. [PMID: 37182592 PMCID: PMC10526711 DOI: 10.1016/j.jbi.2023.104390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 04/14/2023] [Accepted: 05/09/2023] [Indexed: 05/16/2023]
Abstract
Recent work has shown that predictive models can be applied to structured electronic health record (EHR) data to stratify autism likelihood from an early age (<1 year). Integrating clinical narratives (or notes) with structured data has been shown to improve prediction performance in other clinical applications, but the added predictive value of this information in early autism prediction has not yet been explored. In this study, we aimed to enhance the performance of early autism prediction by using both structured EHR data and clinical narratives. We built models based on structured data and clinical narratives separately, and then an ensemble model that integrated both sources of data. We assessed the predictive value of these models from Duke University Health System over a 14-year span to evaluate ensemble models predicting later autism diagnosis (by age 4 years) from data collected from ages 30 to 360 days. Our sample included 11,750 children above by age 3 years (385 meeting autism diagnostic criteria). The ensemble model for autism prediction showed superior performance and at age 30 days achieved 46.8% sensitivity (95% confidence interval, CI: 22.0%, 52.9%), 28.0% positive predictive value (PPV) at high (90%) specificity (CI: 2.0%, 33.1%), and AUC4 (with at least 4-year follow-up for controls) reaching 0.769 (CI: 0.715, 0.811). Prediction by 360 days achieved 44.5% sensitivity (CI: 23.6%, 62.9%), and 13.7% PPV at high (90%) specificity (CI: 9.6%, 18.9%), and AUC4 reaching 0.797 (CI: 0.746, 0.840). Results show that incorporating clinical narratives in early autism prediction achieved promising accuracy by age 30 days, outperforming models based on structured data only. Furthermore, findings suggest that additional features learned from clinician narratives might be hypothesis generating for understanding early development in autism.
Collapse
|
Research Support, N.I.H., Extramural |
2 |
1 |
21
|
de la Iglesia I, Vivó M, Chocrón P, Maeztu GD, Gojenola K, Atutxa A. An open source corpus and automatic tool for section identification in Spanish health records. J Biomed Inform 2023; 145:104461. [PMID: 37536643 DOI: 10.1016/j.jbi.2023.104461] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 07/12/2023] [Accepted: 07/25/2023] [Indexed: 08/05/2023]
Abstract
BACKGROUND Electronic Clinical Narratives (ECNs) store valuable individual's health information. However, there are few available open-source data. Besides, ECNs can be structurally heterogeneous, ranging from documents with explicit section headings or titles to unstructured notes. This lack of structure complicates building automatic systems and their evaluation. OBJECTIVE The aim of the present work is to provide the scientific community with a Spanish open-source dataset to build and evaluate automatic section identification systems. Together with this dataset, the purpose is to design and implement a suitable evaluation measure and a fine-tuned language model adapted to the task. MATERIALS AND METHODS A corpus of unstructured clinical records, in this case progress notes written in Spanish, was annotated with seven major section types. Existing metrics for the presented task were thoroughly assessed and, based on the most suitable one, we defined a new B2 metric better tailored given the task. RESULTS The annotated corpus, as well as the designed new evaluation script and a baseline model are freely available for the community. This model reaches an average B2 score of 71.3 on our open source dataset and an average B2 of 67.0 in data scarcity scenarios where the target corpus and its structure differs from the dataset used for training the LM. CONCLUSION Although section identification in unstructured clinical narratives is challenging, this work shows that it is possible to build competitive automatic systems when both data and the right evaluation metrics are available. The annotated data, the implemented evaluation scripts, and the section identification Language Model are open-sourced hoping that this contribution will foster the building of more and better systems.
Collapse
|
|
2 |
1 |
22
|
Vakili T, Henriksson A, Dalianis H. End-to-end pseudonymization of fine-tuned clinical BERT models : Privacy preservation with maintained data utility. BMC Med Inform Decis Mak 2024; 24:162. [PMID: 38915012 PMCID: PMC11197357 DOI: 10.1186/s12911-024-02546-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 05/21/2024] [Indexed: 06/26/2024] Open
Abstract
Many state-of-the-art results in natural language processing (NLP) rely on large pre-trained language models (PLMs). These models consist of large amounts of parameters that are tuned using vast amounts of training data. These factors cause the models to memorize parts of their training data, making them vulnerable to various privacy attacks. This is cause for concern, especially when these models are applied in the clinical domain, where data are very sensitive. Training data pseudonymization is a privacy-preserving technique that aims to mitigate these problems. This technique automatically identifies and replaces sensitive entities with realistic but non-sensitive surrogates. Pseudonymization has yielded promising results in previous studies. However, no previous study has applied pseudonymization to both the pre-training data of PLMs and the fine-tuning data used to solve clinical NLP tasks. This study evaluates the effects on the predictive performance of end-to-end pseudonymization of Swedish clinical BERT models fine-tuned for five clinical NLP tasks. A large number of statistical tests are performed, revealing minimal harm to performance when using pseudonymized fine-tuning data. The results also find no deterioration from end-to-end pseudonymization of pre-training and fine-tuning data. These results demonstrate that pseudonymizing training data to reduce privacy risks can be done without harming data utility for training PLMs.
Collapse
|
research-article |
1 |
|
23
|
Moulin TC. Learning with AI Language Models: Guidelines for the Development and Scoring of Medical Questions for Higher Education. J Med Syst 2024; 48:45. [PMID: 38652327 DOI: 10.1007/s10916-024-02069-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 04/11/2024] [Indexed: 04/25/2024]
Abstract
In medical and biomedical education, traditional teaching methods often struggle to engage students and promote critical thinking. The use of AI language models has the potential to transform teaching and learning practices by offering an innovative, active learning approach that promotes intellectual curiosity and deeper understanding. To effectively integrate AI language models into biomedical education, it is essential for educators to understand the benefits and limitations of these tools and how they can be employed to achieve high-level learning outcomes.This article explores the use of AI language models in biomedical education, focusing on their application in both classroom teaching and learning assignments. Using the SOLO taxonomy as a framework, I discuss strategies for designing questions that challenge students to exercise critical thinking and problem-solving skills, even when assisted by AI models. Additionally, I propose a scoring rubric for evaluating student performance when collaborating with AI language models, ensuring a comprehensive assessment of their learning outcomes.AI language models offer a promising opportunity for enhancing student engagement and promoting active learning in the biomedical field. Understanding the potential use of these technologies allows educators to create learning experiences that are fit for their students' needs, encouraging intellectual curiosity and a deeper understanding of complex subjects. The application of these tools will be fundamental to provide more effective and engaging learning experiences for students in the future.
Collapse
|
Letter |
1 |
|
24
|
Hussain Z, Mata R, Wulff DU. Novel embeddings improve the prediction of risk perception. EPJ DATA SCIENCE 2024; 13:38. [PMID: 38799195 PMCID: PMC11111540 DOI: 10.1140/epjds/s13688-024-00478-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/04/2024] [Indexed: 05/29/2024]
Abstract
We assess whether the classic psychometric paradigm of risk perception can be improved or supplanted by novel approaches relying on language embeddings. To this end, we introduce the Basel Risk Norms, a large data set covering 1004 distinct sources of risk (e.g., vaccination, nuclear energy, artificial intelligence) and compare the psychometric paradigm against novel text and free-association embeddings in predicting risk perception. We find that an ensemble model combining text and free association rivals the predictive accuracy of the psychometric paradigm, captures additional affect and frequency-related dimensions of risk perception not accounted for by the classic approach, and has greater range of applicability to real-world text data, such as news headlines. Overall, our results establish the ensemble of text and free-association embeddings as a promising new tool for researchers and policymakers to track real-world risk perception. Supplementary Information The online version contains supplementary material available at 10.1140/epjds/s13688-024-00478-x.
Collapse
|
research-article |
1 |
|
25
|
Gorenstein L, Konen E, Green M, Klang E. Bidirectional Encoder Representations from Transformers in Radiology: A Systematic Review of Natural Language Processing Applications. J Am Coll Radiol 2024; 21:914-941. [PMID: 38302036 DOI: 10.1016/j.jacr.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/13/2024] [Accepted: 01/26/2024] [Indexed: 02/03/2024]
Abstract
INTRODUCTION Bidirectional Encoder Representations from Transformers (BERT), introduced in 2018, has revolutionized natural language processing. Its bidirectional understanding of word context has enabled innovative applications, notably in radiology. This study aimed to assess BERT's influence and applications within the radiologic domain. METHODS Adhering to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a systematic review, searching PubMed for literature on BERT-based models and natural language processing in radiology from January 1, 2018, to February 12, 2023. The search encompassed keywords related to generative models, transformer architecture, and various imaging techniques. RESULTS Of 597 results, 30 met our inclusion criteria. The remaining were unrelated to radiology or did not use BERT-based models. The included studies were retrospective, with 14 published in 2022. The primary focus was on classification and information extraction from radiology reports, with x-rays as the prevalent imaging modality. Specific investigations included automatic CT protocol assignment and deep learning applications in chest x-ray interpretation. CONCLUSION This review underscores the primary application of BERT in radiology for report classification. It also reveals emerging BERT applications for protocol assignment and report generation. As BERT technology advances, we foresee further innovative applications. Its implementation in radiology holds potential for enhancing diagnostic precision, expediting report generation, and optimizing patient care.
Collapse
|
Systematic Review |
1 |
|