1
|
Irfan B, Kuoppamäki S, Skantze G. Recommendations for designing conversational companion robots with older adults through foundation models. Front Robot AI 2024; 11:1363713. [PMID: 38860032 PMCID: PMC11163135 DOI: 10.3389/frobt.2024.1363713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 05/07/2024] [Indexed: 06/12/2024] Open
Abstract
Companion robots are aimed to mitigate loneliness and social isolation among older adults by providing social and emotional support in their everyday lives. However, older adults' expectations of conversational companionship might substantially differ from what current technologies can achieve, as well as from other age groups like young adults. Thus, it is crucial to involve older adults in the development of conversational companion robots to ensure that these devices align with their unique expectations and experiences. The recent advancement in foundation models, such as large language models, has taken a significant stride toward fulfilling those expectations, in contrast to the prior literature that relied on humans controlling robots (i.e., Wizard of Oz) or limited rule-based architectures that are not feasible to apply in the daily lives of older adults. Consequently, we conducted a participatory design (co-design) study with 28 older adults, demonstrating a companion robot using a large language model (LLM), and design scenarios that represent situations from everyday life. The thematic analysis of the discussions around these scenarios shows that older adults expect a conversational companion robot to engage in conversation actively in isolation and passively in social settings, remember previous conversations and personalize, protect privacy and provide control over learned data, give information and daily reminders, foster social skills and connections, and express empathy and emotions. Based on these findings, this article provides actionable recommendations for designing conversational companion robots for older adults with foundation models, such as LLMs and vision-language models, which can also be applied to conversational robots in other domains.
Collapse
Affiliation(s)
- Bahar Irfan
- Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Sanna Kuoppamäki
- Division of Health Informatics and Logistics, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Gabriel Skantze
- Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
2
|
Ling Y, Cai F, Liu J, Chen H, de Rijke M. Keep and Select: Improving Hierarchical Context Modeling for Multi-Turn Response Generation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:3636-3649. [PMID: 34587100 DOI: 10.1109/tnnls.2021.3112700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Hierarchical context modeling plays an important role in the response generation for multi-turn conversational systems. Previous methods mainly model context as multiple independent utterances and rely on attention mechanisms to obtain the context representation. They tend to ignore the explicit responds-to relationships between adjacent utterances and the special role that the user's latest utterance (the query) plays in determining the success of a conversation. To deal with this, we propose a multi-turn response generation model named KS-CQ, which contains two crucial components, the Keep and the Select modules, to produce a neighbor-aware context representation and a context-enriched query representation. The Keep module recodes each utterance of context by attentively introducing semantics from its prior and posterior neighboring utterances. The Select module treats the context as background information and selectively uses it to enrich the query representing process. Extensive experiments on two benchmark multi-turn conversation datasets demonstrate the effectiveness of our proposal compared with the state-of-the-art baselines in terms of both automatic and human evaluations.
Collapse
|
3
|
Shi T, Song Y. A Novel Two-Stage Generation Framework for Promoting the Persona-Consistency and Diversity of Responses in Neural Dialog Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1552-1562. [PMID: 34460398 DOI: 10.1109/tnnls.2021.3105584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Although quite natural for human beings to communicate based on their own personality in daily life, it is rather challenging for neural dialog systems to do the same. This is because the general dialog systems are difficult to generate diverse responses while at the same time maintaining consistent persona information. Existing methods basically focus on merely one of them, ignoring either of them will reduce the quality of dialog. In this work, we propose a two-stage generation framework to promote the persona-consistency and diversity of responses. In the first stage, we propose a persona-guided conditional variational autoencoder (persona-guided CVAE) to generate diverse responses, and the main difference when compared with general CVAE-based model is that we use additional dialog attribute to assist the latent variables to encode the effective information in the response and further use it as a guiding vector for response generation. In the second stage, we employ persona-consistency checking module and the response rewriting module to mask the inconsistent word in the generated response prototype and rewrite it to more consistent. Automatic evaluation results demonstrate that the proposed model is able to generate diverse and persona-consistent responses.
Collapse
|
4
|
Peng W, Qin Z, Hu Y, Xie Y, Li Y. FADO: Feedback-Aware Double COntrolling Network for Emotional Support Conversation. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
5
|
Naous T, Bassyouni Z, Mousi B, Hajj H, Hajj WE, Shaban K. Open-Domain Response Generation in Low-Resource Settings using Self-Supervised Pre-training of Warm-Started Transformers. ACM T ASIAN LOW-RESO 2023. [DOI: 10.1145/3579164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Learning response generation models constitute the main component of building open-domain dialogue systems. However, training open-domain response generation models requires large amounts of labeled data and pre-trained language generation models that are often nonexistent for low-resource languages. In this paper, we propose a framework for training open-domain response generation models in low-resource settings. We consider Dialectal Arabic (DA) as a working example. The framework starts by warm-starting a transformer-based encoder-decoder with pre-trained language model parameters. Next, the resultant encoder-decoder model is adapted to DA by employing self-supervised pre-training on large-scale unlabeled data in the desired dialect. Finally, the model is fine-tuned on a very small labeled dataset for open-domain response generation. The results show significant performance improvements on three spoken Arabic dialects after adopting the framework’s three stages, highlighted by higher BLEU and lower Perplexity scores compared with multiple baseline models. Specifically, our models are capable of generating fluent responses in multiple dialects with an average human-evaluated fluency score above 4. Our data is made publicly available.
Collapse
Affiliation(s)
- Tarek Naous
- American University of Beirut, Qatar University
| | | | | | | | | | | |
Collapse
|
6
|
Clavel C, Labeau M, Cassell J. Socio-conversational systems: Three challenges at the crossroads of fields. Front Robot AI 2022; 9:937825. [PMID: 36591412 PMCID: PMC9797522 DOI: 10.3389/frobt.2022.937825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022] Open
Abstract
Socio-conversational systems are dialogue systems, including what are sometimes referred to as chatbots, vocal assistants, social robots, and embodied conversational agents, that are capable of interacting with humans in a way that treats both the specifically social nature of the interaction and the content of a task. The aim of this paper is twofold: 1) to uncover some places where the compartmentalized nature of research conducted around socio-conversational systems creates problems for the field as a whole, and 2) to propose a way to overcome this compartmentalization and thus strengthen the capabilities of socio-conversational systems by defining common challenges. Specifically, we examine research carried out by the signal processing, natural language processing and dialogue, machine/deep learning, social/affective computing and social sciences communities. We focus on three major challenges for the development of effective socio-conversational systems, and describe ways to tackle them.
Collapse
Affiliation(s)
- Chloé Clavel
- LTCI, Telecom-Paris, Institut Polytechnique de Paris, Paris, France,*Correspondence: Chloé Clavel,
| | - Matthieu Labeau
- LTCI, Telecom-Paris, Institut Polytechnique de Paris, Paris, France
| | - Justine Cassell
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States,Inria, Paris, France
| |
Collapse
|
7
|
Ma T, Zhang Z, Rong H, Al-Nabhan N. SPK-CG: Siamese Network based Posterior Knowledge Selection Model for Knowledge Driven Conversation Generation. ACM T ASIAN LOW-RESO 2022. [DOI: 10.1145/3569579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Building a human-computer conversational system that can communicate with humans is a research hotspot in the field of artificial intelligence. Traditional dialogue systems tend to produce irrelevant and non-information responses, which reduce people’s interest in engaging in a conversation. This often leads to boring conversations. To alleviate this problem, many researchers use external knowledge to assist conversation generation. The accuracy of knowledge selection is the prerequisite to ensure the quality of knowledge conversation. This approach has worked positively to a certain extent, but generally only searches knowledge information based on entity words themselves, without considering the specific conversation context. Therefore, if irrelevant knowledge is retrieved, the quality of conversation generation will be reduced. Motivated by this, we propose a novel neural knowledge-based conversation generation model, named Siamese Network based Posterior Knowledge Selection Model for Knowledge Driven Conversation Generation(SPK-CG). We have designed a novel knowledge selection mechanism to obtain knowledge information that are highly relevant to the context of the conversation. Specifically, the posterior knowledge distribution is used as a soft label to make the prior distribution consistent with the posterior distribution in the training process. At the same time, in order to narrow the gap between prior and posterior distributions and improve the accuracy of knowledge selection, we leverage siamese network and design multi-granularity matching module for knowledge selection. Compared with previous knowledge-based models, our method can select more appropriate knowledge and use the selected knowledge to generate responses that are more relevant to the conversation context. Extensive automatic and human evaluations demonstrate that our model has advantages over previous baselines.
Collapse
Affiliation(s)
- Tinghuai Ma
- Nanjing University of Information Science & Technology, China
| | - Zheng Zhang
- Nanjing University of Information Science & Technology, China
| | - Huan Rong
- Nanjing University of Information Science & Technology, China
| | | |
Collapse
|
8
|
Yan C, Bai J, Wang Y, Rong W, Ouyang Y, Xiong Z. Goal-oriented conditional variational autoencoders for proactive and knowledge-aware conversational recommender system. COMPUT SPEECH LANG 2022. [DOI: 10.1016/j.csl.2022.101468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Frummet A, Elsweiler D, Ludwig B. “What Can I Cook with these Ingredients?” - Understanding Cooking-Related Information Needs in Conversational Search. ACM T INFORM SYST 2022. [DOI: 10.1145/3498330] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
As conversational search becomes more pervasive, it becomes increasingly important to understand the users’ underlying information needs when they converse with such systems in diverse domains. We conduct an in situ study to understand information needs arising in a home cooking context as well as how they are verbally communicated to an assistant. A human experimenter plays this role in our study. Based on the transcriptions of utterances, we derive a detailed hierarchical taxonomy of diverse information needs occurring in this context, which require different levels of assistance to be solved. The taxonomy shows that needs can be communicated through different linguistic means and require different amounts of context to be understood. In a second contribution, we perform classification experiments to determine the feasibility of predicting the type of information need a user has during a dialogue using the turn provided. For this multi-label classification problem, we achieve average F1 measures of 40% using BERT-based models. We demonstrate with examples which types of needs are difficult to predict and show why, concluding that models need to include more context information in order to improve both information need classification and assistance to make such systems usable.
Collapse
|
10
|
Sequential or jumping: context-adaptive response generation for open-domain dialogue systems. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04067-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
11
|
Chen J, Zeng B, Du Z, Deng H, Xu M, Gan Z, Ding M. RFM: response-aware feedback mechanism for background based conversation. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04056-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
12
|
A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation. ACM T INFORM SYST 2022. [DOI: 10.1145/3522763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Recently, research on open domain dialogue systems have attracted extensive interests of academic and industrial researchers. The goal of an open domain dialogue system is to imitate humans in conversations. Previous works on single turn conversation generation have greatly promoted the research of open domain dialogue systems. However, understanding multiple single turn conversations is not equal to the understanding of multi turn dialogue due to the coherent and context dependent properties of human dialogue. Therefore, in open domain multi turn dialogue generation, it is essential to modeling the contextual semantics of the dialogue history, rather than only according to the last utterance. Previous research had verified the effectiveness of the hierarchical recurrent encoder-decoder framework on open domain multi turn dialogue generation. However, using RNN-based model to hierarchically encoding the utterances to obtain the representation of dialogue history still face the problem of a vanishing gradient. To address this issue, in this paper, we proposed a static and dynamic attention-based approach to model the dialogue history and then generate open domain multi turn dialogue responses. Experimental results on Ubuntu and Opensubtitles datasets verify the effectiveness of the proposed static and dynamic attention-based approach on automatic and human evaluation metrics in various experimental settings. Meanwhile, we also empirically verify the performance of combining the static and dynamic attentions on open domain multi turn dialogue generation.
Collapse
|
13
|
Ling Y, Cai F, Liu J, Chen H, de Rijke M. Generating Relevant and Informative Questions for Open-domain Conversations. ACM T INFORM SYST 2022. [DOI: 10.1145/3510612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Recent research has highlighted the importance of mixed-initiative interactions in conversational search. To enable mixed-initiative interactions, information retrieval systems should be able to ask diverse questions, such as information-seeking, clarification, and open-ended ones. QG of open-domain conversational systems aims at enhancing the interactiveness and persistence of human-machine interactions. The task is challenging because of the sparsity of question generation (QG)-specific data in conversations. Current work is limited to single-turn interaction scenarios. We propose a
context-enhanced neural question generation
(CNQG) model that leverages the conversational context to predict question content and pattern, then perform question decoding. A hierarchical encoder framework is employed to obtain the discourse-level context representation. Based on this, we propose
Review
and
Transit
mechanisms to respectively select contextual keywords and predict new topic words to further construct the question content. Conversational context and the predicted question content are used to produce the question pattern, which in turn guides the question decoding process implemented by a recurrent decoder with a joint attention mechanism. To fully utilize the limited QG-specific data to train our question generator, we perform multi-task learning with three auxiliary training objectives, i.e., question pattern prediction,
Review
, and
Transit
mechanisms. The required additional labeled data is obtained in a self-supervised way. We also design a weight decaying strategy to adjust the influences of various auxiliary learning tasks. To the best of our acknowledge, we are the first to extend the application of QG to the multi-turn open-domain conversational scenario. Extensive experimental results demonstrate the effectiveness of our proposal and its main components on generating relevant and informative questions, with robust performance for contexts with various lengths.
Collapse
Affiliation(s)
- Yanxiang Ling
- Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, China and College of Information and Communication, National University of Defense Technology, China
| | - Fei Cai
- Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, China
| | - Jun Liu
- Department of Computer Science and Technology, Xiâan JiaoTong University, China
| | - Honghui Chen
- Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, China
| | | |
Collapse
|
14
|
Li J, Liu C, Tao C, Chan Z, Zhao D, Zhang M, Yan R. Dialogue History Matters! Personalized Response Selection in Multi-Turn Retrieval-Based Chatbots. ACM T INFORM SYST 2021. [DOI: 10.1145/3453183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Existing multi-turn context-response matching methods mainly concentrate on obtaining multi-level and multi-dimension representations and better interactions between context utterances and response. However, in real-place conversation scenarios, whether a response candidate is suitable not only counts on the given dialogue context but also other backgrounds, e.g., wording habits, user-specific dialogue history content. To fill the gap between these up-to-date methods and the real-world applications, we incorporate user-specific dialogue history into the response selection and propose a personalized hybrid matching network (PHMN). Our contributions are two-fold: (1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information; (2) we perform hybrid representation learning on context-response utterances and explicitly incorporate a customized attention mechanism to extract vital information from context-response interactions so as to improve the accuracy of matching. We evaluate our model on two large datasets with user identification, i.e., personalized Ubuntu dialogue Corpus (P-Ubuntu) and personalized Weibo dataset (P-Weibo). Experimental results confirm that our method significantly outperforms several strong models by combining personalized attention, wording behaviors, and hybrid representation learning.
Collapse
Affiliation(s)
- Juntao Li
- Wangxuan Institute of Computer Technology and Center for Data Science, Academy for Advanced Interdisciplinary Studies, Peking University, Haidian Qu, Beijing Shi, China
| | - Chang Liu
- Wangxuan Institute of Computer Technology and Center for Data Science, Academy for Advanced Interdisciplinary Studies, Peking University, Haidian Qu, Beijing Shi, China
| | - Chongyang Tao
- Wangxuan Institute of Computer Technology, Peking University, Haidian Qu, Beijing Shi, China
| | - Zhangming Chan
- Wangxuan Institute of Computer Technology, Peking University, Haidian Qu, Beijing Shi, China
| | - Dongyan Zhao
- Wangxuan Institute of Computer Technology, Peking University, Haidian Qu, Beijing Shi, China
| | - Min Zhang
- Soochow University, Suzhou, Jiang Su, China
| | - Rui Yan
- Gaoling School of Artificial Intelligence, Renmin University of China, Haidian Qu, Beijing Shi, China
| |
Collapse
|
15
|
Kopp S, Krämer N. Revisiting Human-Agent Communication: The Importance of Joint Co-construction and Understanding Mental States. Front Psychol 2021; 12:580955. [PMID: 33833705 PMCID: PMC8021865 DOI: 10.3389/fpsyg.2021.580955] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 02/10/2021] [Indexed: 11/29/2022] Open
Abstract
The study of human-human communication and the development of computational models for human-agent communication have diverged significantly throughout the last decade. Yet, despite frequently made claims of “super-human performance” in, e.g., speech recognition or image processing, so far, no system is able to lead a half-decent coherent conversation with a human. In this paper, we argue that we must start to re-consider the hallmarks of cooperative communication and the core capabilities that we have developed for it, and which conversational agents need to be equipped with: incremental joint co-construction and mentalizing. We base our argument on a vast body of work on human-human communication and its psychological processes that we reason to be relevant and necessary to take into account when modeling human-agent communication. We contrast those with current conceptualizations of human-agent interaction and formulate suggestions for the development of future systems.
Collapse
Affiliation(s)
- Stefan Kopp
- Social Cognitive Systems Group, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Nicole Krämer
- Department of Social Psychology, Media and Communication, University of Duisburg-Essen, Duisburg, Germany
| |
Collapse
|
16
|
Abstract
Open-domain generative dialogue systems have attracted considerable attention over the past few years. Currently, how to automatically evaluate them is still a big challenge. As far as we know, there are three kinds of automatic evaluations for open-domain generative dialogue systems: (1) Word-overlap-based metrics; (2) Embedding-based metrics; (3) Learning-based metrics. Due to the lack of systematic comparison, it is not clear which kind of metrics is more effective. In this article, we first measure systematically all kinds of metrics to check which kind is best. Extensive experiments demonstrate that learning-based metrics are the most effective evaluation metrics for open-domain generative dialogue systems. Moreover, we observe that nearly all learning-based metrics depend on the negative sampling mechanism, which obtains extremely imbalanced and low-quality samples to train a score model. To address this issue, we propose a novel learning-based metric that significantly improves the correlation with human judgments by using augmented
PO
sitive samples and valuable
NE
gative samples, called PONE. Extensive experiments demonstrate that PONE significantly outperforms the state-of-the-art learning-based evaluation method. Besides, we have publicly released the codes of our proposed metric and state-of-the-art baselines.
1
Collapse
Affiliation(s)
- Tian Lan
- Beijing Institute of Technology, Haidian, Beijing, China
| | - Xian-Ling Mao
- Beijing Institute of Technology, Haidian, Beijing, China
| | - Wei Wei
- Huazhong University of Science and Technology, Wuhan, China
| | - Xiaoyan Gao
- Beijing Institute of Technology, Haidian, Beijing, China
| | - Heyan Huang
- Beijing Institute of Technology, Haidian, Beijing, China
| |
Collapse
|