1.1. Overview
Explain components, properties, scopes, techniques, and assessments of dialogue systems.
Components
Genre
Conversation: interactive communication between two or more people.
Dialogue: a conversation, often between two people, with a specific goal in mind.
Dialog: a window that appears on a screen in computing contexts (e.g., dialog box).
Application
Dialogue System: a computer system that interacts with humans in natural language.
Conversational Agent: a dialogue system that interprets and responds to user statements.
Virtual Assistant: a dialogue system that performs tasks or services for user requests.
Chatbot: a dialogue system that simulates and processes human conversation.
Chatbots are typically understood to follow pre-defined dialogue flows for open-domain conversations without using sophisticated artificial intelligence technology.
Intelligence
Dialogue Management: a process of controlling the state and flow of the dialogue to conduct contextual communications.
Conversational AI: a type of Artificial Intelligence (AI) for a dialogue system to understand user inputs and respond properly to them, often processed by machine learning models.
What are examples of dialogue systems currently used in practical applications?
Are there applications that would greatly benefit from adopting dialogue systems?
Properties
Unit
Turn: a single contribution from one speaker to the dialogue.
Utterance: a natural unit of speech bounded by breaths or pauses.
For a text-based conversation, each turn is often considered an utterance.
Context
Speech Act: the action, either explicitly or implicitly, expressed by an utterance (e.g., answering, advising, greeting; see Switchboard Dialog Act Corpus).
Intent: the user's goal expressed by an utterance within the context of a conversation (e.g., making an appointment, requesting information).
Topic: the matter dealt with in an utterance (e.g., movie, family, midterm).
It is possible that one utterance expresses multiple speech acts and intents and also deals with various topics.
Classify each of the following utterances from Friends S1E1 using the dialogue acts: http://compprag.christopherpotts.net/swda.html
Ross: Hi.
Joey: This guy says hello, I wanna kill myself.
Monica: Are you okay, sweetie?
Ross: I just feel like someone reached down my throat, grabbed my small intestine, pulled it out of my mouth and tied it around my neck...
Chandler: Cookie?
Monica: Carol moved her stuff out today.
Joey: Ohh.
Monica: Let me get you some coffee.
Ross: Thanks.
Scopes
Task-oriented
Task-oriented dialogue systems have specific tasks to be accomplished:
The Second Dialog State Tracking Challenge, Henderson et al., SIGDIAL, 2014 (dataset).
Conditional Generation and Snapshot Learning in Neural Dialogue Systems, Wen et al., EMNLP 2016 (dataset).
Learning End-to-End Goal-Oriented Dialog, Bordes et al., ICLR, 2017 (dataset).
Key-Value Retrieval Networks for Task-Oriented Dialogue, Eric et al., SIGDIAL, 2017 (dataset).
MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling, Budzianowski et al., EMNLP, 2018 (dataset).
Entity-Consistent End-to-end Task-Oriented Dialogue System with KB Retriever, Qin et al., EMNLP, 2019 (dataset).
Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset, Rastogi et al., AAAI, 2020 (dataset).
Open-domain
Open-domain dialogue systems aim to talk about any topics without specific end goals:
OpenAI ChatGPT (demo; requires login)
What kind of tasks are presented in the above task-oriented datasets?
Try the demos of BlenderBot and ChatGPT. What are their limitations?
What are the challenges in building task-oriented vs. open-domain dialogue systems?
Techniques
State Machine
A dialogue flow can be designed into a fine-state machine. Most commercial dialogue systems take this approach because it gives greater control over how the systems behave. Several platforms are available to facilitate the development of state machine-based dialogue systems:
End-to-End
Recent researches focus on developing end-to-end dialogue systems using sequence-to-sequence (S2S) models, which is a type of encoder-decoder model:
Sequence to Sequence Learning with Neural Networks, Sutskever et al., NeurIPS, 2014.
The current state-of-the-art S2S models use transformers such as BERT as their encoders:
Attention is All you Need, Vaswani et al., NeurIPS, 2017.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., NAACL, 2019.
Three of the open-domain dialogue systems above, Meta BlenderBot, OpenAI ChatGPT, and Google LaMDA, are end-to-end systems based on S2S models.
Implementing an end-to-end system is beyond the scope of this course. Thus, we will use the state machine approach to develop dialogue systems, starting from Chapter 2.
Assessments
The primary objective of both task-oriented and open-domain dialogue systems is to satisfy users by communicating with them. For task-oriented, users are generally satisfied if the tasks are accomplished efficiently. For open-domain, however, user satisfaction is often highly subjective, so proper conversational analysis may need to be involved.
Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols, Finch and Choi, SIGDIAL, 2020.
Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems, Finch et al., arXiv, 2022.
Last updated