1.1. Overview

Explain components, properties, scopes, techniques, and assessments of dialogue systems.

Components

Genre

Conversation: interactive communication between two or more people.
Dialogue: a conversation, often between two people, with a specific goal in mind.

Dialog: a window that appears on a screen in computing contexts (e.g., dialog box).

Application

Dialogue System: a computer system that interacts with humans in natural language.
Conversational Agent: a dialogue system that interprets and responds to user statements.
Virtual Assistant: a dialogue system that performs tasks or services for user requests.
Chatbot: a dialogue system that simulates and processes human conversation.

Chatbots are typically understood to follow pre-defined dialogue flows for open-domain conversations without using sophisticated artificial intelligence technology.

Intelligence

Dialogue Management: a process of controlling the state and flow of the dialogue to conduct contextual communications.
Conversational AI: a type of Artificial Intelligence (AI) for a dialogue system to understand user inputs and respond properly to them, often processed by machine learning models.

What are examples of dialogue systems currently used in practical applications?
Are there applications that would greatly benefit from adopting dialogue systems?

Properties

Unit

Turn: a single contribution from one speaker to the dialogue.
Utterance: a natural unit of speech bounded by breaths or pauses.

For a text-based conversation, each turn is often considered an utterance.

Context

Speech Act: the action, either explicitly or implicitly, expressed by an utterance (e.g., answering, advising, greeting; see Switchboard Dialog Act Corpus).
Intent: the user's goal expressed by an utterance within the context of a conversation (e.g., making an appointment, requesting information).
Topic: the matter dealt with in an utterance (e.g., movie, family, midterm).

It is possible that one utterance expresses multiple speech acts and intents and also deals with various topics.

Classify each of the following utterances from Friends S1E1 using the dialogue acts: http://compprag.christopherpotts.net/swda.html

Ross: Hi.
Joey: This guy says hello, I wanna kill myself.
Monica: Are you okay, sweetie?
Ross: I just feel like someone reached down my throat, grabbed my small intestine, pulled it out of my mouth and tied it around my neck...
Chandler: Cookie?
Monica: Carol moved her stuff out today.
Joey: Ohh.
Monica: Let me get you some coffee.
Ross: Thanks.

Scopes

Task-oriented

Task-oriented dialogue systems have specific tasks to be accomplished:

The Second Dialog State Tracking Challenge, Henderson et al., SIGDIAL, 2014 (dataset).
Conditional Generation and Snapshot Learning in Neural Dialogue Systems, Wen et al., EMNLP 2016 (dataset).
Learning End-to-End Goal-Oriented Dialog, Bordes et al., ICLR, 2017 (dataset).
Key-Value Retrieval Networks for Task-Oriented Dialogue, Eric et al., SIGDIAL, 2017 (dataset).
MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling, Budzianowski et al., EMNLP, 2018 (dataset).
Entity-Consistent End-to-end Task-Oriented Dialogue System with KB Retriever, Qin et al., EMNLP, 2019 (dataset).
Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset, Rastogi et al., AAAI, 2020 (dataset).

Open-domain

Open-domain dialogue systems aim to talk about any topics without specific end goals:

Alexa Prize Socialbot Grand Challenge (Emora demo)
Meta BlenderBot (demo)
OpenAI ChatGPT (demo; requires login)
Google LaMDA (article; interview)

What kind of tasks are presented in the above task-oriented datasets?
Try the demos of BlenderBot and ChatGPT. What are their limitations?
What are the challenges in building task-oriented vs. open-domain dialogue systems?

Techniques

State Machine

A dialogue flow can be designed into a fine-state machine. Most commercial dialogue systems take this approach because it gives greater control over how the systems behave. Several platforms are available to facilitate the development of state machine-based dialogue systems:

End-to-End

Recent researches focus on developing end-to-end dialogue systems using sequence-to-sequence (S2S) models, which is a type of encoder-decoder model:

Sequence to Sequence Learning with Neural Networks, Sutskever et al., NeurIPS, 2014.

The current state-of-the-art S2S models use transformers such as BERT as their encoders:

Attention is All you Need, Vaswani et al., NeurIPS, 2017.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., NAACL, 2019.

Three of the open-domain dialogue systems above, Meta BlenderBot, OpenAI ChatGPT, and Google LaMDA, are end-to-end systems based on S2S models.

Implementing an end-to-end system is beyond the scope of this course. Thus, we will use the state machine approach to develop dialogue systems, starting from Chapter 2.

Assessments

The primary objective of both task-oriented and open-domain dialogue systems is to satisfy users by communicating with them. For task-oriented, users are generally satisfied if the tasks are accomplished efficiently. For open-domain, however, user satisfaction is often highly subjective, so proper conversational analysis may need to be involved.

Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols, Finch and Choi, SIGDIAL, 2020.
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges, Mehri et al., arXiv, 2022.
Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems, Finch et al., arXiv, 2022.

Previous1. Exploration Next1.2. Project Ideas

Last updated 2 years ago

Was this helpful?