Chatbot Overview

Generally, two kinds of dialog systems: task-oriented dialogue agents (help complete tasks) and chatbot (for conversation).

Chatbot

Some basic Concepts

A dialogue is a sequence of turns, each a single contribution to the dialogue. Endpointing or endpoint detection: Spoken dialogue systems must detect whether a user is done speaking, so they can process the utterance and respond. Grounding means acknowledging that the hearer has understood the speaker. initiative: sometimes a conversation is completely controlled by one participant. Implicature means a particular class of licensed inferences, in other words, the speaker is communicating more information than seems to be present in the uttered words.

Rule-based Systems

The most famous one is ELIZA, it’s a kind of system that has a lot of hand-written rules to extract keywords, and then apply transform rules to it, generating the final response. Trick behind ELIZA is Rogerian psychotherapy, that is to draw the patient out by reflecting patient’s statements back at them.

For example, one pattern is (0 YOU 0 ME),
and there is a transform rule (WHAT MAKES YOU THINK I 3 YOU)

So, given the sentence, You hate me,
ELIZA would apply this rule and generate WHAT MAKES YOU THINK I HATE YOU

Information Retrieval Systems

This kind of systems mine large datasets of human-human conversations, which can be done by using information retrieval.

The simplest methods are 1. Return the response to the most similar turn This results in a response answering the most similar question 2. Return the most similar turn This results in a response very similar to the question

In our corpus, we have a conversation:
- Do you like Taylor?
- Yes! I love her songs!

And given a sentence, Do you like Lady Gaga?
The first approach would return Yes! I love her songs!
While the second one returns Do you like Taylor?

Encoder-decoder models

The idea of this system comes from phrase-based machine translation, translating a user turn to a system response. Basically it’s a seq2seq model trained on large conversation dataset, given the first sentence in a turn, predicting the next one in the same turn.

Task-oriented agents

GUS Architecture

It’s based around frames, a frame is a kind of knowledge structure representing the kinds of intentions the system can extract from user sentences, and consists of a collection of slots, each of which can take a set of possible values. And domain ontology is set of frames.

The control architecture’s goal is to fill the slots in the frame with the fillers the user intends, and then perform the relevant action for the user. To do this, the system asks questions of the user, filling any slot that the user specifies.

This involves several Natural Language Understanding tasks:

  • Domain classification
  • Intent determination
  • Slot filling

Evaluation

It’s really hard to use automatic metrics like BLUE to evaluate chatbot response because of the open-domain dialog, the best way would still be human scoring.

And for task oriented agents, we can use Slot Error Rate:

$$ \frac{number \ of \ inserted/deleted/subsituted \ slots}{number \ of \ total \ reference \ slots \ for \ sentence} $$

Chuanrong Li

Read more posts by this author.