A dialogue manager (DM) is the central component of a spoken dialogue system (illustrated below) and is responsible for the state, as well as the flow, of the conversation. Though there are many types of DM that fulfil different roles, one thing every DM has in common is that they are stateful.
The DM accepts input from the Automatic Speech Recognition (ASR) and NLU components, interacts with external resources and knowledge bases, produces the output message, and controls the general flow of the dialogue.
The input to the DM begins with a human utterance that is then typically converted to some system-specific semantic rendering by the natural language understanding (NLU) component. For example, in a system for booking flight tickets, an input can resemble a structure like: ORDER(from=TLV, to=JER, date=2021-01-02).
The output of the DM is usually a list of instructions for other parts of the dialogue system to take care of, and the natural language generation (NLG) component in particular. These instructions tend to be represented in a semantic way, such as: OFFER (flight-num=422, flight-time=13:00), which are then converted back to human language by the natural language generation component.
A dialogue management process is typically viewed through two main tasks:
Nisar Shah (2018) Introduction to Dialogue Systems (Part 1)
This task provides the information that will be required for the dialogue control to take action. Dialogue state tracking gathers information and may include a record of what has been said throughout the dialogue (such as the entities being discussed). Another type of information is the task record (or form/frame/template), which tells the system what information has yet to be gathered throughout the dialogue. This information could be in a form consisting of slots to be filled with values obtained throughout the course of a conversation.
This stage within dialogue management involves deciding on what to do next within the current dialogue state, now that the system has all of the information.
The decisions involved could include prompting for more input, clarifying previous input or outputting information.
When does it know which decision to take? This can be predefined, with choices based on metrics such as confidence levels associated with the quality of input. Therefore, if the confidence level reached a certain point, the system can be sure it has understood the input and can move on to the next step. If, however, the levels are low, it must work on interpreting the input, which could involve asking the user to repeat an utterance.