Given a language model simply represented by P(Y|X), there’re several methods we can use to generate a sentence.
Sampling & Argmax
Sampling: generate a random sentence according to the probility distribution
Argmax: generate the sentence with the highest probility
Greedy Search
Greedy Search selects the most likely word at each step in the output sequence, that is, pick the single highest-probility word one by one. $$y_i = argmax P(y_i | X, y_1, y_2, …, y_{i-1})$$ The benefit is its speed, it can generate sentences vary fast, but the drawback is also obvious, it will choose easy and common word first because they’re more frequent.
Beam Search
The local beam search algorithm keeps track of k states rather than just one. It begins with k randomly generated states. At each step, all the successors of all k states are generated. If any one is a goal, the algorithm halts. Otherwise, it selects the k best successors from the complete list and repeats. *Here, k = 2 and we keep 2 most possible candidates at every time step
Basically, beam search is a heuristic search algorithm, and it can use breadth-first search to build its search tree.
For the traditional beam search, the k is fixed and set at the beginning. While there are several optimization strategies for beam search. For example, the size of candidate may vary at each time step depending on the candidate scores[1], and we can add additional penalizing term[2].
References
CMU CS 11-747
How to Implement a Beam Search Decoder for Natural Language Processing
Artificial Intelligence: A Modern Approach (3rd Edition)
[1]Beam Search Strategies for Neural Machine Translation
[2]A Simple, Fast Diverse Decoding Algorithm for Neural Generation