Algorithms for Natural Language Processing

Based on Wikipedia, Natural language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages. And these days NLP involves information extraction, parsing, machine translation, dialogue systems, question answering system …

And this blog intends to be a collection of algorithms for NLP and tries to get a overview of this field. It will be updated often as my learning path continued.

Language Models

A language model is a distribution over sequences of words or sentences. LMs use the chain rule of probability to compute the entire sequence.

Bag of Words: a text is represented as the bag of its words, disregarding grammar and word order

N-Gram Models

An N-gram is a sequence of N words. The intuition of the N-gram model is that instead of computing the probability of a word given its entire history, we can approximate the history by just the last few words.
Kneser-Ney Smoothing

Maximum Entropy Models

Linear models that are typically regularized via L1 or L2 terms in the likelihood objective avoiding the need for the kinds of backoff or mixture weights used in smoothed N-Gram language models, it’s more statistically efficient.
Related to Maximum Entropy Principle

Neural Network Models

Learn a distributed representation for words with neural networks.
Word Embedding maps each word into a lower-dimensional real-valued space and shares a weight matrix.
RNN can use their internal state (memory) to process sequences of inputs.
LSTM uses several gates to control the state change.

Parsing

Parsing is a combination of recognizing an input string and assigning a structure to it. The structure is mostly trees.

Parse trees are directly useful in applications such as grammar checking, and also serve as an important intermediate stage of representation for semantic analysis and play an important role in applications like question answering and information extraction.

Tagging

POS Tagging is useful as a pre-processing step for parsing
MEMM: left to right local decisions, condition on previous tags and also entire input
Conditional Random Fields
Global Discriminative Tagger

Syntactic Parsing

CKY Parsing
Agenda-Based Parsing

Statistical Parsing

Dependency Parsing

Speech Recognition

Noisy Channel Model

Find the intended word given a word where the letters have been scrambled in some manner. Actually this is not restricted in speech recognition, it’s a general framework in NLP.

Emission Model

Hidden Markov Model: Model assumed to be a Markov process with unobserved states.
Viterbi Algorithm: a dynamic programming algorithm for finding the most likely sequence of hidden states, especially in HMM
Gaussian Mixture Model
EM
Beam search: a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set.

Machine Translation

Evaluation Metrics

BLEU
ROUGE
METEOR
CIDEr

Word Alignment

IBM Model
HMM Model

Modern Approaches

Seq2Seq
Attention Mechanism

Chuanrong Li

Read more posts by this author.