Word Representation

Motivation of Word Representation Unlike images data which have a long tradition of using vectors of pixels, natural language text has no unified representation for a long time. It was always regarded as discrete atomic symbol, where each word was asigned an unique id. Recent years, a popular idea in modern machine learning is to represent words by vectors. Breif History of Word Representation Dictionary Lookup One Hot Encoding Word embedding (distributional semantic model) Distributed Word Representations word2vec Glove Contextural Word Representations CoVe ELMo BERT Dictionary Lookup The most straightforward way to represent a word is to create a dictionary and assign every word a unique ID. »

Attention Mechanism

First introduced in computer vision, visual attention explained the perception towards the world of creatures in some way. I remember the first time I met attention was reading the paper Show and tell: A neural image caption generator during doing work related to image captioning. And in the area of NLP, tasks like machine translation, speech recognition, attention really makes a difference, it was started by combining with Seq2Seq model, then new model like Transformer tried to getting out of this. »

Chinese Topic Modeling Hands On Practice

Task Specification Topic models are type of statistical models for discovering the abstract “topics” that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. About LDA Latent Dirichlet Allocation (LDA) is a generative probabilistic model for collections of discrete data such as text corpora and explain why some parts of the data are similar. It’s widely being used in tasks like topic modeling, text classification and collaborative filtering. »

LDA Explained

Brief Definition Latent Dirichlet Allocation(LDA) is a generative probabilistic model, a three-level hierarchical Bayesian model. In the model, each item of a collection is modeled as a finite mixture over an underlying set of topics. And each topic is modeled as an infinite mixture over an underlying set of topic probabilities. Process Overview LDA assumes the following generative process for each document w in a corpus D: Choose N ∼ Poisson(ξ). »

Text Generation Explained

When it comes to text generation, there’re always so many questions about it: Where’s the word come from? Does the model learn languages as a child do? Characteristics of the text Text data has a time dimension. The order of words is highly important. Text or language is rule-based, which we call grammar. Text data is discrete, characters or words are not continuous, which makes it hard to represent the data compared with image pixels. »