Chinese Topic Modeling Hands On Practice

Task Specification Topic models are type of statistical models for discovering the abstract “topics” that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. About LDA Latent Dirichlet Allocation (LDA) is a generative probabilistic model for collections of discrete data such as text corpora and explain why some parts of the data are similar. It’s widely being used in tasks like topic modeling, text classification and collaborative filtering. »

Chinese Text Clustering Hands On Practice

Task Specification In contract to classification, clustering is always considered as an unsupervised approach and usually applied to unlabeled data. Afterall most of the data in the world are unlabeled. Text clustering is a way to explore and group the text data for further analysis and can be applied to many tasks like document classification, organizaion, browsing etc.. General Steps Overview Text cleaning Text representation(feature engineering) Clustering algorithms Aanlysis of »

Chinese Text Classification Hands On Practice

Task Specification Text Classification is one of the foundamental tasks in Natural Language Processing. It aims to assign the text documents to predefined categories based on its content. It has a lot of potential usage in real world, from spam email detection to intent recognition in chatting bot. And in this practice I’ll use various ways to tackle this task. This is not a comprehensive study but a general illustration »

Chinese Named Entity Recognition Hands On Practice

Task Specification In this post I’m going to implement a bidirectional LSTM model with CRF layer on top to tackle the NER task. Unlike in English where every word is naturally separated by a space, entities in Chinese text are not so straight forward: the boundaries between characters and words are not very clear, and word segmentation itself is a critical task in Chinese NLP. To avoid the extra segmentation, »