General Ensemble Method

For a complex question, single classifier or predictor may give way to a multiple of them, an aggregated answer is always better, think about your life experience. And today, popular ensemble methods including bagging, boosting and stacking. Random Forest is also prevailing. Voting Classifier We start with the simple intuition, building a voting system over different classifiers, they could be Logistic Regression, SVM Classifier and Decision Tree… Then we count each classifier’s prediction and to get a overall prediction, it can be the majority vote, and this mechanism is call hard voting, or we can add a weight to each classifier, and obviously this is soft voting. »

Frequent Pattern

Suppose we got the following shopping record: transaction item T1 baguette, croissant T2 baguette, croissant, jam T3 madeleine, croissant, baguette, jam The first term is support, a measure of absolute frequency. | item | count | support | | :—-: | :-: | :-: | | baguette | 3 | 0.333 | | croissant | 3 | 0. »

Cluster

聚类是一个把数据对象划分为多个组或多个簇的过程。同一个簇内的对象有很高的相似性,而不同簇之间的对象有很高的相异性。 方法 一般特点 划分方法 - 发现 »

Decision Trees

决策树可以用于分类与回归 分类 classification 与回归 regression 是机器学习中两种数值预测的形式,属于监督学习 Supervised Learning 的范围。大致区别可以用下面的表格来说明 属性 分类 回归 输出 »

Finding Similar Items - LSH

数据挖掘一个非常重要的任务就是找到相似项。不管是文本的相似度或者推荐系统,相似性都是一个重要的因素。 Jaccard Similarity 提供了一个集合之间的相似性度量方法。 »