0x42 SML

Supervised Learning

Unsupervised Learning


Ensemble learning is to combine multiple classifiers to obtain better results.


Bagging (Bootstrap Aggregation) is to aggregate results of multiple classifiers trained with different subsets of the training data.

  • Variance Reduction Methods. (increase stability of classifier)
  • Can be trained in parallel
Reference: https://hackernoon.com/how-to-develop-a-robust-algorithm-c38e08f32201

Bagging with CART


  • randomly create subsets of training data with replacement
  • train CART for each subset
  • compute the average for test data

Bagging with Random Forest

  • Similar procedure as CART, only the splitting algorithm is different
  • When splitting a point, the feature candidates are only random subset (e.g: $\sqrt{dim}$) of the entire features. This is to create various trees
  • It looks like dropout to me.


Boosting is to improve model prediction ability iteratively by using errors by previous classifiers.

  • Bias Reduction Methods
  • trained sequentially
Reference: https://hackernoon.com/how-to-develop-a-robust-algorithm-c38e08f32201



  • each classifier has a coefficient $\alpha$ (I think it is something like a confidence score), each data point has a weight (something like punishment score)
  • In each iteration to train a new weak classifier, minimize error with respect to their weight
  • compute confidence score $\alpha$ of this classifier with this error
  • use the confidence score to re-weight each data (higher confidence score with wrong answer will give higher weights or punishment)



Imbalanced Models


  • over-sampling: copy the minority observation by replication. easy to overfit
  • under-sampling: remove the majority observation, easy to underfit
  • SMOTE: up-sampling by using random point sampled from line segments of nearest neighbor points in minority class

Reinforcement Learning


Markov decision process is defined by a tuple of $(S, A, P, R)$

  • S: a finite set of states
  • A: a finite set of actions
  • $P(s_{t+1}=s’ | s_t =s, a_t = a)$ : transition probability
  • $R(r | s_{t+1}=s’, s_t =s, a_t = a )$: reward probability

Dynamic Programming

Policy Iteration

Value Iteration

Model Free Prediction

Model Free Control