0x505 Data
Data-related Note
Pseudo labeling
Pseudo labeling (Lee 2013) assigned max-prob class to unlabeled set and train model with both supervised labels and pseudo-labels.
\[L = \frac{1}{n} \sum_m \sum_i L(y^m_i, f_i^m) + \alpha(t) \frac{1}{n'} \sum_m \sum_i L(y_i^{'m}, f_i^{'m})\]
where the 2nd term is related to the pseudo-label.
This approach is motivated by minimum entropy regularization which favors low density separation between classes. It achieves this by minimizing the conditional entropy for unlabeled data: \(H(y|x) = -\frac{1}{n} \sum_m \sum_i P(y_i^m = 1 | x^{m}) \log P(y_i^m=1|x^m)\)