Big Self-Supervised Models are Strong Semi-Supervised Learners

The proposed learning paradigm:

Self-supervised pretraining
Supervised finetuning
Distillation: train a student to learn the output of the teacher rather than the true label.

It seems to have a rather counterintuitive conclusion. Labeled data do not always help. Or too many labeled data used during training does not help.

Leave a Reply Cancel reply