Big Self-Supervised Models are Strong Semi-Supervised Learners

A nice video explanation of the paper.

The proposed learning paradigm:

  1. Self-supervised pretraining
  2. Supervised finetuning
  3. Distillation: train a student to learn the output of the teacher rather than the true label.

It seems to have a rather counterintuitive conclusion. Labeled data do not always help. Or too many labeled data used during training does not help.

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright OU-Tulsa Lab of Image and Information Processing 2021
Tech Nerd theme designed by Siteturner