Radioactivate data tracing through training

September 6, 2020 by samuel.cheng@ou.edu - No comments

Motivation

Want to verify if someone has used your dataset for training

Idea

Introduce extra (unreal) feature to data. For example, cat feature to cat image, dog feature to dog image
Verify if dataset was used with hypothesis testing

Comments

It seems that it is training and testing the same classifier. If a different classifier is used to the marked data, not sure if the method will actually work
The “radioactive” images actually look very bad
The idea does not seem to be new. And the execution is quite doubtful. It reminds me watermarking techniques popular in early 2000 but it never seems to take off since it doesn’t really work.

Ref

Video, paper