- Kernalized transformer is not on par with the original transformer but can be much faster (x1000 times for some applications)
- Kernalized transformer can be modelled as RNN and can help further speed up inference time.
tldr. but the routing mechanism here seems to be quite similar with the capsule one. The authors emphasize that a slot should learn not just one type of object. It seems that the main trick is to first route image feature to slots. Slots will then be train to fit not just one type of objects like capsule network.
The proposed learning paradigm:
It seems to have a rather counterintuitive conclusion. Labeled data do not always help. Or too many labeled data used during training does not help.
A very nice presentation (as always) of CornerNet by Yannic Kilcher. The key ideas are push-pull losses of corner embeddings and corner pooling. The ideas are simple and intuitive but very well excecuted. The authors have include ablation study for the gain of corner pooling. Their result is competitive with other 1 stage approach, better than YoloV3 but worse than YoloV4. And they also tested that when ground truth corner detections were used, their AP almost doubles. This illustrated that corner detection is their main bottleneck.
A wonderful explanation of why is rotation of , where is most easily interpreted with the complex-number like structure of with normalized (i.e., ). Note that is then just (since just like the imaginary for complex number).
Basically quarternion is 4D stereographic projection back to 3D space. When multiply by a quarternion from the left, points parallel to will be translate along $\bf v$ and points penpendicular to will rotate around (following the right hand rule). Similarly, when multiplying from the right by the same quarternion, points parallel to will be translated in the same direction, but points perpendicular to will be rotated in the opposite direction (following left hand rule). So if we multiple and from both left and right hand sides, the translation effect will be exactly cancelled, and the the rotation will be doubled. That is why the angle in should be halved.
This is old but is more relevant than ever. Everyone should read this.
A great discussion on population-based search. Especially how it connects with goal switching and multi-task learning.