tldr. but the routing mechanism here seems to be quite similar with the capsule one. The authors emphasize that a slot should learn not just one type of object. It seems that the main trick is to first route image feature to slots. Slots will then be train to fit not just one type of objects like capsule network.
Author: samuel.cheng@ou.edu
A comeback of signal processing? A video from authors and from Yannic, and the arxiv paper, and a nice website. The 3d reconstruction is very convincing. btw, another video presentation of implicit representation.
Some remarks:
- Represent signals as functions instead multidimensional data (not a completely new idea as the authors pointed out)
- Match not just the signal itself but also its derivative
- Use sinsoidal activation function. This allows derivative of the network is still well-defined (and still a siren).
A nice video explanation of the paper.
The proposed learning paradigm:
- Self-supervised pretraining
- Supervised finetuning
- Distillation: train a student to learn the output of the teacher rather than the true label.
It seems to have a rather counterintuitive conclusion. Labeled data do not always help. Or too many labeled data used during training does not help.
A very nice presentation (as always) of CornerNet by Yannic Kilcher. The key ideas are push-pull losses of corner embeddings and corner pooling. The ideas are simple and intuitive but very well excecuted. The authors have include ablation study for the gain of corner pooling. Their result is competitive with other 1 stage approach, better than YoloV3 but worse than YoloV4. And they also tested that when ground truth corner detections were used, their AP almost doubles. This illustrated that corner detection is their main bottleneck.
A wonderful explanation of why $latex q p q^{-1}$ is rotation of $latex p$, where $latex q$ is most easily interpreted with the complex-number like structure of $latex \cos(\theta/2) + \sin(\theta/2)(a i + b j + c k)$ with $latex a i + bj +ck$ normalized (i.e., $latex a^2+b^2+c^2=1$). Note that $latex q^{-1}$ is then just $latex latex \cos(\theta/2) – \sin(\theta/2)(a i + b j + c k)$ (since $latex (a i + b j + c k)^2=-a^2 – b^2 -c^2 = -1$ just like the imaginary $latex i$ for complex number).
Basically quarternion is 4D stereographic projection back to 3D space. When multiply by a quarternion $latex \cos(\alpha) + \sin(\alpha) {\bf v}$ from the left, points parallel to $latex \bf v$ will be translate along $\bf v$ and points penpendicular to $latex \bf v$ will rotate around $latex \bf v$ (following the right hand rule). Similarly, when multiplying from the right by the same quarternion, points parallel to $latex \bf v$ will be translated in the same direction, but points perpendicular to $latex \bf v$ will be rotated in the opposite direction (following left hand rule). So if we multiple $latex q$ and $latex q^{-1}$ from both left and right hand sides, the translation effect will be exactly cancelled, and the the rotation will be doubled. That is why the angle in $latex q$ should be halved.
The conversation of Jeff Hawkins and Lex Fridman was very interesting. More info here.
Project | Authors | |
DDDQN on Gym | Robert Estes | code, report |
Open AI Talk | Robert Estes | code |
This is old but is more relevant than ever. Everyone should read this.
A great discussion on population-based search. Especially how it connects with goal switching and multi-task learning.