Linformer

video and paper.

Remarks:

  • Project embedding to lower dimension to save computational complexity and space
  • Some gain in speed but doesn’t look too significant. Tradeoff in performance seems larger than claimed
  • Theorem 1 based on JL-lemma did not used properties of attention itself. It seems that the same argument can be used to anywhere (besides attention). The theorem itself seems to be a bit a stretch
  • With the same goal of speeding up transformer, the “kernelized transformer” appears to be a better work

SIREN

A comeback of signal processing? A video from authors and from Yannic, and the arxiv paper, and a nice website. The 3d reconstruction is very convincing. btw, another video presentation of implicit representation.

Some remarks:

  1. Represent signals as functions instead multidimensional data (not a completely new idea as the authors pointed out)
  2. Match not just the signal itself but also its derivative
  3. Use sinsoidal activation function. This allows derivative of the network is still well-defined (and still a siren).

 

CornerNet

A very nice presentation (as always) of CornerNet by Yannic Kilcher. The key ideas are push-pull losses of corner embeddings and corner pooling. The ideas are simple and intuitive but very well excecuted. The authors have include ablation study for the gain of corner pooling. Their result is competitive with other 1 stage approach, better than YoloV3 but worse than YoloV4. And they also tested that when ground truth corner detections were used, their AP almost doubles. This illustrated that corner detection is their main bottleneck.

Quaternion explanation

A wonderful explanation of why $latex q p q^{-1}$ is rotation of $latex p$, where $latex q$ is most easily interpreted with the complex-number like structure of $latex \cos(\theta/2) + \sin(\theta/2)(a i + b j + c k)$ with $latex a i + bj +ck$ normalized (i.e., $latex a^2+b^2+c^2=1$). Note that $latex q^{-1}$ is then just $latex latex \cos(\theta/2) – \sin(\theta/2)(a i + b j + c k)$ (since $latex (a i + b j + c k)^2=-a^2 – b^2 -c^2 = -1$ just like the imaginary $latex i$ for complex number).

Basically quarternion is 4D stereographic projection back to 3D space. When multiply by a quarternion $latex \cos(\alpha) + \sin(\alpha) {\bf v}$ from the left, points parallel to $latex \bf v$ will be translate along $\bf v$ and points penpendicular to $latex \bf v$ will rotate around $latex \bf v$ (following the right hand rule). Similarly, when multiplying from the right by the same quarternion, points parallel to $latex \bf v$ will be translated in the same direction, but points perpendicular to $latex \bf v$ will be rotated in the opposite direction (following left hand rule). So if we multiple $latex q$ and $latex q^{-1}$ from both left and right hand sides, the translation effect will be exactly cancelled, and the the rotation will be doubled. That is why the angle in $latex q$ should be halved.

Copyright OU-Tulsa Lab of Image and Information Processing 2024
Tech Nerd theme designed by Siteturner