Transformers are RNNs (paper) and (video):
Remarks:
- Kernalized transformer is not on par with the original transformer but can be much faster (x1000 times for some applications)
- Kernalized transformer can be modelled as RNN and can help further speed up inference time.