Kernelizing transformer

Transformers are RNNs (paper ) and (video):

Remarks:

Kernalized transformer is not on par with the original transformer but can be much faster (x1000 times for some applications)
Kernalized transformer can be modelled as RNN and can help further speed up inference time.

Leave a Reply Cancel reply