Transformer-Pytorch When building Deep learning models the transformer architechture is often used, in particular Multi headed attention (MHA). This repo is made to save me retyping out everything when ever I want to use MHA in a project. Implemented in pytorch