- An encoder LSTM turns input sequences to 2 state vectors (we keep the last LSTM state and discard the outputs). - A decoder LSTM is trained to turn the target sequences into the same sequence but ...
(§3.1) independently, printing every shape so a student can verify that the dimensions line up with Fig. 1 of the paper. encoder = Encoder(num_layers, d_model, num_heads, d_ff, dropout=0.0) decoder = ...