The difference between the predicted and actualvalues is captured through the loss function and back-propagatedinto the network,
This is the difference between this approach and our delta lstms. Our Delta lstms treat the target as a word in the vocabulary (bcz of the OHE), while this treats it as numerical value and uses the difference to compute loss function