the LSTM are additive with respect totime, alleviating the gradient vanishing problem. Gradientexploding is still an issue, though in practice simple opti-mization strategies (such as gradient clipping) work well
How is this problem of vanishing or exploding gradient related to eigenvalues of the W operator? Is there any research on this?