the LSTM are additive with respect totime, alleviating the gradient vanishing problem. Gradientexploding is still an issue, though in practice simple optimization strategies (such as gradient clipping) work well
How is this problem of vanishing or exploding gradient related to eigenvalues of the W operator? Is there any research on this?
The mathematics of almost all eigenvalue problems encountered in wave physics is essentially the same, but the richest source of such problems is quantum mechanics, where the eigenvalues are the energies of stationary states ("levels"), rather than frequencies as in acoustics or optics, and the operator is the hamiltonian.
