he reason for thislow utilization of LSTM RNNs compared with CNNs liesin the difference between their high-level structure and thetypes of dominant layers (operators) used in their computa-tion. From a high-level perspective, the computation graphof LSTM RNN exhibits a recurrent structure that processesone input at a time, limiting the amount of model parallelism.
Important part - LSTM's cons