A limitation of the present work is we assume that preferences are fixed, thought they are likely to be dynamically updated throughout the task. Future work should aim to characterize preferences update mechanisms, by dynamically updating both the reward and transitional probabilty structure of the task, either in a gradually or chunk-wise (i.e. having separate experimental blocks with different task structures).
let's try and mkae the limitatons not too extensive. LLMs (when reviewing) actually pick up on these and present them as independent evidence that something's wrong with the paper.