a clockwork watch may be ticking normally, such that it’s very hard to tell that it is likely to break down next month, but opening up the watch and looking inside can reveal mechanical weaknesses that allow you to figure it out.
for - AI - progress trap - interpretability testing - deception - Is it possible that AI could even change their node behavior as a deceptive move?




