If you want to have a problem that is fairly easy for humans but hard for LLMs it should have solution that requires iteratively applying same steps few times. Perhaps conditionally. I predict that LLMs even in chain-of-thought should drop the ball after just few iterations.