Okay, now try typing on a smartphone keyboard that animates an enlarged bubble for every key the user taps.
The user thinks 'type, "hello"' and the muscle memory flits between 'h', 'e', 'l', 'l', 'o' with way less space between each letter than the user's reaction time. If they mistype a letter and want to correct it, they'll probably continue for one or two strokes and either navigate back or tap backspace several times.
I had "except in cases where the user can plausibly predict the change" originally in my comment before deleting it for being verbose. Yes, of course this UI guideline is not a hardware rule. (Also, in the case of typing it's actually not an issue. The visual of the button is revealed but yo could never click the bubbles so the functionality of the button didn't change. A better example is a moving target in a video game.)
The user thinks 'type, "hello"' and the muscle memory flits between 'h', 'e', 'l', 'l', 'o' with way less space between each letter than the user's reaction time. If they mistype a letter and want to correct it, they'll probably continue for one or two strokes and either navigate back or tap backspace several times.