Is context drift worse in Claude than in ChatGPT?
Claude generally handles long contexts better than most models. But the attention drift phenomenon affects all transformer models. No current model maintains perfectly uniform attention across 200K tokens.