GUIDE

AI context window explained

An AI context window is the amount of text a model can hold in active working memory at once, measured in tokens. Everything in the current conversation, your messages, the model's replies, and any files you pasted, has to fit inside it. Understanding the context window explains almost every "the AI forgot what I said" problem, and it points to the fix.

Start free on Chrome

What a token is, in plain terms

A token is a chunk of text, very roughly three quarters of a word in English. So a 200,000 token window holds around 150,000 words. The whole conversation counts against it, not just your latest message, which is why long sessions fill up even when each message is short. Current windows vary widely: GPT-3.5 around 16,000 tokens, GPT-4o around 128,000, Claude 200,000, and Gemini up to 1 million.

What happens when the window fills

Two different things, and they feel similar. The first is a hard limit: the conversation cannot accept more input and the tool tells you. The second, and far more common, is silent dropping or drift, where the oldest messages get pushed out or simply receive less of the model's attention. You are not warned. Responses just start ignoring things you established early on.

How a clean handover fixes it

You cannot expand the window, but you can decide what goes into it. A good handover summary keeps the important information small, strips out repeated back-and-forth, preserves the decisions and outputs that matter, and gives the next chat a focused starting point instead of a bloated transcript. That is the difference between a new chat that picks up smoothly and one that has lost the plot.

Related guides

Frequently asked questions

Does a bigger context window mean I will never lose context?

No. A bigger window delays the hard limit, but attention drift still affects very long conversations, even Gemini's 1 million token window. The practical fix is the same: summarise and start fresh when a thread gets unwieldy.

Do files and images use up the context window?

Yes. Pasted documents, long code, and other content all consume tokens, sometimes a large share of the window before the conversation even gets going. That is why uploading a big PDF can make a model lose track quickly.