The context window is one of the most important practical limits of LLMs and agents. It explain why an AI assistant may seem smart at the beginning of a task, then gradually become less reliable as the conversation gets longer.
Key concepts
Context window refers to the max amount of "information" LLM model can process before becoming ineffective. (we will talk about what “information” here means in a sec). Here, “window” means “window of visibility”: the LLM cant work reliably if the amount of information exceeds this amount. Context window is measured in number of “tokens”. We first need to quickly understand they are.
You may have seen this word floating around. Token is a unit of language, just like a “word” or a “letter”. For practical purposes, you can think of tokens as “sub-words”. For example, the phrase “Readiness coach”, when given to an LLM, actually gets automatically broken down into 3 tokens: “Read”, “iness”, and “coach”, and those 3 tokens are what the LLM sees.
This is also what people mean when they say “ah man I used so many tokens today”; tokens are also the unit of usage for LLMs. Tokens are also how LLM providers charge you. In pricing pages such as this one, you would see e.g. “Prices per 1M input tokens = $5, Prices per 1M output tokens = $30”. Here, “input tokens” means how many tokens the LLM is receiving (e.g. messages you send). “output tokens” mean how many tokens are generated by the LLM. Typically output tokens are more expensive than input tokens. So every 1 million input tokens would cost $5.
Great. Back to context window.
You would see phrases like “GPT 5.5 is a model with 1 million tokens context window” (GPT-5.5 page). This means the model is designed to hold at maximum 1 million tokens in a single conversation, roughly 4 million characters, 750,000 words, or ~2000 pages of text.
So what happens if a conversation exceeds that limit? Practically, the model provider (e.g. OpenAI) will prevent you from continuing (e.g. “token limit exceeded” error).
Hang on, 1 million tokens is a LOT of text right? So why would one need to ever switch/open a new conversation before hitting the 1 million limit? The main reason is model intelligence. A key principle of LLM is that, in general, the more tokens consumed in a conversation, the “dumber” the model becomes.
This kind of makes sense. The more tokens in a conversation, the more likely it is for the LLM to be “distracted” by existing information.
While 1 million tokens is the upper limit, studies have shown that model intelligence starts dropping as soon as the conversation uses up ~50,000 tokens. The exact number of tokens where intelligence starts dropping depends on the model and the specific use case. 50-100k tokens is a ballpark number.
Yes, it can be annoying to re-explain the context to the LLM/agent in a new conversation. This is one of the key limitations of LLMs today. One practical approach is that you can ask the LLM to summarize your existing long conversation, then paste that into your new conversation.)
Note: a model with lower context window size is not necessarily a worse model. Its like a person’s ability to memorize stuff is not necessarily related to how intelligent this person is.