Why does OpenClaw fill up context so fast even on simple tasks?

You sent your agent a simple, direct request. Short conversation, nothing complicated. But it is already running slow, warned you about context, or compacted and lost track of what you were doing. The problem is almost never the conversation itself. It is everything that loads before you type the first message. This article shows you exactly what is consuming your context budget before you type anything, how to measure the breakdown precisely, and the specific changes that recover the most space in the right order.

TL;DR

Context fills fast because your workspace files, tool definitions, and memory files load before every single turn, not just at session start. In a typical setup with several plugins and a well-developed memory file, more than half the context window can already be consumed before you say anything. The three highest-impact fixes: prune your memory file, disable plugins you are not actively using, and check whether your contextTokens ceiling is set lower than the model actually supports.

Every indented block in this article is a command you can paste directly into your OpenClaw chat. Your agent will run it and report back. You do not need to open a terminal, edit any files manually, or navigate any filesystem.

What a context window actually is

Your agent can only hold a certain amount of information in active memory at one time. That limit is the context window. Think of it as a fixed-size whiteboard. Everything the agent needs to work with has to fit on that whiteboard at the moment it generates a response. When the whiteboard fills up, the oldest content gets erased to make room. That erasure is compaction.

The important thing to understand is that the whiteboard is not empty when you start a conversation. Your agent’s instruction files, your workspace files, the descriptions of every tool it can use, and any memory recall results all load onto the whiteboard before your first message. By the time you type anything, you may already be using half your context budget.

At the start of this session, before we have done much work, what does my current context usage look like? How many tokens are already loaded, and what is taking up the most space? Break it down by category: system prompt, workspace files, tool definitions, conversation history.

What loads on every single turn

One of the most misunderstood things about context in OpenClaw is that workspace files do not load once at session start and then disappear. They load on every turn. Every time your agent generates a response, it sends the full context to the model again: system prompt, workspace files, tool definitions, conversation history so far. The workspace files are part of every single API call.

This means the overhead is not a one-time cost. It is a per-turn cost. A workspace file that consumes 5,000 tokens costs 5,000 tokens on turn 1, another 5,000 on turn 2, another 5,000 on turn 3, and so on. That overhead compounds directly into your API costs and into how quickly the context window fills.

List all the files in my workspace that load automatically at the start of every session. For each one, tell me approximately how many tokens it contributes to my context on every turn. Then add up the total constant overhead before any conversation history.

The four biggest contributors, ranked by typical impact

1. The memory file

If you have been using OpenClaw for more than a few weeks and have memory enabled, your memory file is almost certainly the single largest source of context overhead. Every fact, preference, note, and instruction that was ever stored is still there unless it has been explicitly removed. Memory files grow without bounds by default. A memory file from a well-used agent can be 3,000-10,000 tokens or more, all of which loads on every turn.

The compounding problem is that most of what is in a memory file from two months ago is no longer relevant. Your priorities change. Your setup changes. Instructions that mattered in March are stale by May. But they are still consuming context every single turn because no one pruned them.

Read my memory file. Which entries are older than 30 days? Of those, which ones are still relevant to how I currently use you, and which are outdated? Do not delete anything yet. Show me the candidates for removal with a brief reason why each one is probably stale.

Review before deleting

Memory files often contain things that feel outdated but still carry useful context. An old infrastructure note might seem irrelevant but could contain a credential or config value your agent still needs. Review each candidate before removing it. A five-minute review is faster than debugging a problem caused by removing something that was still load-bearing.

2. Tool definitions from unused plugins

Every plugin you have installed and enabled adds tool definitions to your context on every turn. A plugin with five tools might add 1,000-3,000 tokens per call. If you have several plugins enabled and only actively use two or three of them, you are paying a constant context overhead tax for the unused ones.

This is one of the easiest wins available. Disabling a plugin removes its tool definitions from every future call immediately. No other changes required.

List every plugin I currently have enabled. For each one, tell me: when was the last time I actually used a tool from this plugin? Which plugins have I not touched in two weeks or more? I want to identify the ones that are pure overhead.

3. Bloated workspace files

SOUL.md, AGENTS.md, INFRASTRUCTURE.md, USER.md: the configuration files that define your agent’s behavior and context. These are valuable. They are also often significantly longer than they need to be. Redundant instructions, outdated sections, detailed documentation of things that no longer apply, or notes that were written for human reference but do not help the agent do its job.

A workspace file review is worth doing every few months. The question for each section is not “is this correct?” but “does the agent need this to do its job?” Content that is correct but unnecessary is still overhead.

Read my workspace configuration files (SOUL.md, AGENTS.md, and any other files that load every session). Which sections are redundant, outdated, or unlikely to affect your actual behavior? I am not asking you to delete anything. I want to understand where the fat is before I decide what to trim.

4. Large tool outputs earlier in the session

When your agent uses a tool and gets a result, that result goes into the conversation history and stays there for the rest of the session. A web search that returns 5,000 tokens of content. A file read of a long document. A memory recall that matches 50 entries. Each one adds to the running total, and none of it comes out until compaction fires.

This category is harder to control proactively because it depends on what you ask for. The lever here is awareness: if you know you are about to ask for something that will return a lot of content, consider whether you need all of it or whether you can ask for a summary instead.

Over the last few turns in this session, which tool results added the most tokens to our conversation history? Are any of them results I no longer need access to for what we are currently working on?

The context window ceiling setting

OpenClaw has a contextTokens setting that controls how large the context window is allowed to grow before compaction fires. If this is set too low, your agent compacts aggressively even when the model you are using supports a much larger window.

The default contextTokens value in older OpenClaw versions was 16,000 tokens. Claude Sonnet supports 200,000 tokens. If your contextTokens is set to 16k on a model that supports 200k, you are artificially triggering compaction at 8% of the model’s actual capacity.

What is my current contextTokens setting? What model am I using, and what is that model’s maximum context window? Am I setting contextTokens lower than the model’s actual capacity? If so, what should it be set to?

The /new rule after config changes

OpenClaw caches the contextTokens setting in the session entry when a session first starts. Changing contextTokens in your config and then checking the result in the same session will show the old value, not the new one. After any change to contextTokens or compaction settings, start a new session with /new before verifying that the change took effect. This is a common source of confusion: the config is correct but the running session still behaves as if it is not.

Measuring your constant overhead precisely

Before pruning anything, get a precise measurement of your current overhead. This gives you a baseline so you can verify that your changes actually had impact.

Start a fresh session and immediately (before we discuss anything else) tell me: how many tokens are currently loaded in context? Break it down by: system prompt, each workspace file by name, tool definitions total, and conversation history. I want the numbers so I can use them as a baseline.

Write down or save those numbers. After you make changes (prune the memory file, disable unused plugins, trim workspace files), start another fresh session and run the same check. The difference is the overhead you recovered.

A realistic target for a well-optimized setup is: constant overhead under 20% of your model’s context window. For a model with a 100,000-token contextTokens ceiling, that means under 20,000 tokens before any conversation. If your constant overhead is above 40% before any conversation starts, you have significant room to reduce it through the pruning steps above, and the improvement in session length and cost will be immediately noticeable.

Which changes to make first

Not all of these fixes are equal. Here is the recommended order based on impact and reversibility:

  1. Check contextTokens first (5 minutes). If it is set below the model’s capacity, increasing it is the fastest possible win and requires no content review. A setting change plus a /new is all it takes.
  2. Disable unused plugins (10 minutes). Pull up the plugin list, identify the ones you have not used, disable them. Fully reversible. No content at risk.
  3. Prune the memory file (20-40 minutes depending on size). Review and remove stale entries. Higher risk of accidentally removing something useful, so take more care here. Do it in batches.
  4. Trim workspace configuration files (30-60 minutes). Requires careful judgment about what the agent actually needs. Lower risk than memory pruning because workspace files are easier to restore from git history, but more time-consuming to review.

Based on everything we have looked at, what is the single change that would recover the most context overhead for the least risk right now? Walk me through making that change first.

After the highest-impact change is made, start a fresh session and run the baseline measurement again. Compare it to the original number. Then decide whether to continue with the next change or whether the first change was enough to solve the immediate problem. Not every setup needs to go through all four steps. Stop when your constant overhead is under 20% of your model’s context window and compaction is no longer firing prematurely.

The most common outcome: operators who run through steps 1 and 2 (contextTokens adjustment plus plugin cleanup) find that they recover more than enough space without touching their memory file at all. The memory file review is worth doing anyway as a maintenance habit, but it is often not the bottleneck people assume it is. Check the numbers after each step before assuming you need to do more.

Preventing context bloat going forward

The pruning session fixes the immediate problem. Preventing it from coming back requires building a few habits into how you use OpenClaw.

Monthly memory file review

Once a month, review your memory file for entries that are more than 30 days old and no longer relevant. This takes 15-20 minutes and prevents the file from growing back to its bloated state. Ask your agent to flag candidates rather than doing the full review manually.

Set up a monthly reminder on the first of each month: review my memory file for entries older than 30 days that are no longer relevant. Flag them for my review and send me a Telegram message with the list. Do not remove anything without my approval.

Install plugins for use, not for availability

A common pattern is installing a plugin because it sounds useful, trying it once, and then leaving it enabled because disabling it feels like a step backward. Resist this. If you have not used a plugin in two weeks, disable it. You can re-enable it in 30 seconds if you need it again. The right mental model for plugins is not “installed means available.” It is “enabled means you are paying for it on every turn.” A plugin that sits unused but enabled is a recurring overhead tax with no benefit.

The same principle applies to the core tools your agent has access to. If you never use the exec tool, the file write tool, or the image generation tool, consider whether they need to be enabled at all. Each tool adds its definition to every API call. Enabling only what you actually use keeps the constant overhead lean.

Keep workspace files functional, not comprehensive

Workspace configuration files are not documentation. They are instructions that run on every turn. Every sentence in a workspace file costs tokens on every single call for the lifetime of your OpenClaw instance. Write them to be functional: tell the agent exactly what it needs to behave correctly, and nothing else. If you find yourself writing context that explains the history of a decision rather than what to do, that content belongs in a notes file that does not auto-load.

A useful discipline: after writing any new section in a workspace file, read it from the agent’s perspective and ask “does this change how I behave in a specific situation?” If the answer is no, the section is not earning its token cost. Cut it or move it out of the auto-loading files. The goal is a set of workspace files that are dense with operational instructions and free of background context, history, and explanations that only matter to a human reader.

Context window sizes by model

The actual context budget available to you depends entirely on which model you are using. Using the wrong contextTokens value for your model is one of the most common reasons context fills faster than expected.

As of March 2026, the practical context windows for the most common OpenClaw models:

  • Claude Sonnet 4 / Opus 4: 200,000 tokens. If your contextTokens is set below 100,000, you are using less than half the available window.
  • Claude Haiku: 200,000 tokens. Same ceiling, lower per-token cost.
  • DeepSeek Chat / Reasoner: 64,000 tokens.
  • GPT-4o: 128,000 tokens.
  • Ollama local models (llama3.1:8b, phi4): 8,000-32,000 tokens depending on the model and the hardware running it. Local models have the smallest effective context windows by far.

What model am I currently using? What is its maximum context window in tokens? What is my current contextTokens setting? Am I leaving context capacity unused because the setting is below the model’s actual limit?

Local models and context

If you use local Ollama models for any tasks, be aware that their effective context window is much smaller than cloud models. A heartbeat task or simple cron job that runs on llama3.1:8b has roughly 8,000 tokens to work with. If your workspace files and tool definitions already consume 6,000 tokens, those tasks are running in a very tight space. For tasks you route to local models, consider whether all your standard workspace files are necessary or whether a stripped-down context would be better for short, focused tasks.

Strategies for compressing high-overhead sections

Once you know which sections are consuming the most context, you can apply targeted compression. The goal is not to remove useful information but to express the same information in fewer tokens.

Compress instruction prose into structured lists

Prose instructions are token-expensive. A paragraph explaining a behavior in narrative form uses more tokens than a bulleted list of rules expressing the same behavior. Compare:

Prose version (high token cost): “When the user asks about a topic you are unsure about, rather than guessing or fabricating information, you should acknowledge your uncertainty directly and offer to search for the current information. Make sure to present the search results clearly and cite the sources you found so the user can verify them.”

List version (lower token cost):

  • If uncertain: say so, then search
  • Present search results with source citations

The list version communicates the same rule in roughly 60% fewer tokens. Over a 50-turn session, that difference adds up.

Read my workspace configuration files. Find the sections with the most verbose prose instructions. For each one, suggest a compressed version that uses a structured list or shorter phrasing to express the same behavior. Show me the original and the proposed replacement side by side.

Remove explanatory context that the agent does not use operationally

Workspace files often contain explanations of why a rule exists. “We chose this approach because…” or “The reason we structure it this way is…” This context is useful for a human reading the file but the agent does not need to understand the reasoning to follow the rule. Remove the explanations and keep the rules.

Archive historical decisions to a separate non-loading file

Decisions made months ago that shaped how your setup works are valuable context for a human reviewing the workspace. They are not useful operational context for the agent on every single turn. Move historical decision notes to a file like workspace/history/decisions.md that does not auto-load. It is available if you ask the agent to read it, but it does not consume context on every call.

In my workspace files, identify any sections that record historical decisions, background context, or explanations that a human would find useful but that you do not actually need to follow your operating instructions. Suggest which sections could be moved to an archive file that does not auto-load.

Tracking context overhead over time

A single measurement gives you a snapshot. Tracking over time shows you whether the pruning is holding or whether bloat is creeping back.

Set up a monthly check that runs on the first of each month: at the start of a fresh session, measure my constant context overhead (tokens before any conversation) and log it to workspace/metrics/context-overhead-log.md with the date and breakdown by category. If the total is more than 20% higher than the previous month, send me a Telegram alert.

The log gives you a baseline for each month. If overhead creeps up more than 20% month over month, something has grown that needs pruning. The Telegram alert means you find out before it becomes a problem rather than after it starts affecting session quality or API costs. Over a year of monthly checks, you will have a clear record of how your setup has changed and exactly which additions drove the overhead growth. That record is useful both for maintenance and for understanding how your agent usage has evolved over time.

Managing tool output that accumulates mid-session

Static overhead (workspace files, tool definitions) is predictable and measurable. Dynamic overhead from tool outputs is harder to control because it depends on what you ask your agent to do. But there are patterns that help.

Ask for summaries instead of full outputs

When you know a tool is going to return a large result, ask for a condensed version upfront rather than the full output. Instead of “search for recent news on AI regulation,” ask “search for recent news on AI regulation and give me a five-bullet summary of the key developments.” The agent will still run the search, but the result stored in conversation history is the summary, not the full search output. The difference can be thousands of tokens per tool call.

For the next time I ask you to search the web, read a long file, or run a memory recall: by default, give me a condensed output (maximum 500 words or 10 bullet points) unless I explicitly ask for the full result. Confirm you understand this and will apply it going forward.

Request file reads only when necessary

File reads are one of the highest-overhead tool calls. A 10,000-word document read in full adds 10,000+ tokens to conversation history. If you need to reference a long document repeatedly throughout a session, consider asking your agent to read a specific section rather than the whole file, or to extract only the information relevant to the current task.

Recognize when mid-session context is spiking

If you notice your agent getting slower mid-session or warning about context, check what happened in the last few turns. Large tool outputs from searches, file reads, or memory recalls are the most likely cause. The spike is often traceable to a single tool call that returned unexpectedly large results.

Context just spiked. What tool calls have we made in the last three turns, and which one added the most tokens to our conversation history? Was there any output I no longer need access to that I could ask you to summarize and discard the original of?

You cannot remove content from context mid-session

Once a tool result is in the conversation history, it stays there until compaction fires. You cannot selectively remove a specific tool result from the middle of the conversation. What you can do is ask your agent to summarize a long earlier result and confirm you no longer need the original detail, which reduces what compaction needs to preserve. But the original is still there until compaction runs. Prevention (asking for condensed outputs upfront) is more effective than cleanup after the fact.

Common questions

How do I know if my context overhead is unusually high?

Compare your constant overhead (the token count before any conversation) to your model’s context window. Under 15%: well-optimized. 15-30%: normal for a well-used setup. 30-50%: room to improve. Over 50% before the conversation starts: almost certainly has a bloated memory file or too many plugins enabled. Run the baseline measurement command from above and compare.

Will reducing workspace files make my agent worse?

Only if you remove instructions the agent genuinely uses. Before removing any section from a workspace file, ask: “If this section were missing, would my agent do something differently in a way that would bother me?” If the answer is no, it is safe to remove. If you are unsure, archive the section in a separate file that does not auto-load rather than deleting it outright. You can always add it back.

What happens to context after compaction fires?

Compaction summarizes or removes older portions of the conversation history to bring context usage back below the threshold. Your workspace files and tool definitions are not compacted: they reload every turn regardless. What gets compressed or removed is the conversation history itself, typically the oldest messages first. After compaction, your agent may have less detail about what was discussed early in the session.

My context fills up within 5 turns even after I pruned the memory file. What else could it be?

Check contextTokens first. If it is still at 16,000 or another low value, the ceiling is your problem rather than the content size. After that, look at tool output in those five turns: if you asked for a web search or file read early in the session, the result may be consuming a large portion of the available space. Try the same five turns with only text questions (no tool calls) and see if the rate of context consumption changes. That tells you whether the culprit is static overhead (workspace files) or dynamic tool outputs.

Is there a way to see a real-time graph of context usage as a session progresses?

OpenClaw does not currently provide a built-in visualization for context growth over time. You can get a per-turn snapshot by asking your agent “What is my current context usage?” at any point in a session. For a more systematic view, ask at the start of the session and again every five turns, writing the numbers down. The pattern of growth tells you whether it is the constant overhead (high from turn 1, stays steady) or tool outputs (lower at turn 1, spikes when tools are used).

How do I prevent memory autoCapture from filling my memory file with low-value entries?

If memory autoCapture is enabled, your agent automatically stores facts and observations throughout the session without your explicit instruction. This is convenient but can fill the memory file with entries that are not worth the context overhead long-term. Options: disable autoCapture and only store things you explicitly ask to be remembered; set a shorter expiry on automatically captured entries so they age out faster; or run the monthly memory review more frequently (weekly) to catch low-value entries before they accumulate.

Does the order of workspace files affect context usage?

The order affects which files have the best chance of benefiting from prompt caching, but not the total token count. All auto-loaded workspace files consume their full token count regardless of order. Where order matters is for the cache hit rate: stable files loaded early have a better chance of being cached, which reduces the effective cost per turn even if the nominal token count is the same. See the prompt caching article for more on this.

I trimmed my workspace files and the token count barely changed. Why?

A few possibilities: the trimming removed sections that were short in token terms even if they were long in line count. Or the majority of your overhead is coming from elsewhere (memory file, tool definitions) rather than the workspace files you trimmed. Re-run the baseline measurement to see the current breakdown by category. If workspace files are not the top contributor, focus your next pruning effort on whichever category is.

Does the number of tools matter more than the size of tool descriptions?

Both matter, but description size per tool has more variance. A plugin with three short tool descriptions might cost 500 tokens. A plugin with three verbose tool descriptions might cost 3,000 tokens. The quantity of tools is the lever you control by disabling plugins; the description size per tool is set by the plugin author and harder to change. When auditing plugin overhead, look at which plugins have the most tokens in tool definitions, not just which plugins have the most tools. Your agent can estimate the token cost of each plugin’s tool definitions by reading the OpenClaw config.

Will trimming my SOUL.md or AGENTS.md make my agent less effective?

Only if you remove instructions the agent needs to behave correctly. The test: for each section you are considering removing, ask “what would my agent do differently on a typical task if this section were not here?” If the answer is “nothing I would notice,” it is safe to remove. If the answer is “it would stop doing something I care about,” keep it. Most workspace files built over time have at least 20-30% content that has been superseded by later instructions or that was written for edge cases that never actually occurred.

I enabled prompt caching. Why is my context still filling up fast?

Prompt caching reduces the effective cost of the stable portions of your context but does not change how fast context fills. The context window ceiling is still the same. The conversation history still grows at the same rate. What caching does is reduce the API cost of processing the stable sections, not the count of tokens those sections occupy. If context is filling up faster than expected, that is a sizing or overhead problem. If costs are high despite low context usage, that is a caching problem. The two are related but distinct.

What is the difference between context window size and context window usage?

The size is the ceiling: how many tokens the model can handle at once (set by the model) and how many tokens OpenClaw will allow before triggering compaction (set by your contextTokens config). Usage is how many of those tokens are currently occupied. You want the size to be close to the model’s actual maximum, and you want the usage to be as low as possible given what you actually need loaded. A large size with low usage gives you long sessions before compaction. A small size with high usage gives you frequent compaction even in short sessions.

How do I know if context overhead is causing my API costs to be higher than expected?

Multiply your constant overhead (tokens before conversation) by your per-token input cost and by the number of turns per session. If you have 10,000 tokens of constant overhead, are running 20 turns per session, and paying $3 per million input tokens, the overhead alone costs: 10,000 tokens x 20 turns x $0.000003 = $0.60 per session. Run this calculation with your actual numbers and compare it to what you expected to pay. If the overhead cost exceeds the conversation cost, reducing workspace file size has more leverage than reducing how much you talk to your agent.

Can I load workspace files conditionally based on what task I am doing?

Not natively as of March 2026. All auto-loaded workspace files load every turn regardless of task type. The workaround is to structure your workspace files as lean defaults and ask the agent to read additional context files on-demand when needed for specific tasks. For example, instead of loading a detailed infrastructure reference file on every turn, keep it in a non-auto-loading file and have your agent read it when a task specifically requires it. This reduces constant overhead while keeping the reference available.

My memory file is 8,000 tokens. Is that too large?

It depends on your model and contextTokens setting. On Claude Sonnet with a 100,000-token contextTokens ceiling, 8,000 tokens is 8% of your budget, which is acceptable. On a local model with an 8,000-token context window, an 8,000-token memory file would consume the entire window before a single message. The absolute size of the memory file matters less than its proportion of the model’s context budget. Calculate the percentage and use that as your guide: under 10% is fine, 10-20% is worth reviewing, over 20% is actively problematic and worth pruning.


Brand New Claw

Context sizing, compaction config, and workspace file structure

Every setting that controls how much context your agent keeps, how compaction fires, and how to structure workspace files so the constant overhead stays small. Drop it into your agent and it handles the configuration.

Get Brand New Claw for $37 →

Keep Reading:

Brand New ClawWhy does OpenClaw keep compacting even on short sessions?If compaction is firing before the session gets long, the context window setting is probably too small. How to check and fix it.Brand New ClawWhy is OpenClaw so slow? It is probably your context windowResponse time is directly tied to how much context the model processes on every turn. How to reduce it.Brand New ClawYour agent forgets instructions mid-session. Compaction is whyWhat gets dropped first when context fills, and how to protect the instructions that matter most.