OpenClaw Compaction: How to Stop Your Agent from Losing Context
OpenClaw Compaction: How to Stop Your Agent from Losing Context
Every OpenClaw user has felt it. You are six hours into a complex session. The agent has been helping you build a multi-step workflow, debug a tricky integration, or compose a detailed report. Then suddenly, it asks a question it already answered two hours ago. It forgets a key instruction you gave it in the opening message. It starts generating output that contradicts decisions the two of you made earlier in the conversation.
Your agent is not getting dumber. It is running into a fundamental constraint of large language models: the context window. Every model has a maximum amount of text it can hold in memory at once. When a conversation exceeds that limit, the system has to make room. And that process – called compaction – can silently strip away the details your agent needs to stay coherent.
This article explains exactly how OpenClaw compaction works, how to recognize when it is distorting your session, and five practical strategies to prevent context loss. If you have ever had a long-running OpenClaw agent “forget” something important, these techniques will solve it.
What Context Window Compaction Is
Every large language model has a fixed context window. This is the maximum number of tokens – roughly, chunks of text – that the model can process at once. The context window includes the system prompt, any workspace files loaded at session start, and the entire conversation history up to the current moment.
When a conversation reaches the model’s context window limit, the system cannot simply append new messages. It must make room. The solution most LLM-powered agents use is called compaction: the system summarizes older parts of the conversation and replaces the original messages with that summary. The conversation continues with a shorter, compressed version of the earlier discussion prepended to the active context.
Compaction is a practical necessity. Without it, every conversation would be limited to a few dozen exchanges before hitting the wall. But the tradeoff is real: summarization is lossy. Important details, specific instructions, numbers, dates, file paths, and nuanced decisions can be distorted or omitted when the system distills thousands of tokens into a few hundred.
How OpenClaw Handles Compaction
OpenClaw’s compaction system is automatic and designed to be invisible. When the current conversation approaches the model’s context window limit, OpenClaw generates a summary of the earlier conversation messages. It then replaces the original messages with this summary and prepends it as context for the continuing conversation.
The summary generation is done by the LLM itself. OpenClaw passes the older messages to the model with a prompt asking it to produce a concise but faithful summary. The resulting summary is inserted at the top of the conversation, and the original messages are discarded.
What gets lost depends heavily on the content. Straightforward request-response exchanges compress well. Complex multi-step instructions with interdependent conditions compress poorly. If you told the agent at the start of a session “always format code blocks with line numbers” or “never use em dashes in output,” those instructions are vulnerable. They sit in the oldest messages, which are the first to be summarized. In the compression, a specific formatting instruction can become “follow the user’s formatting preferences” – a pale shadow of the original directive.
Similarly, decisions reached iteratively – “we decided to use SQLite instead of PostgreSQL because the user has no server infrastructure” – can become “we discussed database options” in the summary. The reasoning and the specific choice are gone.
Warning Signs: How to Know Compaction Is Distorting Your Session
Compaction does not announce itself. The agent does not say “I have summarized the earlier part of our conversation.” You have to recognize the symptoms. Here are the most common warning signs:
- The agent disagrees with earlier instructions. You told it at the start to always use British English spelling. Now it is writing “color” and “center.” The original instruction was lost in compaction.
- Loses track of a multi-step task. You asked it to complete a five-phase analysis. It finishes phase two and then asks “what should I do next?” – even though phases three through five were clearly specified earlier.
- Repeats questions it already asked. The agent asked for clarification on a requirement two hours ago. Now it asks the same question again, having no memory of the earlier exchange.
- Generates output inconsistent with earlier context. It proposed using API A earlier, you agreed, and it built code around that decision. Now it proposes API B with completely different architecture, as if the earlier decision never happened.
- Stops referencing session-start context. If the agent no longer mentions instructions, constraints, or goals that were set in the initial messages, those likely got compacted away.
The earlier you catch these signs, the less work you will lose. If you notice any of them, pause the current session. Do not continue piling on more messages hoping the agent will “snap out of it.” It will not. The lost context is gone.
The Cache Boundary: What’s Protected from Compaction
OpenClaw provides a specific mechanism to protect content from compaction. In workspace files – SOUL.md, AGENTS.md, MEMORY.md, and any other project files – a special HTML comment serves as a cache boundary:
<!-- OPENCLAW_CACHE_BOUNDARY -->
Content above this line in any workspace file is part of the static cached context. It is loaded once when the session starts and is not subject to compaction during the session. The model treats it as permanent read-only context that persists regardless of how long the conversation grows.
Content below this line is loaded fresh each session and may be subject to session-length compaction. It can still be very useful – MEMORY.md content below the boundary is reloaded every session start, so it survives between sessions even if it gets compacted out of a long-running current session.
The practical takeaway: put your most critical, unchanging instructions above the cache boundary. Your identity, core rules, and behavioral invariants belong there. Put dynamic content – session notes, logs, recent data – below it, where it can be refreshed each session.
Strategy 1: Keep Workspace Files Concise
SOUL.md and AGENTS.md are the first things loaded into every session’s context window. They sit above the cache boundary and are protected from compaction, but they still consume tokens from the context window. If these files are bloated – tens of thousands of tokens of identity prose, elaborate backstory, or exhaustive rule lists – they eat into the space available for conversation history before compaction kicks in.
Keep both files under 2,000 tokens each. Target 1,500 tokens if possible. Every token you save in your workspace files is a token available for conversation history before the next compaction event. Concise files also survive compaction better: if a summary of your SOUL.md is ever generated (for instance, in a subagent spawn), a compact source document compresses more faithfully than a sprawling one.
Review your workspace files periodically. If you have added paragraphs of detail that could live in a reference document instead of the core identity file, move them. Put reference material in separate files that the agent can read on demand using the read tool – not in the always-loaded workspace files.
Strategy 2: Use MEMORY.md for Persistent Facts
MEMORY.md is designed to survive between sessions. It is loaded fresh at every session start, which means important facts written to MEMORY.md are available even if they were compacted out of the previous session’s conversation.
Use MEMORY.md as your agent’s long-term memory. After completing a significant task, append a structured note to MEMORY.md summarizing what was done, what decisions were made, and what state the work is in. Use a consistent format so the agent can scan it quickly:
### 2026-04-27 - Database Migration Complete
- Migrated UserService from MySQL to PostgreSQL
- Schema v7 deployed to production
- Rollback plan saved to /home/node/rollback-v7.sh
- Next phase: migrate AnalyticsService (estimated 4 hours)
Write critical decisions, file paths, API keys (securely), and status updates to MEMORY.md. When a new session starts – or when compaction has stripped context from a current session – the agent can re-read MEMORY.md and re-establish its bearings. This is the single most effective defense against long-term context loss.
Strategy 3: Explicit Checkpoints During Long Sessions
In a session that runs for hours or involves dozens of exchanges, you cannot rely on the agent remembering the full arc of the conversation. Build explicit checkpoints into your workflow.
At regular intervals – every 30-60 minutes, or after completing a significant subtask – tell the agent to write a summary of the current state to a file:
"Write a checkpoint to /home/node/session-checkpoint.json with:
1. The current task and what has been completed
2. Any decisions made so far
3. Open questions or blockers
4. Next steps
5. Key files or data references"
This gives you a recoverable snapshot regardless of what compaction does to the conversation history. If you detect compaction distortion, you can start a new session, have the agent read the latest checkpoint, and continue from there with a full context window. The checkpoint file serves as an external memory that compaction cannot touch.
For particularly critical work – financial calculations, production deployments, legal analysis – do not rely on the agent’s in-session memory at all. Have every major decision written to a checkpoint file immediately after it is made.
Strategy 4: Break Long Tasks Into Subagent Sessions
This is the most powerful strategy in the OpenClaw ecosystem. Instead of running one massive session that accumulates enough history to trigger multiple compaction events, break complex multi-step tasks into separate subagent sessions. Each subagent spawns with a fresh, full context window. It completes its work without any context pressure and reports back to the parent session.
OpenClaw’s subagent architecture is designed for exactly this use case. The parent agent spawns a subagent with a precise task brief, waits for the subagent to complete, and receives the results. The subagent’s context window is never more than a few dozen exchanges, so compaction never distorts its work. The parent’s session accumulates only the subagent’s returned outputs – summaries of what was done – rather than the entire detailed conversation.
This is not theoretical. High-volume OpenClaw deployments use subagent-based decomposition as their standard operating pattern. Each complex task – content generation, code refactoring, data analysis, multi-step research – is dispatched to a subagent with a clear, bounded brief. The subagent works in isolation, compacts little or not at all, and returns clean results.
The pattern works because compaction is a function of session length. Short sessions do not reach the context limit. Subagents keep sessions short by design.
Strategy 5: Re-Read Files On Demand Instead of Carrying Content
A common mistake that accelerates compaction is loading large file contents into the conversation. When you paste the contents of a 5,000-line configuration file, a 20,000-word document, or a database dump into the chat, those tokens fill the context window rapidly. A few such operations and the window is full, triggering compaction on everything that came before – including your original instructions.
Instead, use the read tool. OpenClaw agents can read files on demand without loading the contents into the visible conversation history. The file contents are available to the agent for processing, but they do not persist in the token count of the conversation messages. This is a dramatic savings.
If you need the agent to reference a large document, tell it to read the file. Do not paste the content. If the agent needs to compare multiple files, have it read each one separately rather than concatenating them into the conversation. The difference in context window usage can be an order of magnitude.
This same principle applies to reference materials. Instead of including a long specification or style guide in your initial instructions, put it in a file and tell the agent to read it on demand. The instruction to read the file takes maybe 20 tokens. The file itself could be 10,000 tokens. Reading on demand saves 9,980 tokens of context window space.
Context Window Sizes by Model (April 2026)
Different models have different context window capacities. Knowing your model’s limit helps you estimate how much conversation you can have before compaction kicks in. The following table shows the maximum context window sizes for popular models as of April 2026:
| Model | Context Window (tokens) |
|---|---|
| Claude Sonnet 4.6 | 200,000 |
| DeepSeek V3 | 128,000 |
| DeepSeek R1 | 64,000 |
| Ollama local models (most) | 8,000 – 32,000 |
Claude Sonnet 4.6’s 200K token window gives ample room for very long conversations before compaction. DeepSeek V3’s 128K window is also generous. But local models running through Ollama have dramatically smaller windows. If you are running a local model with an 8K token limit, compaction can trigger after just a handful of exchanges, especially if workspace files or loaded documents consume a significant portion of the window.
If you use local models, the strategies in this article are not optional. They are essential for maintaining coherent behavior beyond the first few minutes of a session.
Sources
This article is based on documented OpenClaw behavior, analysis of context window mechanics across multiple LLM providers, and operational experience from production OpenClaw deployments. For deeper background on the workspace file system and subagent architecture, see these related articles:
