Why does OpenClaw keep compacting even on short sessions?

Compaction fires when the context window fills to a threshold. On short sessions, that should not happen. If it is happening anyway, the context window is either smaller than you think, or something is filling it faster than you expect. Here is how to find which one and fix it.

TL;DR: Compaction fires on short sessions when the context window is too small for your actual workload. The usual culprits: a large system prompt taking up most of the headroom, tool outputs accumulating fast, or memory injection adding tokens on every turn. This article walks through each one in order, shows how to measure what is filling your context, and how to tune compaction settings to match your actual usage patterns.

Before you start: The indented blocks throughout this article are commands you paste directly into your OpenClaw chat. Your agent runs them and reports back. No terminal or file editing needed. Manual fallbacks are in blue boxes where a config change is required.

What triggers compaction

Compaction runs when the total token count in context reaches the compaction threshold. The threshold is a percentage of the configured context window ceiling. It is not a turn count. A two-turn session can hit the compaction threshold if those two turns involved heavy tool use, large file reads, or a long system prompt loading at startup.

Read my openclaw.json. What is my configured context window size in tokens? What is my compaction threshold set to? At what token count does compaction actually fire given those two values?

Most operators do not know these numbers. If your context window is set to 32,000 tokens and compaction fires at 75%, it triggers at 24,000 tokens. That is not much headroom for a session with a large system prompt and a few tool calls. The system prompt alone can consume 15,000 tokens, leaving only 9,000 tokens for the actual conversation and tool outputs before compaction fires.

The calculation:

compaction trigger (tokens) = contextTokens × (threshold / 100)

Example: 32,000 × 0.75 = 24,000 tokens

If your system prompt alone is 15,000 tokens, you have 9,000 tokens of headroom before compaction fires. That is roughly 3 to 5 turns of normal conversation with no tool use, and 1 to 2 turns with active tool calls.

The system prompt is bigger than you think

Every session starts by loading the system prompt. For most operators this includes SOUL.md, AGENTS.md, USER.md, workspace context files, injected memories, and any plugin context blocks. All of it loads before your first message. On a setup with several loaded files and an active memory plugin, the system prompt alone consumes 15,000 to 40,000 tokens.

Estimate how many tokens are in my current system prompt. Include everything that loads at session start: all workspace files, injected memories, plugin context, and the base system instructions. Give me an approximate total and a breakdown by source.

If that number is close to your compaction threshold token count, you have very little headroom. A single tool call with a large output can push you over. This is why compaction appears to fire on short sessions: the system prompt is already using most of the headroom before the conversation even begins.

The three most common causes of system prompt bloat, in order of frequency:

Instruction files that grew over time

AGENTS.md and SOUL.md accumulate content across sessions as new protocols, rules, and notes are added. Over weeks of active use, these files can grow from 2,000 tokens to 10,000 or more. Auditing and trimming them periodically reduces baseline context cost on every session.

Read AGENTS.md and SOUL.md. Estimate the token count of each. Are there sections that are redundant, outdated, or could be trimmed without losing important instructions? Flag the top 3 candidates for removal or condensing.

Memory injection loading too many memories

If autoRecall is enabled and the similarity threshold is loose, a large block of recalled memories prepends every turn. With 500 stored memories and a threshold that surfaces 20 results per query, each turn adds 3,000 to 6,000 tokens of injected context. After 3 turns, you have added 10,000 to 18,000 tokens that were not there at session start.

Read my memory plugin config. How many memories are returned per autoRecall query? What is my similarity threshold? Estimate how many tokens the injected memory block adds to context per turn.

Tightening memory recall: Reduce the number of memories returned per query (a limit of 5 to 10 is sufficient for most sessions). Raise the similarity threshold to reduce low-relevance matches. Both reduce per-turn injection volume without significantly affecting recall quality.

Plugin context blocks at session start

Some plugins inject context at session initialization. A plugin that loads its documentation, configuration summary, or state into context at startup adds tokens that persist for the entire session. If you have multiple plugins doing this, the overhead compounds. Check which plugins are active and what context each injects.

List all enabled plugins in my openclaw.json. For each one, does it inject any context at session start? If yes, how many tokens does that injection add? Which plugin adds the most context overhead?

Tool outputs accumulate

Every tool call appends its output to the context. The outputs stay there until compaction collapses them. After ten tool calls, you have added 5,000 to 20,000 tokens without noticing. On a 32k window with a large system prompt, this is what pushes context over the threshold.

The biggest contributors by type:

File reads: Returns the full file content. A 500-line file at 4 characters per token adds roughly 2,000 to 3,000 tokens per read.
Web searches: Returns 3 to 5 results with titles, URLs, and snippets. Each search adds roughly 500 to 1,500 tokens.
exec outputs: Returns the full command output. A command that prints 200 lines adds 1,000 to 2,000 tokens.
Memory recall: Returns the matched memories with scores. 10 memories at 200 tokens each adds 2,000 tokens per recall call.

Estimate how many tokens have been added to the current context by tool outputs since this session started. What are the three largest individual tool output contributors?

If tool output accumulation is the primary driver, the fix is not to avoid tool use. It is to tune compaction so it fires and collapses those outputs before they fill the window. A lower compaction threshold (firing earlier, say at 65%) with a lower retain setting collapses tool outputs faster.

The context window may be set lower than the model supports

The default context window in OpenClaw is set conservatively. If your model supports 128k tokens but OpenClaw is configured for 32k, compaction fires at a fraction of what the model could handle. This is a common misconfiguration that looks like compaction misbehaving but is actually a window ceiling problem.

Read my openclaw.json. What model am I using as my primary? What is the maximum context window that model supports? What is my configured contextTokens? Is there a significant gap between them?

If there is a gap, raising contextTokens to a more appropriate value for your workload gives compaction more room before it fires. See the companion article on context window sizing for guidance on choosing the right ceiling.

How to tune compaction directly

If the window is sized appropriately and something is still filling it too fast, tuning the compaction settings themselves is the right move.

The threshold setting

The threshold controls when compaction fires as a percentage of the context ceiling. A lower threshold (65%) fires compaction earlier, keeping the active context tighter. A higher threshold (85%) gives more room before compaction fires, allowing longer uninterrupted sessions but at the cost of heavier context toward the end.

Read my openclaw.json compaction config. What is my current threshold? What would it be as a raw token count given my contextTokens? Suggest whether I should raise or lower it based on my workload (I will describe it: [describe your typical session]).

The retain setting

The retain setting controls how much context is kept after compaction fires, expressed as tokens or as a percentage. A high retain setting means compaction collapses less. The active context stays large after compaction. A low retain setting means more content is compacted, giving more headroom for the next phase of the session. If compaction fires repeatedly in rapid succession, retain is set too high.

What is my current compaction retain setting? After compaction fires, how much context is left? Is compaction firing multiple times in rapid succession in my recent sessions? If so, is the retain setting leaving too much behind each time?

The turns preserved setting

Some compaction configurations allow specifying how many of the most recent turns to preserve verbatim before the compaction summary. If this is set high (10 to 15 turns), those turns plus the compaction summary plus the system prompt can easily fill most of the window. A setting of 4 to 6 recent turns is sufficient for conversational continuity.

Config change reminder: Changes to contextTokens, compaction threshold, or retain require a fresh session (/new) to take effect. OpenClaw caches these values at session start. Always verify with /status in the new session.

What compaction actually does to your context

Understanding the mechanism helps you tune it. When compaction fires:

OpenClaw sends the older conversation turns to a compaction model (configured separately from your main agent model).
The compaction model produces a structured summary of those turns.
The original turns are removed from the active context.
The summary replaces them. The summary is shorter but not zero. It adds a new block to context.
The most recent N turns (per the turns preserved setting) are kept verbatim alongside the summary.

After compaction, the context contains: system prompt + compaction summary + N recent turns. The total is smaller than before compaction, but the summary and preserved turns persist. If compaction fires again quickly, another summary layer accumulates. Over a very long session, multiple compaction cycles produce layered summaries that themselves take up meaningful space.

Has compaction fired in this session? If yes, how many times? What does the current context look like: how much is system prompt, how much is compaction summary content, and how much is recent turns?

Compaction and the memory pipeline interaction

Compaction and autoCapture are designed to work together. autoCapture extracts memories from conversation turns before compaction collapses them, so the extracted content is preserved in the memory store even after the raw turns are gone. This is the right behavior.

The edge case: if compaction fires before autoCapture has finished extracting from the current turn, some content is compacted before extraction runs on it. In practice this is rare and not critical. The compacted summary still contains the key points, and extraction from the summary produces similar memories. But it explains why you see slightly different memory content than you expected from a specific session.

One practical consequence: do not rely on lowering the compaction threshold as a substitute for a working memory pipeline. Compaction is not a memory system. It is a context management system. Memories need to be extracted and stored explicitly by autoCapture. Compaction summaries are not the same as stored memories.

When compaction disrupts mid-task work

The most frustrating compaction failure mode is when it fires in the middle of a multi-step task and the agent loses track of where it was. This happens when the task accumulates a lot of context (tool outputs, large file reads, intermediate results) that pushes the window to the threshold mid-task.

Three approaches to prevent this:

Raise the context window ceiling

The simplest fix if the model supports a larger window. Give the task more room to accumulate context before compaction fires.

Checkpoint state before heavy tool use

Instruct the agent to write a checkpoint (key state, current step, decisions made) before running a sequence of tool-heavy operations. If compaction fires mid-task, the agent can read the checkpoint to restore its working state. This is the approach in AGENTS.md: write context checkpoints every 5 turns and before any task involving 5+ tool calls.

Break large tasks into smaller sessions

If a task regularly fills the context window before completion, it is too large for a single session. Split it into phases, each ending with a written handoff to the next session. The memory pipeline handles continuity between sessions; the context window handles continuity within a session. Use each for what it is designed for.

Before we start this next task, write a brief checkpoint: what we are doing, where we are in the process, and the key decisions made so far. If compaction fires mid-task, I want you to read this checkpoint to restore your working state before continuing.

How to measure compaction impact on your workflow

Before tuning compaction settings, it helps to know how frequently compaction is actually firing and what effect it has on your sessions. The easiest way is to ask your agent directly:

Look at my last 10 sessions. How many times did compaction fire in each session? At what turn number did it first fire? What was the average token count when compaction fired? Give me a summary.

If compaction is firing at turn 3 or 4 consistently, the context window is too small for your system prompt plus early tool use. If it is firing at turn 15 or later, the window is sized appropriately and compaction is doing its job.

Another metric: how much context is left after compaction? If compaction fires and leaves 80% of the window still full, the retain setting is too high. If it leaves 30% of the window empty, the retain setting is too low (compaction is collapsing too much, potentially losing important context).

In the current session, if compaction fired right now, how many tokens would be left after compaction based on my current retain setting? Is that enough headroom for the next 5 turns of typical work?

The relationship between context window and compaction threshold

These two settings are coupled. Changing one without considering the other produces unexpected behavior.

The compaction threshold is a percentage of the context ceiling. If you increase contextTokens from 32,000 to 64,000 but keep the threshold at 75%, compaction now fires at 48,000 tokens instead of 24,000. That is a 24,000 token difference in absolute terms. Your sessions will run longer before compaction fires.

If you reduce contextTokens from 100,000 to 40,000 with the same threshold, compaction fires at 30,000 instead of 75,000. That is a 45,000 token difference in the opposite direction. Your sessions will hit compaction much sooner.

When adjusting either setting, recalculate the absolute trigger point:

trigger = contextTokens × (threshold / 100)

Example: 40,000 × 0.75 = 30,000 tokens

Then compare that trigger to your system prompt size plus typical early-turn tool output. If the trigger is less than the sum, compaction will fire early in every session.

Compaction model selection and cost

The model that performs compaction is configured separately from your main agent model. Using a frontier model (Sonnet, Opus, GPT-4o) for compaction is unnecessary and expensive. A cheaper model handles the summarization task just as well.

Common compaction model choices:

DeepSeek V3: $0.27 per million tokens. Fast, cheap, good quality for summarization.
Local phi4 (Ollama): Free, runs on your hardware. Slower but zero API cost.
GPT-3.5 Turbo: $0.50 per million tokens. Slightly more expensive than DeepSeek but still cheap.

If you are using a local model for compaction, ensure it is available and responsive. A slow local model can cause compaction to take 30+ seconds, during which the session is blocked. If compaction appears to be causing long pauses, check the compaction model’s response time.

Read my openclaw.json. What model is configured for compaction? Is it a frontier model? If yes, suggest a cheaper alternative and estimate the cost difference per compaction cycle.

Compaction and the /new rule

Changes to compaction settings require a fresh session to take effect. This is the same rule that applies to context window changes. OpenClaw caches the compaction threshold and retain values at session start. If you update the config and continue in the same session, the old values remain in effect.

Always start a new session after changing compaction settings, and verify with /status that the new values are active. Some versions of OpenClaw show compaction settings in the status output; others do not. If not visible, ask your agent to read the config and confirm the active session is using the new values.

Verification sequence: (1) Update compaction threshold/retain in config, (2) start a fresh session with /new, (3) run /status to see if compaction settings are shown, (4) ask your agent to read the config and confirm the active session is using the new values, (5) run a test session that would previously have triggered compaction early and see if it now fires later (or not at all).

When compaction is not the problem

The symptom (compaction firing early) is sometimes correct, but the root cause is not a compaction setting. Two other possibilities:

Memory injection volume is too high

If autoRecall is returning 20+ memories per turn, each at 200 tokens, that is 4,000 tokens injected per turn. After 5 turns, you have added 20,000 tokens of memory content alone. That can fill a small context window quickly. Tighten the memory recall limit and similarity threshold before adjusting compaction.

Tool outputs are not being collapsed

If your workflow involves many large tool outputs (file reads, web searches) and compaction is not firing early enough to collapse them, the context fills with raw tool output that stays until the threshold is reached. Lowering the compaction threshold causes it to fire earlier and collapse those outputs sooner.

Distinguish between these by checking what is filling context fastest:

In the current session, what is using the most context tokens? Break it down: system prompt, conversation turns, tool outputs, memory injections. Which category is largest and growing fastest?

Advanced: layered compaction summaries

In very long sessions, compaction can fire multiple times. Each compaction cycle adds a summary layer. After three compactions, the context contains: system prompt + summary1 + summary2 + summary3 + recent turns. The summary layers themselves take up space.

If you notice context filling faster later in a session even though tool use has slowed, layered summaries are likely the cause. The fix is to lower the retain setting so each compaction collapses more, leaving less summary content behind. Alternatively, break the session into smaller sessions with handoffs between them.

Layered summaries are a sign that the session has outgrown the context window’s capacity for the workload. Either increase the window, or accept that very long sessions will accumulate summary overhead.

If compaction has fired multiple times in this session, how many summary layers are currently in context? How many tokens do those summary layers occupy? Is that a significant portion of the current context size?

Platform-specific notes

macOS (local Ollama compaction model)

On Apple Silicon, Ollama runs on unified memory. If you are using a local model for compaction (phi4, llama3.1:8b), ensure the model is loaded and ready before compaction fires. The first compaction call after a period of idle time triggers a model load, which can take 10 to 30 seconds on slower hardware. This appears as a long pause in the session. To avoid this, set OLLAMA_KEEP_ALIVE=-1 in the Ollama environment to keep models loaded.

Linux VPS (CPU-only)

On a CPU-only VPS, local compaction models are slow. A 7B model summarizing 10,000 tokens of context can take 45 to 90 seconds. If compaction appears to hang the session, switch to an API model (DeepSeek V3) for compaction. The API cost per compaction cycle is low ($0.002 to $0.005) and the response is near-instantaneous.

Docker deployments

If OpenClaw is running in a container and the compaction model is on the host (Ollama), ensure the container can reach the Ollama endpoint. The default 127.0.0.1:11434 inside the container points to the container, not the host. Use the host’s Docker bridge IP or host.docker.internal (Docker Desktop).

Common mistakes when tuning compaction

Setting threshold too low (compaction fires too frequently)

A threshold of 50% or 60% causes compaction to fire early and frequently. This can break conversational flow and increase cost (each compaction cycle uses tokens). Unless you have a specific reason for aggressive compaction, keep the threshold between 70% and 85%.

Setting retain too high (compaction does not clear enough)

A retain setting of 80% or 90% leaves most of the context intact after compaction. The next turn pushes back over the threshold, causing compaction to fire again immediately. This creates a loop of rapid-fire compactions that accomplish little. Lower retain to 40% to 60% for meaningful headroom.

Not accounting for system prompt size

The most common oversight. If your system prompt is 20,000 tokens and your context window is 32,000 with a 75% threshold, compaction fires at 24,000 tokens. You have only 4,000 tokens of headroom before compaction. Always subtract system prompt size from the trigger calculation.

Changing settings but not starting a new session

The /new rule applies to compaction settings just as it does to context window. If you change threshold or retain and continue in the same session, the old values remain cached. Always start fresh after compaction config changes.

How to test your new compaction settings

After adjusting threshold and retain, run a test session that mimics your typical workload. Note when compaction fires and how much headroom remains after. The goal is a single compaction fire mid-session (not early, not late) that leaves enough headroom for the rest of the session without another fire.

We are going to simulate a typical session. I will ask you to perform a sequence of tasks that matches my normal workload. Track the context token count after each turn. Tell me when compaction would fire given my current settings, and how much headroom would remain after.

If the test shows compaction firing too early or too late, adjust the threshold and repeat. This iterative approach is faster than guessing and waiting for real sessions to reveal problems.

When to leave compaction at defaults

If your sessions are not experiencing early compaction, and you are not hitting context limits mid-task, the default settings are likely fine. Compaction exists to prevent hard context limits from breaking sessions. If your sessions never approach those limits, compaction is not a problem you need to solve.

Signs that default compaction is working correctly:

Compaction fires only in sessions that run 20+ turns with active tool use
After compaction, the session continues without confusion or lost context
You never see the “context window full” error
Response times remain consistent throughout sessions

If all four are true, your compaction settings are appropriate for your workload. Do not tune what is not broken.

What the compaction model sees

The compaction model receives a prompt that includes the older conversation turns and instructions on how to summarize them. The exact prompt varies by OpenClaw version, but the general structure is: “Summarize the following conversation turns, preserving key decisions, facts, and preferences. Keep the summary concise.”

You cannot directly edit this prompt in most configurations, but you can influence it indirectly. If your system prompt (SOUL.md, AGENTS.md) includes instructions about what is important to preserve across turns, the compaction model reads those instructions and is more likely to include that type of content in the summary. This is why clear, structured instruction files improve compaction quality over time.

Read my system prompt (AGENTS.md, SOUL.md). Are there clear instructions about what should be preserved across turns? If not, suggest a brief addition that would help the compaction model understand what is important to keep.

Compaction tuning is one piece of the full production configuration. Brand New Claw covers the complete setup: context sizing, compaction, security exposure, and the settings that quietly cause problems after you go live.

Complete fix

Brand New Claw

The complete production configuration guide. Every setting that matters, what it does, and what breaks if you leave it at default. Drop it into your agent and it audits your current config and fixes what needs fixing.

Get it for $37 →

FAQ

Is there a way to disable compaction entirely?

Yes, but it is not recommended for most setups. Without compaction, sessions accumulate context indefinitely until they hit the hard context window limit, at which point no more input is accepted. Compaction exists to prevent this. If you want to effectively disable it, set the threshold very high (95%) and the retain very low. Compaction fires rarely and clears aggressively when it does. That is different from disabling it, but produces a similar experience for short sessions.

Does compaction lose information?

Yes, always. A summary is always lossy. The compaction model makes decisions about what is important enough to include in the summary. Fine-grained details, exact phrasing, and specific numbers from early in the session are the most likely to be lost. If you need precise details from early conversation to be available later, write them to a file or to memory explicitly before compaction fires. Do not rely on the compaction summary to preserve them.

How do I know if compaction has fired in the current session?

Ask your agent directly: “Has compaction fired in this session? If yes, when and how many times?” Some versions of OpenClaw also surface this in the /status output or in session logs. If you notice the agent losing context of something discussed earlier in the session, compaction has likely fired.

Can I control which model does the compaction?

Yes. The compaction model is configured separately from the main agent model. For cost reasons, use a cheaper model for compaction. It does not need frontier capability. DeepSeek V3 or a local phi4 model handles compaction well. Using Sonnet or Opus for compaction is unnecessary and expensive.

My agent acts confused after compaction. What is happening?

The compaction summary replaced the verbatim conversation turns. If the summary missed a key detail or decision, the agent is now working from an incomplete picture. The fix: instruct the agent to read a context checkpoint file (if one exists) after compaction fires. The checkpoint preserves state that compaction summaries could lose. If no checkpoint exists, establish the practice of writing one before any multi-step task.

Compaction fires once and then fires again almost immediately after. Why?

The retain setting is too high. After the first compaction, the remaining context (system prompt + summary + preserved turns) is still close to the threshold. The next turn pushes it back over. Lower the retain setting so more is collapsed in each compaction cycle, creating more headroom before the next fire.

Does compaction affect my API bill?

Compaction itself uses tokens from the compaction model. Those calls are billed separately. However, compaction reduces the size of the context sent on subsequent turns, which reduces input token costs for those turns. Whether compaction saves or costs money overall depends on how expensive your compaction model is versus how much it reduces subsequent input token costs. Using a cheap compaction model (DeepSeek, local phi4) almost always results in net savings.

Can I tell compaction to preserve specific things?

Not directly in the compaction config, but you can influence it. The compaction model uses a prompt to generate the summary. If your workflow instructions (in AGENTS.md or SOUL.md) mention what is important to preserve, the compaction model reads those instructions and is more likely to include that information in the summary. Explicitly stating what matters in your system prompt indirectly influences compaction quality.

What is the difference between compaction threshold and retain?

Threshold controls when compaction fires (as a percentage of the context ceiling). Retain controls how much context remains after compaction fires. Threshold is about timing. Retain is about depth. Tuning both together gives you precise control: fire late but compact aggressively (high threshold, low retain) for sessions where uninterrupted work matters more than keeping context light. Fire early but retain more (low threshold, high retain) for sessions where keeping recent context verbatim is important.

Go deeper

Context window sizing by use case

How to find the right ceiling for your workload and verify the change actually took effect in a new session.

Read →

What actually fills your OpenClaw context window

A detailed breakdown of what takes up space: system prompt, tool outputs, memory injections, compaction summaries.

Read →

OpenClaw compaction is losing important context: how to fix it

When compaction fires and the agent loses something important, this is how to prevent it from happening again.

Read →