Both settings are on. The config looks right. The agent still forgets everything. This is almost never a config problem. It is a pipeline execution problem. The settings are enabled but somewhere between the conversation ending, the extraction model running, and the memory store writing, the chain broke. Nothing surfaces this in the conversation. Here is how to find where it broke, and what to do about it.
What the memory pipeline looks like when it is working
Before diagnosing failures, it helps to know what success looks like. Here is the full sequence of events in a session where autoCapture and autoRecall are working correctly.
Session start
When a new session starts, autoRecall fires before the first turn. It takes your initial context (the contents of SOUL.md, AGENTS.md, and any startup files) and uses it as a query to search the memory store. Relevant memories matching that context are retrieved and injected as a block before the conversation begins. This is why your agent should already know things from previous sessions before you say anything in the current one.
During each turn
autoRecall fires again before each turn, using the current conversation context as the query. As the session progresses and different topics come up, different memories surface. This is the active recall mechanism: memories relevant to what you are currently discussing are injected into the context just before the agent responds.
After each turn
autoCapture fires after each turn ends. It takes the recent conversation content, sends it to the extraction model, and the extraction model outputs structured memory objects: facts, preferences, decisions, entities. These are embedded and written to the memory store. The turn that just completed is now represented in memory and available for future recall.
Across sessions
When a new session starts, the memories written by autoCapture in all previous sessions are available for recall. This is how continuity works: the agent does not remember the conversation directly (that is gone at session end), but it remembers the facts, preferences, and decisions that were extracted from those conversations and stored.
How to test the pipeline end to end
Rather than diagnosing a broken pipeline in an existing session with an unknown state, sometimes it is faster to run a clean end-to-end test in a fresh session. This takes less than five minutes and immediately tells you whether the pipeline is fundamentally broken or whether the issue is specific to an existing session’s state.
Start a memory pipeline test. First, run memory_stats and tell me the current count. Then I am going to tell you a specific fact: my favorite color is deep teal. Confirm you have heard this, then end this turn. On the next turn I will check whether it was captured.
After the above exchange, start a fresh session and run:
Without me telling you anything in this session: do you know what my favorite color is? Check your recalled memories and report what you find. If you do not know, tell me that clearly.
If the agent knows your favorite color in the fresh session, the full pipeline is working: autoCapture extracted it, the embedding model stored it, and autoRecall surfaced it. If the agent does not know, use the diagnostic sequence above to find where the chain broke.
How compaction affects the pipeline
Compaction and autoCapture are designed to work together, though most documentation treats them as independent features. Understanding how they coordinate prevents a class of problems that looks like broken memory but is actually a timing interaction. When compaction fires, it replaces older conversation turns with a compressed summary. autoCapture extracts from the raw turns before compaction, so memories are written from actual conversation content, not from summaries.
The timing edge case: if compaction fires during a long session while an extraction cycle is in progress, the extraction payload may include turns that get compacted in the same cycle. autoCapture then processes the compacted summary in the next cycle, potentially extracting the same information again in a slightly different form. This produces duplicate or near-duplicate memories over time.
This is rarely critical but explains why memory stores accumulate duplicates in long active sessions. Enable deduplication in your plugin if it is available. A deduplication threshold of 0.92 to 0.95 removes near-identical memories while preserving genuinely distinct ones. Setting it too low (below 0.80) aggressively merges memories that are related but not identical.
Check my memory plugin config. Is deduplication enabled? If yes, what similarity threshold is it using? If no, is it available in this plugin version and how do I enable it?
How autoCapture interacts with captureAssistant
Some memory plugin configurations include a captureAssistant setting. When this is enabled, autoCapture also extracts from the assistant’s own responses, not just from user messages. This doubles the extraction surface and increases the volume of memories written per turn.
With captureAssistant: true, memories are written from both what you say and what the agent says. This is useful for capturing agent decisions and reasoning. It also increases extraction time (larger payloads) and memory store growth (more writes per turn). If your extraction is timing out and payload sizes are large, check whether captureAssistant is enabled and whether disabling it reduces the payload size enough to fix the timeout. Disabling captureAssistant means agent responses are no longer extracted into memory. That is a tradeoff worth knowing before making the change.
Read my memory plugin config. Is captureAssistant enabled? What is it set to? Based on my current hardware and extraction model, do you think it is contributing to extraction timeouts?
Monitoring memory health over time
A working pipeline today can silently degrade. Extraction model updates, Ollama restarts, config changes after plugin updates: any of these can break extraction without surfacing an error. Build a lightweight check into your routine:
Run memory_stats. How many memories are in the store? When was the most recent one written? If the most recent write is more than 24 hours old and I have had active sessions since then, flag this as a potential extraction failure and run the extraction diagnostic.
You can also build this check into a cron job that fires daily and sends a summary to your messaging channel. If the memory count has not increased since the previous day’s check, the cron job reports it. Catching silent failures before they accumulate a week’s worth of lost context is much cheaper than diagnosing them after the fact.
Platform-specific notes
macOS (local installation)
On macOS, Ollama runs as a background application rather than a systemd service. If Ollama quits or crashes (common after system updates or sleep/wake cycles), your local extraction model and embedding model become unreachable. autoCapture will fail silently until Ollama is restarted. If you notice memory writes stopping on macOS, check that the Ollama menu bar icon is active. Run ollama list in Terminal to verify the service is running and models are available.
Linux VPS (systemd)
On a VPS running Ollama as a systemd service, verify Ollama is set to restart on failure: systemctl status ollama. If it is not configured with Restart=always, a crash leaves your extraction model unreachable until the next manual restart or server reboot. Also confirm OLLAMA_KEEP_ALIVE=-1 is set in the Ollama service environment. Without this, Ollama unloads models after a period of inactivity, causing the first extraction call after idle time to time out while the model reloads.
Docker deployments
If OpenClaw is running in a Docker container and Ollama is on the host, confirm network routing between the container and the host. The Ollama endpoint inside the container is typically host.docker.internal:11434 on Docker Desktop (macOS/Windows) or the host’s Docker bridge IP on Linux (check with docker network inspect bridge | grep Gateway). If your memory plugin config has 127.0.0.1:11434 as the embedding endpoint, it will fail inside a container because 127.0.0.1 refers to the container, not the host.
Windows (WSL2)
On Windows with WSL2, Ollama can run either on the Windows host or inside WSL2. If Ollama is on the Windows host and OpenClaw is in WSL2, the endpoint is the WSL2 gateway IP (typically 172.x.x.1:11434), not 127.0.0.1. Run ip route show default | awk '/default/ {print $3}' inside WSL2 to find the gateway IP. If both are in WSL2, 127.0.0.1 works correctly.
After fixing: what to expect
Once the pipeline is working, memory accumulation is gradual. Do not expect the agent to immediately recall everything from a single fixed session. The memory store builds over many sessions. After the first week of active use with a working pipeline, the agent will start surfacing preferences and facts you mentioned days ago without prompting. After a month, the context it brings to new sessions is noticeably richer.
What to check at 24 hours post-fix:
- memory_stats count has increased from your sessions that day
- A fresh session surfaces at least one memory from a previous session without you prompting it
- The most recent write timestamp in memory_stats is from today
If all three are true, the pipeline is healthy. Set the daily monitoring cron described above and let the store grow. After several days of active use with the pipeline working, you will see the agent pulling in specific context from earlier sessions without being prompted. That is the intended end state: a session that starts already knowing where you left off.
When to wipe and restart the memory store
Sometimes the fastest fix is to start fresh. If your memory store has accumulated months of low-quality or incorrect memories from a broken extraction pipeline, cleaning house is more productive than trying to fix individual bad records.
Signs that a reset makes sense:
- The store has hundreds or thousands of memories but recall is consistently poor
- The embedding model was changed after memories were written (old vectors are incompatible with new queries)
- You changed the extraction model and the old model was producing low-quality extractions
- The store is full of duplicates or near-duplicates from a misconfigured extraction pipeline
I want to evaluate whether to reset my memory store. Show me: the total count, a sample of 10 random memories, the oldest and most recent write timestamps, and the current embedding model. Based on that, give me a recommendation on whether a reset is worth it.
If you do reset, do it cleanly: stop OpenClaw, confirm the exact database path with your agent first, delete that specific directory, restart OpenClaw, and confirm the plugin reinitializes correctly before your first session. Check gateway logs for successful LanceDB initialization. Run memory_stats immediately after startup to confirm the count is zero and the store is fresh. Keep a backup of the old database if you want to inspect it later.
cp -r ~/.openclaw/memory ~/.openclaw/memory.backup.$(date +%Y%m%d). This preserves the old store in a dated directory while letting the plugin reinitialize a clean one.
API cost of the memory pipeline
If you are using API models for extraction or embedding, the memory pipeline adds to your API cost. Understanding the rough cost helps you decide whether to use local models instead.
Using DeepSeek V3 for extraction at $0.27 per million tokens:
- Average extraction payload per turn: 3,000 tokens
- Extraction output (the extracted memories): 200 tokens
- Cost per extraction: 3,200 × $0.00000027 = $0.00086 (less than a tenth of a cent)
- 50 turns per day: $0.043/day or $1.30/month
Using OpenAI text-embedding-3-small at $0.02 per million tokens for embeddings:
- Average memory text size: 150 tokens
- Cost per memory embedding: $0.000003
- 100 new memories per day: $0.0003/day or $0.009/month
The memory pipeline is cheap even with API models. Most operators using a combination of local extraction and API embedding spend under $0.50/month on memory-related API calls. The main cost risk is if extraction is misconfigured to run on a frontier model instead of a cheap or local one. Running extraction on Claude Sonnet instead of DeepSeek V3 increases the per-turn extraction cost by 10x. Always verify which model is handling extraction before assuming the memory pipeline is cost-free. The model routing matters here just as much as it does for conversation.
Read my memory plugin config. What model is configured for extraction? What model is configured for embedding? Are either of these frontier models (Sonnet, GPT-4o, Opus)? If yes, flag this as unnecessary cost and suggest cheaper alternatives.
The diagnostic sequence matters. Start at the bottom of the stack (backing store) and work up. Checking recall quality before confirming writes are actually occurring wastes time on the wrong layer.
Step 1: Check the backing store
Before checking autoCapture or autoRecall, verify the memory plugin actually initialized successfully. A plugin can report as loaded while the backing store failed silently. A missing dependency, a bad file path, or a permissions issue on the database directory will all do this.
Check the gateway logs from the last startup. Show me any errors related to memory, LanceDB, the embedding model, or plugin initialization. If there are no errors, confirm the memory plugin loaded successfully.
journalctl -u openclaw --since "1 hour ago" | grep -i memory. On Docker, run docker logs openclaw 2>&1 | grep -i memory. Look for errors about LanceDB, embedding model, or missing dependencies.
Also verify the memory directory exists. If OpenClaw is running on a VPS, SSH in first. Then run:
ls -la ~/.openclaw/
ls -la ~/.openclaw/memory/ 2>/dev/null || echo "memory directory does not exist"
If it prints “memory directory does not exist,” the backing store never initialized. The plugin loaded but had nowhere to write. If the directory is there, confirm the path in your config matches it.
Read my memory plugin config in openclaw.json. What path is the database configured to use? Does that path exist? Does the openclaw process have write permissions to it?
If the path does not exist, first confirm what path the plugin expects (from the blockquote above), then create it: mkdir -p /the/path/from/config. Do not create a generic path. Create the exact path the config specifies. If permissions are wrong, fix them. The OpenClaw process user (node or openclaw) needs write access to the directory.
Step 2: Verify autoCapture is writing
autoCapture runs after each turn ends. It sends recent conversation content to an extraction model, which pulls out facts and preferences, then writes them to the memory store. If the extraction model call fails or times out, nothing is written. No error surfaces in the conversation.
Use the memory_stats tool and the memory_list tool. Show me: how many memories are currently stored, when the most recent one was written, and what its content is.
If the count is zero or the most recent memory is from days ago, autoCapture is failing silently. The most common causes:
- Extraction model timing out. Small local models can take 30+ seconds on some hardware. If the plugin’s timeout is shorter, extraction fails every time with no error surfaced.
- Extraction model not available. If it is a local Ollama model, check that Ollama is running and the model is pulled.
- extractMinMessages set too high. If set to 5, extraction only runs after 5 turns. Short sessions never trigger it.
Read my memory plugin config. What model is running extraction? What is the timeout for extraction LLM calls? What is extractMinMessages set to? Is the extraction model currently reachable?
Why the extraction model times out
This is the most common extraction failure. When autoCapture fires, it assembles a payload of recent conversation turns and sends it to the extraction model asking it to pull out facts, preferences, and decisions. The model must respond within the configured timeout. If it does not, the call is abandoned and nothing is written.
Extraction payloads are large. A single turn can generate 2,000 to 5,000 tokens of input. On hardware without a GPU, an 8B parameter model processes this in 25 to 40 seconds. The default timeout in most memory plugins is 30 seconds. The math does not work. Extraction fails consistently with no error.
Read my memory plugin config. What is the extraction LLM timeout set to? Is my extraction model running on local hardware or an API? Based on that, is the timeout long enough?
~/.openclaw/openclaw.json. Find the memory plugin config section. Look for llm.timeoutMs or extractionTimeoutMs. Set it to 90000 (90 seconds). Save and restart OpenClaw. If the field is not configurable in your plugin version, this requires a source patch.
Run memory_stats and tell me the current count. Then have a 4-turn conversation with me about any topic, then run memory_stats again. Did the count increase? If not, describe what the plugin logs show for those turns.
Step 3: Verify autoRecall is injecting
autoRecall runs at context assembly time before each turn. It queries the memory store and injects relevant memories into the context as a block. If the agent appears to ignore recalled memories, check whether the injection is actually happening:
At the start of this turn, were any memories injected into your context? If yes, show me the exact injected block including all memories and their scores. If no, tell me why no memories were injected.
If the agent confirms injection happened but still does not act on the recalled memories, the issue is placement. The injected block lands at the start of the context. After compaction, it is no longer in the active window. The agent processes what is in its current context window, not the full session history.
If no memories are being injected despite the store having content, there is a scope mismatch.
Step 4: Check for scope mismatch
This is the most common cause of “nothing works” when the pipeline is actually running fine. Memories were written with one scope, recall is querying a different scope. The pipeline executes correctly. It just queries the wrong bucket.
Scopes are namespaces. Memories written to scope A are invisible to a recall query targeting scope B. There is no overlap and no fallback. The scope you write with and the scope you query with must be identical, including capitalization and any prefix formatting.
Scope mismatch is easy to introduce by accident:
- Default scope drift. autoCapture writes to a default scope (often
globaloragent:main). You configure autoRecall with an explicit scope. If they differ, recall finds nothing. - Plugin update changed the default. New version changed the default scope. Existing memories are in the old scope, new writes and recall queries use the new scope.
- Manual memory_store calls used a different scope. Manual stores with an explicit scope and autoCapture’s default scope are two separate buckets.
Read my memory plugin config. What scope is autoCapture writing to? What scope is autoRecall querying? Are they the same value? If different, explain what that means for recall.
Run memory_stats with no scope argument, then memory_list with no scope argument. Show me the total count and the scope breakdown (which scopes have memories and how many are in each).
When recall injects but the agent ignores the memories
The pipeline is working, memories are stored and injected, but the agent still asks for information it should have. Three causes:
Compaction pushed the injected block out of context
autoRecall injects memories near the beginning of the context. After compaction, that block is no longer in the active window. Check whether your plugin re-injects on every turn or only at session start. Re-injection every turn keeps memories in context regardless of compaction.
Injection format blends into conversation content
Some plugins inject memories as plain text with no structural marker. If the block blends into conversation content, the model does not treat it as authoritative context. Plugins that wrap injections in XML tags or clearly labeled blocks produce more reliable results.
The stripAutoCaptureInjectedPrefix bug
Some memory plugin versions have a known issue where autoRecall injections are stripped before the agent processes the turn rather than after. Memories appear in the context when inspected but are absent during response generation. If you see injected memories in a raw context dump but the agent lacks the information, check your plugin version’s changelog for this issue. It has been patched in most current releases.
Show me exactly where in your current context the recalled memories block appears. Is it before or after the most recent conversation turns? Is it re-injected each turn or only at session start? What format wraps the injection?
Checking the embedding model
The embedding model converts memory text into vectors for storage and search. If it is unavailable or producing low-quality vectors, recall fails to surface relevant memories even when they exist in the store.
- nomic-embed-text (Ollama): free, local, 768 dimensions. Fast, lightweight, good for most use cases.
- mxbai-embed-large (Ollama): free, local, 1024 dimensions. Better semantic understanding for abstract or nuanced memories.
- OpenAI text-embedding-3-small: API-based, low cost, reliable quality.
- Jina jina-embeddings-v3: API-based, task-aware LoRA, strong on complex recall tasks.
What embedding model is configured for my memory plugin? Is it currently reachable? Run a recall test: search for a phrase I give you and show me the results including similarity scores.
ollama list to confirm it is installed. Embedding models must be pulled separately from chat models. Run ollama pull nomic-embed-text to install the most commonly required one (274MB download).
The full diagnostic sequence
Steps 1 through 4 are the sequence. The sections below are deep dives into the most common failure modes at each step.
- Check the backing store exists and is writable. If not, nothing else matters.
- Run memory_stats to confirm writes are happening. If count is zero, the problem is in extraction.
- If writes are zero: check extraction model availability and timeout. Is the model reachable? Is the timeout long enough?
- If writes exist but recall surfaces nothing: check scope alignment. Are capture and recall using the same scope?
- If scope is aligned but agent ignores recalled content: check context placement. Is the injected block re-injected each turn? Has compaction pushed it out?
Most setups fail at step 2 (extraction timeout) or step 4 (scope mismatch). Steps 1 and 3 are quicker to rule out and worth confirming first. Run each check before moving to the next.
Run a full memory pipeline diagnostic. For each of the following, report pass, fail, or unknown: (1) memory plugin loaded and backing store accessible, (2) memory_stats count and most recent write timestamp, (3) extraction model name, timeout value, and reachability, (4) extractMinMessages value, (5) autoCapture scope vs autoRecall scope, (6) embedding model name and reachability, (7) whether autoRecall re-injects on every turn or only at session start.
Complete fix
Ultra Memory Claw
The complete memory configuration guide. Every layer of the pipeline, how they interact, and how to tune each one for your workload. Drop it into your agent and it audits your current memory setup and fixes what needs fixing.
FAQ
How do I check if my extraction model is timing out?
Ask your agent: “Did autoCapture run after the last turn? If so, what did it extract and how long did it take?” If the agent says extraction ran but nothing was written, or cannot confirm it ran at all, check the timeout value in your memory plugin config against how long your extraction model actually takes to respond. Local models on slower hardware commonly exceed a 30-second default timeout.
I can see memories in memory_list but my agent acts like they do not exist. What is happening?
Most likely a scope mismatch or a similarity threshold that is too tight. The memories exist but recall is not finding them. Run a direct recall query for a specific phrase you know is in one of the memories. If it does not surface, adjust the threshold. If it surfaces but the agent ignores it, check where in the context the injected block lands and whether compaction has pushed it out of the active window.
Does autoCapture run on every turn or only at session end?
autoCapture runs after each turn ends, not at session end. If the session ends before a turn completes (crash, kill, timeout), the last turn is not captured. This is expected. The extractMinMessages setting controls how many messages must accumulate before extraction triggers. Very short sessions with fewer turns than that setting never fire extraction even if autoCapture is on.
autoRecall injects memories but they are always the same ones. How do I fix this?
The recall query is based on your current context. If the same topics dominate your sessions, the same memories surface. This is correct behavior. If genuinely different queries keep returning the same irrelevant memories, the similarity threshold is too loose or the embedding model is producing low-discrimination vectors that match too broadly.
What is the difference between autoCapture and autoRecall?
autoCapture writes memories from conversation content. It runs after each turn, extracts facts and preferences, and stores them. autoRecall reads memories from the store and injects them into the context. It runs before each turn, queries for relevant memories, and adds them as a block. They are separate processes that can be enabled independently, but having one without the other is not useful for normal operation.
Can I use autoCapture without autoRecall?
Yes, but memories are never surfaced to the agent without autoRecall. The only valid reason is if you are building a separate memory query interface outside OpenClaw. For normal use, both should be on.
What happens if I have multiple memory plugins installed?
OpenClaw loads all enabled memory plugins. They can conflict. The most common conflict is tool registration: two plugins try to register the same memory tool names, causing one to fail silently. Check your gateway logs for tool registration errors. Only one memory plugin should be active at a time.
Does autoRecall work with subagents?
Yes, if the subagent session shares the same memory scope. Subagents spawned by your main agent run in separate sessions but query the same memory store. Whether they do depends on the plugin config and whether the subagent session is set up to use memory tools. Some plugins restrict memory tool access to the main agent session only.
My memories exist but recall never surfaces them, and the scopes match. What else could cause this?
Check the similarity threshold. If it is set too high (say, 0.95), almost nothing qualifies as a match. Try lowering it to 0.7 and running the same recall query. Also check the embedding model: if it was changed after memories were written, the existing vectors and the new query vectors are from different model spaces and will not match correctly. Re-embedding existing memories with the new model is required in that case.
Go deeper
How to choose the right embedding model for OpenClaw memory
Once the pipeline is running, embedding model quality is the biggest lever on recall precision.
My OpenClaw agent keeps mixing up memories from different sessions
The pipeline is running and recall is working, but memories from one context surface in another. Scope design is the fix.
OpenClaw memory is on but it keeps recalling the wrong things
If the pipeline is running but recall quality is bad, the problem is in the embedding model or retrieval settings.
