The Memory Problem – Red Rook AI LLC

OpenClaw memory looks like it is working. The tools appear. The commands run. Nothing errors. But if you have not verified each layer of the stack individually, there is a good chance it is silently failing and you would not know it. This guide covers every layer of the memory pipeline, how to test each one, and what breaks where.

TL;DR: Install the plugin, set scope to agent:main on every tool call, verify extraction by storing and immediately recalling a test entry, and switch your embedding model to nomic-embed-text via Ollama so memory operations cost nothing. If recall returns empty after a store, your LLM extraction is timing out. Fix the timeout before anything else.

Why OpenClaw memory silently fails

The memory pipeline has five layers. All five have to work for memories to persist and surface correctly. The problem is that four of them fail silently. No error in the logs, no visible output, just nothing stored or recalled.

The plugin must be installed and enabled. If you have both memory-lancedb and memory-lancedb-pro installed, the stock version shadows tool registrations and breaks the pro version. Only one memory plugin should be active at a time.
The LLM extraction call reads the conversation and decides what to store as a long-term memory. This call has a hardcoded 30-second timeout. Local models routinely take longer than 30 seconds to complete. When the timeout fires, extraction fails silently and nothing is stored.
The embedding model converts stored text to vectors for similarity search. If the embedding model is unavailable or misconfigured, stores appear to succeed but recall produces no results.
The scope on every memory tool call determines which bucket entries go into and come out of. A store to agent:main will not surface on a recall without a scope, or a recall with a different scope. Scope mismatch is the most common cause of “I stored it but recall found nothing.”
The retrieval configuration (hybrid weight, reranker, result count) determines whether relevant memories surface in the right order. A working pipeline with poor retrieval config returns results, just not the right ones.

Run a memory pipeline diagnostic. Check: (1) which memory plugins are installed and which are enabled, (2) what LLM model is configured for extraction and what its timeout setting is, (3) what embedding model is configured and whether it is reachable, (4) what scope is set as the default. Report the current state of each layer and flag any problems.

Layer 1: Plugin selection

Two memory plugins exist for OpenClaw as of March 2026: memory-lancedb (the stock version) and memory-lancedb-pro (the extended version with hybrid retrieval and reranking). They register the same tool names. If both are installed and enabled simultaneously, one shadows the other and you get unpredictable behavior.

The stock plugin handles basic use cases. The pro version adds hybrid retrieval (combining vector similarity and keyword search), a reranker for result ordering, and configurable extraction behavior. For most personal deployments, the pro version is worth using.

List all installed memory plugins. Tell me which ones are enabled and which are disabled. If both memory-lancedb and memory-lancedb-pro are enabled simultaneously, disable memory-lancedb (the stock version) and leave only the pro version active. Show me the config change.

Known conflict: The stock memory-lancedb plugin, when enabled alongside memory-lancedb-pro, shadows the pro plugin’s tool registrations. The memory tools appear to work (they respond to calls) but they are routing through the stock plugin’s logic, not the pro plugin’s. Disable the stock plugin even if you installed it first.

Layer 2: LLM extraction timeout

When autoCapture is enabled, the memory plugin reads each conversation turn and calls an LLM to extract facts worth storing. That LLM call has a hardcoded 30-second timeout in the stock plugin code. If you are using a local model for extraction (llama3.1:8b, phi4:latest, or similar), that model will frequently take longer than 30 seconds to process a full conversation turn, especially on a loaded server.

When the 30-second timeout fires, the extraction fails with no log entry and no error visible in chat. The conversation continues normally. Nothing is stored. This is the most common cause of “memory is installed but nothing is being remembered.”

Check my memory plugin configuration. What model is set for LLM extraction? What is the timeoutMs value? If timeoutMs is not set or is set below 90000, update it to 90000 (90 seconds). Show me where in the config to make this change.

Manual fallback: If your memory plugin supports a llm.timeoutMs config field, set it to 90000. If it does not, the timeout may be hardcoded in the plugin source. In that case, find the timeoutMs: 30000 line in the plugin’s index.ts and change it to 90000. This patch survives normal usage but needs to be reapplied after plugin updates.

Choosing the right extraction model

The extraction model does not need to be your primary model. It just needs to be good enough to read a conversation turn and identify facts worth storing. In practice, phi4:latest (14B parameters, quantized) handles this reliably and is available locally via Ollama at zero API cost. llama3.1:8b works for simple extractions but misses nuance on complex technical conversations.

What model is currently configured for memory extraction? If it is an API model (deepseek, claude, openai), switch it to ollama/phi4:latest to eliminate extraction costs. If Ollama is not running locally, tell me what the cheapest API alternative would be for extraction-quality tasks.

Layer 3: Embedding model

The embedding model converts stored text into a vector representation that enables similarity search. When you recall a memory by query, the query is also embedded and compared against stored vectors to find the closest matches. If the embedding model is unavailable, misconfigured, or changed after memories were stored, recall produces empty results even though the memories are in the database.

Check my embedding model configuration. What model is being used for memory embeddings? Is it running and reachable right now? Run a test: store a memory entry with a unique phrase, then recall by that phrase. If recall returns empty, the embedding pipeline is broken.

Local vs. API embeddings

As of March 2026, the best zero-cost option for local embeddings is nomic-embed-text via Ollama. It produces 768-dimensional vectors, handles technical content well, and is fast enough on any server with 512MB free RAM. At zero per-call cost, it is the right default for any personal OpenClaw deployment.

API embedding options (Jina, OpenAI) produce higher-dimensional vectors and sometimes higher recall precision, but they cost money on every memory operation. For a setup with autoCapture enabled, every conversation turn triggers an embedding call. At moderate usage, that adds up to $5-15 per month just for embeddings. Switching to nomic-embed-text eliminates that cost entirely with minimal quality impact for most use cases.

Check whether nomic-embed-text is available via my local Ollama instance. If it is, show me how to switch my memory plugin to use it as the embedding model. If nomic-embed-text is not installed, give me the command to pull it and then show me the config change.

Important: If you switch embedding models after memories are already stored, existing memories will not surface in recall. The stored vectors were generated by the old model and are not compatible with the new one. After switching embedding models, either re-embed existing memories or accept that older memories will not surface until they are re-stored. The full embedding model guide covers the migration path.

Layer 4: Scope configuration

Scope is the namespace for memory storage and retrieval. Every memory_store call stores into a scope. Every memory_recall call retrieves from a scope. If these do not match, the recall finds nothing even though the memory is in the database.

The recommended scope for a single-agent personal deployment is agent:main. It keeps memories tied to your primary agent session and prevents cross-contamination from other sessions, test runs, or subagent sessions.

Check my memory plugin scope configuration. What is the default scope? Are there any memories stored under different scopes from past sessions or test runs? Run memory_list with scope=”agent:main” and tell me how many memories are in that scope. Then check if there are memories in any other scopes.

Passing scope on every call

Even with a default scope configured, passing scope explicitly on every memory tool call is the safest approach. It eliminates any ambiguity about which scope a memory goes into or comes from, and it protects against default scope changes during plugin updates.

Store a test memory: memory_store(text=”Test entry for scope verification”, scope=”agent:main”). Then immediately recall it: memory_recall(query=”scope verification test”, scope=”agent:main”). If recall returns the entry, the scope pipeline is working. If it returns empty, tell me what the last step was that failed.

Layer 5: Retrieval configuration

Even with all four previous layers working correctly, poor retrieval configuration produces results that are technically accurate but practically useless: memories that are in the database but not surfacing when relevant, or memories surfacing in the wrong order.

Hybrid retrieval

The memory-lancedb-pro plugin supports hybrid retrieval, which combines vector similarity search with BM25 keyword matching. Pure vector search performs poorly on technical content with specific terms (model names, config field names, error messages) because these terms may be semantically distant from the query even when they are an exact match. Hybrid retrieval catches exact matches that vector search misses.

Check my memory retrieval configuration. Is hybrid retrieval enabled? What is the vector-to-keyword weight ratio? The recommended setting is 0.7 vector / 0.3 BM25. If hybrid retrieval is not enabled, show me how to enable it.

Result count and reranking

The default result count for memory recall is typically 5. For a setup with months of stored memories on a specific topic, 5 results may not surface the most relevant entry. Increasing the candidate count and using a reranker to select the best results from a larger pool improves precision significantly.

Check my memory recall configuration. What is the default result limit? Is a reranker configured? If the result limit is below 10 and no reranker is configured, recommend the change that would most improve recall precision for my setup.

autoCapture and autoRecall

These two settings are the most misunderstood part of OpenClaw memory. They sound like they should work together automatically. They actually conflict with each other in specific plugin versions, and most documentation does not explain why.

autoCapture runs the extraction pipeline after each conversation turn. It reads what was said, extracts facts worth storing, and writes them to memory. It operates on the outgoing message (what the agent just said).

autoRecall injects relevant memories at the start of each conversation turn before the agent responds. It queries memory based on the incoming message and prepends the results as context.

The conflict: when autoCapture runs, it injects its own prefix into the message it processes. If autoRecall also injected memories into that same message, the extraction LLM sees both the user message and the injected memory block and sometimes treats the injected block as new content to capture, creating noise and circular captures.

Check my memory plugin version and settings. Are both autoCapture and autoRecall enabled? If so, verify that the plugin version includes the stripAutoCaptureInjectedPrefix fix. If it does not, tell me which setting to disable to prevent circular capture behavior.

The fix: In memory-lancedb-pro version 1.1.0-beta.9 and later, the stripAutoCaptureInjectedPrefix function at line 782 strips injected autoRecall blocks before the capture pipeline processes the message. If you are on an earlier version, either upgrade or disable autoRecall and run manual recalls instead.

Verifying the full pipeline end to end

After configuring all five layers, run a full pipeline verification. This takes about 3 minutes and confirms that every layer is actually working, not just configured. The distinction matters because each layer can be “configured” in the sense that the config file has a value, while still failing at runtime due to an unreachable model, a wrong path, or a version-specific bug. The verification checks actual runtime behavior, not config file contents.

Run a full memory pipeline verification: (1) Store a test entry: memory_store(text=”Pipeline verification test: stored at [current timestamp]”, scope=”agent:main”). (2) Immediately recall it: memory_recall(query=”pipeline verification”, scope=”agent:main”). (3) If it surfaces, confirm the stored text matches what was written. (4) List memory stats to confirm the total count increased. (5) Report pass or fail for each step with the actual output.

Common memory failure patterns

Pattern: store succeeds, recall returns empty

Almost always a scope mismatch or an embedding model problem. Check scope first: confirm the scope on the store call matches the scope on the recall call. If scopes match, test the embedding model directly: ask your agent to embed a short phrase and confirm the embedding model returns a non-empty vector. If the embedding model is unreachable, fix that before anything else.

I stored a memory but recall returns empty. Diagnose: (1) What scope was used on the store? What scope am I using on the recall? (2) Is the embedding model running and responding? Test by trying to store and recall a very short phrase: “test123”. (3) Are there any errors in the memory plugin logs? Report what you find.

Pattern: autoCapture is on but nothing is being captured automatically

The LLM extraction timeout is firing before the model finishes. Check the timeoutMs setting. If it is 30000 (30 seconds) or unset, that is the cause. Increase it to 90000. Then have a short conversation and check whether new memories appear in memory_list. If timeoutMs is already high and nothing is being captured, check whether the extraction LLM is running and responding within the extended timeout.

autoCapture is enabled but my agent is not storing any memories automatically. Check: (1) What is my timeoutMs setting for LLM extraction? (2) What model is doing the extraction and how long does it typically take to respond? (3) Run memory_list after a few messages and tell me whether the count is changing.

Pattern: recall surfaces memories from other sessions or old context

Scope bleed. Memories from other sessions, test runs, or subagent sessions are stored under different scopes and contaminating recalls. Run memory_list without a scope filter to see all memories across all scopes. If you see memories from sessions that should not be in your main scope, clean them up with targeted memory_forget calls, then verify each scope has only the memories you intended.

Run memory_stats to see how many memories are stored across all scopes. Then run memory_list for each scope that has entries. Tell me: are there any memories in scopes I did not intentionally create? Are there test entries, duplicate entries, or entries from previous sessions I no longer want?

Memory hygiene

A memory store with hundreds of entries from months of sessions accumulates noise: outdated facts, superseded decisions, test entries that were never cleaned up, and duplicate entries that create conflicting recall results. Periodic hygiene keeps recall quality high.

Run a memory audit. List all memories in scope agent:main. Identify: (1) entries that are clearly outdated (references past decisions that have since been reversed), (2) duplicate entries covering the same fact, (3) test entries that were never deleted, (4) entries with low confidence or high ambiguity. Flag each category and ask me which ones to delete before proceeding.

Updating a memory entry

The memory_update tool does not accept a scope parameter on some plugin versions. Attempting to update a memory by ID fails with “outside accessible scopes.” The workaround: memory_forget the old entry (passing scope explicitly), then memory_store the updated version. This is not elegant but it is reliable.

I need to update an existing memory. Check whether memory_update works with my current plugin version by attempting to update a test entry by ID with scope=”agent:main”. If it fails with a scope error, use the workaround: forget the old entry, then store the updated version. Show me the commands for whichever path works.

Designing what to remember

Not every fact that passes through a conversation deserves permanent storage. Storing too much creates noise that degrades recall precision. Storing too little means the agent re-learns the same things every session. The decision of what to capture is as important as the technical pipeline that captures it.

What is worth storing

Facts that change rarely and are relevant across many future sessions are the highest-value memory candidates. Examples: your preferred model for different task types, your workspace structure decisions, recurring preferences about output format, names and identifiers for things you reference frequently (project names, server addresses, key contacts), decisions made about persistent infrastructure, and constraints or rules the agent should always respect.

Facts that are session-specific or short-lived are not worth storing. Examples: the specific output of a one-time task, intermediate steps in a completed process, or anything that will be outdated within a week. Storing these adds noise to future recalls without adding value.

Review my current stored memories and classify each one: (1) high-value persistent facts that should stay, (2) session-specific entries that can be deleted, (3) outdated entries where the fact has since changed. Give me a count in each category and flag the ones most likely to cause bad recall if they stay.

Explicit stores vs. autoCapture

autoCapture stores what the extraction LLM decides is worth storing. This captures things you did not think to save, which is valuable. It also stores things that are not worth storing, which creates noise. For high-signal memories (explicit preferences, decisions, infrastructure facts), use explicit memory_store calls. For background capture of incidental facts, use autoCapture. The two approaches complement each other.

I want to store a set of high-value facts explicitly. Store each of the following as a separate memory with category and importance set appropriately: my preferred model for different task types, my workspace structure, any recurring preferences I have expressed in this session. Use scope=”agent:main” for all of them.

Memory in multi-project setups

If you work on multiple distinct projects with your OpenClaw agent, you have two options for memory organization: use a single scope for everything and rely on the retrieval system to surface the right memories, or use per-project scopes and recall explicitly from the relevant scope when switching projects.

Single scope is simpler to manage but can produce cross-contamination on recall. A query about one project may surface memories from another if they share vocabulary. Per-project scopes require explicit scope management on every call but give complete separation.

I work on multiple projects. Show me how to set up per-project memory scopes. For each active project I describe, create a scope name and show me the memory_store and memory_recall commands with the correct scope. Also show me how to query across all scopes at once when I need to find something I am not sure which project it belongs to.

What memory cannot do

Understanding the limits of OpenClaw memory prevents overbuilding the pipeline and setting wrong expectations.

Memory is not a conversation log. It stores extracted facts, not raw conversation history. If you need the exact words from a past conversation, memory recall will not help. Use session history (LCM or session archives) for that.

Memory does not replace context. For a task that depends on recent decisions in the current session, the agent’s active context window is more reliable than memory recall. Memory is for facts that need to survive across sessions, not for keeping track of what happened 10 messages ago.

Memory recall is probabilistic, not exact. A fact stored as “the server IP is 46.62.191.46” may not surface on a query for “what is my VPS address” if the embedding distance between those phrases is not close enough. For high-stakes facts that must always be available, store them in a reference file that is loaded into context directly rather than relying on recall.

Review my current memory setup and identify any facts I am relying on memory recall for that would be better stored in a reference file loaded directly into context. These are facts that (1) are always relevant, (2) must be accurate every time, and (3) are short enough to fit in context without a large cost impact. Recommend which ones to move.

Memory after a server migration

When you move OpenClaw to a new server, the memory database needs to move with it. LanceDB stores data as a directory of files at the configured dbPath. Copying that directory to the same path on the new server restores all memories exactly as they were.

What fails silently after migration: if the embedding model on the new server is different from the one used to generate the stored vectors, recall will return poor results or nothing. The vectors in the database are tied to the model that generated them. Same model, same path, same results. Different model requires re-embedding.

I am migrating to a new server. What do I need to transfer to preserve my memories? Tell me: the database path, the embedding model currently in use, any other memory-related config. Then give me a migration checklist that ensures memory works identically on the new server.

Does the quality of the extraction LLM matter for what gets remembered?

Yes, noticeably. A weaker extraction model (llama3.1:8b, for example) tends to capture surface-level facts and miss nuanced preferences or implicit decisions. A stronger model (phi4:latest, deepseek-chat) captures more of the context around a fact, which improves how well it surfaces on future recalls. The tradeoff is extraction time and cost. For most personal deployments, phi4:latest via Ollama is the right balance: strong enough to capture nuance, local and free, and fast enough with a 90-second timeout to handle complex conversation turns.

Monthly memory review routine

Memory quality degrades over time without maintenance. A 15-minute monthly review keeps the store accurate, prevents recall noise, and ensures the pipeline is still functioning correctly after any plugin updates or config changes.

Run the monthly memory review: (1) Run memory_stats to see total count and scope breakdown. (2) Run memory_list with limit=50 and look for: outdated facts, test entries, duplicate entries, and entries that conflict with current state. (3) Store a test entry and immediately recall it to verify the pipeline is still working. (4) Check the timeoutMs setting is still 90000 (plugin updates sometimes revert it). Report findings and ask me which entries to delete.

Set this up as a monthly cron job using a local model so it costs nothing to run. The output lands in your Telegram or Discord as a brief summary with a list of candidates for deletion. You confirm which to remove, or ignore the report if everything looks clean.

Create a monthly cron job that runs on the first of each month using ollama/phi4:latest. It should: check memory_stats, store and recall a test entry to verify the pipeline, and send me a Telegram message with the total memory count, any scope anomalies, and whether the pipeline test passed or failed.

Setup sequence summary

The right order prevents spending time debugging a layer that looks broken when the real problem is a layer below it. Plugin conflicts produce the same symptoms as timeout failures. Scope mismatches look identical to embedding model failures. Working through the layers in sequence from bottom to top eliminates false diagnoses.

Disable conflicting memory plugins (keep only one active)
Configure extraction LLM and set timeoutMs to 90000
Configure embedding model (nomic-embed-text if local Ollama is available)
Set default scope to agent:main
Run pipeline verification test (store, recall, confirm match)
Enable autoCapture and autoRecall after verification passes
Configure hybrid retrieval weights
Set up monthly review cron

Frequently asked questions

Is memory built into OpenClaw or does it require a plugin?

Memory requires a plugin. OpenClaw core ships with memory tool stubs (the tool names appear in the tool list) but without a memory plugin installed and configured, those tools do nothing. The most common symptom is that memory_store returns success and memory_recall returns empty results: the tool call succeeded but nothing was actually stored because there is no backend. Install and configure a memory plugin before assuming memory is operational.

How many memories can I store before performance degrades?

With nomic-embed-text and a LanceDB backend, performance stays acceptable up to approximately 10,000-50,000 entries on a typical VPS. Beyond that, query latency increases noticeably (from under 100ms to several seconds). Most personal deployments never reach this limit with normal use. If you run autoCapture on every message across many active sessions, you can accumulate entries faster. The practical recommendation: run a quarterly memory audit to prune outdated entries and keep the total count under 5,000 for responsive recall.

Do memories persist across sessions automatically?

Yes, if the plugin and LanceDB backend are configured correctly. Memories are stored to disk in the database file at the path you configured (typically /home/node/.openclaw/memory.db). They persist across session restarts, gateway restarts, and server reboots as long as the database file is not deleted. autoRecall (if enabled) injects relevant memories at the start of each new session automatically. If memories are not persisting across sessions, the most common cause is that the database path is set to a temporary directory that gets cleared on restart.

How do I back up my memories before a migration?

The LanceDB database is a directory of files at your configured dbPath. The simplest backup is a copy of that entire directory. For a more portable export, ask your agent to list all memories with memory_list, export them to a JSON or markdown file, and commit that file to your workspace git repository. The full memory backup guide covers both approaches with agent commands.

What does memory actually cost per month to run?

With local embeddings (nomic-embed-text) and a local extraction model (phi4:latest via Ollama), memory operations cost nothing in API fees. The cost is entirely in server RAM. If you are on a VPS with 8GB RAM, running nomic-embed-text and phi4:latest alongside OpenClaw requires careful memory management: phi4:latest (14B quantized) uses approximately 8-10GB when loaded. On a server with exactly 8GB, this will not fit alongside a running OpenClaw instance plus operating system overhead. The only cost is server resources: nomic-embed-text uses approximately 274MB RAM and phi4:latest uses 8-10GB. If your server has less than 10GB RAM available for Ollama, use nomic-embed-text locally for embeddings and route extraction to a cheap API model like deepseek/deepseek-chat (approximately $0.14 per million input tokens). With moderate autoCapture usage, that runs under $1 per month.

My agent has no memory when I start a new session. Why?

Either autoRecall is disabled, the plugin is not loaded at session start, or the recall at session start is querying the wrong scope. A fourth possibility: the gateway restarted between sessions and the Ollama embedding service was not running yet when the first autoRecall fired. Ollama with OLLAMA_KEEP_ALIVE=-1 stays loaded permanently and prevents this. Without that setting, models unload after a period of inactivity and take 5-20 seconds to reload on the next request. If autoRecall fires before the model is ready, it times out silently. Check three things: (1) is autoRecall set to true in the plugin config, (2) is the plugin listed in the startup sequence in your SOUL.md or AGENTS.md, and (3) when you run memory_recall manually in a new session with scope=”agent:main”, do results appear? If manual recall works but autoRecall does not, the plugin’s autoRecall feature is disabled or broken in your version.

Complete fix

Ultra Memory Claw

The complete memory setup guide. Every layer of the stack, in order. Drop it into your agent and it installs the right plugin, sets the config, patches the extraction timeout, and verifies the full pipeline is running. Your agent remembers from that point on.

Get it for $37 →

Keep Reading:

Ultra Memory ClawautoCapture and autoRecall are both enabled but my agent still forgets everythingWhy they conflict, which plugin version fixed it, and how to verify the fix is actually in effect.Ultra Memory ClawHow to choose the right embedding model for OpenClaw memorynomic-embed-text vs. Jina vs. OpenAI. Tradeoffs in quality, cost, and what breaks when you switch.Ultra Memory ClawMy OpenClaw agent keeps mixing up memories from different sessionsScope bleed explained. How to find contaminated scopes, clean them, and prevent it from happening again.