Memory databases accumulate noise. After months of active use, your OpenClaw instance may hold thousands of memories, many of them outdated, duplicated, or irrelevant. Deleting them one by one is impractical. But bulk deletion carries its own risk: delete too broadly and you wipe context that shapes how your agent behaves. This article covers how to audit what you have, build a targeted deletion list, execute it safely with verification checkpoints, and prevent the same noise from accumulating again. The whole process runs through your agent: no direct database access, no scripts, no terminal required.
TL;DR
- Audit first: Get a full breakdown by category and importance score before deleting anything.
- Delete by filter: Use category, importance threshold, or keyword to target deletions precisely rather than deleting by ID.
- Backup before bulk delete: A database copy takes under a minute and is your only recovery path if you delete something you needed.
Why memories accumulate and what goes wrong
OpenClaw memory with autoCapture enabled extracts memories from every conversation. Most extractions are useful. Some are not. The common categories of noise that build up over time:
- Duplicates. The same fact stored multiple times with slightly different phrasing. “User prefers direct answers” and “Ghost prefers short responses without filler” are the same preference stored twice. Duplicates dilute recall by returning the same signal multiple times, which can push genuinely different memories out of the top results.
- Stale facts. Old server IPs, old API keys, old project names, old role titles. These were accurate when stored and are now wrong. Stale facts cause the agent to confidently apply outdated context.
- Low-signal extractions. The extraction model pulls out facts that are technically true but not useful. “User asked about the weather today” is not a memory worth keeping. These have low importance scores but still consume space and appear in recall results.
- Conflicting memories. An update to a preference or decision stored as a new memory rather than replacing the old one. The old version is still in the database and recall returns both, leaving the agent to guess which one is current.
None of these problems cause hard failures. They degrade recall quality gradually, the way noise degrades a signal. The effect is subtle: the agent gives responses that feel slightly off, applies outdated preferences, or returns too-generic recall results. A memory audit and cleanup fixes this.
Step 1: Back up before you delete anything
Bulk deletion is irreversible. There is no undo. Back up the database before you start.
Stop the OpenClaw gateway, copy my memory database directory to /tmp/memory-backup-pre-cleanup-$(date +%Y%m%d), then restart the gateway. Confirm the backup size and that it matches the original directory size.
If you delete something you needed and want to recover it, the backup is your only path. The full restore procedure is covered in the memory backup article. For a targeted recovery of one or two memories, you can start the old database in parallel on a different port, recall the specific memories, and re-store them in the live database.
Step 2: Audit your openclaw memory cleanup targets
Run a full audit before deciding what to delete. You need to know what you are working with.
Run a full memory audit. Tell me: (1) total memory count, (2) breakdown by category, (3) breakdown by importance score range (0.0-0.3, 0.3-0.6, 0.6-0.8, 0.8-1.0), (4) the 10 oldest memories with their text and dates, (5) any memories that appear to be near-duplicates of each other. List everything.
The importance score distribution tells you a lot. If 60% of your memories are below 0.4, the extraction model is being too liberal and you have a significant noise problem. If almost everything is above 0.7, either the extraction is well-tuned or the model is not distinguishing between important and unimportant facts.
The oldest memories audit tells you whether stale facts are a problem. Memories from 6 months ago about a project you finished, a role you left, or a server you decommissioned are candidates for deletion.
Step 3: Identify deletion targets by category
Work through deletion targets category by category. Do not try to identify everything in one pass. Each category has its own deletion logic.
Low-importance memories
List all memories with an importance score below 0.3. For each one, show the text, category, importance score, and approximate age. Do not delete anything yet. I want to review them first.
Review this list carefully before acting. Low importance scores do not always mean the memory is worthless. The extraction model assigns importance based on how significant the fact seemed in context. Some genuinely important facts get low scores because they were mentioned casually rather than emphasized.
After reviewing, decide on a threshold. Common thresholds:
- Delete everything below 0.2: Safe for most setups. Only the lowest-signal extractions. Low risk of losing something important.
- Delete everything below 0.3: Moderate cleanup. May catch some edge-case facts that turn out to matter. Review the list before committing.
- Delete everything below 0.4: Aggressive cleanup. Likely removes some useful context. Only do this if recall quality is seriously degraded and you want a clean slate on the lower tier.
Stale facts
Search my memories for anything that looks like infrastructure details: server IPs, hostnames, port numbers, API endpoints, deployment paths. List each one with its text, ID, and approximate age. Flag any that look outdated based on what you know about my current setup.
Search my memories for API keys, tokens, credentials, or secrets. List each one. I need to review whether any of these should still be in memory or whether they should be removed.
Credentials in memory are a specific concern. The memory database is not encrypted. Anyone with access to the database file can read the stored text. If your memory database contains active API keys, rotate them and delete the memories. Storing credentials in memory is convenient but creates unnecessary exposure.
Duplicates and near-duplicates
Search my memories for the topic “user preferences and working style.” List every memory returned with its text and ID. Identify any that are duplicates or near-duplicates of each other. For each duplicate set, recommend which one to keep (the most specific or most recent) and which to delete.
Run this same search for your most common topic areas: project context, infrastructure details, communication preferences, tool configurations. Duplicates cluster around topics you discuss frequently because the extraction model captures the same fact every time it comes up in conversation.
Step 4: Execute deletions safely
Delete in batches by category, not all at once. This gives you checkpoints where you can stop and verify the results before proceeding.
Deleting by importance threshold
Delete all memories with an importance score below 0.25. Before deleting, show me the count of memories that will be affected. After I confirm, delete them and report the new total memory count.
The confirmation step before deletion is important. A count of 12 when you expected 200 means the threshold query is not working as expected. A count of 800 when you expected 50 means the threshold is set too broadly. Check the count before confirming.
Deleting specific memories by ID
Delete the memories with these IDs: [list IDs here]. Confirm each deletion and report the final count.
ID-based deletion is the most precise method. Use it for targeted removals: specific stale facts, specific duplicates, specific credentials. When you know exactly which memories to remove, ID-based deletion carries the least risk.
Deleting by keyword or topic
Search my memories for anything containing [old project name / old server name / old role title]. List all matches with their IDs and text. After I review, I will confirm which ones to delete.
Always list before deleting when using keyword-based targeting. A keyword may appear in memories you want to keep as well as the ones you want to remove. Review the list and identify the specific IDs to delete rather than deleting everything that matches the keyword.
Step 5: Verify after deletion
After each batch deletion, run a recall check to confirm the right things were removed and the right things were kept.
Run memory stats and give me the current total count and category breakdown. Then recall three things: (1) my most important working preferences, (2) the current primary model I use, (3) the name of the active project I am working on. Confirm each returned result looks correct.
The three-point recall check after each batch confirms the deletions did not wipe anything critical. If a recall returns empty or wrong results, stop the deletion process, check the backup, and restore any memories that were removed incorrectly before continuing.
Step 6: Rebuild the index after large deletions
LanceDB maintains index structures that make search fast. After deleting a large percentage of memories (more than 30% of the total), the index can become fragmented. Fragmented indexes do not cause errors but they make recall slower and may reduce result quality.
After bulk deletion, check whether the memory plugin supports an index rebuild or compaction operation. If it does, run it now. If not, report what the current index state looks like and whether recall performance appears to have changed.
For the memory-lancedb-pro plugin, index compaction runs automatically on a schedule. A manual trigger may be available depending on your version. If not, the index will compact on its own over the next few sessions as the plugin runs its background maintenance cycle.
Preventing noise from building up again
Bulk deletion is a one-time fix. Preventing the same buildup from recurring requires tuning the extraction settings so the signal-to-noise ratio stays acceptable without manual cleanup.
Raise the extraction importance threshold
Read my openclaw.json. What is the current minimum importance threshold for memory storage in my memory plugin config? If there is no minimum threshold set, what is the default? Show me how to set a minimum importance threshold of 0.35 so that low-signal extractions are discarded before storage.
Tune extraction categories
If most of your noise falls into specific categories (low-value “other” extractions, transient event memories that are not useful after the fact), configure the plugin to ignore those categories or reduce their storage priority.
Read my memory plugin config. Are there any settings that control which categories of memory are stored automatically? I want to stop storing memories in the “other” category unless the importance score is above 0.5. Show me the config change needed.
Schedule periodic audits
Set up a monthly cron job that runs on the first of each month and does the following: (1) Reports my total memory count and category breakdown to me via Telegram. (2) Lists the 10 oldest memories so I can review for staleness. (3) Lists any memories with importance below 0.3 that have not been accessed in over 30 days. Do not delete anything. Just send me the report so I can review and decide. Show me the cron configuration before creating it.
A monthly report rather than automated deletion keeps you in the loop. Automated deletion is tempting but risks removing context that looks low-value by metric but is genuinely important. Manual review after a report is slower but safer, especially for preference and decision memories where the importance score may not reflect how much the agent actually uses that context.
What to do if you deleted something you needed
If you deleted a memory and need it back, you have two options depending on whether you made a backup.
Recovery from backup
If you took the backup in Step 1, restore specific memories from it rather than restoring the whole database. Stop the gateway, temporarily rename your current database directory, rename the backup directory to the database path, start the gateway, recall the specific memory you need, stop the gateway again, restore the current database, restart, and re-store the recovered memory. This is a multi-step process but it is cleaner than a full database rollback, which would discard any new memories stored after the cleanup.
Recovery without backup
If you did not make a backup, check whether the memory appears in any conversation history, daily memory markdown files in your workspace, or git commit history for your workspace. Search your workspace git log for the memory text:
Search my workspace git history for any mention of [the text from the deleted memory]. Check the last 90 days of commits. Also search my daily memory markdown files in memory/ for the same text. Report what you find.
If the memory text appears in a git commit or markdown file, re-store it manually. If it does not appear anywhere, it is gone. This is why the backup step exists.
Common mistakes during bulk deletion
Deleting by category without reviewing the contents first
The “other” category sounds like it should be safe to bulk delete. It is not. Many important memories end up in “other” because the extraction model could not confidently assign a more specific category. Always list the contents of a category before deleting everything in it.
Using a search query as the deletion target directly
Asking the agent to “delete all memories about the old server” is less precise than asking it to list memories about the old server, reviewing the list, identifying specific IDs, and then deleting by ID. Natural language queries for deletion targets can match memories you did not intend to remove. List first, delete by ID.
Not verifying after each batch
Running a single recall check after completing all deletions misses problems that appeared mid-way through the process. Verify after each batch. A recall check takes 30 seconds and catches accidental over-deletion before the next batch compounds the damage.
Deleting preference and decision memories too aggressively
Preference and decision memories are the most valuable category. They are also the most likely to appear as duplicates because preferences come up repeatedly in conversation. When you find preference duplicates, keep the most specific and most recent version and delete the others. Do not delete all copies of a preference because they appear redundant. Having the preference stored at all is what matters.
Understanding importance scores before you act
Importance scores are assigned by the extraction LLM at the time a memory is created. They range from 0 to 1. The score reflects how significant the extraction model judged the fact to be at the time of extraction, based on context. A few things to understand about them before using them as a deletion criterion for openclaw delete memories safely:
- They are not updated after creation. A memory stored at importance 0.3 six months ago may now be one of the most important facts in your database because circumstances changed. The score reflects the extraction context, not the current relevance.
- They reflect emphasis, not value. Facts mentioned casually get lower scores than facts stated emphatically. “By the way, my API rate limit is 1000 per day” may get a 0.2 score even though it is operationally critical.
- They vary by extraction model. A setup using llama3.1:8b for extraction scores differently from one using deepseek-chat. If you switched extraction models partway through your setup, older memories may have systematically different score distributions than newer ones.
The practical implication: use importance scores as a first-pass filter to identify candidates for review, not as an automatic deletion trigger. Always look at what is actually in the low-importance set before deleting.
Look at my memories with importance scores between 0.2 and 0.35. List the 20 most recently stored ones with their text and category. I want to assess whether the low scores reflect genuinely low-value content or whether my extraction model is systematically underscoring certain fact types.
Category-by-category deletion guide
Different memory categories have different deletion risk profiles. Here is how to approach each one:
Preference memories
The highest value category. Deletions here directly affect agent behavior. When cleaning up preference memories, only delete duplicates, never the last copy of a preference. If you find five near-identical “Ghost prefers direct answers” memories, keep the most specific and most recently stored one. Delete the others. Running recall after every preference deletion is especially important here.
Fact memories
High value, but stale facts are common. Infrastructure facts (IP addresses, API endpoints, server names) go stale when environments change. Project facts go stale when projects complete. Role facts go stale when employment changes. Fact memories are safe to delete when you can confirm they no longer apply. When in doubt, check whether the fact is still true before deleting.
Decision memories
High value, often sparse. Decisions represent choices the agent should continue to respect. “We decided to use DeepSeek as the primary model” shapes behavior every session. Delete decision memories only when the decision has been reversed and a new decision memory has been stored in its place. Never delete a decision memory without replacing it.
Entity memories
Variable value. Entity memories capture information about people, projects, tools, and systems. Old entity memories about systems you no longer use, people you no longer work with, or projects that completed are safe to delete. Current entity memories (your infrastructure, your active projects, your collaborators) should be kept and kept accurate.
Reflection memories
Low deletion priority. Reflection memories capture observations about patterns and behavior. They are useful for long-term continuity even when they feel generic. “The agent tends to over-explain when the user is frustrated” is a reflection that may be dormant for weeks but valuable when the situation arises. Delete reflections only when they are clearly wrong or superseded by a better reflection.
Other memories
Highest noise proportion. The “other” category catches everything the extraction model could not confidently classify. It has the most duplicates and the lowest average value. A conservative starting point: delete all “other” memories below 0.4 importance after reviewing a sample of 20 to 30 to confirm the pattern. Keep any “other” memories above 0.6 regardless of category mismatch, since the high importance score suggests the extraction model found them significant even if it could not classify them.
Systematic deduplication
Duplicates are the most common form of memory bloat on active autoCapture setups. They are also the safest to delete: removing one copy of a fact you have stored three times does not lose information, it just consolidates it.
The challenge is identifying duplicates reliably. Near-duplicates are not exact matches; they are semantically equivalent facts stored with different phrasing. The agent can identify them better than a text search can:
List all my memories in the “preference” category. Group them by semantic topic: working style, communication preferences, technical preferences, tool preferences, schedule/timing. Within each group, identify any memories that express the same underlying preference. For each duplicate set, recommend the one to keep (most specific wording, most recent if otherwise equal) and list the IDs to delete. Format as a table: Keep [ID] | Delete [IDs] | Topic.
Run this same analysis for “fact” and “decision” categories. Preferences and facts are where duplicates cluster most densely. The table output from the agent makes it easy to copy the delete IDs directly into the deletion command without re-reading each one.
After the agent provides the deduplication table, validate a random sample before executing:
Show me the full text of the memories you recommended keeping for the first 5 duplicate sets in the table. I want to confirm they contain the most complete information before I delete the others.
Once you have confirmed a sample looks right, delete the identified duplicates by ID. The delete-by-ID approach ensures you are removing exactly what you reviewed, not anything else.
Large-scale cleanup: when to consider a full reset
For databases with more than 10,000 memories, selective cleanup may be less practical than a structured reset: export the memories worth keeping, wipe the database, and re-import the curated set. This is the right approach when:
- More than 50% of memories are low-quality or stale and selective deletion would take hours
- The extraction model was misconfigured for an extended period and produced a large volume of unusable memories
- A scope misconfiguration stored memories under the wrong scope and recall is mixing contexts from different projects
- The database has performance problems and compaction alone has not resolved them
I am considering a full memory reset. Before we proceed: (1) Export all memories with importance above 0.6 to /tmp/memories-high-value.json. (2) Export all preference and decision memories regardless of importance to /tmp/memories-preferences-decisions.json. (3) Report the total count in each export file and estimate how long a full re-import would take. Do not delete anything yet.
The two-file export captures the highest-value memories and the behaviorally critical memories separately. After a full reset, import the preferences and decisions first (they shape agent behavior most directly), then the high-value facts, then stop. Do not re-import everything. The point of the reset is to start with a curated set, not to reproduce the bloated database with an extra step.
Cleaning up memories stored under the wrong scope
Scopes separate memories by context. A common problem after config changes: memories that were stored under the wrong scope because the plugin defaulted to a different scope than expected, or because a scope misconfiguration ran for an extended period before being caught.
What scopes do I have memories stored under? List every scope and the count of memories in each. Are there any memories stored under a scope that looks incorrect for my setup (e.g., “default” instead of “agent:main”, or a test scope from an old session)?
Memories stored under the wrong scope are not retrieved by recall queries that use the correct scope. They sit in the database taking up space and never appearing in results. They are safe to delete if you confirm they are not the only copies of important facts.
To move memories from a wrong scope to the right one rather than deleting them:
List all memories stored under the scope [wrong scope name]. For each memory, show the text, category, and importance. After I review, I want to re-store the important ones under the correct scope (agent:main) and then delete the originals from the wrong scope.
This is a manual process but it is the right approach for scope migration. Automated scope reassignment is not supported in current OpenClaw memory plugin versions. For a large number of misscoped memories, script the re-store in batches to avoid hitting rate limits on the extraction model.
Measuring the impact of cleanup
After a bulk cleanup, recall quality should improve. The metric is not just memory count: it is whether the right memories surface for the right queries. Run a before-and-after comparison using a fixed set of recall queries.
Before the cleanup, run these queries and note what they return:
Run recall on these four queries and show me the top 3 results for each with their text and importance score: (1) “how I prefer to work and communicate”, (2) “current infrastructure and server setup”, (3) “active projects and what is in progress”, (4) “decisions made about tools and models”. Save these results so I can compare them after cleanup.
After the cleanup, run the same four queries. Compare the results:
- More relevant top results: Cleanup improved recall. The noise that was diluting the signal has been removed.
- Similar results, higher average importance: The same memories are surfacing but the low-quality duplicates are gone. Cleanup worked as intended.
- Missing results for a query that used to return results: A deletion removed a load-bearing memory. Check which batch removed it and restore from backup if needed.
- No change: The deleted memories were not being recalled in practice. The cleanup was cosmetic rather than functional, but it still reduces database size and keeps future recalls faster.
Why not to automate bulk deletion
The most common question after reading this article is: can I automate this and just run a cleanup cron job that deletes low-importance memories automatically every month?
The short answer is: possible, but not recommended without careful safeguards.
The problem with automated bulk deletion is that the importance scores used as deletion criteria are static snapshots from the extraction model at creation time. They do not reflect current relevance. An automated job that deletes all memories below 0.3 importance will, over time, delete some memories that were genuinely important but captured in contexts where they were stated casually rather than emphatically.
The safer automation is the monthly report cron described in the prevention section. Report, review manually, delete in the same session. The review takes 15 minutes. The cost of a bad automated deletion is hours of trying to reconstruct lost context, assuming you even notice it was deleted. The asymmetry argues for manual review.
If you do want to automate deletion, add at least two safeguards:
Set up an automated cleanup that runs monthly with these constraints: (1) Never delete preference or decision memories regardless of importance score. (2) Never delete any memory stored in the last 30 days regardless of importance score. (3) Only delete memories in the “other” category with importance below 0.2 that have not been accessed in 60 days. (4) Take a database backup before running the deletion. (5) Send a Telegram report with the count of memories deleted and a sample of 5 randomly selected deleted memories for spot-checking. Show me the complete cron and cleanup configuration before creating anything.
These constraints protect the highest-value memories (preferences, decisions) and the newest memories (which may still be developing in importance) while still removing the lowest-signal noise automatically. A memory database that has 300 high-quality memories is more useful than one with 5,000 mixed-quality memories, and the automated deletion job with the right constraints gets you there without requiring manual intervention every month. The spot-check report gives you visibility into what the automation is removing without requiring a full manual review every month.
Common questions
How do I know if a memory is safe to delete without reviewing it?
You do not. Every bulk deletion should start with a sample review. Even low-importance memories (score below 0.4) sometimes contain useful facts that were mis-scored at capture time. The import threshold filter is a starting point for finding candidates, not a guarantee of safety. Sample at least 10 memories from any batch before deleting.
What happens to my agent’s behavior after I delete a large batch of memories?
Recall results will be more precise but may miss some context the agent previously relied on. Monitor the first few sessions after a large cleanup for any unexpected gaps: questions the agent handles less well, context it seems to have lost, preferences it no longer applies. Most cleanups do not cause noticeable behavior changes, but large deletions of preference memories occasionally do.
I accidentally deleted memories I needed. Is there any recovery option?
If you exported before deletion, restore from the export file by re-storing each memory. If you did not export, check whether your server has a LanceDB snapshot or file system backup from before the deletion. If neither is available, the memories are gone. This is why the export step in this article is non-negotiable before any bulk deletion.
How often should I run a memory cleanup?
When memory count exceeds 500-800 or when recall quality starts declining. For most operators running autoCapture, a cleanup every two to three months keeps the database lean. If autoCapture is aggressive and writes many memories per session, monthly cleanups may be needed.
Is there a safe maximum memory count?
Not a hard limit, but recall quality tends to degrade noticeably past 1,000 memories in a single scope. At that point, either implement proper scoping to split the pool, or run a cleanup to reduce count. Memory databases can hold tens of thousands of entries, but recall precision at that scale depends heavily on scope structure and embedding model quality.
