My agent’s memories are getting stale. How to set expiry and refresh them.

Your agent stored a memory six months ago about which model you were using, which project was active, or what your preferences were. That memory is still there. The agent is still acting on it, and it is wrong. OpenClaw’s memory system does not have built-in expiry. Memories persist until you delete them. This article covers how to audit for stale memories, how to build a refresh workflow, and how to think about memory lifecycle so your agent stays accurate as your setup evolves.

TL;DR

OpenClaw has no built-in memory expiry. Stale memories accumulate silently and cause the agent to act on outdated context. The fix is periodic audits using memory_list filtered by category, targeted deletion of outdated entries, and a scheduled refresh cron that prompts the agent to verify and update key facts. This takes about 10 minutes to set up and runs automatically after that.

How the agent actually uses stale memories in practice

Understanding what staleness does requires understanding how recall works. When you send a message, the memory plugin runs a hybrid search against the stored database: a vector similarity search (semantic) plus a keyword search (BM25). The top results are injected into the context before the model sees your message. The model treats injected memories as authoritative facts about your setup and preferences.

This means a stale memory does not sit there passively. It actively shapes every response that triggers a recall. If you stored a memory saying your active project is “the Q1 launch” and it is now Q3, every conversation that touches project context will have the agent referencing Q1 work as if it is current. The model has no mechanism to doubt injected memories. It treats them the same way it treats everything else in its context.

The specific damage stale memories cause depends on the category and the importance score of the stale entry:

Stale preferences: The agent uses the wrong model, wrong output format, or wrong communication style. The behavior is off but not obviously wrong, so it persists for weeks before you notice.
Stale infrastructure facts: The agent tries to reference a path, IP, or configuration that no longer exists. Commands fail, lookups return nothing, and the agent attempts debugging based on a picture of the server that is months out of date.
Stale project context: The agent references completed projects as active, tries to continue work that is done, or applies context from a previous project to a new one with a similar name.
Stale entity information: The agent references the wrong version of a tool, applies a deprecated configuration pattern, or avoids a feature it believes is not yet available that has since shipped.

I want to understand what stale memories might be affecting my current sessions. Recall all memories related to my current project context, active configurations, and preferences. For each recalled memory, tell me when it was stored and whether it matches my current setup as you understand it from our recent conversations.

Why memories go stale and which categories are most at risk

Memory staleness is not a bug. It is a natural consequence of a system that stores information persistently without an expiry mechanism. The problem is that stale memories are indistinguishable from accurate ones from the agent’s perspective. When the agent recalls that you prefer DeepSeek Chat, it does not know whether that preference was recorded last week or eight months ago before you switched providers.

The categories most prone to going stale are the ones that change most often:

Preferences: Model choices, output format preferences, communication style, verbosity settings. These change when your use case changes.
Facts about your setup: Which server you are on, which plugins are enabled, which API keys are active. These change with migrations, updates, and provider switches.
Project context: Which project is active, what phase it is in, what the current priorities are. These change constantly.
Entity information: Information about tools, services, or people. Versions change, teams change, products pivot.

The categories least prone to going stale are facts with no time dimension: who you are, what your timezone is, constants about your workflow that have not changed in years. These rarely need refreshing.

List all my stored memories grouped by category. For each category, tell me how many memories are there and what the oldest entry in each category is. I want to understand the age distribution of my memory database.

Auditing for stale memories

The audit is the first step. Before setting up any refresh workflow, you need to know what is actually stale. Memory failures on VPS installs have four common causes and they require different fixes. Changing things without knowing which cause you have leads to config drift and makes the problem harder to diagnose. The same logic applies here: changing or deleting memories without knowing which ones are actually wrong makes the database less reliable, not more.

List my stored memories with category=preference. Show me all of them, including when each was stored. For each one, tell me whether the information in it is likely still accurate based on what you know about my current setup. Flag anything that looks outdated.

List my stored memories with category=fact. Show me all of them with their stored dates. Flag any facts about software versions, server configuration, API keys, or tool choices that may have changed since the memory was stored.

After running both, you will have a list of candidates. The goal is not to delete everything old, it is to delete or update anything that is both old and likely wrong. Age alone is not enough: a preference that has not changed in two years is fine. A fact about your current model default that was recorded before you switched providers needs to go.

What to look for in the audit

Contradictions: Two memories that say opposite things. The agent is acting on both, which means it is sometimes acting on the wrong one.
Version references: Any memory mentioning a specific software version. Versions change. If the memory says “using OpenClaw v1.2” and you are on v2.1, the memory is wrong.
Project status: Memories about what project is active or what phase it is in. These are almost always stale within a month.
Infrastructure facts: Server IPs, database paths, plugin versions. These change with migrations and updates.
Resolved issues: Memories about workarounds for bugs or limitations that have since been fixed. The agent may still be applying the workaround after the underlying issue is gone.

Identifying and resolving memory contradictions

A specific form of staleness is contradiction: two memories that say opposite things are both present in the database. The agent may surface either one depending on which search query hits more strongly. This produces inconsistent behavior that is difficult to diagnose because the agent is not wrong consistently, it is wrong unpredictably.

Search my memory database for any contradictions. Look for pairs of memories that make opposing claims about the same topic: model preferences, server configuration, project status, tool choices. List every contradiction you find with both memory texts and their stored dates.

When you find a contradiction, the resolution is straightforward: delete both and store one accurate replacement. Do not try to determine which one is correct from the memory text alone. The more recently stored one is usually more accurate, but not always. If you are not sure which is current, check the actual state of your system before deciding.

I found a contradiction between two memories about [topic]. Delete both. Then store a new memory with the accurate current state: [accurate description]. Category=[category], importance=[score].

Why contradictions are hard to catch without an audit

Recall queries retrieve the top matches, not all matches. If two contradictory memories both match a query, the one that scores higher gets injected. The other one may never surface in that query but will surface in a related one. The agent appears to have inconsistent beliefs because it does have inconsistent beliefs, and different queries trigger different ones. An audit that reads all memories in a category is the only reliable way to catch this.

Deleting stale memories

Once you have identified stale memories from the audit, deletion is straightforward. Memory_forget accepts a search query that targets the relevant entries.

Delete the memory about [specific outdated fact]. After deleting, confirm it is gone by searching for it again and showing me the result.

For a larger cleanup where multiple related memories are stale, target by category and theme rather than one at a time:

I want to clean up stale project context memories. List everything in category=other or category=fact that mentions project names or phases. For each one, tell me whether it reflects current reality. Then delete the ones that are outdated, one at a time, confirming deletion before moving to the next.

Do not bulk-delete an entire category

Deleting all memories in a category removes accurate ones along with stale ones. Work through the list and delete targeted entries. The time cost is small compared to the cost of re-establishing context that was actually correct.

Updating instead of deleting

Some memories are not wrong, they are just incomplete. The model preference memory might be correct but missing the reasoning behind it. The infrastructure fact might be mostly accurate but have one detail wrong. In these cases, delete and re-store rather than leaving a partial truth in place:

Delete the memory about my current model preference and store a new one: [accurate current preference with full context]. Use category=preference and importance=0.9.

Memory lifecycle planning by category

Different categories have different natural lifespans. Treating all memories as equally permanent is the root cause of most staleness problems. Explicitly thinking about how long each category stays valid helps you set appropriate importance scores and audit cadences.

Preferences (typical lifespan: 3 to 12 months)

Model preferences, output format preferences, verbosity, communication style. These change when your use case changes or when better options become available. Review quarterly. If you switch a major preference, update the memory immediately rather than waiting for the next audit. The cost of acting on a stale preference is high because it affects every interaction.

Facts about current setup (typical lifespan: 1 to 6 months)

Server configuration, plugin versions, active API keys, installed tools. These change with migrations and updates. Review after every significant infrastructure change. Store with a timestamp so the age is visible at a glance. The “as of [date]” pattern in the memory text is the cheapest form of expiry tracking available without building a full TTL system.

Project context (typical lifespan: days to weeks)

Active project name, current phase, immediate priorities. These change constantly. Do not rely on memory for active project context. Maintain a live context file (like a .context-checkpoint.md) that the agent reads at session start instead of trying to recall project state from memory. Memory is for persistent facts, not for active session state.

Entity information (typical lifespan: 6 to 24 months)

Facts about tools, services, frameworks. These change with major version releases. Store with a version number in the text: “As of OpenClaw v2.1, the compaction threshold setting is at agents.defaults.contextTokens.” When a major version ships, search for memories referencing the old version and update them.

Reflections and decisions (typical lifespan: indefinite)

Why you made a particular architectural choice, what you learned from a failed approach, what you decided not to do and why. These do not go stale in the same way. The reasoning was correct at the time even if the decision has since changed. Keep these but add an “archived” note if the context has passed: “Archived 2026-06: decision no longer active, project complete.”

Setting up an automatic refresh workflow

Manual audits work but require you to remember to do them. A cron job that runs a refresh prompt on a schedule is more reliable. The refresh does not need to be sophisticated. A simple prompt that asks the agent to verify a set of key facts and flag anything that does not match current reality is enough.

Weekly preference review

Set up a weekly cron job (Sundays at 9am my timezone) that runs this prompt: “Review my stored preference memories. List all of them. For each one, check whether it still reflects how I actually work. Flag any that look potentially outdated based on our recent conversations. Do not delete anything automatically. Just produce a report and send it to me via Telegram.”

The Sunday report lands in Telegram. You read it in 2 minutes. If something is flagged, you ask the agent to update it. The total time cost per week is under 5 minutes and you stay current.

Weekly preference review cron: exact setup

Set up a cron job with this schedule: every Sunday at 9am America/New_York. Use model ollama/phi4:latest. The prompt should be: “Run memory_list with category=preference and scope=agent:main. For each memory returned, evaluate: (1) Is this preference likely still active based on our recent conversations? (2) Is the importance score appropriate for how foundational this preference is? (3) Is there a timestamp in the memory text, and if so, is the information likely still current? Produce a report with: memories that look potentially stale (flagged with reason), memories that look accurate and current (confirmed), and any apparent contradictions. Send the report to Ghost via Telegram.” Set sessionTarget=isolated.

Do not auto-delete in the cron

The audit cron should report, not act. Automatic deletion based on a cron prompt risks removing memories that the agent incorrectly flags as stale. Keep the deletion step human-triggered. The cron saves you the effort of running the audit manually. You still make the deletion decisions.

Monthly infrastructure fact audit cron

Set up a cron job with this schedule: first day of each month at 8am America/New_York. Use model ollama/phi4:latest. The prompt should be: “Run memory_list with category=fact and scope=agent:main. Also run memory_list with category=entity. For each memory returned, check: (1) Does it reference a software version, server address, or configuration value? If so, is that version, address, or value still current? (2) Does it have an ‘as of [date]’ timestamp? If the timestamp is more than 3 months ago, flag it for review. (3) Does it reference a project, tool, or service that may no longer be active? Produce a report grouped by: needs immediate review, likely stale, looks current. Send to Ghost via Telegram.” Set sessionTarget=isolated.

Cron model selection

Memory audit cron jobs do not need a powerful model. They are reading a list and comparing it to recent conversation context, a low-complexity task. Use a local model like ollama/phi4:latest or ollama/llama3.1:8b for these jobs to keep the cost at zero. Reserve API models for the actual updates if any are needed.

Triggering a refresh on infrastructure changes

The weekly and monthly crons catch drift over time. But the biggest source of stale memories is a specific event: a migration, an update, or a major config change. These events make multiple memories stale at once. The pattern is to run a targeted refresh immediately after any significant change rather than waiting for the next scheduled audit.

I just migrated to a new VPS. Review all my stored memories about infrastructure: server IPs, data directory paths, installed software and versions, plugin configurations, and any facts about the server environment. Tell me which ones need to be updated based on what you know about the new setup. Then update them one at a time, confirming each change before moving to the next.

The same prompt works for any significant change. Replace “migrated to a new VPS” with “updated OpenClaw”, “switched primary model”, “changed API providers”, or whatever the actual event was. The prompt structure is the same: review the relevant category, flag what needs updating, update one at a time.

If you run this prompt within the same session where the change was made, the agent has both the old context (from earlier in the session) and the new state (from the change you just made) available. That is the best possible moment to do the update, because the comparison is easy and nothing has been lost yet to context compaction.

Using importance scores to prioritize which memories to refresh first

Importance scores (0 to 1) affect how memories are weighted in recall. High-importance memories surface more readily and have more influence on the agent’s responses. This means a stale high-importance memory causes more damage than a stale low-importance one. During an audit, prioritize reviewing high-importance memories first.

List my stored memories sorted by importance score, highest first. Show me the top 20. For each one, tell me the importance score, the stored date, and whether the content looks current or potentially stale.

The high-importance memories that are also old are your highest-priority refresh targets. A memory with importance 0.95 that was stored 9 months ago and describes your current project is almost certainly wrong and is actively misleading your agent on every relevant query.

After a refresh, reconsider the importance score. A memory that was high-importance when the project was active may be low-importance historical context now. Adjust:

Update the importance score for the memory about [topic] from 0.9 to 0.3. It is historical context now, not active guidance.

Preventing staleness at the point of storage

The best time to think about memory lifecycle is when the memory is being stored, not six months later when you are trying to figure out why the agent is behaving strangely. Two habits prevent most of the staleness problem.

Include a timestamp in time-sensitive memories

When storing a fact that you know will change, include the date and context. Not just “switched to DeepSeek Chat” but “as of March 2026, switched to DeepSeek Chat from Claude Sonnet because of cost. DeepSeek Chat handles most tasks at roughly 10x lower cost per token.” The additional context means the memory is useful even when it goes stale: you can see what you switched from, when, and why, which helps you decide whether to update or archive it when the situation changes again. Instead of “current primary model is DeepSeek Chat”, store “as of March 2026, primary model is DeepSeek Chat”. The timestamp tells you immediately how old the information is when you review it. It also gives the agent context about when the fact was current, which prevents it from treating historical facts as present truth.

Store a memory: “As of [today’s date], my primary model is [model name] and I am using [provider] as the main API. This reflects a switch from [previous setup].” Category=preference, importance=0.85.

Use lower importance for anything that changes frequently

Importance 0.9 is for stable, foundational facts. Things that change quarterly should be stored at 0.5 to 0.6. Things that change monthly should be stored at 0.3 to 0.4. This limits the damage when the memory goes stale, because lower-importance memories are weighted less in recall and have less influence on the agent’s responses. The maintenance cost of a low-importance stale memory is much smaller than a high-importance one.

A practical way to assign importance scores at storage time: ask yourself how much incorrect agent behavior this memory would cause if it were wrong. If the answer is “a lot, on every relevant task” , that is 0.8 or above. If the answer is “occasionally, on specific tasks” , that is 0.5 to 0.7. If the answer is “rarely, and it would be obvious when it happened” , that is 0.3 or below. Apply this test when storing new memories and you will naturally avoid the pattern of everything being 0.9 by default, which is the most common cause of high-impact staleness. Most memories should be in the 0.4 to 0.7 range. Reserve 0.85 or above for the handful of facts that are truly foundational to how you use the agent every day.

When a full reset is the right call

There is a point of diminishing returns on selective cleanup. If the audit consistently returns 30 to 40 flagged memories, if contradictions keep reappearing after you clean them up, or if the agent’s behavior remains inconsistent despite targeted deletions, the database may have enough accumulated drift that a selective approach is slower than a reset.

A reset does not mean deleting everything permanently. Export the memories first:

Export all my stored memories to a file at workspace/memory-export-[today’s date].json. Include the full text, category, importance score, and stored date for each entry. Then tell me the total count and the breakdown by category.

Then clear the database, review the export manually, and re-store only the memories that are accurate and still relevant. This is faster than it sounds: most memory databases for active setups have 50 to 200 entries. A manual review takes 20 to 30 minutes. Work through the export in order of importance score, highest first. The high-importance entries need the most scrutiny. Low-importance entries from early in the database are often safe to skip. What you get back is a clean, accurate database with intentional entries instead of an accumulated pile of mixed-quality facts where the important ones are buried under months of low-signal auto-captured context.

The full reset is also the right call when switching from one memory plugin to another. The export gives you a portable list that you can re-store into the new plugin. Do the export before uninstalling the old plugin, since you cannot read the old database after the plugin is gone. The re-store process is also an opportunity to intentionally set importance scores on every memory rather than inheriting whatever the auto-extraction assigned. A deliberate re-store produces a cleaner database than any amount of incremental cleanup.

Building memory hygiene into your daily workflow

The cron jobs handle the scheduled audits. They are the safety net. But the fastest way to prevent accumulation is not catching up on a schedule, it is building a habit of updating memories at the point of change. The difference between a well-maintained memory database and a stale one is usually not the audit frequency, it is whether the operator updates memories when things actually change. But the fastest way to prevent accumulation is to build a habit of updating memories at the point of change rather than catching up later. Three situations warrant an immediate memory update rather than waiting for the next audit:

You switch a primary preference: New model, new output format, new working style. Update the memory the same day.
You complete a project: Archive the project context memories before starting the next one. Five minutes now saves an hour of cleanup later when the new project’s memories start mixing with the old ones.
You make a significant infrastructure change: New VPS, new plugin, new API provider. Run the infrastructure refresh prompt before closing the session where the change was made.

I just completed the [project name] project. Archive all memories related to that project by updating their text to prefix with “Historical context (archived [today’s date]):” and lower their importance to 0.15. Then confirm the count of archived memories.

The archive pattern rather than the delete pattern is the right default for completed work. Deletion is irreversible and faster than you think. The first time you need to reference a past decision and find the memory is gone, you will wish you had archived instead. Storage is cheap. The cost of an archived low-importance memory is essentially zero in terms of recall accuracy, because at 0.15 importance it almost never surfaces unless you ask for it directly. You may need to reference past decisions. You may onboard someone new who needs context. The archived memories are out of the active recall pool (low importance) but available if you go looking for them deliberately.

Reading the recall log to understand what is actually being surfaced

One of the most useful diagnostic moves for stale memory problems is reading what was actually recalled in a problematic session, not what you think was recalled. The memory plugin logs injected memories when the gateway log level is set to debug or verbose. Checking what was injected tells you exactly which memories are driving incorrect behavior.

Check the gateway logs for the last session. Were any memories injected into the context? Show me the exact memory texts that were recalled, including their importance scores and stored dates. I want to know what context the agent was working with.

If the logs show a memory with a stored date of eight months ago and a high importance score being injected at the top of every relevant query and an importance of 0.85 being injected at the top of every relevant query, you have found your problem. The fix is straightforward: update or delete that specific memory. The log tells you exactly which one to target rather than requiring you to review the entire database.

If recall logging is not available

Not all plugin versions expose recall logs at the gateway level. If the logs do not show injected memories, use the direct approach: before a session where you suspect stale context is causing problems, run memory_recall explicitly with the query you expect the agent to be running internally, and check what comes back. That tells you what would be injected even if the automatic injection logs are not available.

Frequently asked questions

The questions below cover the failure modes that do not fit cleanly into the sections above but come up consistently in operator setups.

How often should I audit my memories?

Preferences: monthly at minimum, weekly if your workflow changes frequently. Facts about infrastructure and software: after every significant change (migration, major update, provider switch) plus monthly. Project context: as projects start and end. The cron setup in this article covers preferences weekly and facts monthly, which is the right baseline for most setups. Adjust the cadence based on how fast your environment changes.

Can I set an automatic expiry date on a memory when I store it?

Not natively. OpenClaw’s memory plugin does not have a built-in TTL or expiry field as of March 2026. This is a known gap that has been requested in the community but has not shipped. The maintainers have noted that TTL semantics are difficult to implement correctly in a hybrid vector/keyword store without degrading recall quality on queries near the expiry boundary. The workaround is the cron-based audit approach in this article. If expiry-by-date is important to your setup, you can include an “expires” note in the memory text itself (“expires 2026-06-01”) and configure your audit cron to check for that pattern and delete expired entries. It requires the audit cron to parse the text, but it works with a well-structured prompt.

My agent keeps recalling outdated information even after I deleted the stale memory. Why?

Two possible causes. First, there may be a duplicate memory that survived the deletion: a memory stored twice under slightly different text that the targeted deletion missed. Run memory_list with no filter and check for duplicates manually. Second, the outdated information may be in the model’s training data or in the conversation context rather than in stored memory. If the agent says something incorrect and you cannot find a memory that explains it, the source may not be memory at all. Ask the agent explicitly: “Where is that information coming from? Check whether it is from a stored memory or from your training data.”

Should I delete all my memories and start fresh if I have too many stale ones?

Almost never. A full wipe removes accurate memories that took months to build. The agent loses context about your preferences, your infrastructure, your working style, and your ongoing projects. Starting fresh is fast but you pay for it over the next several months as the agent re-learns everything. The targeted audit approach takes longer upfront but preserves what is correct. The only case where a full wipe makes sense is if the memory database is so corrupted or contradictory that it is actively damaging the agent’s behavior, and targeted cleanup has failed to fix it.

The weekly audit cron keeps flagging the same memories as potentially stale but they are actually correct. How do I stop the false positives?

Add a “verified [date]” note to the memory text after each audit confirms it is still accurate. “Primary model: DeepSeek Chat. Verified March 2026.” The next audit will see the verification date and flag it as recently reviewed rather than potentially stale. This approach also tracks when you last checked each memory, which is useful for identifying memories that have never been reviewed since they were originally stored.

My autoCapture is storing lots of memories automatically. How do I keep those from going stale faster than I can manage them?

Lower the importance score in the autoCapture config. If autoCapture is storing at 0.8 by default, lower it to 0.5. Higher-importance memories surface more in recall and do more damage when stale. Auto-captured memories are often conversational context rather than foundational facts, so a lower importance score is appropriate. Also set a higher extractMinMessages threshold so extraction runs less frequently, which reduces the volume of auto-captured memories. Review auto-captured memories in the weekly audit and raise the importance score manually for any that turn out to be genuinely important facts worth keeping current.

I have memories about an old job or old project that I want to keep for historical reference but not have influence current responses. How do I handle this?

Lower the importance score to 0.1 or 0.2 and update the memory text to prefix it with “Historical context (archived [date]):” For example, “Historical context (archived 2026-03): [old project details].” The low importance score means it rarely surfaces in recall. The “Historical context” prefix means that if it does surface, the model treats it as background information rather than current guidance. This gives you an archive without the noise of high-importance stale entries competing with current ones.

That combination , cron audits on a schedule, immediate updates on change events, and intentional importance scoring at storage time , is what keeps an openclaw memory stale or openclaw memory expiry issue from developing. The openclaw refresh memories workflow described here in the first place.

Go deeper

MemoryHow to bulk-delete OpenClaw memories without losing the ones you wantAudit, target, and clean up memory noise without wiping what matters.MemoryOpenClaw memory works on my laptop but not on my VPSThe five causes of VPS memory failure and how to fix each one.CronHow to set up a cron job that runs every morning and summarizes what’s happeningBuild the daily summary cron that feeds your refresh workflow.