How to choose the right embedding model for OpenClaw memory

Your memory pipeline is running. Writes are happening. Recall is working. But your agent keeps surfacing things that are only loosely related to what you asked, and misses things that should be obvious matches. The embedding model is almost always why. Most operators use whatever the default was when they installed the plugin and never touch it. The model you choose determines how well your agent understands what a memory means, not just what words it contains.

TL;DR

The embedding model converts your memories into vectors that represent their meaning. A better model produces vectors that capture semantic similarity more accurately, making recall more relevant. The default free option (nomic-embed-text via Ollama) is good enough for most setups. If recall quality matters more than cost, switch to a paid API model like OpenAI text-embedding-3-large or Jina v3. Switching models invalidates existing memories because vectors are not compatible across models. You need to re-embed or start fresh after a change.

Every indented block in this article is a command you can paste directly into your OpenClaw chat. Your agent will run it and report back. You do not need to open a terminal, edit any files, or navigate any filesystem.

What the embedding model actually does

When your agent stores a memory, it does two things: saves the text, and converts that text into a vector: a list of numbers that represents the meaning of the text. When you ask the agent something later, it converts your question into the same kind of vector and finds stored memories with similar vectors. That similarity search is how recall works.

The embedding model is what does the converting. A better model produces vectors that more accurately reflect meaning. “The API key expired” and “authentication stopped working” end up close together in a good model. In a weak one, they might not match at all because the words are different even though the meaning is the same.

What embedding model is my memory plugin currently using? Is it a local model via Ollama or an API model? What is the vector dimension size? Show me the exact config line.

The three categories of embedding models

Embedding models fall into three categories based on where they run and what they cost. The right choice depends on how much you value recall quality versus cost.

Local models (free, zero API cost)

These run on your own hardware via Ollama. No external API calls, no per-request cost. Quality ranges from baseline to very good depending on the model.

nomic-embed-text: The default for most OpenClaw memory plugins. 768 dimensions, solid baseline quality, runs on modest hardware. Good enough for most setups where recall does not need to be perfect.
mxbai-embed-large: 1024 dimensions, noticeably better semantic understanding, still runs locally. If you have the RAM, this is the free upgrade from nomic-embed-text.
all-minilm-l6-v2: 384 dimensions, very fast, lower quality. Only use if you are on extremely constrained hardware and recall quality is not a priority.

Check whether I have Ollama installed and which embedding models are available locally. If I do not have mxbai-embed-large, what is the Ollama pull command to install it? Show me the command before running it.

API models (paid, higher quality)

These run on external services and cost per request. Quality is generally higher than local models, especially for nuanced semantic understanding.

OpenAI text-embedding-3-large: 3072 dimensions, currently the gold standard for semantic similarity. Excellent quality, relatively expensive per request.
Jina jina-embeddings-v3: 1024 dimensions, strong multilingual support, good for technical content. Often cheaper than OpenAI for comparable quality.
Cohere embed-english-v3.0: 1024 dimensions, strong for English content, good for long-form text.

API models make sense when recall quality directly impacts your work and the cost is justified by the value of better results. For most personal automation setups, local models are sufficient.

If I wanted to switch to an API embedding model, which one would you recommend based on my usage? What would the approximate monthly cost be given my current memory write and recall frequency? Show me the cost calculation before making any change.

Hybrid models (local with API fallback)

Some memory plugins support a hybrid setup: use a local model for most operations, fall back to an API model for high-stakes or complex queries. This reduces cost while still getting high-quality results when it matters most.

Does my memory plugin support hybrid embedding models (local for writes, API for certain recalls)? If not, is there a way to implement a similar pattern manually? Explain the tradeoffs.

Testing your current recall quality

Before changing anything, test whether your current embedding model is actually the problem. Poor recall can also be caused by vague memory text, too many memories in one scope, or a misconfigured similarity threshold.

Run a recall quality test. Pick three specific queries that should have clear matches in my memory. For each query: run the recall, show the top five results, and rate each result as relevant, partially relevant, or irrelevant. Give me a summary of how many results were genuinely relevant.

If the recall test shows mostly relevant results, your embedding model is fine and the problem is elsewhere. If it shows mostly irrelevant results, the embedding model is a likely suspect.

The semantic similarity test

A more direct test of embedding quality is to check whether the model correctly groups related concepts. Write a few test memories with similar meaning but different wording, then query with a related phrase and see if they surface.

Write three test memories about the same topic with different wording. Then query with a phrase that is semantically related but uses none of the same words. Do the test memories surface? This tells us whether the embedding model is capturing meaning or just word matching.

Switching embedding models: the re-embedding problem

The biggest hurdle to switching embedding models is that vectors from different models are not compatible. A memory embedded with nomic-embed-text cannot be meaningfully compared to a query embedded with mxbai-embed-large. The numbers represent different semantic spaces.

When you switch models, you have two options:

Re-embed all existing memories: Read each memory text, re-embed it with the new model, and replace the vector in the database. This preserves your memory history but requires a migration script.
Start fresh: Keep the old memory database as an archive, create a new database with the new model, and let new memories accumulate. This is simpler but loses access to old memories in recall.

If I switch from my current embedding model to a different one, what would be involved in re-embedding my existing memories? How many memories would need to be processed, and roughly how long would it take? Show me the migration steps before I decide.

Export before switching

Before any model change, export your current memories to a plain text file. This gives you a backup in case the migration goes wrong and preserves the memory text even if you decide to start fresh.

Before considering an embedding model switch, export all my current memories to workspace/memory-export-YYYY-MM-DD.md. Include memory ID, scope, category, and full text. I want a recoverable snapshot.

Calculating the cost of API embedding models

API embedding models charge per token. A typical memory might be 50-200 tokens. At OpenAI’s pricing (around $0.00013 per 1,000 tokens for text-embedding-3-large), embedding 1,000 memories costs about $0.13. The cost is low per operation but adds up with frequent writes and recalls.

The monthly cost depends on three factors:

Memory writes per day: Each new memory triggers an embedding call.
Recall queries per day: Each recall query is embedded.
Average memory length in tokens: Longer memories cost more to embed.

Estimate my monthly embedding API cost if I switched to OpenAI text-embedding-3-large. Use my current memory write and recall frequency, and assume an average memory length of 100 tokens. Show me the calculation and the final monthly estimate.

When API cost is worth it

API embedding models are worth the cost when recall quality directly impacts revenue or productivity. For a content operation where the agent needs to recall specific source details accurately, the cost of a few dollars per month is trivial compared to the value of correct information. For personal automation where approximate recall is fine, local models are the right choice.

Based on how I use my agent, would an API embedding model provide enough quality improvement to justify the cost? Or is my usage such that a local model is perfectly adequate? Give me a recommendation with reasoning.

Vector dimension size and what it means

Embedding models produce vectors of a fixed dimension size: 384, 768, 1024, 3072, etc. Higher dimensions can represent more nuanced meaning but require more storage and compute for similarity search.

For most OpenClaw setups, 768 dimensions (nomic-embed-text) or 1024 dimensions (mxbai-embed-large) is sufficient. The jump to 3072 dimensions (OpenAI’s large model) provides measurable quality improvement but at a cost in storage and API expense.

What is the vector dimension size of my current embedding model? If I switched to a model with higher dimensions, would I need to adjust any memory plugin settings for storage or performance?

Storage implications

Each dimension is typically stored as a 32-bit float (4 bytes). A 1024-dimensional vector uses about 4KB of storage per memory. With 1,000 memories, that is 4MB of vector storage. For 3072 dimensions, it is 12KB per memory, 12MB for 1,000 memories. Storage is rarely the limiting factor, but it is worth knowing the scaling.

Rerankers: the second stage of recall quality

Some memory plugins add a reranker stage after the initial vector similarity search. The reranker takes the top N results from the vector search and reorders them based on a more sophisticated understanding of relevance. This improves precision at the cost of an additional API call.

Common reranker models include Jina Reranker v3 and Cohere Rerank. These are typically API models that charge per request. The quality improvement is noticeable when your initial vector search returns many somewhat relevant results and you need the most relevant one at the top.

Does my memory plugin have a reranker configured? If so, what model is it using and what is the cost per request? If not, would adding one improve my recall quality enough to justify the additional API calls?

When a reranker is worth it

A reranker makes sense when you have a large memory database (thousands of memories) and need high precision in the top results. For small databases with hundreds of memories, the vector search alone is usually sufficient. The reranker is an optimization for quality-critical use cases, not a necessity for most personal setups.

Multilingual embedding models

If your memories include text in multiple languages, you need an embedding model that handles cross-language similarity well. Some models are trained primarily on English and perform poorly on other languages. Others are explicitly multilingual.

Good multilingual options include Jina Embeddings v3 (supports 100+ languages) and OpenAI’s text-embedding-3-large (strong multilingual performance). For local models, mxbai-embed-large has decent multilingual capability.

Do any of my memories contain non-English text? If so, is my current embedding model likely handling them well, or should I consider switching to a multilingual model? Check a sample of memories and flag any that appear to be in a language other than English.

Embedding models for technical content

Technical content (code, configuration, error messages, API documentation) has different semantic patterns than natural language. Some embedding models are better at capturing technical similarity than others.

For technical content, models trained on code or technical documentation often perform better. OpenAI’s text-embedding-3-large handles technical content well. For local models, mxbai-embed-large is a good choice. If your memories are heavily technical and recall quality matters, consider testing a few models to see which one groups technical concepts most accurately.

What percentage of my memories are technical (code snippets, configuration, error messages, API references)? If it is high, should I prioritize an embedding model known for good technical understanding? Recommend a model based on my content mix.

Evaluating embedding models before switching

Before committing to a model switch, run a structured evaluation. This prevents switching to a model that is theoretically better but actually performs worse on your specific content.

The evaluation process

Create a test set: Export 50-100 representative memories from your current database.
Define test queries: Write 10-20 queries that should match specific memories in the test set.
Run recalls with each candidate model: For each model, embed the test memories and run the queries, recording which memories surface in the top results.
Score relevance: For each query, score how many of the top 5 results are genuinely relevant.
Compare scores: The model with the highest average relevance score is the best fit for your content.

Help me design an embedding model evaluation for my setup. What would be a good test set size, what queries should I use, and how should I score the results? I want to compare my current model against at least one alternative before deciding whether to switch.

Memory plugin compatibility with different models

Not all memory plugins support all embedding models. Some plugins are hardcoded to use specific models or have limited configuration options. Before planning a switch, check what your plugin actually supports.

What embedding models does my memory plugin support? Check the plugin documentation or config schema. Are there any limitations (maximum vector dimensions, required model format, API key requirements) that would restrict my choices?

Common plugin limitations

Vector dimension mismatch: The plugin expects a specific dimension size and may not work correctly with models that produce different sizes.
API key configuration: Some plugins require API keys to be configured in a specific format or location.
Local model path: For local models, the plugin may expect the model to be at a specific Ollama endpoint or local file path.

Performance considerations for local models

Local embedding models run on your hardware. The performance impact depends on the model size and your hardware specs.

RAM requirements: nomic-embed-text needs about 1-2GB of RAM. mxbai-embed-large needs 3-4GB. If you are running on a VPS with limited RAM, the larger model may cause swapping and slow down other processes.

CPU vs GPU: Most local embedding models run on CPU by default. If you have a GPU with enough VRAM, some models can be configured to use it for faster inference. This is an advanced optimization that is rarely necessary for typical memory volumes.

Check my system resources. How much RAM is available, and how much is currently in use? Would running mxbai-embed-large instead of nomic-embed-text cause memory pressure? Show me the current memory usage before recommending a larger local model.

A hybrid approach for cost-conscious quality

If you want better recall quality but cannot justify the full cost of an API model for all operations, consider a hybrid approach:

Use a local model for writes: All new memories are embedded with a free local model.
Use an API model for critical recalls: When you run a recall where quality really matters, use a separate tool to embed the query with an API model and compare against your local vectors (this requires custom scripting).
Periodic quality upgrade: Once a month, re-embed your most important memories with an API model to improve their vector quality for future recalls.

This approach is more complex to implement but can provide better quality where it matters most while keeping costs low.

Would a hybrid embedding approach make sense for my usage? Estimate the cost savings versus a full API model approach, and the quality improvement versus a full local model approach. Is the complexity worth it for my setup?

Performance considerations for local embedding models

Local embedding models run on your own hardware, which means their performance depends on your CPU, RAM, and whether you have a GPU available. For most OpenClaw setups running on a VPS or home server, embedding performance is not a bottleneck, but it is worth understanding the tradeoffs.

CPU versus GPU inference

Ollama can use GPU acceleration for embedding models if you have a compatible GPU and drivers installed. GPU inference is significantly faster for batch operations like re-embedding many memories at once. For typical usage (embedding a few memories per session), CPU is fine.

Check whether my Ollama installation is using GPU acceleration for embedding models. If not, what would be required to enable it? Is the performance improvement worth the setup effort for my usage pattern?

Memory requirements

Larger embedding models require more RAM. nomic-embed-text (768 dimensions) uses about 300MB of RAM. mxbai-embed-large (1024 dimensions) uses about 500MB. If your server has limited RAM, stick with the smaller model.

How much RAM does my current embedding model consume when loaded in Ollama? If I switched to mxbai-embed-large, would I have enough free RAM on my server to run it alongside everything else?

Multilingual embedding models

If your memories contain text in multiple languages, you need an embedding model that handles multilingual content well. Most local models are English-focused. API models often have better multilingual capabilities.

Jina jina-embeddings-v3: Excellent multilingual support across 100+ languages.
OpenAI text-embedding-3-large: Good multilingual support, especially for European languages.
mxbai-embed-large: Decent multilingual support but not as comprehensive as API models.

Do any of my stored memories contain non-English text? If so, what percentage of my memories are multilingual? Based on that, would I benefit from switching to a model with better multilingual support?

Domain-specific embedding models

Some embedding models are trained on specific types of content (legal documents, medical text, code, etc.) and perform better on that domain than general-purpose models. For most OpenClaw usage, general-purpose models are fine. If your work involves highly technical or domain-specific language, a specialized model might improve recall.

Review a sample of my stored memories. Are they mostly general conversational text, or do they contain significant technical terminology, code snippets, or domain-specific jargon? Would a domain-specific embedding model likely improve recall quality for my content?

Using rerankers alongside embedding models

Some memory systems use a two-stage retrieval process: first find candidate memories using vector similarity (embedding model), then re-rank them using a more powerful model (reranker) to improve precision. This is especially useful when you have many memories and need the top results to be highly relevant.

Common reranker options:

Jina Reranker v3: High quality, available as an API.
Cohere Rerank: Strong for English content.
Local rerankers: Less common, but some Ollama models can be used for reranking.

Does my memory plugin support rerankers? If so, is one currently configured? What would be the cost and performance impact of adding a reranker to my setup?

Hybrid retrieval: vector + keyword search

Many memory plugins support hybrid retrieval: combining vector similarity with traditional keyword (BM25) search. This approach catches memories that are semantically related but also ensures exact keyword matches surface when they exist. Hybrid retrieval often produces better results than either method alone.

Is my memory plugin using hybrid retrieval (vector + keyword)? If not, would enabling it likely improve recall quality for my type of memories? What is the configuration change needed?

Monitoring embedding quality over time

Embedding model performance can degrade if the model drifts (less likely) or if the nature of your memories changes (more likely). A quarterly quality check ensures recall remains effective as your usage evolves.

Set up a quarterly embedding quality check. Every three months, run the semantic similarity test with the same set of test memories and queries. Compare the results to the previous run and flag any significant decline in recall precision. Store the results in a memory so we can track trends.

Troubleshooting poor recall step by step

When recall quality is poor, follow this diagnostic sequence before changing your embedding model:

Check memory text quality: Are memories specific and well-written, or vague and generic?
Check scope structure: Are too many memories in one scope, causing dilution?
Run the semantic similarity test: Does the embedding model correctly group related concepts?
Check for duplicate memories: Duplicates can push relevant results down the list.
Verify hybrid retrieval is enabled: If available, ensure both vector and keyword search are active.
Consider adding a reranker: If precision is critical and you have many memories.

My recall quality has declined. Walk me through the six-step diagnostic. For each step, run the check and report whether it reveals a problem. Stop at the first step that shows an issue.

The complete migration playbook

If you decide to switch embedding models, follow this playbook to minimize disruption and data loss.

Step 1: Export existing memories

Create a complete backup of all memories with their text, scope, and category. This is your recovery point.

Export all memories to workspace/memory-export-before-migration-YYYY-MM-DD.md. Include memory ID, scope, category, and full text. Verify the file exists and contains all memories before proceeding.

Step 2: Test the new model

Create a test database with the new model, write sample memories, and verify recall works as expected.

Set up a test memory database with the new embedding model. Write 10 test memories covering different topics. Run recall queries and confirm the results are at least as good as with the old model.

Step 3: Re-embed or start fresh

Decide whether to re-embed existing memories or start fresh. For most setups with under 1,000 memories, re-embedding is worth the effort to preserve history.

Based on my memory count, recommend whether to re-embed or start fresh. If re-embedding, show me the migration script that will read each memory from the export file, embed it with the new model, and write it to the new database.

Step 4: Verify the migration

After migration, run recall tests to confirm the new setup works correctly.

After migration, run the same recall quality test as before. Compare the results to the pre-migration baseline. Confirm that recall quality has not degraded and ideally has improved.

Practical migration steps for switching models

If you decide to switch embedding models, follow this sequence to minimize disruption and data loss.

Step 1: Export current memories

Create a complete backup of all memories with their text, scope, category, and metadata. This is your safety net if anything goes wrong during migration.

Export all my memories to workspace/memory-export-before-model-change-YYYY-MM-DD.md. Format each memory with its ID, scope, category, importance, and full text. Confirm the file was created and contains all memories before proceeding.

Step 2: Test the new model

Before migrating everything, test the new model with a small subset of memories to confirm it works correctly with your plugin and produces better results.

Create a test memory database with the new embedding model. Write 20 test memories and run 5 test queries. Compare the results with the same queries against my main database. Is the new model producing better relevance?

Step 3: Configure the new model

Update your memory plugin configuration to use the new model. This may require adding API keys, changing model names, or adjusting vector dimension settings.

Show me the exact config changes needed to switch from my current embedding model to [new model]. Include any API key additions, model parameter changes, and whether a gateway restart or new session is required for the change to take effect.

Step 4: Re-embed or start fresh

Decide whether to re-embed existing memories or start fresh. For most operators with under 500 memories, re-embedding is worth the effort to preserve history. For larger databases, starting fresh may be simpler.

I have [number] memories. Should I re-embed them or start fresh? Estimate the time and effort for each approach and recommend one based on my memory count and how important historical context is to my work.

Step 5: Verify migration success

After migration, run the same recall quality tests you ran before the switch. Confirm that recall quality has improved and that no memories were lost or corrupted during the process.

After switching embedding models, run the recall quality test again with the same three queries. Compare the results with the pre-switch test. Has relevance improved? Are there any missing memories that should have surfaced?

Monitoring embedding quality over time

Embedding model performance can degrade if the model service changes or if your content evolves in ways the model handles poorly. Setting up periodic quality checks catches degradation before it becomes a problem.

Monthly quality check

Once a month, run a standard recall quality test with a fixed set of queries. Record the relevance scores and compare them to previous months. A declining trend indicates a problem.

Set up a monthly embedding quality check cron job. On the first of each month, run three standard recall queries, score the top five results for relevance, and store the scores in a workspace file. Alert me if the average score drops by more than 20% from the previous month.

Content drift detection

If your work shifts to new domains (e.g., from technical documentation to marketing copy), your embedding model may become less effective. Monitor the types of memories being written and flag significant shifts that might warrant re-evaluating your model choice.

Analyze the last 100 memories written. Has the content type shifted significantly from earlier memories? If so, does my current embedding model handle the new content type well, or should I consider a model better suited to the new domain?

Cost optimization for API models

If you are using an API embedding model, these strategies reduce cost without significantly impacting quality.

Batch writes: Instead of embedding each memory as it is written, collect memories and embed them in batches. This reduces the number of API calls.
Cache query embeddings: If the same query is used frequently, cache its embedding vector rather than re-embedding it each time.
Prune low-value memories: Delete memories that are unlikely to be useful in future recalls. Fewer memories mean fewer vectors to store and search.
Use shorter memory text: Embedding cost scales with token count. Write concise memories that capture the essence without unnecessary detail.

I am using an API embedding model. Analyze my memory writing patterns and suggest specific cost optimizations. Which of the four strategies would save the most money given my usage?

Final recommendation framework

Use this decision tree to choose the right embedding model for your setup:

Is recall quality critical to your work? If no, use nomic-embed-text (free). If yes, continue.
Do you have budget for API costs? If no, use mxbai-embed-large (free, better quality). If yes, continue.
Is your content mostly English? If yes, consider OpenAI text-embedding-3-large. If multilingual, consider Jina Embeddings v3.
Is your content heavily technical? If yes, test both OpenAI and Jina models to see which handles your technical content better.
Do you need the absolute best quality regardless of cost? If yes, use OpenAI text-embedding-3-large with a reranker.

Run me through the embedding model decision tree. Based on my answers to the five questions, which model should I use? Show me the reasoning step by step.

Common questions

How do I know if my embedding model is the problem versus other memory issues?

Run the semantic similarity test described earlier. If memories with similar meaning but different wording do not surface together, the embedding model is likely the issue. If they do surface together but the results are still not what you expect, the problem is elsewhere: memory text quality, scope structure, or recall configuration.

Can I use different embedding models for different scopes?

Most memory plugins do not support per-scope embedding models. The embedding model is a global configuration that applies to all memories in the database. If you need different quality levels for different contexts, you would need separate memory plugin instances or a custom setup, which is complex and rarely worth the effort.

My recall was working fine and suddenly got worse. Could the embedding model have changed?

If you did not change the config, the embedding model did not change. More likely, the quality of new memories being written has declined (vague text, duplicate content) or the shared scope has grown so large that relevant results are getting diluted. Check memory quality and scope structure before suspecting the embedding model.

Is there a way to test a new embedding model without migrating all my memories?

Yes. Create a separate test memory database with the new model, write a few test memories, and run recall queries to compare results with your main database. This tells you whether the new model would improve recall quality without committing to a full migration. Some memory plugins support multiple database connections for this purpose.

How often should I reconsider my embedding model choice?

Re-evaluate when recall quality becomes a noticeable problem, not on a schedule. If your work has evolved and now requires more precise recall than before, that is a reason to consider upgrading. Otherwise, a working embedding model should be left alone. The “if it ain’t broke, don’t fix it” principle applies strongly here because of the re-embedding complexity.

Can I use a local model for writes and an API model for recalls?

Not directly in standard configurations. The vectors used for writes and recalls must come from the same model to be comparable. A hybrid approach would require storing two vectors per memory (one from each model) and switching between them based on query type, which is not supported out of the box. The simpler approach is to pick one model that meets your needs for both writes and recalls.

My memory plugin does not seem to support changing the embedding model. What are my options?

If the plugin has hardcoded model support, you have three options: (1) Fork the plugin and modify it to support your desired model (advanced, requires coding), (2) Switch to a different memory plugin that does support model configuration, (3) Accept the default model and work on improving recall quality through other means (better memory text, scope structure, similarity thresholds). For most operators, option 3 is the most practical unless recall quality is critical to your workflow.

How do I know if my embedding model is running on GPU?

Check the Ollama logs or use the Ollama API to list running models and their device placement. For most local embedding models, GPU acceleration is not enabled by default even if a GPU is present. Enabling it requires configuring Ollama to use the GPU for that specific model, which is documented in the Ollama project but not typically necessary for memory embedding workloads unless you have very high volume.

Can I use multiple embedding models in parallel for different types of memories?

Not in a standard single-database setup. The memory database stores one vector per memory. Using multiple models would require storing multiple vectors per memory or having separate databases for different memory types, which is complex to manage and query. For most use cases, picking one good model that handles all your content types well is simpler and more effective.

My recall results include memories that are completely unrelated. Could this be an embedding model problem?

Yes, if the model is producing poor vectors. But first check: are the unrelated memories in the same scope as the query? If they are, and they have text that superficially matches some words in the query, the model might be over-indexing on word matching rather than semantic meaning. Try the semantic similarity test to confirm. If unrelated memories are from different scopes, the problem is scope configuration, not the embedding model.

How long does it take to re-embed 1,000 memories?

With a local model, about 2-5 seconds per memory depending on model size and hardware, so 30-80 minutes total. With an API model, the limiting factor is rate limits and network latency, typically 10-30 seconds per memory, so 3-8 hours total. For large migrations, batch processing with pauses between batches is recommended to avoid hitting rate limits or overloading your local system.

Is there a way to gradually migrate to a new embedding model without downtime?

Yes, with a dual-write approach: write new memories with both the old and new models (storing two vectors), and gradually re-embed old memories in the background. This requires custom database schema and query logic that most memory plugins do not support out of the box. For most operators, accepting a brief period of reduced recall quality during migration is simpler than implementing a gradual migration system.

What happens if my API embedding model becomes unavailable or changes pricing?

Your memory writes and recalls will fail until you switch to a different model. This is a risk of depending on external services. Mitigate it by keeping a local model as a fallback option in your config, and by exporting your memories regularly so you can rebuild with a different model if needed. For critical setups, consider using a local model as the primary and an API model only for quality-critical operations.

Ultra Memory Claw

Embedding model comparison, migration scripts, and cost calculator pre-built

Side-by-side quality tests for nomic-embed-text, mxbai-embed-large, and OpenAI text-embedding-3-large. The re-embedding migration script for switching models without losing memories. The monthly cost calculator based on your memory volume. Everything from this article ready to run.

Get Ultra Memory Claw for $37 →

Keep Reading:

Ultra Memory ClawHow do I know if my embedding model is hurting my recall quality?Testing recall quality and switching embedding models without losing existing memories.Ultra Memory ClawHow to design memory scopes for a multi-project OpenClaw setupScope architecture for operators running multiple contexts from one agent instance.Ultra Memory ClawMemories from one project keep showing up in a different oneWhy scope bleed happens and how to stop it with the two-layer scope pattern.

What the embedding model actually does

The three categories of embedding models

Local models (free, zero API cost)

API models (paid, higher quality)

Hybrid models (local with API fallback)

Testing your current recall quality

The semantic similarity test

Switching embedding models: the re-embedding problem

Export before switching

Calculating the cost of API embedding models

When API cost is worth it

Vector dimension size and what it means

Storage implications

Rerankers: the second stage of recall quality

When a reranker is worth it

Multilingual embedding models

Embedding models for technical content

Evaluating embedding models before switching

The evaluation process

Memory plugin compatibility with different models

Common plugin limitations

Performance considerations for local models

A hybrid approach for cost-conscious quality

Performance considerations for local embedding models

CPU versus GPU inference

Memory requirements

Multilingual embedding models

Domain-specific embedding models

Using rerankers alongside embedding models

Hybrid retrieval: vector + keyword search

Monitoring embedding quality over time

Troubleshooting poor recall step by step

The complete migration playbook

Step 1: Export existing memories

Step 2: Test the new model

Step 3: Re-embed or start fresh

Step 4: Verify the migration

More common questions

How do I know if my embedding model is outdated?

Can I use multiple embedding models simultaneously for different types of memories?

My embedding API calls are failing with rate limits. What should I do?

How do I benchmark different embedding models on my specific content?

What is the environmental impact of running local versus API embedding models?

Can I fine-tune an embedding model on my specific content?

Practical migration steps for switching models

Step 1: Export current memories

Step 2: Test the new model

Step 3: Configure the new model

Step 4: Re-embed or start fresh

Step 5: Verify migration success

Monitoring embedding quality over time

Monthly quality check

Content drift detection

Cost optimization for API models

Final recommendation framework

Common questions

How do I know if my embedding model is the problem versus other memory issues?

Can I use different embedding models for different scopes?

My recall was working fine and suddenly got worse. Could the embedding model have changed?

Is there a way to test a new embedding model without migrating all my memories?

How often should I reconsider my embedding model choice?

Can I use a local model for writes and an API model for recalls?

My memory plugin does not seem to support changing the embedding model. What are my options?

How do I know if my embedding model is running on GPU?

Can I use multiple embedding models in parallel for different types of memories?

My recall results include memories that are completely unrelated. Could this be an embedding model problem?

How long does it take to re-embed 1,000 memories?

Is there a way to gradually migrate to a new embedding model without downtime?

What happens if my API embedding model becomes unavailable or changes pricing?

Embedding model comparison, migration scripts, and cost calculator pre-built

Keep Reading: