OpenClaw keeps switching models mid-task and I don’t know why

You asked your agent to do one thing. Halfway through, the response changed tone, lost context, or started making mistakes it was not making before. The reason: OpenClaw model switching happened silently in the middle of the task. This article explains every reason that happens, how to see it happening, and how to stop it.

Before You Start

Your OpenClaw agent is reachable and responding
You have noticed a change in response quality, tone, or capability mid-session
You want to understand what triggered the switch and how to prevent it

TL;DR

OpenClaw switching models mid-task happens for five reasons: rate limit hit on the primary model, provider-side error, fallback chain stepping down, a plugin or cron job running a different model, or compaction using a separate summarization model. Each one has a different fix. The agent can tell you which one happened if you ask it to check the logs.

Estimated time: 15 minutes to diagnose, 5 minutes to fix

Jump to the cause

Not sure what happened? Check which model you are on right now
Rate limit triggered? Rate limit fallback
Provider error? Provider error fallback
Fallback chain? How the fallback chain works
Plugin or cron override? Plugin and cron model overrides
Compaction? Compaction model switch
Want to lock the model? How to prevent all automatic switching

First: find out which model your agent is using right now

The first question to answer when the OpenClaw wrong model problem occurs: is this a switch or was the wrong model configured from the start? and what model it was supposed to be using. The gap between those two answers is the diagnosis.

What model are you currently using for this conversation? What is the default model configured in my openclaw.json? Are they the same? If not, explain what caused the switch and when it happened.

The agent will tell you the model it is actively using (the runtime model) and the model your config says it should use (the default). If they differ, a switch occurred. The agent can check the gateway logs to tell you exactly when and why.

Check the gateway logs for any model switch events in the last hour. Show me each switch: what model was active before, what model it switched to, the timestamp, and the reason. If the logs do not contain model switch events, check for any error messages from model providers in the same timeframe.

Manual check (SSH)

Run /status in your OpenClaw chat. It shows the current model and the default. If they differ, a switch occurred.
From SSH (systemd): journalctl -u openclaw --since "1 hour ago" | grep -i "model\|fallback\|rate.limit\|error"
Non-systemd: check the log file configured in your openclaw.json logging.file setting.

Cause 1: Rate limit on the primary model

Every model provider enforces rate limits. When your agent sends too many requests in too short a window, the provider returns a 429 error. OpenClaw sees this and, instead of failing the task entirely, steps to the next model in your fallback chain. This is by design. It keeps the agent working. But it also means the agent is now on a model you did not choose, with different capabilities and different cost.

Rate limits are the most common cause of unexpected OpenClaw model switching. They happen more often than expected because OpenClaw makes multiple API calls per task: the main response, any tool calls, memory operations, and follow-up turns. What looks like one question to you is often five or six API calls to the provider.

Check the gateway logs for any 429 or rate limit errors from model providers in the last 24 hours. For each one, tell me: which provider returned the 429, at what time, how many requests were sent in the preceding minute, and what model OpenClaw switched to as a result. Then tell me the current rate limits for each configured provider.

What makes rate limits worse

Several things increase the frequency of rate limit hits beyond your actual conversation usage:

Heartbeat polling: a heartbeat configured to run every 30 or 60 seconds is an API call against your rate limit on every tick. On a provider with a low requests-per-minute cap (or tokens-per-minute cap; some providers limit by tokens, not request count), heartbeats alone can consume a significant portion of your quota.
Cron jobs using the same model: scheduled tasks that fire on the same primary model compete for the same rate limit window as your active conversation.
Multi-step tool use: tasks involving file reads, shell commands, web searches, and decision-making generate many API calls per task. A single “check my server status and fix any problems” request can trigger 10 or more calls.
Compaction passes: when context reaches the compaction threshold, OpenClaw runs a summarization pass. If that pass uses the primary model, it competes with your conversation for rate limit quota.
Memory extraction: some memory plugins run extraction on every message, adding a call against your quota for each turn.

Show me everything that is making API calls against my primary model right now. Include: heartbeat frequency and model, each active cron job and its model, compaction model, memory extraction model, and any plugins that make their own API calls. Estimate the total requests per minute I am generating at peak usage.

How to fix rate limit switching

Move background tasks to a local model. Heartbeats, cron jobs, compaction, and memory extraction do not need your primary frontier model. Moving them to a local Ollama model (completely free, no rate limit) reserves your primary model’s quota entirely for your active conversations.

Check my config for heartbeat, cron jobs, compaction, and memory extraction settings. For any that use my primary model, suggest a local Ollama model that can handle that task. Show me the complete config change. Do not apply anything yet.

Request a rate limit increase. If you are on a paid tier and consistently hitting limits, most providers increase your cap on request. Check your provider’s dashboard or email support. It is free and takes a day or two to process.

Add a pause between bursts. If you run multi-step tasks that fire many calls in quick succession, adding a pacing instruction to your agent prompt prevents burst-triggered rate limits even when the per-minute average is within bounds.

Cause 2: Provider error triggering fallback

Sometimes the model switch is not a rate limit but a provider-side error. A 500 (internal server error), 503 (service unavailable), or network timeout makes OpenClaw treat the primary model as temporarily unavailable and step to the fallback chain. Unlike rate limits, these errors are outside your control.

Check the gateway logs for any 500, 502, 503, or timeout errors from model providers in the last 24 hours. For each one, tell me which provider had the error, what the exact error message was, and whether it was transient (once) or recurring. Then check whether the affected provider’s status page shows any ongoing incidents.

Common provider errors and what they mean

401 Unauthorized: API key is invalid, expired, or not set. Config problem, not a provider outage.
402 Payment Required: account balance is zero or billing is suspended. Add credits.
429 Too Many Requests: rate limit (covered above).
500 Internal Server Error: provider-side problem. Wait or use a fallback.
503 Service Unavailable: provider is overloaded or in maintenance. Same as 500.
Timeout: request took too long. Happens with very large context windows or slow inference. OpenClaw has a default timeout; if the provider does not respond in time, it steps to the next fallback.

For a single transient error, no action is needed. OpenClaw switched to the fallback, completed the task, and will try the primary model again on the next request. For recurring errors, check that your API key is valid and billing is current before assuming it is the provider’s fault.

Run a test call against each of my configured model providers to verify the API key is valid and the provider is responding. For any key that fails, tell me the exact error. Also check that billing is current for each paid provider.

Cause 3: OpenClaw model fallback stepping to a weaker model

OpenClaw has a fallback chain: a list of models it tries in order when the primary is unavailable. The fallback chain is the right feature working correctly. The problem is when the chain drops to a dramatically weaker model at step two or three, and the quality change looks like a malfunction.

A chain that goes from Claude Sonnet to DeepSeek Chat to Llama 3.1 8B is functional. But the gap between the second and third model is steep enough that the agent’s behavior changes visibly. Users who do not know a switch happened think the agent broke. In practice, the fallback fired and landed on a small local model.

Show me my current fallback chain. For each model in the chain, tell me the provider, the model name, approximate cost per million tokens, and a rough quality comparison to my primary model. Flag any model that is a significant quality downgrade from the one before it in the chain.

How to build a fallback chain that does not cause visible quality drops

Fallback chain design principles

Same model, different provider first: if your primary is Claude Sonnet on Anthropic, your first fallback should be Claude Sonnet via OpenRouter. Same model, different endpoint. Protects against single-provider outages with zero quality change.
Same-tier, different model second: after same-model fallbacks, move to a model of comparable capability from a different provider. DeepSeek Chat is a strong fallback for Sonnet-class tasks at a fraction of the cost.
Local model last: a local Ollama model is the last resort, not the second option. It keeps the agent functional when all providers are down, but users will notice the quality difference.
No tier skipping: going from a frontier model directly to a 7B local model in one step is too steep. The quality change is dramatic enough that users report it as the agent being broken.

Based on my current primary model, design an optimal fallback chain that avoids steep quality drops. Include at least one same-model-different-provider fallback and one same-tier-different-model fallback before any local models. Show me the exact config change to implement it.

Cause 4: A plugin or cron job is using a different model

Some OpenClaw features specify their own model independently of the primary. When these run, the agent temporarily operates on a different model for that task. The most common sources:

Heartbeat: the heartbeat config has its own model setting. If set to a local model, the heartbeat runs on that local model. This is intentional cost-saving behavior.
Cron jobs: each cron job can specify a model. A cron job firing during your conversation runs on its configured model, not your primary.
Compaction: uses a separate model to summarize context when the conversation gets long. May differ from your primary model.
Memory extraction: memory plugins run fact extraction on a separate model, adding API calls that consume quota.
Per-session override: the /model command sets a model override for the current session only. If a session was opened with a model override (by you, a cron job, or a subagent), it stays on that model until the session ends or the override is cleared.

List every model override currently active in my setup. Check: heartbeat model, each cron job and its model, compaction model, memory extraction model, and any active per-session model overrides. For each one, tell me what model it uses and whether it is competing with my primary model for rate limit quota.

The fix depends on whether the override is intentional. A heartbeat running on a free local model is working as designed. A cron job accidentally using the primary model and eating your rate limit quota is a misconfiguration.

For every background task that is using my primary model when it should use something cheaper, show me the config change to move it to a local Ollama model. Do not apply changes yet, just show me the plan.

Cause 5: Compaction changing behavior mid-conversation

When your conversation gets too long, OpenClaw compresses older messages to keep the context within the model’s window. This is compaction. It uses a model to summarize the older content, and that model can differ from your primary.

The symptom of compaction is not quite the same as a model switch. With a model switch, the agent’s tone and capabilities change. With compaction, the agent’s tone stays the same but it suddenly forgets details from earlier in the conversation, repeats questions you already answered, or loses track of a multi-step task. The cause is the same: the agent’s input changed because older context was compressed into a summary.

Check my compaction settings. What model is used for compaction? What are the trigger thresholds? How many tokens are retained after compaction runs? Has compaction run in this session, and if so, when and how much context was compressed?

How to reduce compaction disruption

Increase retained tokens. More tokens kept after compaction means more of your conversation survives verbatim. Increasing the retained token count from the default to 60,000 or more significantly reduces information loss per compaction pass.

Use a capable compaction model. The model doing the summarization matters. A weak model produces lossy summaries. A capable model (even a slightly more expensive one) preserves more detail and causes fewer mid-task disruptions.

Start fresh sessions for long tasks. If a session has been running for hours, starting a new session with /new before a complex task is cleaner than letting compaction remove context mid-way through. All your memories survive across sessions; only the raw conversation history is lost.

Show me my current compaction settings and suggest improvements that reduce information loss while keeping costs reasonable. Specifically: what should the retained token count be, and what model should handle compaction? Show me the config changes without applying them.

How to tell which model produced which response

After a suspected switch, you want to know which responses came from which model. This helps you evaluate whether the fallback degraded your results or whether the quality was acceptable.

For my last 10 messages in this session, tell me which model produced each response. If the model changed between any two consecutive responses, highlight where the switch happened, what caused it, and whether that fallback model is still active now.

For ongoing visibility, add this as a standing instruction in your AGENTS.md or system prompt:

Standing instruction to add to AGENTS.md

If you switch to a fallback model for any reason, announce it in your reply: “Note: switched to [model name] because [reason].” If you return to the primary model after a fallback, announce that too.

How to prevent all automatic model switching

If predictable model behavior matters more to you than reliability, you can configure OpenClaw to stay on the primary model and fail instead of falling back. This is a deliberate tradeoff: any provider outage or rate limit makes the agent unresponsive until the issue clears.

Show me the exact config change to disable all model fallback behavior so that OpenClaw uses my primary model only and returns an error if that model is unavailable. Also show me how to re-enable fallbacks if I change my mind.

WRITE, TEST, THEN IMPLEMENT

Disabling fallbacks means any provider outage or rate limit makes the agent completely unresponsive. Only do this if you have a high rate limit on your primary model and the provider has strong uptime. For most setups, a well-configured fallback chain is better than no fallbacks.

A better middle ground: keep fallbacks enabled but remove any model from the chain you would not want the agent using under any circumstances. A shorter, more deliberate fallback chain gives you the reliability of fallbacks without the quality-cliff problem.

Show me my current fallback chain. Remove any model that is more than one quality tier below my primary. Leave the remaining models as the chain. Show me the config change.

What model switching is actually costing you

Model switching is not just a quality problem. It is a cost problem that cuts both ways. Most operators assume that switching to a fallback model saves money because fallbacks are supposed to be cheaper. That is not always true, and when it is true, you are still potentially paying more than you should because the fallback model takes more turns to complete tasks than your primary model would have.

Here is the math that matters: if your primary model costs $3 per million tokens and your fallback costs $5 per million tokens, every switch to the fallback is spending 67% more per token. If the switch happens because of a rate limit hit on a cron job that was accidentally using the primary model, you are paying a premium for a background task that should have been on a free local model in the first place.

Show me the approximate cost per million input tokens and per million output tokens for every model currently in my config, including my primary model, every fallback model, my compaction model, and my heartbeat model. Rank them from cheapest to most expensive. Flag any fallback model that costs more per token than my primary.

The hidden cost of fallback inefficiency

Even when a fallback model is cheaper per token, it costs more in total if it takes more turns to complete a task. A smaller or less capable model that needs three back-and-forth rounds to accomplish what your primary model does in one round burns three times the tokens. The per-token price was lower but the total bill was higher.

This is especially visible in complex tasks: research, multi-step analysis, writing that requires iteration. A frontier model completes the task in two turns. A fallback small model completes it in six. The six-turn version was cheaper per token and more expensive in total. Operators who track only per-token cost without tracking tokens-per-task miss this entirely.

For my last 10 tasks that involved more than one back-and-forth turn, tell me: which model handled each task, how many total turns it took to complete, and an estimate of the total tokens used. For any task that used a fallback model, estimate what it would have cost if the primary model had handled it in fewer turns.

The cheapest model is not always the right fallback

A common mistake is putting the cheapest available model at every fallback position. The problem is that a model that cannot complete a task at all costs more than a slightly more expensive model that completes it in one turn. If your fallback fails at tool use, multi-step reasoning, or following complex instructions, the user either gives up (the agent did not complete the task) or retries (doubling the token cost). Neither outcome is cheaper than using a capable model the first time.

The right fallback model is the cheapest model that can still complete the class of tasks your agent handles. For a research and writing agent, that bar is higher than for a simple Q&A agent. The model routing checklist below helps you find the right threshold.

Model routing cost audit checklist

List every model in your config (primary, fallbacks, compaction, heartbeat)
Note the cost per million tokens for each
Note which models are capable of tool use vs. text only
Identify every background task (heartbeat, cron, compaction, memory extraction) and which model it uses
Move any background task that does not need tool use to a free local model
Move any background task that does not need reasoning or writing quality to a free local model
Keep the fallback chain limited to models capable of tool use if your primary tasks involve tool use

Run a full model routing cost audit on my current config. For every model in use (primary, fallbacks, compaction, heartbeat, cron jobs, memory extraction), calculate what I am paying per day at my current usage level. Then identify the three changes that would reduce my daily cost the most without noticeably reducing response quality for my main use cases.

OpenClaw model routing: building a setup that does not surprise you

Model switching surprises happen when the system is making decisions you are not aware of. The way to eliminate the surprises is to make all the decisions explicit. Every model used in every context should be a deliberate choice, not an inherited default.

Here is the full audit to get your config into an explicit, deliberate state:

Read my openclaw.json and list every place a model is specified or implied: primary model, fallback chain, heartbeat model, each cron job model, compaction model, memory extraction model, any subagent defaults. For any context where no model is explicitly set and a default will be inherited, tell me what default will be used. I want to see every model decision in my config, explicit and implicit.

After you have the full list, the decision for each context is straightforward:

Background tasks that do not interact with users: free local model (Ollama llama3.1:8b or equivalent). No API cost, no rate limit, no quality requirement beyond “can it follow simple instructions.”
Compaction: a capable model that produces dense, accurate summaries. Losing context here costs more (in retries and confusion) than spending slightly more on a good summarization model.
Memory extraction: a capable instruction-following model. Poor extraction means poor recall, which means the agent needs more turns to remember context it should already have.
Fallback chain: same-model-different-provider first, then same-tier-different-model, then local last resort.
Primary conversational model: the most capable model your budget supports for your main use cases.

Based on the model routing audit, create the optimal config for my setup. Set every model context explicitly. Do not leave any context using an inherited default. Show me the complete config diff before applying anything.

Quick-reference: which cause matches your symptom

If you are not sure which of the five causes applies to your situation, this table maps symptoms to causes.

What you observed	Most likely cause	Where to look
Agent changed tone or writing style mid-conversation	Model switch (rate limit or provider error)	Gateway logs, Cause 1 or Cause 2
Agent forgot instructions it had at the start of the session	Compaction removed earlier context	Cause 5
Responses became slower, then quality dropped	Primary model was slow (timeout), switched to faster fallback	Cause 2
Happened at the same time every day or every hour	Cron job firing and using a different model	Cause 4
API bill spiked even though you used the agent less	Fallback switched to a more expensive model	Cost section
Agent stopped using tools correctly mid-task	Fell back to a model that does not support tool use	Cause 3
Quality dropped only on cron job or heartbeat responses	Heartbeat or cron is set to a weak model	Cause 4

After matching your symptom to a cause, jump to that section and run the diagnostic paste. Most of these diagnoses take under two minutes, and the fix takes another five.

Stop paying for the wrong model by accident.

Cheap Claw: $17

The complete model routing and cost control guide for OpenClaw operators: fallback chain design, background task routing, heartbeat cost reduction, compaction tuning, and the rate limit audit checklist. Everything formatted to paste directly into your agent.

Get Cheap Claw

Questions people actually ask about this

My agent keeps ending up on a local Ollama model. How do I stop it?

Your primary model is hitting either a rate limit or a provider error, and the fallback chain is stepping all the way down to your local model. Check the logs for 429 errors from your primary provider. If you see them, the fix is reducing background API call traffic (heartbeats, cron jobs, compaction) that competes with your conversation for the rate limit quota.

Check the gateway logs for 429 errors from my primary provider in the last 12 hours. Count how many rate limit hits occurred per hour and tell me what triggered each one. Then show me what changes would eliminate the most rate limit hits.

How do I know if the model switch is costing me money?

Fallback models are not always cheaper than your primary. Some fallback configurations switch from a cheap model to a more expensive one. Always check the per-token cost of every model in your fallback chain. If the fallback is more expensive than the primary, a rate limit hit on the primary is actually costing you more per token than if the request had just waited and retried.

Show me the cost per million input tokens and per million output tokens for each model in my fallback chain, in chain order. Are any fallback models more expensive than my primary model? If so, flag them.

Can I set different fallback chains for different types of tasks?

Not directly through the fallback chain config, which applies globally. But you can control model routing per task by specifying the model in cron job configs, heartbeat config, and compaction config. For conversation-level control, use the /model command to override the model for a specific session or ask your agent to route specific task types to specific models as a standing instruction.

The agent switched models and now it will not switch back. How do I reset it?

If the primary model is still unavailable (still hitting rate limits or still erroring), the agent will stay on the fallback. Wait for the rate limit window to reset (usually 1 minute, sometimes 1 hour depending on the provider) or fix the underlying provider error. Once the primary model is available again, the next conversation turn should attempt it. If the agent stays on the fallback even after the primary recovers, start a new session with /new. New sessions always start with the default model from your config.

My primary model is available again but the agent is still on the fallback model. What do I need to do to get it back to the primary? Check whether there is a session-level model override active, and if so, clear it.

How do I see model switching in real time as it happens?

Add an instruction to your AGENTS.md: “Whenever you switch to a fallback model for any reason, announce it at the start of your response with the model name and the reason. Do the same when returning to the primary model.” This turns silent switching into visible behavior. You will always know when a switch happened and why.

Can I set up alerts when the primary model goes down?

Yes. Set up a cron job that runs a simple test call against your primary model on a schedule (every 5 minutes, for example) and sends a Telegram or Discord notification if the call fails. This gives you proactive notification of provider outages rather than discovering them after the agent has been on a fallback for an hour.

Create a cron job that tests my primary model every 5 minutes and sends me a Telegram message if the test fails. The test should be a minimal API call, not a full conversation turn. Show me the cron config before creating it.

Go deeper

CostChoosing a model based on your actual workloadHow to match model capability to task type and avoid overpaying for simple requests.CostHow much does it actually cost to run OpenClaw for a month?A real breakdown of API costs by usage pattern, with the config changes that cut bills in half.CostOpenClaw is making API calls I never asked forWhy background API calls happen and how to identify and control each source.