OpenClaw Model Routing: How to Use Different LLMs for Different Tasks

OpenClaw Model Routing: How to Use Different LLMs for Different Tasks

If you have been running OpenClaw for more than a few days, you already know the problem. You set up a single model in your config. It handles everything. Drafting a newsletter. Reasoning through a multi-step analysis. Summarizing a web page. Running a cron job that checks your server health. All on the same model, at the same price point, at the same speed.

That one-model approach works. But it is expensive when it does not need to be, and slow when speed matters most. The alternative is model routing: using different large language models for different tasks, and having your agent choose intelligently based on the work at hand.

OpenClaw supports multiple LLM providers out of the box. You can configure a cheap, fast model for 90 percent of your daily workloads and reserve expensive, high-capability models for the handful of tasks that actually need them. This article walks through every layer of model routing available in OpenClaw as of April 2026, from the default model setting to per-session overrides, instruction-based routing, cron job model selection, and reasoning mode toggling.

What Model Routing Is and Why It Matters

Model routing is the practice of matching a task to the most appropriate language model rather than running everything through a single model. Think of it as putting the right tool on the workbench for each job rather than using a sledgehammer for every nail.

Language models vary dramatically in cost, speed, and capability. DeepSeek V3 costs $0.14 per million input tokens and handles most standard agent tasks — writing, analysis, tool use, summarization — with excellent quality. Claude Sonnet 4.6 costs roughly 21 times more at $3 per million tokens but offers stronger reasoning reliability and better performance on complex multi-step instructions. DeepSeek R1, at $0.55 per million tokens, is slower but produces deeper reasoning chains for tasks that benefit from extended deliberation.

There are three concrete benefits to routing well. First, cost reduction. If you route 90 percent of your agent’s work to a cheap model and 10 percent to an expensive one, your total API spend drops by roughly 80 to 90 percent compared to running everything on the expensive model. Second, speed. Cheap models are often faster in practice because they have lower inference latency and lower queue contention. Third, reliability. A model that is good at creative writing may produce garbled JSON when you ask it to format structured data. Routing lets you send structured data tasks to a model that handles them well and writing tasks to a model that writes well.

The cost asymmetry between models is only growing wider. Frontier models get more expensive as they add reasoning capabilities and longer context windows. Commodity models get cheaper as inference optimizations improve. Routing captures the best of both worlds: you pay premium prices only when premium capability is genuinely required.

The OpenClaw Model Tier Map (April 2026)

OpenClaw does not restrict which models you can use. Any model accessible through an API provider or a local runtime can be configured. However, the ecosystem has settled into four clear tiers based on real-world usage by the OpenClaw community. These tiers are defined by cost per million tokens, capability profile, and typical use cases.

Tier Model Cost per MTok Best For Worst For
Fast/Cheap DeepSeek V3 (ds-v3) $0.14 Writing, analysis, summarization, standard tool use, data extraction, newsletter drafting, content generation Complex multi-step reasoning, high-stakes code review, sensitive communications
Balanced Claude Sonnet 4.6 (sonnet) $3.00 Complex reasoning, sensitive communications, code review, structured output with strict formatting, tasks requiring high reliability High-volume bulk processing (too expensive), rapid iterative tasks (too slow per call)
Reasoning DeepSeek R1 (ds-reason) $0.55 Multi-step analysis, mathematical reasoning, chain-of-thought tasks, research synthesis, tasks that benefit from extended deliberation Real-time chat, high-throughput tool use, latency-sensitive operations
Local/Free Ollama models $0.00 Private data processing, offline operation, prototyping, high-volume experimentation, air-gapped environments Complex reasoning with small models, tasks requiring up-to-date knowledge, high-quality creative writing

These tiers are not rigid. Some users run Claude Sonnet as their primary model because they value reliability above cost. Others run DeepSeek V3 for everything and find it sufficient. The point of routing is not to force a particular split but to give you the tools to make that split yourself based on your actual workloads and budget.

Setting a Default Model in openclaw.json

The foundation of any model routing setup is the default model. This is the model that handles every request unless an override is in effect. You set it in the model field of your openclaw.json configuration file.

Here is a minimal configuration with DeepSeek V3 as the default:

{
  "model": "ds-v3",
  "apiKeys": {
    "deepseek": "sk-your-deepseek-api-key",
    "anthropic": "sk-ant-your-anthropic-api-key"
  }
}

Every agent task that does not have a model override will use DeepSeek V3 by default. This is the workhorse tier: cheap enough to run continuously, capable enough to handle the vast majority of daily agent work.

If you prefer to start with Claude Sonnet as the default and route cheap tasks down, the configuration is similar:

{
  "model": "sonnet",
  "apiKeys": {
    "anthropic": "sk-ant-your-anthropic-api-key",
    "deepseek": "sk-your-deepseek-api-key"
  }
}

The key principle is the same regardless of which model you choose as default. Your default handles everything that is not explicitly routed elsewhere. Make it the model that handles your most common task profile well at the lowest acceptable cost.

Per-Session Model Override: The /model Command

Sometimes you need a different model for a single session without editing any configuration files. OpenClaw supports this through the /model slash command, which can be typed directly in the chat interface.

To switch to a specific model for the current session:

/model sonnet

This overrides the default model for every interaction within that session until you reset it or the session ends. The override applies to all messages, tool calls, and subagent tasks spawned during the session.

To reset back to the default model defined in openclaw.json:

/model default

To see which model is currently active:

/session_status

The /session_status output displays the active model identifier and indicates whether reasoning mode is enabled. This is useful for confirming that your override took effect before starting a task that requires a specific model capability.

Common use cases for per-session overrides include: dropping into a reasoning model for a one-time complex analysis, switching to a cheaper model for a long-running batch task, or testing a new model before committing to it in your configuration.

Instruction-Based Routing: Teaching Your Agent When to Escalate

The most powerful routing mechanism in OpenClaw does not involve commands or configuration changes at all. It lives in your agent instructions. By specifying model preferences directly in your agent’s SOUL.md or AGENTS.md files, you teach the agent to choose the right model based on the nature of the task it is performing.

This is instruction-based routing. The agent reads its own instructions and makes real-time decisions about which model to invoke for each subtask. It works because OpenClaw agents can spawn sub-agents or delegate subtasks to specific models when instructed to do so.

Here is an example from an AGENTS.md file that implements instruction-based routing:

## Model Routing Instructions

Use ds-v3 for:
- Drafting newsletter content
- Summarizing web pages and documents
- Data extraction and formatting
- Standard tool calls and API operations
- Writing Slack messages and responses
- Generating analysis summaries

Use sonnet for:
- Complex multi-step reasoning tasks
- Code review and debugging
- Writing sensitive communications
- Structured output with strict schema requirements
- Any task where the user explicitly requests higher quality

Use ds-reason for:
- Multi-step analytical workflows
- Mathematical or logical problem solving
- Research synthesis across multiple sources
- Tasks requiring chain-of-thought deliberation

The agent follows these instructions when deciding how to execute tasks. If a user asks for a newsletter draft, the agent routes to DeepSeek V3. If a user asks for a complex code audit, the agent escalates to Claude Sonnet 4.6. The routing happens automatically, without any manual intervention.

More sophisticated agents can include fallback logic:

## Escalation Rules

If ds-v3 fails to produce a satisfactory result (e.g., output contains errors, fails schema validation, or the task requires iterative refinement), retry with sonnet automatically before reporting failure to the user.

This approach keeps the routing logic version-controlled alongside your agent’s behavior and does not require any changes to openclaw.json or the agent’s runtime configuration.

Cron Job Routing: Different Models for Different Scheduled Tasks

OpenClaw supports cron jobs for scheduled agent tasks, and each cron entry can specify a different model. This is useful when you have recurring tasks with very different capability and cost profiles.

A single cron set up that routes different tasks to different models looks like this:

{
  "cron": [
    {
      "schedule": "0 6 * * *",
      "task": "daily-newsletter-draft",
      "model": "ds-v3"
    },
    {
      "schedule": "0 8 * * 1",
      "task": "weekly-security-audit",
      "model": "sonnet"
    },
    {
      "schedule": "0 7 * * 3",
      "task": "monthly-financial-analysis",
      "model": "ds-reason"
    },
    {
      "schedule": "*/15 * * * *",
      "task": "server-health-check",
      "model": "ds-v3"
    }
  ]
}

The newsletter draft runs on DeepSeek V3 at $0.14 per million tokens because it is a high-volume task that does not need frontier-model reasoning. The weekly security audit runs on Claude Sonnet 4.6 because it needs reliable tool use and careful code inspection. The monthly financial analysis runs on DeepSeek R1 because it benefits from extended deliberation over multiple data points. The server health check runs every 15 minutes on the cheapest model because it is high-frequency and low-complexity.

If your cron entries do not specify a model, they inherit the default model from openclaw.json. You can also omit the model field from cron entries that match your default and only specify it for the exceptions.

You can also route cron tasks through instruction-based routing by omitting the model field from cron and relying on the agent’s instructions to select the appropriate model based on the task description. This is slightly less explicit but more flexible if your agent’s routing logic is sophisticated.

Reasoning Mode: When to Enable Extended Thinking

Beyond model selection, OpenClaw supports a reasoning mode toggle that enables extended thinking for models that support it. Reasoning mode is not the same as switching to a reasoning-optimized model like DeepSeek R1. It is a per-session flag that tells the model to spend more inference compute on each response, producing deeper chain-of-thought reasoning before generating the final output.

To enable reasoning mode:

/reasoning

This toggles the mode on or off for the current session. When reasoning mode is active, the model uses extended thinking on every response. When it is off, the model responds normally.

Use reasoning mode when:

  • The task requires careful multi-step logic
  • You need the model to show its work before arriving at a conclusion
  • The cost of an incorrect answer is high enough to justify additional inference spend
  • You are working on mathematical, analytical, or research tasks

Disable reasoning mode when:

  • Speed matters more than depth
  • The task is straightforward (summarization, drafting, simple tool calls)
  • You are doing high-volume processing and cost is a concern
  • The model you are using does not benefit from extended thinking

Reasoning mode interacts with model routing in an important way. If you route a task to a cheap model and enable reasoning mode, you may negate some of the cost benefit because extended thinking increases token consumption. A better approach is to route simple tasks to cheap models without reasoning mode, and reserve reasoning mode for expensive models where the additional inference cost is justified by the task value.

Cost-Based Routing Strategy: The 90/10 Rule

The most practical model routing strategy for OpenClaw users is the 90/10 split. Route 90 percent of your agent’s tasks to a cheap model and 10 percent to an expensive, high-capability model. This gives you roughly 80 to 90 percent cost reduction compared to running everything on the expensive model while maintaining quality on the tasks that genuinely need it.

Here is a worked cost example for a typical OpenClaw setup running 100 million tokens per month.

Scenario A: All Sonnet

100 million tokens x $3.00 per million = $300.00 per month

Scenario B: 90/10 Split (ds-v3 / sonnet)

90 million tokens x $0.14 per million = $12.60
10 million tokens x $3.00 per million = $30.00
Total: $42.60 per month

Scenario C: 90/10 + 5 percent reasoning reserve (ds-v3 / sonnet / ds-reason)

90 million tokens x $0.14 per million = $12.60
5 million tokens x $3.00 per million = $15.00
5 million tokens x $0.55 per million = $2.75
Total: $30.35 per month

Going from Scenario A to Scenario B saves $257.40 per month — an 86 percent reduction. Going from Scenario A to Scenario C saves $269.65 per month — a 90 percent reduction. For higher-volume setups running 500 million tokens per month, the savings scale linearly: $1,500 per month on all-Sonnet drops to roughly $151.75 per month on the three-tier split.

These savings come with a caveat: the 90/10 split assumes you have correctly identified which tasks belong in which tier. If you route complex reasoning to the cheap model, you will get poor results and waste time redoing work. If you route simple tasks to the expensive model, you waste money. The routing must be accurate for the economics to work.

Start conservative. Route 80 percent to the cheap model and 20 percent to the expensive model until you have enough data to tighten the split. Monitor task completion quality and adjust as your agent’s routing instructions improve.

When NOT to Use Model Routing

Model routing is not free. It adds complexity to your agent configuration, increases the surface area for bugs, and requires ongoing maintenance as models change prices and capabilities. There are cases where a single model is the better choice.

Simple, single-purpose agents. If your agent has one job — monitoring server health, sending daily reminders, or processing a single data pipeline — a single model is simpler and more reliable. Routing adds no value when the task profile is uniform.

Evaluation and testing. When you are evaluating a new model or testing agent behavior, a single model removes a variable. Add routing after you have validated the core behavior.

Low-volume usage. If you run fewer than 10 million tokens per month, the cost difference between models is small enough that routing complexity may not be worth it. Ten million tokens on Sonnet costs $30. On DeepSeek V3 it costs $1.40. The savings are real but the absolute dollar amount may not justify the configuration overhead.

Experimental or prototype agents. Routing adds a dependency on having multiple API keys configured and tested. For prototypes, just pick one model and move fast.

Deterministic workflows on local models. If you run exclusively on Ollama with local models, there is no cost differential between models on your hardware (they both use your GPU cycles). Routing by cost does not apply, and routing by capability only matters if you run multiple local models of different sizes.

The rule of thumb: add routing when you can point to a specific pain point it solves. Too much spend? Routing. Too many wrong answers from a weak model? Routing. Configuration getting complicated? Maybe you did not need it in the first place.

Sources

Pricing data for DeepSeek and Anthropic models sourced from official API pricing pages as of April 2026. OpenClaw configuration behavior verified against the OpenClaw gateway source code and documentation. The /model, /session_status, and /reasoning commands are built into the OpenClaw gateway and documented in the project README and command reference.

Related Reading

Similar Posts