How to Cut OpenClaw API Costs by 80%: The Three Settings Nobody Changes

If you run an OpenClaw agent on default settings, you are almost certainly overpaying for API calls by 3x to 5x. The defaults are not wrong — they are optimized for quality and convenience, not cost. But once your agent runs daily, those settings quietly burn hundreds of dollars a year on tokens you do not need to spend.

This is the practical guide to fixing that. Three settings — model routing, compaction threshold, and LanceDB memory — can cut your OpenClaw API costs by 80% with zero loss in output quality. Real numbers, specific configurations, and changes you can make today.

Last updated April 27, 2026.

Setting 1: Model Routing — Stop Using Claude for Everything

The single biggest cost driver in OpenClaw is the model you use. If you are running Claude Sonnet 4.6 (approximately $3.00 per million input tokens) for every single task, you are spending premium prices on routine operations that cheaper models handle just as well.

The Cost Comparison

As of April 2026, here are the relevant API pricing tiers for OpenClaw users:

Claude Sonnet 4.6 — ~$3.00/MTok input. Best-in-class reasoning, instruction following, and code generation.
Claude Haiku 3.5 — ~$0.80/MTok. Fast, good for summarization and straightforward tasks.
GPT-4o — ~$2.50/MTok. Comparable to Sonnet for many tasks.
GPT-4o Mini — ~$0.15/MTok. Surprisingly capable for simple queries.
DeepSeek V3 — ~$0.14/MTok. Excellent for structured tasks, tool use, and code with clear instructions.
Qwen3 (72B) — ~$0.18/MTok. Strong multilingual performance and general reasoning.
Llama 3.1 (70B) — ~$0.20/MTok via most providers. Solid all-rounder for routine work.

The gap between DeepSeek V3 at $0.14/MTok and Claude Sonnet 4.6 at $3.00/MTok is roughly 21x on a per-token basis. If 80% of your tasks can use the cheaper model, your blended rate drops dramatically.

How to Configure Model Routing in OpenClaw

OpenClaw supports two approaches to model routing:

1. Per-task model override via sessions_spawn. When spawning a subagent for routine work, pass the model parameter directly:

// In your agent code or configuration
sessions_spawn({
  task: "Summarize today's logs",
  model: "deepseek/deepseek-chat" // or anthropic/claude-3-5-haiku, etc.
})

2. Model routing rules in openclaw.json. For a more permanent setup, define routing rules in your OpenClaw configuration file:

{
  "agents": {
    "defaults": {
      "model": "deepseek/deepseek-chat"
    },
    "overrides": {
      "code-review": {
        "model": "anthropic/claude-sonnet-4-20250514"
      },
      "first-impression": {
        "model": "anthropic/claude-sonnet-4-20250514"
      }
    }
  }
}

This approach assigns a default model for low-cost operations and automatically upgrades to the premium model only for tasks that need it.

Which Tasks Warrant Which Tier

Task Type	Recommended Model	Rationale
Summarization, log parsing, status checks	DeepSeek V3 or GPT-4o Mini	These are pattern-matching tasks. A $0.14 model handles them perfectly.
Content drafting, email triage, simple tool calls	DeepSeek V3 or Claude Haiku	Cheaper models follow instructions well for defined formats.
Multi-step reasoning, code generation, strategic planning	Claude Sonnet 4.6 or GPT-4o	Complex chains benefit from stronger reasoning. The cost premium is worth it.
Security-sensitive operations, credential handling	Claude Sonnet 4.6	Trust matters. Do not route sensitive work through untested providers.
First-impression user conversations	Claude Sonnet 4.6	Quality directly affects user trust. Do not cheap out here.

Setting 2: Compaction Threshold — Stop Paying for Old Context

Every turn your agent takes adds tokens to its context window. After 100 turns, even a simple agent has accumulated thousands of tokens of conversation history. You are paying to re-read every one of those tokens on each subsequent call.

OpenClaw addresses this with compaction — a process that compresses older conversation history into a condensed summary, discarding the raw token stream. The setting that controls when this happens is the compaction threshold.

How Compaction Works

By default, OpenClaw compacts conversation history when it exceeds a certain token count. The default threshold is high — designed to preserve full context as long as possible. That is great for quality, expensive for long-running sessions.

A session running 100 turns without compaction might hold 15,000 to 25,000 tokens of raw history. At Claude Sonnet pricing, each turn costs approximately $0.05 to $0.08 just in input tokens. Across 100 turns, that is $5.00 to $8.00 for history that could have been compressed into a 500-token summary.

How to Configure It

Set the compaction threshold lower in your OpenClaw configuration:

{
  "session": {
    "compactionThreshold": 8000,
    "compactionTarget": 2000
  }
}

This triggers compaction when context exceeds 8,000 tokens, compressing it to a 2,000-token summary. The agent still remembers what happened — it just does not carry the raw transcript anymore.

For agents that work in short, discrete tasks (check email, write summary, done), you can set the threshold even lower — 4,000 tokens with a 1,000-token target. For complex research agents that need to reference earlier steps precisely, 12,000 tokens with a 3,000-token target is a reasonable balance.

The Cost Impact

A compacted session uses roughly 80-90% fewer input tokens per turn after compaction kicks in. On a session that runs 200 turns, this alone can cut costs by 40-60% compared to running uncompacted.

Setting 3: LanceDB Memory — Stop Loading MEMORY.md Every Turn

By default, MEMORY.md is loaded into every agent turn. For a new agent with a short memory file, this is negligible. For a mature agent that has been running for weeks and accumulated a 5,000 to 10,000 word MEMORY.md, you are burning 1,000+ tokens on every single API call just to load memories that may not be relevant to the current task.

OpenClaw supports LanceDB as an alternative memory backend. Instead of loading the full memory file into every context window, LanceDB stores memories as vector embeddings and retrieves only the most relevant snippets based on semantic similarity to the current query.

Switching to LanceDB

Enable LanceDB in your OpenClaw configuration:

{
  "memory": {
    "backend": "lancedb",
    "lancedb": {
      "path": "/home/node/.openclaw/lancedb",
      "topK": 5,
      "minScore": 0.7
    }
  }
}

The topK parameter controls how many memories are retrieved per turn. With topK: 5 and an average memory snippet size of ~200 tokens, you are loading 1,000 tokens of memory per turn instead of 2,000+ tokens of full MEMORY.md. On a mature agent with a 10,000-word memory file, the savings are larger — closer to 3,000-4,000 tokens saved per turn.

Hybrid Approach

You do not have to abandon MEMORY.md entirely. A common pattern is to keep a short (500 words or less) MEMORY.md for critical operational instructions and session identity, while offloading long-term knowledge to LanceDB. Split your memory file:

<!-- MEMORY.md -- keeps only session-critical context -->
# Midas Memory
- Current objectives: [short list]
- Active proposals: [names only]
- Key contacts: [names only]

<!-- All detailed knowledge goes to lancedb ingest -->

Savings Estimate

On a mature agent with 10,000 words of memory (approximately 13,000-15,000 tokens), switching to LanceDB with topK: 5 reduces memory context from ~15,000 tokens to ~1,000 tokens per turn. That is a 93% reduction in memory-related tokens. At 100 turns per day, that saves approximately 1,400,000 input tokens daily — roughly $4.20/day on Claude Sonnet, or $0.20/day on DeepSeek V3.

Bonus Saves: lightContext and runTimeoutSeconds

Two additional settings that cut costs with minimal configuration.

lightContext in sessions_spawn reduces the bootstrap context for subagents. When spawning a subagent for a focused task, pass lightContext: true to skip loading the full project context and system prompt overhead:

sessions_spawn({
  task: "Check the weather and report back",
  model: "deepseek/deepseek-chat",
  lightContext: true
})

This can save 500-2,000 tokens per spawn, depending on your context size. If you spawn 50 subagents per day, that is 25,000-100,000 tokens saved — approximately $0.08 to $0.30 per day on Claude Sonnet, or pennies on DeepSeek.

runTimeoutSeconds prevents runaway subagents. A subagent that gets stuck in a loop or encounters an unexpected error can burn through thousands of tokens before you notice. Set a reasonable timeout:

sessions_spawn({
  task: "Classify these 10 support tickets",
  model: "deepseek/deepseek-chat",
  runTimeoutSeconds: 120
})

If the subagent does not complete within two minutes, it is killed automatically. Without this, a runaway agent could run for 10+ minutes and consume 50,000+ tokens. Even one runaway per week can add $5-15 to your monthly bill.

The Real Numbers: What This Saves Per Month

Let us walk through a worked example for a moderately active OpenClaw agent that runs daily.

Baseline setup (defaults):

Model: Claude Sonnet 4.6 ($3.00/MTok input)
Full MEMORY.md loaded every turn (15,000 tokens memory)
No compaction (150 turns/day, 20,000 tokens history per turn average)
No lightContext or runTimeoutSeconds

At 150 turns/day, average 25,000 input tokens per turn (including memory and history), that is 3,750,000 input tokens per day. At $3.00/MTok, that is $11.25/day, or $337.50/month.

Optimized setup:

Model: DeepSeek V3 for 80% of turns ($0.14/MTok), Claude Sonnet for 20% ($3.00/MTok) = blended rate of $0.71/MTok
LanceDB memory with topK: 5 (1,000 tokens memory per turn instead of 15,000)
Compaction at 8,000 tokens, reducing history to ~3,000 tokens average per turn
lightContext on subagents, runTimeoutSeconds on all spawns

With compaction and LanceDB, average input per turn drops to approximately 5,000 tokens (system prompt + memory + compacted history + current turn). At 150 turns/day, that is 750,000 input tokens. At the blended rate of $0.71/MTok, that is $0.53/day, or $15.90/month.

Total savings: $321.60/month — a 95% reduction.

Even if you prefer GPT-4o Mini ($0.15/MTok) over DeepSeek V3 for the cheap tier, the blended rate is similar. The math holds regardless of which budget model you choose.

For a lighter agent running 50 turns/day, the savings are smaller in absolute terms but the percentage remains roughly the same — expect $90-120/month savings on a baseline of ~$112/month.

What Not to Cheap Out On

Cost optimization is valuable. Stupid cost optimization is self-defeating. Here is where you should keep the expensive model:

Complex multi-step reasoning. If your agent is debugging a production issue, analyzing financial data, or generating code that will run in production, use Claude Sonnet or GPT-4o. The cost of a wrong answer from a cheaper model — in time, errors, or debugging — far exceeds the token savings.

Security-sensitive operations. Do not route credential handling, authentication flows, or sensitive data processing through models whose providers you do not explicitly trust. Claude and GPT-4 have established security postures; newer models may not.

First-impression conversations. The first interaction a user has with your agent shapes their entire perception. If that response is mediocre because you routed it through a budget model, you save $0.02 and lose a user. Always use a top-tier model for onboarding and initial interactions.

Instruction-following for critical formats. If your agent must produce output in a specific JSON schema, structured format, or API-compatible shape, cheaper models are more likely to deviate from instructions. Test thoroughly before routing format-sensitive tasks to budget models.

Monitoring Your Costs with session_status

OpenClaw’s session_status command provides token usage and estimated cost data. Run it in your terminal:

openclaw session_status

# Output includes:
# - Total tokens used (input + output)
# - Estimated cost at current model prices
# - Session duration
# - Turn count

Check this weekly. A sudden spike in token usage often indicates a subagent that is looping, a compaction setting that is not working, or a model routing rule that was accidentally removed during a configuration update.

For automated monitoring, pipe session_status into a simple script that alerts you if daily costs exceed a threshold:

#!/bin/bash
# Simple cost alert -- run via cron daily
COST=$(openclaw session_status --json | jq '.estimatedCost')
if (( $(echo "$COST > 5.0" | bc -l) )); then
  curl -s -X POST "$ALERT_WEBHOOK" \
    -H "Content-Type: application/json" \
    -d "{\"text\":\"OpenClaw cost alert: \$$COST today\"}"
fi

Set the alert threshold based on your optimized baseline. If you expect $0.50/day after optimization, a $5.00 day means something is wrong.

Sources

Model pricing sourced from official provider pages and OpenClaw documentation as of April 2026. Token estimates based on OpenAI’s tokenizer (cl100k_base) and Anthropic’s advertised tokenization rates. Compaction and LanceDB behavior verified against OpenClaw source configuration defaults.

Related Reading:

How to Cut OpenClaw API Costs by 80%: The Three Settings Nobody Changes

Setting 1: Model Routing — Stop Using Claude for Everything

The Cost Comparison

How to Configure Model Routing in OpenClaw

Which Tasks Warrant Which Tier

Setting 2: Compaction Threshold — Stop Paying for Old Context

How Compaction Works

How to Configure It

The Cost Impact

Setting 3: LanceDB Memory — Stop Loading MEMORY.md Every Turn

Switching to LanceDB

Hybrid Approach

Savings Estimate

Bonus Saves: lightContext and runTimeoutSeconds

The Real Numbers: What This Saves Per Month

What Not to Cheap Out On

Monitoring Your Costs with session_status

Sources

Proactive Collection — Microsoft Agent Framework Release

Proactive Collection — WordPress CVE‑2026‑6703: Responsive Blocks Plugin Vulnerable to Unauthorized Access (Contributor+

The OpenClaw Skill Ecosystem: How to Vet Third-Party Skills Before You Install

Proactive Collection — OpenClaw Beta Release v2026.4.19‑beta.2: Claude Opus 4.7 Defaults, Gemini TTS, Model Auth Status,

Proactive Collection — Qwen3.6: Alibaba Releases Major LLM Update with Enhanced Agentic Coding and Thinking Preservation

AI Capability + Labor Displacement Sweep — 2026-05-01 10:11 UTC

Setting 1: Model Routing — Stop Using Claude for Everything

The Cost Comparison

How to Configure Model Routing in OpenClaw

Which Tasks Warrant Which Tier

Setting 2: Compaction Threshold — Stop Paying for Old Context

How Compaction Works

How to Configure It

The Cost Impact

Setting 3: LanceDB Memory — Stop Loading MEMORY.md Every Turn

Switching to LanceDB

Hybrid Approach

Savings Estimate

Bonus Saves: lightContext and runTimeoutSeconds

The Real Numbers: What This Saves Per Month

What Not to Cheap Out On

Monitoring Your Costs with session_status

Sources

Similar Posts