How much does it actually cost to run OpenClaw for a month?

Nobody publishes a straight answer to this because it depends on your model choices more than anything else. The difference between a well-configured setup and a poorly configured one is not 20 percent. It is an order of magnitude. This article uses real, current pricing to show you what each component actually costs and why local models change the math completely.

TL;DR

  • Model choice is the entire ballgame. Claude Sonnet costs $3/M input and $15/M output tokens. DeepSeek V3 costs $0.28/M input and $0.42/M output. Local Ollama models cost nothing per token.
  • Background processes are a hidden cost center. Heartbeat, memory extraction, and compaction all make model calls. On Sonnet, heartbeat alone can run $15 to $20/month without a single user conversation.
  • The right split: Local models for all background work, DeepSeek V3 for routine conversations, Kimi K2 for tasks that need near-Sonnet quality at a fraction of the price, Sonnet only when it genuinely cannot be avoided.
  • A well-configured setup: Under $10/month for active daily use. A poorly configured one: $100 or more for the same workload.

Throughout this article you will see indented blocks like the ones below. Each one is a command you can paste directly into your OpenClaw chat. Your agent will run it and report back. You do not need to open a terminal or edit any files manually.

Prices as of March 2026, verified from official provider documentation. All figures are per million tokens.

Model Input / 1M tokens Output / 1M tokens Best for RAM needed
Cloud API models (pay per token)
Claude Sonnet 4.6 $3.00 $15.00 Complex multi-tool tasks, long-context work with citations Closed weights. API only. Cannot be self-hosted.
Claude Haiku 4.5 $1.00 $5.00 Anthropic ecosystem tasks where Sonnet is overkill Closed weights. API only. Cannot be self-hosted.
Kimi K2 (Moonshot API) $0.55
$0.15 cached
$2.20 Sub-agents, research tasks, near-Sonnet quality at lower cost. Verbose on open-ended outputs. Open weights. Self-host requires ~375GB RAM+VRAM (Q2 quant) across multiple high-end GPUs. Practical only at data center scale.
Kimi K2.5 (OpenRouter) $0.45 $2.20 Same as K2. 262K context window. Slightly lower price via OpenRouter. Open weights. Same self-host requirements as K2 (~375GB RAM+VRAM). Use the API unless you have serious infrastructure.
DeepSeek V3 (DeepSeek API) $0.28
$0.028 cached
$0.42 Default model for most conversations, research, writing, tool use. Best cost-to-quality ratio for daily use. Open weights (MIT). 671B MoE. Self-host requires ~380GB VRAM minimum (Q4 quant), multiple H100s recommended. Practical only at data center scale.
Local models via Ollama (free per token, hardware required)
Llama 3.1 8B $0.00 $0.00 Heartbeat checks, status queries, simple yes/no decisions, anything that doesn’t need the internet ~6GB RAM
CPU: yes. Runs fine on a modern laptop or low-cost VPS. Slower than GPU but fully usable for background tasks.
Phi-4 14B $0.00 $0.00 Summaries, drafts, compaction, sub-agents. Better quality than 8B. For memory extraction (autoCapture), results vary. Some operators find a stronger API model produces meaningfully better captures. Start here and upgrade if recall quality disappoints. ~10GB RAM
CPU: yes, but slow. Works on CPU with enough RAM, but responses take noticeably longer. Best on a machine with 16GB+ RAM. Acceptable for background tasks, not ideal for real-time conversation.
Qwen 2.5 Coder 7B $0.00 $0.00 Code generation, script writing, debugging. Purpose-built for code tasks. ~5GB RAM
CPU: yes. One of the more CPU-friendly options. Smaller than the chat models and purpose-built, so it tends to give focused answers without burning through tokens.
nomic-embed-text $0.00 $0.00 Memory embeddings for the LanceDB memory plugin. Not a chat model. Runs automatically in the background. ~300MB RAM
CPU: yes, no issue. Extremely lightweight. Runs in the background on any hardware without noticeable impact.

Ollama RAM figures are approximate for Q4 quantized versions (the default when you run ollama pull). Full-precision weights require more. CPU inference works but is slower than GPU. Expect a few seconds per response for 8B models, longer for 14B. Prices as of March 2026. Check provider docs for current rates.

The ratio between Sonnet and DeepSeek V3 on input tokens is roughly 10 to 1. On output tokens it is about 35 to 1. Kimi K2 sits in the middle: better quality than DeepSeek V3 for complex tasks, about 5 to 6 times cheaper than Sonnet on input. The catch is output verbosity, covered below. The local models cost nothing per token regardless of how often they run, which is why routing background work to them has an outsized impact on the monthly total.

Ollama lets you run open-source models locally on your own hardware. If you are running OpenClaw on a VPS, Ollama runs on the same server. Every call that goes to a local model costs zero in API fees. The only cost is the VPS you are already paying for.

This matters most for the processes that run constantly in the background. Heartbeat, memory extraction, and compaction are not doing complex reasoning. They are doing structured tasks on short inputs: check if anything needs attention, extract a few facts from a conversation snippet, summarize some old turns. A capable local model handles all three reliably.

The practical split for most setups:

  • Local models (free): Heartbeat, memory extraction via autoCapture, compaction summarization, simple formatting tasks, status checks, file reads
  • DeepSeek V3 ($0.28/M input): Most conversations, routine research, writing tasks, anything that needs internet access or tool use but not frontier reasoning
  • Kimi K2 / K2.5 ($0.45 to $0.55/M input): Sub-agents and background tasks where you need near-Sonnet quality but want to avoid Sonnet pricing. A useful middle tier when DeepSeek V3 falls short and Sonnet feels like overkill.
  • Sonnet or equivalent ($3.00/M input): Complex multi-tool tasks, long-context work with citations, anything where a cheaper model has failed or you judge it will

The Kimi K2 verbosity problem

Kimi K2 and K2.5 benchmark near Claude Opus on reasoning and coding tasks at roughly a fifth of the price. On paper that looks like the obvious default model. In practice there is a catch: these models are verbose. Where Sonnet might answer in 200 output tokens, K2 often uses 1,000 or more for the same task. Output tokens on K2 cost $2.20 per million. A task that looks cheap based on input token count can end up costing more than you expect once output is factored in.

This does not mean K2 is a bad choice. It means it is the right choice for the right tasks. Use it where thoroughness matters and token efficiency does not: sub-agent tasks, complex research passes, anything where a partial answer is worse than a long one. Avoid it for heartbeat, extraction, compaction, and any high-frequency background work where output length is unpredictable.

If I switched my default model to Kimi K2, which of my current tasks would likely generate long output responses? Based on my usage patterns, would K2 end up cheaper or more expensive than my current setup?

Read my OpenClaw config. Tell me: what model is set as default, what model handles heartbeat, what model handles memory extraction, and what model handles compaction. For each one that is set to a paid API model, flag it.

These estimates use real pricing and realistic token counts. Your actual numbers will vary based on session length and how many background processes are active.

Heartbeat

Heartbeat fires on a schedule to keep the session alive and check for pending tasks. A typical heartbeat call uses around 500 input tokens (system prompt plus a short check prompt) and produces around 50 output tokens.

At a 5-minute interval, that is roughly 8,600 heartbeat calls per month.

  • On Sonnet: around $15 to $20 per month, just for heartbeat
  • On DeepSeek V3: around $1 to $2 per month
  • On a local Ollama model: $0

Heartbeat is the fastest win in OpenClaw cost reduction. Route it to a local model and it disappears from your bill entirely.

Memory extraction (autoCapture)

If autoCapture is on, a memory extraction call fires after each conversation turn. It sends a snippet of recent conversation to a model and asks it to pull out facts and preferences. A typical extraction call uses around 1,500 to 2,000 input tokens and produces around 200 output tokens.

For a setup with 50 turns per day across all sessions:

  • On Sonnet: around $12 to $15 per month
  • On DeepSeek V3: around $1 per month
  • On a local Ollama model: $0

Extraction quality on a 14B local model like phi4 is good enough for most setups. Facts and preferences from short conversation snippets do not need a frontier model.

Conversations

This is the part most operators think about, and it is typically not the most expensive part once background processes are handled. A typical active session turn uses around 3,000 input tokens (system prompt plus conversation history plus the user message) and around 500 output tokens.

At 20 turns per day for a month:

  • On Sonnet: around $10 to $12 per month
  • On Kimi K2: around $3 to $8 per month the range is wide because K2 output verbosity varies by task type. Constrained tasks (yes/no decisions, short summaries) land at the low end; open-ended research or writing tasks can hit the high end.
  • On DeepSeek V3: under $1 per month

If you are using Sonnet as your default model for all conversations, you are spending $10 to $12 per month on conversation tokens alone and another $25 to $35 on background processes. That is a $35 to $50 per month setup for what a well-routed setup handles for under $5.

Compaction

Compaction fires when the context window fills. Each compaction call summarizes old conversation content. It is roughly the same cost profile as a medium-length conversation turn. If your context window is small and you run long sessions, compaction fires frequently.

  • On Sonnet with frequent compaction: $5 to $15 per month
  • On a local model: $0

Cron jobs and automation

Each cron run opens a session, runs the task, and closes. A simple daily task (fetch some data, write a summary, send a notification) might use 5,000 to 10,000 tokens total. A complex research task could use 50,000 or more.

At one moderate daily cron job using 8,000 tokens:

  • On Sonnet: around $3 to $5 per month
  • On DeepSeek V3: under $0.30 per month

Infrastructure

A VPS capable of running OpenClaw and Ollama (with a 7B to 14B model) starts around $6 to $12 per month. This is a fixed cost regardless of usage. It is also what makes all local model routing possible, so it effectively pays for itself if you route even one background process away from a paid API.

Usage level Poorly configured Well configured
Light (occasional use, no automation) $20 to $35/month $2 to $6/month
Active (daily sessions, memory on) $50 to $90/month $6 to $15/month
Heavy (overnight cron, multi-agent) $150 to $300+/month $15 to $40/month

“Well configured” means: local models for heartbeat, extraction, and compaction; DeepSeek V3 as the default conversation model; Sonnet or equivalent as an on-demand escalation only when the task needs it.

Based on my current config, estimate my approximate monthly API spend across all components: heartbeat, memory extraction, conversations, cron jobs, and compaction. Flag anything that could be routed to a cheaper or local model.

Routing background processes to local models and switching to a cheap default API model are the two changes that produce the largest spend reductions. The complete playbook for doing both, including fallback chain setup and per-task model routing, is in Cheap Claw.

Cheap Claw

Every cost lever in OpenClaw, ranked by impact. Drop it into your agent and it reads the guide and makes the changes.

Get it for $17 →

Complete fix

Cheap Claw

The complete cost reduction playbook. Every lever, ranked by impact. Operators report significant spend reduction within a week.

Get it for $17 →

Common questions

Is OpenClaw itself free?

OpenClaw is open source and free to run. You pay for the API models it calls and for any server you run it on. There is no OpenClaw subscription fee.

Can I run OpenClaw for free?

Yes, if you run entirely on local Ollama models. Local models have no per-token cost. The tradeoff is quality on complex tasks. For background processes and routine work, local models handle things well. For complex reasoning, long-context analysis, or multi-tool tasks, most operators use a paid API model selectively.

Does Ollama require different hardware than what OpenClaw runs on?

Ollama runs on the same server as OpenClaw. A VPS with 8GB of RAM can run a 7B model comfortably. 16GB gets you into 14B territory, which is noticeably better for extraction and summarization tasks. If you are already running OpenClaw on a decent VPS, you likely have enough headroom for at least a 7B local model.

How do I set a hard cap so I never get a surprise bill?

Set a monthly spend limit in your API provider’s dashboard first. Then configure OpenClaw’s spend limit setting to stop agent activity when a daily threshold is hit. Both layers together mean a runaway cron job or misconfigured background process cannot generate an unexpected bill while you are not looking.

When should I use Kimi K2 instead of DeepSeek V3?

When you need near-Sonnet reasoning quality but cannot justify Sonnet pricing. Kimi K2 benchmarks strongly on coding, analysis, and multi-step tasks. The tradeoff is output verbosity: K2 generates more tokens per response than most models, so tasks with open-ended outputs cost more than the input price suggests. Use K2 for sub-agents and complex background tasks, not for high-frequency work like heartbeat or extraction where output length is unpredictable.

Are these prices stable?

Model pricing changes. The figures in this article reflect official provider documentation and aggregator data as of March 2026. Check the Anthropic, DeepSeek, and Moonshot pricing pages directly before making decisions based on specific numbers. The relative differences between model tiers tend to be more stable than the absolute prices.

Tracking spend over time

A snapshot cost estimate is useful when setting up. Tracking actual spend over time tells you whether costs are stable, trending up, or drifting due to configuration changes.

Set up a weekly cost tracking cron job. Every Sunday evening, read my current API usage stats from my provider dashboard (or check the monthly invoice summary). Write the spend-to-date for the current month to workspace/spend-tracking/YYYY-MM.md. After three months of data, I want to see whether my monthly spend is stable, increasing, or decreasing.

What a cost spike usually means

Sudden cost increases almost always come from one of three places:

  • A new cron job that fires more often than expected. Check the interval and confirm the job is actually running on a local model, not an API model.
  • A session that ran much longer than usual. Long sessions accumulate context; compaction events on large contexts can cost 10-20x a normal session compaction.
  • A model routing failure. If your primary model is unavailable, the fallback chain may route to a more expensive model. Check your fallback configuration and whether any fallback events occurred.

My API spend was higher than expected this month. Help me diagnose the cause. Check: which cron jobs ran most frequently, whether any sessions were unusually long, and whether any fallback model routing occurred. Show me the three most likely sources of the spike.

Model pricing reference (March 2026)

Prices change. These are current as of the article date and should be verified against provider pricing pages before making budget decisions.

Model Input (per 1M tokens) Output (per 1M tokens) Cache discount
DeepSeek V3 $0.27 $1.10 ~75%
Claude Sonnet 4 $3.00 $15.00 ~90%
Claude Opus 4 $15.00 $75.00 ~90%
Kimi K2 $0.15 $2.50 varies
phi4:latest (local) $0 $0 N/A
llama3.1:8b (local) $0 $0 N/A

Given my current model configuration, estimate my monthly cost using current pricing. For each task category, show me which model is handling it and what the monthly cost would be if I switched to the cheapest model that can handle that task category reliably.

Infrastructure costs worth knowing

The model API spend is not the only cost. These infrastructure line items affect the total.

VPS hosting

A minimal VPS that runs OpenClaw with Ollama needs 16GB RAM for phi4:latest and 4+ vCPUs for responsive inference. At common providers (Hetzner, DigitalOcean, Vultr): $15-25/month. The jump to 32GB RAM (for running multiple large local models simultaneously) costs $35-55/month.

Storage

OpenClaw workspace, memory database, and session archives are typically under 5GB. Standard VPS storage is usually sufficient. If you are running large research archives or storing PDFs, you may need supplemental storage: $1-5/month for 100GB at most providers.

Domain and SSL

If running a public-facing setup (not recommended without proper authentication): domain registration ~$10-15/year, SSL via Let’s Encrypt is free.

Total infrastructure per month

For a solo operator on a single VPS: $15-25 for the server plus whatever you spend on API models. A well-optimized setup with local models as the default can run the full stack for $20-30/month total.

Cost breakdown by usage pattern

The monthly cost varies dramatically based on how you use OpenClaw. These three usage profiles cover most operators.

Profile A: Automation-first (minimal interactive use)

Running 10-15 cron jobs daily on local models, with occasional interactive sessions for planning and review. API calls go primarily to compaction and complex reasoning tasks.

  • Cron jobs (Ollama): $0/month
  • Compaction (API): ~$3-8/month depending on session frequency
  • Interactive sessions (API): ~$5-15/month at moderate use
  • Memory extraction (DeepSeek): ~$1-3/month
  • Total: $9-26/month

Profile B: Interactive-heavy (daily working sessions)

Using OpenClaw as a primary work assistant for 2-4 hours daily. Research, writing, analysis. Most work on API models.

  • Interactive sessions (DeepSeek V3): ~$15-35/month
  • Cron and automation (local): $0/month
  • Compaction: ~$5-12/month
  • Memory extraction: ~$2-5/month
  • Total: $22-52/month

Profile C: Agent team (multiple specialized agents)

Running 3-5 specialized agents with dedicated crons, heavy tool use, and frequent API calls.

  • Agent sessions (API): ~$30-60/month
  • Automation (mixed local/API): ~$10-20/month
  • Compaction across all sessions: ~$10-20/month
  • Infrastructure (VPS): ~$10-20/month
  • Total: $60-120/month

Based on how I currently use OpenClaw, which profile most closely matches my usage? Estimate my actual monthly cost based on my real session frequency, cron job count, and model configuration.

Cost reduction quick wins in order of impact

  1. Switch cron jobs to Ollama local models. If you have any background tasks running on API models, switching to phi4:latest costs nothing and typically produces the same results for structured tasks.
  2. Enable prompt caching. Set caching to “short” in your config. For repeated system prompts, this cuts token cost by 60-80% on cache hits.
  3. Tune compaction thresholds. Compaction triggers an API call. Raising the threshold means fewer compaction events per session. Each compaction on a long session costs meaningful tokens.
  4. Route interactive work to DeepSeek V3 as default. DeepSeek V3 costs about 10x less than Claude Sonnet for equivalent quality on most tasks. The switch from Sonnet to V3 as default is the single highest-impact cost reduction available.

Check my current config for each of these four quick wins: local model routing for crons, prompt caching setting, compaction threshold, and default interactive model. Tell me which ones I have not implemented yet and estimate the monthly savings for each.

Cost comparison: self-hosted versus managed

Some operators consider managed OpenClaw hosting rather than self-hosting. The tradeoffs are worth understanding before committing either direction.

Self-hosted (VPS): $15-25/month infrastructure, plus API costs. Full control over config, models, and data. Requires initial setup time of two to four hours and occasional maintenance. Local models are available for zero marginal cost after setup. For operators who run local models as the default for background and automation work, the API spend is dramatically lower than on a managed setup where local models may be unavailable or usage-throttled. This is where most of the cost advantage comes from.

Managed hosting: Higher per-month cost (typically $30-80/month depending on provider), zero setup time, automatic updates handled by the provider. Local models may not be available or may have usage limits built into the plan tier. API costs either pass through at standard rates or are bundled into the monthly fee. The time savings is real but the flexibility tradeoffs are significant for power users who want full control over their model routing and tool permissions.

For operators who intend to run local models as the default for most tasks, self-hosting pays for itself relatively quickly over the course of a few months. For operators who want a completely hands-off setup and plan to use API models exclusively, managed hosting may be worth the premium for the time it saves on maintenance and updates.

Based on my current usage and model configuration, calculate the 12-month total cost for self-hosting versus managed hosting. Include API costs for both scenarios and factor in the time I spend on maintenance at a reasonable hourly rate.

Keep Reading:

Go deeper

OpenClaw is making API calls I never asked for. Here’s what’s causing them.

The four background processes that run without any user action and how to route them to free local models.

Read →

Setting spend limits so your agent stops at night

Hard caps so a runaway task cannot wreck your budget while you are not watching.

Read →

I woke up to a $300 OpenClaw bill and had no idea what caused it

How to audit exactly where your spend is going before you start cutting.

Read →