You checked your API bill and it is noticeably higher than expected. You did not run any major tasks. You were not even at your computer for most of the day. Something is making API calls in the background, without your knowledge and without appearing in your session history. This article covers how to identify what is running, why it is generating more calls than you intended, and how to stop or reduce it without breaking the automation you actually need.
TL;DR
Unexpected background API calls in OpenClaw almost always trace back to one of four sources: cron jobs you set up and forgot about, memory reflection that fires automatically after every session, heartbeat tasks running more frequently than you realized, or a skill or plugin that makes its own background calls. The fastest diagnostic is asking your agent to list all active cron jobs and check which ones ran in the last 24 hours. That alone identifies the source in most cases. The second check is your memory plugin configuration. Between those two checks, you will find the source in under five minutes in the vast majority of cases.
Every indented block in this article is a command you can paste directly into your OpenClaw chat. Your agent will run it and report back. You do not need to open a terminal, edit any files, or navigate any filesystem.
The four most common sources of unexpected API calls
Before running any diagnostic, it helps to know where to look. The vast majority of unexpected API spend in OpenClaw traces back to one of these four sources.
Cron jobs running more often than you remember
A cron job set to run every 15 minutes makes 96 API calls per day. Most operators set up cron jobs when they first configure their instance and stop thinking about them. Weeks later, the bill is higher than expected and nobody remembers that the morning brief is actually running four times an hour because someone fat-fingered the interval. Check your cron jobs first; they are the most likely cause.
List all my active cron jobs. For each one: what does it do, how often does it run, when did it last run, and roughly how many API calls does a single run make? Calculate the daily call volume for each job and give me a total.
Memory reflection after every session
If your memory plugin is set to reflect after every session, it is making additional API calls at the end of each conversation to extract and store memories. On a well-configured instance with a local model handling reflection, this is free. On an instance where reflection uses an API model, it adds a call to every single session end. With an active setup running multiple sessions per day, those reflection calls add up fast.
What is my memory plugin configuration? Is autoCapture or reflection enabled? What model does it use for extraction? Is that model a local model or an API model? How many times per day is reflection typically running based on my session activity?
Heartbeat tasks checking in too frequently
A heartbeat task that runs every 5 minutes and uses an API model for even a simple status check makes 288 API calls per day before you do any actual work. Check what model your heartbeat is using and how frequently it fires.
What is my heartbeat configuration? How often does it run and what model does it use? If it is using an API model for a simple heartbeat check, switch it to a local model (ollama/llama3.1:8b) instead. Show me the change before applying it.
Plugin background calls
Some plugins make their own API calls independently of the main agent. A plugin that syncs data, polls a service, or runs background processing contributes calls that never appear in your interactive session history. Check which plugins are installed and whether any have background activity configured.
List all installed and enabled plugins. For each one: does it make any background API calls? Does it have its own scheduling or polling configuration? Is there any documentation or config that shows what it does between interactive sessions?
Running the full API call audit
The fastest way to identify unexpected API calls is a structured audit that checks all four sources in order. This takes about five minutes and usually identifies the cause immediately.
Run a full API usage audit for me. Check: (1) all active cron jobs with their schedules and models, (2) memory plugin settings including extraction model and frequency, (3) heartbeat configuration and model, (4) all installed plugins and whether any have background activity. For each item that is using an API model for a routine task, flag it and show me what it would cost if I switched to a local model instead. Give me a total estimated daily API call count from background activity alone.
The output of that audit gives you a prioritized list of changes. Start with the highest-volume item. A cron job that runs 96 times per day is more impactful to fix than a reflection pass that runs twice.
Auditing your cron jobs in detail
Cron jobs are the most common source of unexpected API spend. An honest cron job audit covers four questions for each job: what model it uses, how many tokens a typical run consumes, how frequently it runs, and whether it is still serving a purpose.
For each of my cron jobs, tell me: the model it uses (from the job payload or global config default), the approximate number of input and output tokens per run based on the job description and typical outputs, and the estimated daily cost at current model pricing. Sort by estimated daily cost, highest first.
The goal is to find jobs where an expensive model is doing a cheap task. A job that reads a file and sends a one-sentence Telegram notification does not need Claude Sonnet. It needs llama3.1:8b or phi4, which costs nothing.
Which of my cron jobs are using an API model when a local model would be sufficient? For each one: what does the job actually do, is there any reasoning or complex instruction-following required, or is it a simple read-and-send task that a local model handles just as well? Show me the proposed model change for each.
Duplicate cron jobs
A duplicate cron job doubles the API calls for that task with no benefit. Duplicates are created when you ask your agent to set up a job, it partially succeeds, you ask again, and both versions end up enabled. Check for jobs with the same or nearly identical descriptions and schedules.
Are there any duplicate or near-duplicate cron jobs in my config? Look for jobs with the same schedule, the same task description, or the same output destination. If you find duplicates, show me both entries so I can confirm which one to keep before disabling the other.
Jobs set to run far more often than needed
A morning brief that runs every 15 minutes is not a morning brief. A queue processor set to run every minute for a queue that has new tasks once per hour is overkill. Review each job’s interval against what it actually needs.
For each of my cron jobs, is the current run frequency appropriate for the task? A morning brief should run once per day. A queue processor should run at an interval that matches how often new tasks are added. A memory cleanup should run once per week. Flag any jobs where the interval seems much higher than the task actually needs.
Understanding memory reflection costs
Memory reflection is one of the most overlooked sources of API spend in OpenClaw. When reflection is enabled, the memory plugin processes recent conversation content to extract facts, preferences, and events worth storing. This extraction uses a language model, and if that model is an API model, every session end triggers a paid API call.
The fix is straightforward: use a local model for extraction. A well-configured local model (phi4 or llama3.1:8b) handles memory extraction competently for most use cases. The extraction prompt is not complex reasoning; it is structured information extraction from recent conversation text. Local models do this reliably.
What model is my memory plugin using for extraction? If it is using an API model, what would I need to change in my config to switch it to ollama/phi4:latest instead? Show me the exact config change before applying it.
Reflection frequency
Beyond the model used, the frequency of reflection matters. If extractMinMessages is set to 1, reflection runs after every single message exchange. For most setups, reflecting after every 5-10 messages is sufficient to capture what matters without running extraction on trivial exchanges.
What is my memory plugin’s extractMinMessages setting? If it is set to 1, what would happen if I increased it to 5? Would I lose any important memory captures, or would the same information still be captured with less frequent extraction? Recommend a value for my usage pattern.
Comparing API costs across providers
The same task routed to different API providers can vary by 10x or more in cost. Background tasks that run dozens of times per day are where this difference compounds most. A 10x cost difference on a task that runs 100 times per day is significant on a monthly bill.
What is my current default model for cron jobs and background tasks? What are the three cheapest API model options available in my config that could handle routine background tasks? Compare their per-token costs and flag any tasks currently on an expensive model that could be moved to a cheaper one without quality loss.
DeepSeek as the default for background tasks
As of March 2026, DeepSeek Chat (deepseek/deepseek-chat) is substantially cheaper than Claude Sonnet or GPT-4o for the same task. For any background job that does not require Claude’s specific capabilities, DeepSeek is the cost-effective default. Routine tasks like queue processing, data formatting, file reads with simple summaries, and Telegram notifications are all well within DeepSeek’s capability at a fraction of the cost.
Review my cron job and background task configuration. Which tasks are currently using Claude Sonnet or GPT-4o? For each one: does the task actually require Claude-level reasoning, or is it a routine task that DeepSeek Chat or a local model could handle? Show me a recommended model assignment for each background task.
Stopping specific unexpected API calls
Once you have identified which job or plugin is making the unexpected calls, the fix depends on the source.
To stop a cron job temporarily
Disable it rather than deleting it. Disabling preserves the config so you can re-enable it later without recreating it from scratch. Verify the job stops running by checking the run history after one full cycle time has passed.
Disable the cron job [name]. Show me the config change before applying it. After disabling, confirm the job is in disabled state and will not run at its next scheduled time.
To change a cron job’s model
Add a model override to the job payload so it uses a specific cheaper model instead of the global default. This change takes effect on the next run without requiring a gateway restart or new session.
Update my cron job [name] to use ollama/phi4:latest instead of its current model. Add the model override directly to the job payload so this job specifically uses the local model while other jobs continue using the default. Show me the change before applying it.
To reduce a cron job’s frequency
Change the schedule expression or interval. For an every-type job, increase the everyMs value. For a cron-type job, update the expression. After changing the frequency, verify the next scheduled run time is what you expect.
Change my cron job [name] from running every [current interval] to running every [new interval]. Show me the updated schedule and confirm when the next run will be after the change.
Monitoring API usage going forward
Running this audit once solves the immediate problem. The question is how to catch unexpected spend earlier next time.
Create a weekly API cost check task. Every Sunday at 8pm America/New_York, read my cron job list and calculate estimated weekly API call volume from background tasks only. Send me a Telegram message with: total estimated calls this week, top 3 jobs by call volume, any jobs that changed frequency or model since last week. I want to catch unexpected increases before they show up on my monthly bill.
A weekly background-task audit is particularly valuable after any config change, new plugin install, or new cron job creation. Those are the moments when unexpected background activity is most likely to appear.
Understanding what each background task actually costs
Most operators think about API costs in terms of the work they do interactively. Background tasks are invisible in that mental model, which is exactly why they accumulate unnoticed. The math is straightforward once you run it.
A typical cron job that reads a workspace file and sends a brief summary to Telegram might consume 2,000 input tokens and 200 output tokens per run. At DeepSeek Chat pricing (roughly $0.00027 per 1,000 input tokens, $0.0011 per 1,000 output tokens as of March 2026), that is about $0.0008 per run. Running four times per hour, that is $0.077 per day or $2.30 per month. Not alarming in isolation. Multiply by five similar cron jobs and you are at $11.50 per month in background task costs before any interactive use.
The same job routed to Claude Sonnet costs approximately 50x more per token. The same five cron jobs would cost around $575 per month. That is the difference between using a capable API model for routine background tasks versus a cost-appropriate one.
For my three highest-frequency cron jobs, estimate the monthly cost at my current model versus the monthly cost if I switched to DeepSeek Chat. Use realistic token counts based on the job descriptions. Show me the comparison as a table with current cost and projected cost after the switch.
The compaction multiplier
When a session runs long enough to trigger compaction, the compaction model processes the full conversation history to generate a summary. If your compaction model is set to an expensive API model, a single overnight session that compacts twice costs as much as two additional full-length sessions. Check your compaction model setting and consider whether it needs to be as capable as your primary model.
What model is configured for LCM compaction? How often does compaction typically trigger in my sessions? If the compaction model is the same as my primary model, would switching to a cheaper model for compaction alone reduce my monthly spend, or does compaction happen rarely enough that it is not worth optimizing?
Embedding model costs
Memory systems that use vector embeddings make an API call every time a memory is written, searched, or recalled. If your embedding model is a paid API service rather than a local model, this is a hidden per-operation cost that adds up silently.
What embedding model is my memory system using? Is it a local model like nomic-embed-text via Ollama, or is it calling an external embedding API? How many embedding calls are made per day based on my memory read and write frequency? If it is an external API, what would it cost to switch to nomic-embed-text locally?
Nomic-embed-text running locally via Ollama produces embeddings at zero API cost with quality comparable to commercial embedding APIs for most use cases. If your memory plugin is calling an external API for embeddings, switching to a local embedding model is one of the highest-return cost optimizations available.
Hidden costs in fallback chains
OpenClaw supports fallback model chains: if the primary model fails or is rate-limited, the request automatically retries with the next model in the chain. This is a useful reliability feature, but it can produce unexpected costs if the fallback chain includes expensive models that get triggered more often than you realize.
What is my current model fallback chain? How often do requests fall through to a fallback model versus completing on the primary? If fallbacks are triggering frequently, what is the cause and is the fallback model more expensive than the primary?
A common pattern that increases costs: the primary model is a cheap option that occasionally rate-limits during peak hours, and the fallback is a more expensive model that handles the overflow. During busy periods, a significant fraction of requests end up on the expensive fallback without the operator realizing the rate-limiting is happening at all.
The per-session baseline cost
Every session in OpenClaw starts with a fixed baseline cost: all workspace files that auto-load, all tool definitions, the system prompt. This baseline is paid as input tokens on the first turn of every session. On a large, well-configured instance, this baseline can be 20,000-50,000 tokens per session start.
If you are starting many sessions per day (each time you open OpenClaw and start a new conversation), the baseline accumulates. Five sessions per day on an instance with a 40,000-token baseline costs 200,000 input tokens per day in session starts alone, before any actual work.
How many tokens does a new session start cost me in baseline input tokens? Include: all workspace files that auto-load, tool definitions, and the system prompt. What is the approximate cost of starting one new session at my current model pricing? How many new sessions do I typically start per day?
Reducing session start costs
The two most effective ways to reduce per-session baseline costs are trimming workspace files that load automatically and reducing the number of enabled plugins. A workspace file that has not been relevant in months is still loading on every session start and consuming tokens. An enabled plugin that you are not actively using is contributing tool definitions to every session’s token count.
Review all workspace files that load automatically at session start. For each one: how large is it in tokens, is it still actively relevant to my current work, and is there any content in it that could be archived or removed without affecting functionality? Flag any files that could be trimmed or removed to reduce my per-session baseline cost.
Building a cost-awareness habit
The operators who avoid surprise API bills are not the ones who monitor costs obsessively. They are the ones who run a quick audit after any change that touches cron jobs, plugins, or memory configuration. Change something, check the cost implications immediately rather than waiting for the next bill.
The two-minute check
After any of the following, run the API cost audit: new cron job created, existing cron job frequency changed, new plugin installed, memory plugin config changed, model routing config changed. Two minutes of checking after each change catches cost surprises before they accumulate over a full billing cycle.
I just made a config change. Do a quick cost impact check: what changed, does it affect any background API calls, and is the estimated cost impact positive or negative? Give me a one-paragraph assessment I can read in 30 seconds.
The three routing rules that cover 90% of background tasks
Rather than auditing model assignments case by case, three routing rules handle the most common background task types correctly:
- Heartbeats, file reads, status checks, simple notifications: local model (llama3.1:8b). Free. No API cost. These tasks require no reasoning, just reading and reporting.
- Queue processing, summaries, drafts, memory cleanup: phi4 locally or DeepSeek Chat. Very cheap. These tasks need more capability than llama3.1:8b but not flagship-model capability.
- Complex reasoning, multi-step planning, long-context synthesis: DeepSeek Chat or Claude Sonnet. Reserve expensive models for tasks that actually need them. A queue check is not one of those tasks.
Apply the three routing rules to all my current cron jobs and background tasks. Categorize each one (heartbeat/status, routine processing, or complex reasoning) and assign the appropriate model. Show me the full proposed model assignment before I decide whether to apply it.
A worked example: finding and fixing unexpected spend in 10 minutes
Here is a concrete walkthrough of how the audit plays out on a typical instance. The goal is to show what the conversation looks like in practice so you know what to expect when you run it yourself.
Step 1: Pull the cron job list. You ask your agent to list all active cron jobs with schedules and models. The output shows six cron jobs. Two were set up intentionally and are running as expected. Three were set up during initial configuration and then forgotten. One is a duplicate of another job that was created when the original setup partially failed and you ran the setup command again.
Step 2: Calculate the duplicate cost. The duplicate job runs every 30 minutes and uses DeepSeek Chat. Not expensive per run, but 48 times per day doubles the call volume for that specific task. Disabling the duplicate cuts those calls by 50% immediately with zero change to functionality.
Step 3: Address the forgotten jobs. Two of the three forgotten jobs are no longer relevant: one was set up for a specific project that finished two months ago, and one monitors a URL you stopped using. Disabling both removes 30 or more calls per day with zero impact on anything currently in use. The third forgotten job is actually useful but running at a higher frequency than needed. Reducing its frequency from every 15 minutes to every 2 hours cuts its daily call volume by 87%.
Step 4: Check the memory reflection model. The memory plugin is using DeepSeek Chat for extraction with extractMinMessages set to 2. On a session with 20 messages, that is roughly 10 extraction calls per session. Switching to phi4 locally costs nothing and produces comparable extraction quality for the types of facts being stored in this setup.
Total time: about 10 minutes of conversation with the agent. Total impact: background API call volume reduced by roughly 65%, with zero loss of functionality for any task that is actually still needed.
Walk me through the full background API audit right now. List all cron jobs, check the memory plugin config, check the heartbeat setup, and check plugin background activity. After the audit, give me a prioritized list of changes sorted by estimated cost reduction. I will review and approve each change before you apply it.
Common questions
How do I know which provider is being billed for a specific API call?
Check the model used for the task. Each model maps to a specific provider: deepseek-chat is billed by DeepSeek (or OpenRouter if you are routing through them), claude-sonnet is billed by Anthropic, gpt-4o is billed by OpenAI. Your OpenClaw config shows the provider for each model under models.providers. If a task is using the default model, find what that default resolves to in your config and trace it to the provider.
My API bill is high but I only see a few cron jobs. Where else could the calls be coming from?
Check four places that are easy to miss: (1) the compaction model: if your LCM compaction is set to use an API model, every compaction event makes a paid call; (2) the embedding model for memory: if it is using an API embedding service instead of nomic-embed-text locally, every memory write and recall makes a paid call; (3) the fallback model chain: if your primary model is hitting rate limits, calls may be falling through to a more expensive fallback; (4) any plugin that registers its own tool and makes calls independently of the main agent prompt.
I switched a cron job to a local model but the bill did not go down. Why?
The change only affects that specific job going forward. If the billing period started before you made the change, the high-cost runs from the first half of the period are already counted. Give it a full billing cycle to see the reduction. Also verify the change actually took effect: ask your agent to confirm the current model for that cron job. If the change was applied to the wrong job or not saved correctly, the job is still using the API model.
Is there a way to set a hard spend limit so API calls stop automatically at a threshold?
As of March 2026, Anthropic offers usage alerts but not hard cut-offs at the API level. OpenAI offers hard limits. DeepSeek offers alerts. For a hard stop within OpenClaw regardless of provider, you need to implement it as a monitoring cron job: have your agent track estimated spend and stop triggering other tasks when a threshold is reached. This is the in-app equivalent of a provider hard limit. For a true hard stop, set the limit at the provider dashboard for providers that support it.
My overnight cron jobs ran fine but my morning bill is much higher than the overnight job costs explain. What am I missing?
Look at the session that started when you opened OpenClaw in the morning. Long context windows loaded at session start (large workspace files, many auto-loaded documents) consume input tokens on every single turn. A context-heavy session that runs 20 turns costs 20x the per-turn input cost before you have done any meaningful work. Check the size of your workspace files that auto-load and whether any of them have grown significantly recently. Also check whether the morning session triggered any compaction events, which are an additional cost.
Can I get a real-time view of what API calls are being made right now?
Not natively in OpenClaw as of March 2026. The closest equivalent is asking your agent to list all currently active cron jobs and confirm their next scheduled run times, then watching the activity log for entries. For provider-level real-time monitoring, each API provider has a usage dashboard that updates with a short delay. For ongoing visibility without manual checking, the weekly audit cron job described earlier in this article is the practical solution.
I cut my cron job frequency in half but my bill barely changed. What else should I look at?
The cron frequency reduction helps, but if the bill is still high the cost is coming from somewhere else. Work through this checklist: check whether your primary interactive model is set to something expensive like Claude Opus (which should only be used for specific high-value tasks); check whether your memory plugin is using an API model for extraction with extractMinMessages set low; check whether your session start baseline is large (many auto-loaded workspace files, many enabled plugins); and check whether compaction is running frequently in long sessions. Any one of these can dwarf the cron job costs.
My DeepSeek API calls went up even though I have not been using it more. Why?
Check whether it is being used as a fallback. If your primary model (Sonnet, GPT-4o) is rate-limiting more than usual during busy periods, requests are falling through to DeepSeek in your fallback chain. The increase in DeepSeek calls is not because you are using it more directly; it is because your primary model is unavailable more often. Check the fallback trigger rate in your session logs and consider whether switching the primary to DeepSeek directly (rather than relying on fallback) would be more cost-effective than the current setup.
I disabled all my cron jobs but I am still seeing API calls. What is running?
Three possibilities: memory reflection is still running after each session (check autoCapture setting in your memory plugin config); the heartbeat is still active even with no tasks configured (check HEARTBEAT.md and heartbeat cron schedule); or a plugin is making independent calls (check each installed plugin for background polling or sync behavior). If all three are clean and you are still seeing calls, check whether another OpenClaw instance is running on the same API keys (for example, a dev and production instance sharing credentials).
How do I estimate my total monthly API cost before the bill arrives?
Add three components: background task cost (cron jobs multiplied by daily frequency, times 30 days), interactive session cost (typical session token count times number of sessions per day, times 30 days), and baseline overhead (session start tokens times sessions per day times 30 days). Your agent can calculate all three from your current config and usage patterns if you ask. The estimate will not be exact but it will be within 20-30% of actual, which is enough to catch surprises before they land.
A plugin I just installed seems to be making a lot of API calls. How do I check?
Disable the plugin, run your normal workflow for one hour, and note the API call volume. Then re-enable the plugin, run the same workflow for one hour, and compare. If the call volume is significantly higher with the plugin enabled, the plugin is the source. Check the plugin documentation or source code for any background polling, periodic sync, or automatic processing behavior. If the plugin does not disclose its background call behavior clearly, consider whether you need it badly enough to accept the cost.
What is a reasonable monthly background task cost for a well-configured OpenClaw instance?
A well-optimized instance running all background tasks on local models should have near-zero background API cost. The only unavoidable background API calls are tasks that specifically require a cloud model’s capabilities: complex reasoning, synthesis of long documents, tasks where local model quality is genuinely insufficient. For a typical personal automation setup, background API costs should be under $5 per month if local models are used appropriately. $10-20 per month in background costs alone is a signal that something is using an API model for routine work that a local model could handle.
Will switching cron jobs to local models affect the quality of their output?
For most routine background tasks, the quality difference is not noticeable in practice. Tasks like sending a daily brief, checking a file and reporting its status, running a queue processor, and writing a log entry do not require Claude-level capability. The quality difference between phi4 and Claude Sonnet becomes meaningful for complex reasoning, nuanced writing, and tasks that require synthesizing large amounts of context. Start by switching the simplest background tasks (heartbeats, status checks, simple notifications) to local models and verify they produce the expected output. Then progressively move more complex background tasks to local models based on observed output quality.
Cheap Claw
Cut your OpenClaw API spend by 60-80% without changing what your agent does
Model routing rules for every task type, local model assignments for background work, memory reflection config, and the weekly audit cron job pre-built. Everything from this article configured and ready to drop in.
