Setting spend limits so your agent stops at night

You left your OpenClaw agent running overnight and woke up to an API bill that does not match what you expected. This article shows you exactly how overnight costs compound, the four settings that stop it, and how to add self-monitoring alerts so your agent tells you before the damage is done rather than after.

TL;DR

Overnight bill spikes almost always come from one of four things: a failing task retrying endlessly, an expensive model being used for a cheap task, context windows growing larger each run, or a task running more frequently than you realized. Paste the audit command below into your OpenClaw to see which one you have, then follow the section for that cause. All four fixes take under ten minutes each.

Every indented block in this article is a command you can paste directly into your OpenClaw chat. Your agent will run it and report back. You do not need to open a terminal, edit any files manually, or navigate any filesystem. That is the whole point.

Start here: find out what actually ran

Before changing anything, find out what ran. The cause of an overnight bill spike is almost never what you think it is on first look. The task you remember setting up is rarely the one that ran twenty times. The model you thought was routing to cheap calls often routed to expensive ones. The fastest path to a fix is a quick audit, not a guess.

Run a full audit of my overnight activity. I want to know: what tasks ran between 10pm and 8am last night? For each task, which model did it use? Approximately how many API calls did each task make? Are there any tasks that retried more than once? Are there any tasks that are scheduled to run on a shorter interval than I might have intended?

What you are looking for in the output:

Any task with a retry count above 1: that is almost always the culprit when the bill is much higher than expected. One failed task retrying through the night multiplies cost by the retry count.
Any task using a flagship model: deepseek-chat, claude-sonnet, gpt-4o. Anything running for something routine like a queue check, a summary, or a memory cleanup. These tasks work equally well on a local model at zero API cost.
Any task running more often than you remembered: a task set to every 30 minutes runs 18-20 times overnight. At even modest per-call costs, that adds up.
Growing response sizes: if a task reads its own output and feeds it back in, context compounds across runs. The second run is more expensive than the first. By the tenth run, you are processing a very large context window.

Find which of these is happening before reading the sections below. The right fix depends on the cause.

Cause 1: retry loops

A retry loop is when a task fails, then tries again, fails again, tries again, and repeats through the night. Without a maximum retry count, this goes on indefinitely. Every retry costs the same as a normal run. A task that fails 40 times overnight costs 40x its normal run cost.

OpenClaw does not set a retry limit by default as of March 2026. The retry behavior for cron jobs is controlled by the job configuration and by any retry instructions in the agent system prompt or task processor prompt. If neither of those specifies a limit, the task retries as many times as the scheduler allows.

How to check if you have a retry loop right now

Look at my cron job history for the last 24 hours. For each job that ran, how many times did it run? Are there any jobs that ran significantly more than their schedule would suggest? If a job was scheduled to run every hour, did it actually run once per hour or did it run multiple times per hour due to retries or overlap?

If the output shows a job running more times than scheduled, that is a retry loop. Here is how to add a limit.

Adding a retry limit to a cron job

The retry limit goes in the cron job’s task processor prompt. This is the instruction that tells your agent what to do when a task runs. The limit is a behavioral instruction, not a config setting.

Look at my task processor prompt or the prompts for my recurring cron jobs. I want to add a retry limit to all of them. The instruction should be: if this task fails, retry it once after 5 minutes. If it fails a second time, mark it as failed, send me a Telegram message with the task name and the error, and do not retry until I explicitly tell you to. Show me where in each prompt this instruction should go and what it should say.

Two retries, not infinite

One retry is usually enough to handle transient failures like a brief API timeout. Two retries covers most real-world cases. More than three retries on a failing task means the task has a structural problem that a retry will not fix. The retry limit exists to alert you, not to fix the underlying issue.

The Telegram alert on failure is important. Without it, a task that hits its retry limit silently stops. You learn about it the next time you check, which may be hours later. With the alert, you find out within minutes of the second failure.

If you use a queue file instead of cron directly

Queue-based task handling

If your recurring tasks run through a QUEUE.md file rather than direct cron jobs, the retry limit goes in the queue processor prompt, not the individual task. Add this instruction to your processor: “If a task fails, increment its retry count in the queue file. If the retry count exceeds 2, set the status to FAILED, send a Telegram alert, and skip it until it is manually reset to PENDING.” The Queue Commander guide covers this setup in full.

Cause 2: expensive models on cheap tasks

An API model is a model that charges per token for every call. A local model runs on your machine and costs nothing per call. The difference in overnight cost between routing a nightly summary to a flagship API model versus a local model can be significant, especially if the task runs multiple times.

A “flagship API model” in the context of OpenClaw in March 2026 means models like claude-sonnet-4-6, deepseek-chat, gpt-4o, or similar. These cost between $0.50 and $15 per million tokens, depending on the model. A local model like ollama/llama3.1:8b or ollama/phi4:latest costs zero per call once downloaded.

Most overnight tasks do not need a flagship model. A nightly summary, a queue check, a memory cleanup, a status report, a Telegram notification. These tasks work just as well on a local model. The only tasks that genuinely require a flagship model overnight are ones that involve complex multi-step reasoning, nuanced writing, or handling edge cases that smaller models fail on consistently.

Check which model your overnight tasks are using

List all my cron jobs and recurring tasks. For each one, tell me: what model is it currently configured to use? If no model is specified, what is the default model being used? Which of these tasks do you think could be switched to a local model like ollama/llama3.1:8b or ollama/phi4:latest without a meaningful drop in output quality? Explain your reasoning for each one.

Any task your agent flags as suitable for a local model is a candidate for zero-cost overnight operation. Switch those first. The impact on cost is immediate.

Specifying a model in a cron job or task

In OpenClaw as of March 2026, the model for an isolated cron job (sessionTarget: “isolated”) is set in the job’s payload configuration. For a systemEvent cron (sessionTarget: “main”), the model follows the session default unless the task processor prompt specifies otherwise.

I want to switch my overnight tasks to use a local model. For each cron job I have scheduled to run overnight, update the model to ollama/phi4:latest if the task is a summary, queue check, or status report. Keep the current model for any task that requires complex reasoning or external research. Show me the changes you plan to make before applying them.

Test before the overnight run

Switch one task to a local model and trigger it manually before leaving it to run overnight. Read the output. If it is acceptable, leave it. If the local model is producing noticeably worse results for that specific task, keep the API model for that one task and move on to the next candidate. The switch only makes sense where output quality is preserved.

Cause 3: context window growth across runs

The context window is the amount of text your agent processes in a single conversation. Every token in the context window costs something when using an API model. When a task reads its own previous output and includes it in the next run, the context window grows with each run. What started as a 2,000-token task becomes an 8,000-token task by the fourth run, and a 20,000-token task by the tenth.

This is the most subtle of the four causes and the hardest to spot without knowing what to look for. The first few runs of the day look normal. By run six or seven, each run costs significantly more than run one did.

How to check if your tasks are compounding context

Look at my recurring cron jobs. For each one: does the task read any file or memory that grows over time? Does it include the output of a previous run in its current run? Does it read a log file, a queue file, or a memory file that gets longer each time the task runs? If yes, what would I need to change to prevent the context from growing across runs?

The fix for compounding context depends on the task:

For summary tasks: summarize and delete rather than summarize and append. After each run, write only the summary to the output file, not the full source content. The next run processes the summary, not everything.
For queue tasks: archive completed tasks instead of keeping them in the active queue file. A queue file that only contains PENDING tasks stays the same length regardless of how many tasks have completed.
For log-reading tasks: rotate logs. After a task reads a log, truncate or rotate it so the next run starts from a clean log, not the accumulated history.

For my overnight recurring tasks that read or write to files: after each successful run, truncate or rotate those files so the next run starts fresh. For queue files, move DONE tasks to an archive file instead of leaving them in the main queue. Show me what needs to change and apply the changes.

Isolated sessions and compounding

Cron jobs with sessionTarget: “isolated” start fresh each run with no prior conversation history. They are not immune to compounding if they read a growing file as part of their task, but they do not inherit context from previous runs. Cron jobs that run in the main session (sessionTarget: “main”) do carry context from prior runs. If your overnight tasks run in the main session, that is a compounding risk worth addressing by switching to isolated sessions.

Cause 4: tasks running more often than intended

A cron expression that you set once and never reviewed may be running more often than you remember. A task set to “every 30 minutes” runs 18 times in a 9-hour overnight window. A task set to “every 15 minutes” runs 36 times. If the per-run cost is modest, the overnight total is still significant at those frequencies.

This cause is the easiest to fix once you find it. Longer intervals, time-bounded windows, or consolidating into a single end-of-night run all reduce run count immediately.

List all my cron jobs with their current schedule expressed in plain English, not cron syntax. Tell me how many times each job runs in a typical overnight period between 10pm and 7am. Flag any job that runs more than 4 times overnight. For those jobs, suggest whether a longer interval or a single end-of-night run would cover the same need.

Consolidating frequent overnight runs into one

The highest-ROI change for most overnight setups is replacing hourly or half-hourly tasks with a single end-of-night run. Instead of checking the queue every hour through the night, run one queue processor at 6am that handles everything. You get the same output, typically faster, with a fraction of the API calls.

I want to consolidate my overnight recurring tasks. Instead of running them every hour through the night, I want a single cron job at 6am that processes everything that would have run overnight: queue items, memory cleanup, status checks, any summaries. Build the consolidated 6am job. Tell me which existing jobs it replaces so I can disable them.

When you actually need hourly overnight runs

Some tasks genuinely need to run frequently overnight: monitoring tasks that alert on external events, intake queues for time-sensitive requests, or integrations where the external service pushes new data through the night. For those, keep the frequency but add the retry limit and local-model routing from earlier sections. The combination keeps the run count necessary while minimizing the cost per run.

Adding a self-monitoring alert before it gets expensive

The four fixes above prevent the most common causes. A self-monitoring alert catches anything that slips through. The idea is simple: your agent tracks how much activity it has generated overnight and alerts you when it crosses a threshold you set, before the session ends.

This does not require any external tool or spending API. It is a behavioral instruction in your overnight task processor. Your agent counts its own API calls or estimates its own spend based on what it has processed, and sends a Telegram message if the number is higher than expected.

I want to add a self-monitoring alert to my overnight operations. Here is what I want: if any single overnight session makes more than 15 API calls, or if any single task runs more than 3 times in one night, send me a Telegram message at that point with: the task name, how many times it ran, and what it was doing. Do not stop the task. Just notify me. Show me where in my setup this instruction should go.

The threshold you set depends on your normal overnight activity. If your agent normally makes 8-10 API calls overnight across all tasks, set the alert at 20. If it normally makes 2-3, set it at 8. The goal is a threshold that only fires when something is actually wrong, not one that fires on normal nights.

Pair the alert with a morning summary

Combine the self-monitoring alert with a morning summary that tells you what ran last night. The alert wakes you up if something went wrong. The summary tells you what happened either way. Together they close the loop on overnight operations without requiring you to check logs manually. This pairing is the minimum viable monitoring setup for any OpenClaw operator who runs tasks overnight. The alert is reactive: it fires when something exceeds a threshold. The summary is proactive: it gives you the full picture every morning regardless of whether anything went wrong. Having both means you are never in the dark about what your agent did while you were asleep.

I want a morning summary cron job at 7am that sends me a Telegram message with: how many tasks ran overnight, which ones succeeded and which failed, the approximate number of API calls made, and any tasks that triggered the self-monitoring alert. Keep it under 10 lines. Use ollama/phi4:latest for this task.

Restricting overnight activity to a window

If you want a hard boundary on when your agent does expensive work, set a time window. Any API-model task outside the window either waits until the window opens or runs on a local model instead. This is a more aggressive approach than alerting, but it is the right choice when overnight bill spikes are a recurring problem and the other fixes have not fully solved it.

A time window works by adding a conditional model check to your task processor prompt. The processor checks the current hour before running any task and routes to a local model if the current time falls outside the window. This is not the same as disabling your cron jobs at night. The jobs still run on their normal schedule. They just use a different model depending on the time. Your agent wakes up, checks the time, routes the task accordingly, and runs it. The output goes to the same destination regardless of which model handled it.

The practical benefit is that you can leave your full cron schedule in place without worrying about what runs at 2am. Anything that fires outside your working hours uses a local model at zero cost. Anything inside your working hours uses the model you specified for that task. You get full overnight autonomy without the overnight API bill.

I want to restrict API model usage to a daily window. Between 9pm and 7am, any task that would normally use an API model should use ollama/phi4:latest instead, unless the task is flagged as time-sensitive. Tasks flagged as time-sensitive should still send me a Telegram alert before using an API model outside the window. Show me how to implement this in my setup.

Time-bounded cron expressions

OpenClaw cron jobs as of March 2026 support cron expressions with hour-based restrictions. A job scheduled to run “0 7 * * *” runs once at 7am. A job scheduled to run “0 6-22 * * *” runs every hour between 6am and 10pm. You can restrict when a job runs at the cron level, so it simply does not fire outside those hours. Use this for API-model tasks that do not need to run overnight at all.

What a local model is and how to use one

A local model is an AI model that runs entirely on your machine. It does not make API calls to an external provider. It does not cost anything per token. Once it is downloaded and running, every call it handles is free, regardless of how many times it runs overnight or how large the context window is.

OpenClaw supports local models through a tool called Ollama. Ollama is a separate piece of software that runs on your server and makes local AI models available to OpenClaw via a simple API on your local network. OpenClaw treats Ollama models exactly like API models from the perspective of task routing. The only difference is the cost: zero.

As of March 2026, the most practical local models for overnight OpenClaw tasks are:

ollama/phi4:latest (14.7B parameters): best quality for drafts, summaries, queue processing, memory cleanup. Handles multi-step instructions reliably. Slower than llama3.1:8b on weak hardware but produces better output for anything involving writing or reasoning.
ollama/llama3.1:8b: the fastest of the three. Best for simple yes/no decisions, status checks, heartbeat pings, Telegram notifications, and anything that does not require nuanced output. Essentially instant on modern hardware.
ollama/qwen2.5-coder:7b: optimized for code. If any of your overnight tasks involve reading, writing, or modifying scripts or config files, this model handles that better than the other two.

Checking if Ollama is installed and running

Check if Ollama is available on this system. Run a command to verify the Ollama service is running and list the local models currently downloaded. If Ollama is not running or not installed, tell me what I would need to do to set it up.

If Ollama is not installed, you have two options: install it on your server (the setup guide at ollama.com covers this for Linux, macOS, and Windows in about 10 minutes), or continue using API models for overnight tasks and rely on the retry limits, model routing, and self-monitoring alerts from the earlier sections to control costs. The in-agent protections in this article work whether or not you have local models available.

Hardware requirements for local models

phi4:latest at 14.7B parameters requires approximately 10GB of VRAM if run on a GPU, or about 12GB of RAM if run on CPU. llama3.1:8b requires approximately 6GB of VRAM or 8GB of RAM. On a standard VPS with 4GB RAM and no GPU, llama3.1:8b will run but slowly (20-40 seconds per response). phi4 will likely be too slow for practical use on that hardware. If your server has limited resources, use llama3.1:8b for overnight tasks or consider a hosted API provider with a spending cap instead.

Routing a specific task to a local model

When you specify a model in a cron job or task prompt, you use the full model identifier including the prefix. For Ollama models, the prefix is “ollama/”. For API models, the prefix identifies the provider: “anthropic/”, “deepseek/”, “openrouter/”, and so on.

I want to test a local model for one of my overnight tasks before switching all of them. Pick my most frequently-running overnight cron job and run it once right now using ollama/phi4:latest instead of whatever model it normally uses. Show me the output. Then tell me whether you think the output quality is acceptable for that task.

That test run tells you whether the output quality is acceptable for that specific task before you commit to the switch. Do not switch all tasks at once. Switch one, review the output, then move to the next. The whole process takes 20 minutes and saves you from discovering at 3am that a critical overnight task is producing unusable output on a model it was not designed for.

The checklist before leaving your agent overnight

Running through this before your first long overnight session saves you from finding out about the problem after the bill is already done.

Before I leave my agent running overnight, run through this checklist for me and tell me the result for each item: 1. Do all recurring tasks have a retry limit set? 2. Are overnight tasks using local models where possible? 3. Are there any tasks that read files which grow over time, creating compounding context? 4. Is there a Telegram alert configured to notify me if overnight activity exceeds my expected threshold? 5. What is the expected total API call count for tonight based on current schedules?

The last item is the one that most operators skip. Asking your agent to estimate tonight’s expected call count before you leave gives you a baseline. If you wake up to a much higher count than the estimate, the delta tells you exactly what went wrong.

What OpenClaw cannot do for you at the provider level

OpenClaw does not control your API provider’s spending limits. It cannot cap your Anthropic, DeepSeek, or OpenAI spend from inside the configuration. Those caps are set in your provider dashboard directly.

As of March 2026, Anthropic does not offer hard spending caps. They offer usage limits that trigger alerts. OpenAI offers hard limits that stop API calls once the limit is reached. DeepSeek offers configurable daily spending thresholds. If you want a hard cap at the provider level, check your provider’s console and set one there alongside the in-agent protections covered in this article.

Tell me which API providers I currently have configured in my OpenClaw. For each one, do you know if that provider offers a hard spending cap or only alert-based limits? What would I need to do in each provider’s dashboard to set a monthly spending limit?

Both layers together

The in-agent protections in this article and the provider-level limits serve different purposes. Provider limits are a last resort that stops all API calls once you hit the cap. In-agent protections prevent you from hitting the cap in the first place by catching retry loops, routing to local models, and alerting when something runs unexpectedly. Set both. The provider cap is insurance. The in-agent protections are prevention.

Common questions

How do I know if a task is running on a local model or an API model?

Ask your agent directly. Paste “List all my cron jobs and tell me which model each one is configured to use. For any job that does not have an explicit model set, tell me what model it will default to.” Your agent will check the job configurations and the session default model setting and give you a clear list. If a job shows “ollama/” in its model field, it is local. If it shows an API provider prefix like “anthropic/”, “deepseek/”, or “openrouter/”, it is an API model with a per-token cost.

Can I set a hard daily spending cap inside OpenClaw?

As of March 2026, OpenClaw does not have a native hard spending cap setting. There is no single config field that says “stop all API calls when spend reaches $X.” The self-monitoring approach in this article (having your agent count its own calls and alert you) is the current in-app equivalent. For a hard stop, you need to set it at the provider level in your Anthropic, OpenAI, or DeepSeek dashboard. That stops all API calls from any source once the cap is hit, not just OpenClaw.

What happens if my agent hits its retry limit and sends me a Telegram alert, but I am asleep?

The task stops retrying and sits in FAILED status until you reset it. Nothing continues running. When you wake up and see the alert, you decide whether to manually trigger the task again, fix whatever caused the failure first, or leave it until the next scheduled run. The key is that the task does not keep charging you while you sleep. Failed status is free. Retrying is not.

My agent ran a task successfully but the cost was still higher than expected. What causes that?

Successful tasks with unexpectedly high costs usually fall into one of two categories. First, the task succeeded but processed much more context than usual: a log file that grew, a queue with a long history, a memory read that pulled a large block. Second, the task called a more expensive model than expected because the default model changed (for example, after an update) or because the task’s output required a follow-up call that routed differently. Paste “What did my [task name] job actually process in its last run? How large was the context window it used, and which model handled each part of the task?” to trace the exact source of the cost.

Can I pause all overnight API calls without disabling my cron jobs?

Yes. Add a time-window check to your task processor: “If the current time is between 11pm and 6am and this task would use an API model, switch to ollama/phi4:latest instead and note the switch in the task output.” This keeps all your cron jobs active on their normal schedule but routes overnight API calls to local models automatically. The cron jobs themselves never pause. Only the model routing changes based on the time.

What is the fastest single change I can make to reduce overnight costs right now?

Switch your most-frequently-running overnight cron job to a local model. That one change often cuts overnight API costs by 40-60% because the most frequent task is usually something simple: a queue check, a status ping, a memory housekeeping job. It does not need a flagship model. Once that switch is made and tested, apply the same to the next most frequent job. The retry limit is the second-most-impactful change, but only if you actually have a failing task. The model routing change applies to every overnight run regardless of whether anything is failing.

I do not use Telegram. Can the alerts go somewhere else?

Yes. Discord works the same way. Replace “send me a Telegram message” with “send a message to my Discord channel” in any of the prompts in this article. If you have both configured, you can send to both. If you use neither and just want the agent to log the alert somewhere you will see it, “write a note to my memory with the alert details and mark it as urgent” is a workable alternative that you will see the next time your agent does a memory recall at session start.

Will switching to local models for overnight tasks affect output quality noticeably?

It depends on the task. For summaries, queue processing, status checks, memory cleanup, and simple Telegram notifications, local models like phi4:latest produce results that are indistinguishable from API model output in practice. For tasks that involve complex reasoning, multi-step research, nuanced writing, or handling edge cases that the task prompt does not fully specify, local models will produce noticeably worse results and you should keep the API model for those. The safest approach is to switch one task at a time, review the output after the first run, and only keep the switch if the quality is acceptable.

My agent sent me a Telegram alert that something ran too many times, but I cannot tell if it was a real problem or a false alarm. How do I check?

Ask your agent to reconstruct what happened. Paste: “I got an alert last night that [task name] ran more than expected. Pull the session history or activity log for that task from last night. Tell me: did each run succeed or fail? Was there a clear pattern to when the extra runs happened? Did any run produce an error? Based on what you can see, was this a retry loop on a failed task, a legitimate burst of activity due to external events, or something else?” Your agent will trace the run history and tell you whether the alert was a real problem or a burst of normal activity. If it was a burst of normal activity, adjust your alert threshold upward. If it was a retry loop, apply the retry limit fix from earlier in this article.

How often should I review my overnight cron job setup?

Once a month is enough for most operators. The things that change over time are: tasks you added and forgot about, files that have grown larger than you expected and are now compounding context, and model pricing changes that shift the cost calculus for which tasks justify an API model. A monthly review takes about five minutes. Paste: “Review all my cron jobs. Are there any that have not run successfully in the last 30 days? Are there any that are running on more expensive models than they need? Are there any files that my overnight tasks read that have grown significantly in the last 30 days?” That single prompt surfaces everything worth reviewing.

Cheap Claw

Every cost lever, ranked by impact

The complete spend reduction playbook for OpenClaw operators. Model routing, prompt caching, context window sizing, compaction settings, and the full fallback chain. Drop it into your agent and it reads the guide and makes the changes. Operators report 60-80% spend reduction within a week.

Get Cheap Claw for $17 →

Keep Reading:

Cheap ClawI woke up to a $300 OpenClaw bill and had no idea what caused itHow to trace exactly which task, model, or setting drove the spike and fix it so it does not happen again.Cheap ClawOpenClaw is using your most expensive model for everything, even simple tasksHow to set model routing so cheap tasks use cheap models automatically, without changing anything task by task.Queue CommanderMy OpenClaw agent failed overnight and I did not find out until morningSetting up failure alerts so a broken task notifies you immediately instead of silently retrying through the night.