OpenClaw keeps hitting rate limits every day

Your OpenClaw agent hits a rate limit at some point every day. Responses slow down, tool calls fail, or the agent switches to a fallback model mid-task. It was fine when you first set it up. Now it is a daily problem. This article explains where OpenClaw rate limits come from, which ones you are hitting, and the config changes that stop it from happening.

Before You Start

Your OpenClaw agent is hitting rate limits regularly, at least daily
You are not sure which provider is rate limiting you or why
You want to stop the limits without paying for a higher tier

TL;DR

OpenClaw rate limits hit for four reasons: too many requests per minute to one provider, too many tokens per minute, daily token caps on free or low tiers, or a background process (cron job, heartbeat, memory extraction) consuming quota silently. The fix is almost never to upgrade your plan. It is to spread load across providers, move background tasks to free local models, and configure the fallback chain correctly so limits degrade gracefully instead of failing hard.

Estimated time: 15 minutes to diagnose, 20 minutes to fix. No paid plan upgrade required.

Jump to the issue

Not sure what is rate limiting you? Diagnose which limit you are hitting
Hitting requests-per-minute? RPM limits
Hitting tokens-per-minute? TPM limits
Hitting daily limits? Daily and monthly caps
Background tasks eating quota? Background task quota drain
Want to spread load? Spreading load across providers
Fallback chain not working? Configuring the fallback chain

Diagnose which OpenClaw 429 error you are hitting

Before fixing anything, identify the exact limit. OpenClaw rate limits are provider-side. They come from the API you are calling, not from OpenClaw itself. Each provider enforces different limits on different dimensions: requests per minute, tokens per minute, requests per day, or tokens per day. The error message in the logs tells you which one.

Check the gateway logs for rate limit errors in the last 24 hours. Show me each error with the timestamp, the provider it came from, and the exact error message or HTTP status code. Then tell me what model was active when the limit hit, what the agent was trying to do at the time, and whether the agent fell back to another model or failed hard.

The error codes that indicate rate limiting (log these for your own monitoring):

HTTP 429: too many requests. This is the standard rate limit response from every major provider.
HTTP 529: overloaded. Anthropic-specific. Means the provider is under load, not your account limit, but effectively the same outcome.
“rate_limit_exceeded” in the response body: explicit account-level limit.
“tokens_per_minute_exceeded” or “requests_per_minute_exceeded”: tells you exactly which dimension you hit.

Run this and show me the output: journalctl -u openclaw --since "24 hours ago" | grep -i "429\|rate.limit\|quota\|429\|overload\|tokens_per" | head -30. If you are not running systemd, check the log file configured in your openclaw.json logging settings instead.

The four limit types and what they mean

Limit type	What triggers it	When it resets	Typical fix
Requests per minute (RPM)	Too many API calls in 60 seconds	60 seconds	Add delay between calls; use local model for low-priority tasks
Tokens per minute (TPM)	Too many tokens sent/received in 60 seconds	60 seconds	Reduce prompt size; stagger large tasks; use a different provider
Requests per day (RPD)	Daily call volume exceeded	Midnight (provider timezone)	Move background tasks to local models; split load across providers
Tokens per day (TPD)	Total token budget for the day exceeded	Midnight (provider timezone)	Trim prompts; move non-critical tasks to free local models

RPM limits: too many requests per minute

RPM limits are the most common source of OpenClaw rate limits for operators running active cron jobs. Each cron job fires a request. Each heartbeat fires a request. Each memory extraction fires a request. If several of these overlap within the same 60-second window, the combined request count hits the per-minute ceiling.

Free tier limits are tight. Anthropic free tier: 5 RPM. Groq free tier: 30 RPM. DeepSeek free tier: 60 RPM. A single active OpenClaw session with heartbeats, a cron job, and an active conversation can hit 5 RPM on an Anthropic free account in under a minute.

List all active cron jobs, their schedules, and the model they use. Also show me the heartbeat interval and model. Then estimate how many API requests per minute these generate at peak overlap (e.g., when a cron fires at the same time as a heartbeat and an active conversation). Is this likely to exceed the RPM limit for my current primary provider?

How to reduce RPM without losing functionality

The fastest fix for RPM limits is offloading background tasks to a free local model. Heartbeats, status cron jobs, and simple file-read tasks do not need a frontier API model. Running them on ollama/llama3.1:8b costs nothing and removes them from your RPM count entirely.

Show me every cron job and the model it uses. For each one, tell me whether the task requires an API model or whether a free local model (like ollama/llama3.1:8b) could handle it. For any cron job that does not require tool use or complex reasoning, suggest the config change to move it to the local model. Show me the exact config. Do not apply it yet.

The second fix is staggering cron schedules. If three jobs all fire at the top of the hour, they generate three simultaneous requests. Offsetting them by 5 minutes each spreads the load across the minute window and avoids the RPM spike.

A concrete example: a morning brief cron at 8:00 AM, a session archive cron at 8:00 AM, and a memory synthesis cron at 8:00 AM is three simultaneous requests at the same minute. If your RPM limit is 5, and a typical conversation turn takes 2 requests (user message + response), those three crons consume 3 of your 5 available requests in that minute before you have typed anything. Shift them to 8:00, 8:05, and 8:10 and you spread them across three separate minute windows, leaving your full RPM budget available for each one.

The same applies to crons that fire every hour at :00. 0 * * * * means every cron fires simultaneously at the top of every hour. 0 * * * *, 5 * * * *, and 10 * * * * staggers them 5 minutes apart while maintaining the same hourly frequency. This single change eliminates RPM spikes for most operators running three or more cron jobs.

Show me all cron schedules as cron expressions. Are any of them aligned to the same minute or hour? Suggest a staggered schedule for each one that maintains the same approximate frequency but distributes the firing times so no two fire within the same 5-minute window.

TPM limits: too many tokens per minute

TPM limits are less intuitive than RPM limits because the constraint is invisible until you hit it. You might send only 3 requests per minute (well within the RPM limit) but each request contains an 8,000-token system prompt plus a 3,000-token context window. That is 33,000 tokens per minute on just 3 requests. If the TPM limit is 20,000, you exceed it on the second request.

TPM limits hit when you are sending large prompts repeatedly within a short window. The most common causes in OpenClaw: reading a large file and passing its full contents to the model, running compaction on a long session, or loading multiple large workspace files (AGENTS.md, SOUL.md, INFRASTRUCTURE.md) as system context on every request.

The system prompt token count matters here. Every request includes your full system prompt. If AGENTS.md + SOUL.md + injected memories = 8,000 tokens per request, and you make 10 requests per minute, that is 80,000 tokens per minute in system prompt alone before a single word of your actual message is counted.

Estimate the total tokens in my current system prompt including all injected workspace files and active memories. Then estimate the average tokens per response for the last 10 turns. Multiply by my typical requests per minute to get a tokens-per-minute estimate. Compare this to the TPM limit for my current primary provider.

How to reduce system prompt token count

Large workspace files loaded on every request are the highest-leverage place to cut. AGENTS.md at 5,000 tokens, SOUL.md at 3,000 tokens, and injected memories at 1,000 tokens is 9,000 tokens of overhead on every single request. Trimming these files reduces TPM linearly: cut the system prompt by 50% and you cut your token consumption roughly in half.

Specific approaches that work without losing functionality:

Move reference material out of always-loaded files. Infrastructure details, product specs, and long protocol descriptions do not need to be in AGENTS.md. Move them to separate files and load them on demand when a task needs them.
Trim redundant instructions. If the same behavioral rule appears in both AGENTS.md and SOUL.md, pick one location and remove the duplicate.
Cap memory injection. If autoRecall is on and injecting 20+ memories per request, cap it. Most operators find 5 to 8 injected memories sufficient for continuity without the token cost.

Read AGENTS.md and SOUL.md. Count the total tokens in each file. Identify the sections that are loaded on every request versus sections that are only relevant for specific task types. Suggest which sections could be moved to a separate reference file that is only loaded when relevant. Give me an estimate of how many tokens this would save per request.

OpenClaw API limit: daily and monthly caps

Daily caps are the hardest limit to work around without changing providers. Once you hit the daily cap, every request fails until the reset, typically at midnight UTC. If you hit it at 3pm, you are down for nine hours. There is no config change that restores a spent daily cap. The only options are to wait for the reset, switch to a secondary provider, or fall back to a local model.

The insidious thing about daily caps is that the agent does not warn you when you are approaching them. You get no signal at 50%, 75%, or 90% of your daily budget. The first indication is a hard 429 with a message that the daily limit is exceeded. Building a monitoring habit around the logs prevents the surprise.

Daily caps are common on free tiers and low-tier paid accounts. Anthropic free tier: no daily cap (RPM enforced instead). Groq free tier: 14,400 requests per day, 500,000 tokens per day. DeepSeek: no hard daily cap on paid tier, enforced by spend limit.

Check the gateway logs for the last 7 days. Is the rate limit happening at the same time each day? If it is, this points to a daily cap being exhausted. What time does the limit first appear each day, and does the agent recover at midnight or another fixed time?

Managing daily caps with multiple providers

The practical solution for daily caps is provider rotation in the fallback chain. When provider A exhausts its daily cap, fallback to provider B. When B exhausts, fallback to the local Ollama model. Configure this explicitly rather than relying on OpenClaw’s default fallback behavior, which is designed for transient errors rather than sustained daily-cap exhaustion.

Free tier daily caps reset at different times

Groq resets at midnight UTC. DeepSeek resets at midnight CST (UTC+8). If you are running on both free tiers, your effective coverage window is the overlap of their reset schedules. Build your fallback chain with this in mind.

Read my openclaw.json. Show me the current primary model and fallback chain. Then tell me: if the primary provider hits its daily cap at 6pm, what happens? Does the fallback chain handle it gracefully? If not, suggest a fallback sequence that includes at least one provider without a daily cap (or a free local model as the last resort) so I am never fully down.

Background task quota drain

This is the most common surprise for operators who have been running OpenClaw for a few weeks. When you first set it up, you had one model and used it for conversations. Then you added cron jobs, a heartbeat, memory extraction, and a morning brief. Each of these fires API calls continuously throughout the day. By the time you sit down to use your agent, it has already spent most of its daily quota on background tasks you set up and forgot about.

Estimate the total API tokens consumed per day by background tasks only: cron jobs, heartbeats, memory extraction, and compaction. Do not count interactive conversation turns. Break it down by task type. Then compare this to the daily token budget implied by my typical API spend. What percentage of my daily quota is going to background tasks?

The background task audit

To see the scale of the problem, do the math before you audit. A heartbeat every 5 minutes is 288 requests per day. Memory extraction on every turn, at 10 turns per hour for 8 active hours, is 80 extraction requests per day. A session archive cron every 15 minutes is 96 requests per day. Add those together: 464 background requests per day before you have a single interactive conversation. If your provider has a 500 requests per day free tier limit, you are spending 93 percent of your quota on automation before you say a word.

Go through each background process and ask one question: does this task produce output I actually look at? A heartbeat that runs every 5 minutes and checks HEARTBEAT.md is useful only if HEARTBEAT.md has tasks in it. If the file is empty, the heartbeat is spending quota to confirm nothing needs doing.

Common background tasks that operators run unnecessarily on paid API models:

Heartbeats on paid models with nothing in HEARTBEAT.md. Switch to a local model or increase the interval to 30+ minutes.
Memory extraction on every turn. If extractMinMessages is set to 1, the extraction model fires after every single conversation turn. Most operators get equivalent recall quality with extractMinMessages set to 5 or 10.
Compaction on short sessions. If compaction fires at 50% context usage and your sessions are short, you are paying for compaction on sessions that would have been fine without it.
Session archiving cron jobs on fast intervals. A cron job that archives session files every 5 minutes generates 288 API calls per day. Push it to hourly or daily if session archiving is not time-critical for you.

List every automated background process currently active: cron jobs with their schedules and models, heartbeat interval and model, memory extraction settings (extractMinMessages, model), and compaction settings. For each one, tell me whether it is running on a paid API model and whether the task requires a paid model or could run on a free local model. Flag any that are firing more frequently than necessary.

Spreading load across providers

OpenClaw is not locked to a single provider. You can have five API keys configured simultaneously and route different task types to different providers. This is the cleanest solution to daily rate limits because no single provider sees your full request volume.

A practical multi-provider setup for cost-sensitive operators:

Primary conversation: DeepSeek Chat (deepseek/deepseek-chat): generous rate limits,, low cost per token, good tool use.
Background tasks that need tool use: Groq + LLaMA 3.1 70B: fast, free tier has generous TPM, handles tool calls.
Memory extraction: DeepSeek Chat: consistent output format,, fast enough for extraction latency.
Heartbeats and simple cron jobs: Ollama local model: completely free,, zero API calls, no rate limits at all.
Complex multi-tool tasks or planning: Anthropic Claude Sonnet: pay per use,, reserve for tasks that actually need it.

Read my openclaw.json models configuration. Which providers do I have API keys configured for? For each active task type (conversation, cron jobs, heartbeat, memory extraction, compaction), tell me which provider is currently handling it. Then suggest a routing change that spreads the load across my available providers to reduce the per-provider request volume. Show the suggested config changes. Do not apply them.

Configuring the fallback chain correctly

OpenClaw’s fallback chain is designed to handle transient provider errors: a model going down briefly, a network hiccup, a provider outage. When configured correctly it also handles rate limits: a 429 response triggers the fallback, and the agent continues on the next model in the chain.

The default fallback behavior is not always the right one for rate limit scenarios. A transient error warrants a fast retry on the same model. A rate limit warrants switching providers immediately and staying on the fallback for the rest of the session (or until the rate limit resets). These are different failure modes and benefit from different handling.

Show me my current fallback chain configuration. When I hit a 429 rate limit on the primary model, what happens? Does the agent fall back immediately or retry first? Does it stay on the fallback model for subsequent requests or return to the primary? Is there a local model at the end of the chain as a final fallback if all API providers are rate limited simultaneously?

Building a rate-limit-resilient fallback chain

A well-configured fallback chain for operators hitting daily rate limits has three tiers. The key principle is that each tier should be on a different provider with independently tracked limits. Two models from the same provider do not help: hitting the daily cap on Claude Haiku and falling back to Claude Sonnet still draws from your Anthropic account quota. Tier 1 and Tier 2 must be different providers entirely for the fallback to provide genuine rate limit protection.

A well-configured fallback chain for operators hitting daily rate limits has three tiers:

Tier 1: Primary API model. Your main provider (DeepSeek Chat, Claude Sonnet, or whatever you prefer for quality). This handles the majority of requests on normal days.

Tier 2: Secondary API provider. A different provider with its own separate rate limits. When Tier 1 exhausts its daily cap, Tier 2 takes over. This should be a provider whose limits are unlikely to be exhausted on the same day as Tier 1.

Tier 3: Local model. Ollama running locally. No API, no rate limits, no cost. Quality is lower but the agent stays functional. For users who hit Tier 3, the degradation in response quality is obvious, but a degraded response is better than a hard failure that takes the agent offline until midnight.

Read my openclaw.json. Show me the current fallbackModels setting. If it does not include at least two API providers and one local Ollama model, show me the config change needed to add them. Make sure the local model is last in the chain. Do not apply the change. Just show me what it should look like.

Quick-reference: rate limit type to fix

Symptom	Likely cause	Fix
Rate limit hits at the same time every day	Daily cap exhausted	Add a second provider to fallback chain; move background tasks to local models
Rate limit during burst activity (multiple tasks at once)	RPM or TPM exceeded	Stagger cron schedules; move low-priority tasks to local models
Rate limit even when you are not actively using the agent	Background tasks draining quota	Audit cron jobs, heartbeat, memory extraction; move to local models where possible
Rate limit only on large requests (file reads, long context)	TPM limit on prompt size	Trim system prompt; cap memory injection; load large files on demand only
Agent switches models mid-session without explanation	Fallback triggered by RPM limit	Check logs for 429 errors; configure fallback chain explicitly
Agent fully unavailable until midnight	No fallback configured; daily cap hit on only provider	Add local Ollama model as Tier 3 fallback immediately

Monitoring before OpenClaw hits rate limit: tracking your quota

The most effective fix for daily rate limits is knowing you are approaching them before you hit them, not after. Most operators discover their quota situation when the agent stops responding. At that point the only option is to wait for the reset or fall back to a degraded local model. Monitoring gives you time to act: slow down background tasks, switch to a secondary provider, or defer a large job until after midnight.

Set up a daily quota check. Each morning, check the OpenClaw gateway logs for total API requests made in the last 24 hours, broken down by provider. Compare this to the known limits for each provider. If any provider is above 70 percent of its daily limit, flag it. Run this check now and show me the current state.

Provider rate limit reference (early 2026)

Rate limits change with tier upgrades and provider policy updates. The numbers below reflect typical limits as of early 2026. Always verify against your provider’s current documentation.

Provider	Free tier RPM	Free tier TPM	Daily cap	Reset
Anthropic (Claude)	5 RPM	20,000 TPM	None stated	Per minute
DeepSeek (direct)	60 RPM	No hard TPM	Spend limit	Spend-based
Groq (free)	30 RPM	6,000 TPM	14,400 RPD	Midnight UTC
OpenRouter	Varies by model	Varies by model	Varies	Varies
Ollama (local)	No limit	No limit	No limit	N/A

The practical implication: if you are running primarily on Anthropic’s free tier (5 RPM), a single active conversation with heartbeats and one cron job will hit the limit immediately. Anthropic free tier is not viable for autonomous agent use with background tasks. It is viable only for interactive-only use with no cron jobs and no heartbeat. Moving to the paid Tier 1 (1,000 RPM) solves the RPM problem but not necessarily the TPM problem for large context sessions.

What provider am I currently using as my primary model? Look up the rate limits for that provider and tier based on my current API key configuration. Then tell me: given my typical usage pattern (background tasks, cron frequency, average session length), am I currently operating within those limits, near them, or over them? Be specific about which dimension is the constraint.

Stop losing your agent to rate limits.

Cheap Claw: $17

The complete cost control guide for OpenClaw operators. Covers multi-provider routing, background task optimization, fallback chain configuration, system prompt trimming, and the daily quota math that tells you exactly when you will hit your limit. Everything as paste-ready agent prompts.

Get Cheap Claw

Questions people actually ask about this

Why did I only start hitting rate limits after a few weeks of use?

You added more automation over time. The first week you used OpenClaw conversationally. Then you added a morning brief cron job, then a heartbeat, then memory extraction. Each addition was small. The combined daily request volume crossed your provider’s limit gradually. The spike that finally breaks it is usually the last thing you added, but the problem is the accumulated total from everything running together.

I am on a paid plan. Why am I still hitting rate limits?

Paid plans have limits too, just higher. Tier 1 paid on Anthropic enforces 1,000 RPM and 80,000 TPM. A busy OpenClaw instance with active cron jobs, memory extraction, and long context sessions can still hit those. The fix is the same: spread load across providers and move background tasks to local models. Upgrading to a higher paid tier is rarely the right answer when the root cause is background task accumulation.

Is running a local Ollama model really good enough as a fallback?

For simple conversation and status checks: yes. For complex tool use or long planning tasks: no. The right framing is that the local model keeps the agent functional rather than completely offline. A degraded response from llama3.1:8b is more useful than a hard failure. Configure it as the last fallback, not the primary, and the quality tradeoff is acceptable.

Will adding a second API provider actually help if my rate limits are on tokens per minute?

Yes, because each provider tracks tokens per minute independently. A 429 from DeepSeek on TPM does not affect your Anthropic TPM budget. Routing large requests to a different provider when you are near the TPM limit on the primary effectively doubles your token throughput for high-volume bursts. The fallback chain handles this automatically when configured with multiple providers.

How do I know what my actual rate limits are for each provider?

Anthropic: check your Usage page in the Anthropic console. Limits are shown per model and per tier. DeepSeek: limits are documented in the API docs per tier and enforced by spend limits rather than hard request caps on paid accounts. Groq: limits are shown in the GroqCloud console under your account. OpenRouter: limits depend on the underlying model provider. Check the model card for each model you route through OpenRouter.

My agent hits the rate limit and then recovers on its own. Do I still need to fix this?

Depends on whether the fallback model is acceptable for your workflow. If the agent recovers by switching to a local model and the quality is acceptable, the fallback chain is working as intended. If it recovers by switching to a more expensive model, you are paying more per task than you should. If it recovers by retrying and succeeding after a delay, the limit is a nuisance rather than a problem. Only fix it if the current behavior is costing you money or causing quality issues.

Go deeper

Model RoutingOpenClaw keeps switching models mid-task and I don’t know whyHow rate limits trigger automatic model switching and how to control it.CostI switched to a cheaper model and my agent got worseWhen the cheaper model costs more in total because of extra turns and retries.ConfigurationOpenClaw is ignoring my model overrideEvery reason a model override fails to take effect, including fallback chain conflicts.