An OpenClaw agent that loops on the same task is one of the more disorienting failures to diagnose. The agent looks busy, it is producing output, but it is running in circles rather than making progress. This article covers the causes, how to identify which one you have, and how to stop each type of loop cleanly.
TL;DR
- Most loops are caused by a cron job firing repeatedly, a heartbeat prompt being misinterpreted, or a tool call failing and retrying endlessly.
- Stop the loop first by identifying and pausing the trigger before diagnosing the cause.
- Then fix the root cause: misconfigured cron, bad heartbeat config, missing loop protection, or a stuck retry.
- Add loop protection to prevent the same loop from recurring.
Throughout this article you will see indented blocks like the ones below. Each one is a command you can paste directly into your OpenClaw chat. Your agent will run it and report back. You do not need to open a terminal or edit any files manually.
Step 1: Identify what kind of loop you have
Before you can stop the OpenClaw agent loop, you need to identify what is driving it. Stopping the wrong thing first either does not work or makes the diagnosis harder. There are four main categories of agent loops in OpenClaw, each with a distinct trigger and a different fix.
Cron loop: A cron job is firing more frequently than intended, or a cron job is triggering agent turns that produce output the cron logic interprets as a trigger for another run. The loop is driven externally by the scheduler.
Heartbeat loop: The heartbeat prompt is configured to check something, the agent finds that thing needs attention, and each response produces output that triggers another heartbeat check immediately. The loop is driven by the heartbeat mechanism misinterpreting its own output as new input.
Retry loop: A tool call is failing and the agent is retrying it repeatedly. The agent is not confused about what to do, it is stuck trying to do the right thing and hitting the same failure every time. The loop is driven by a missing stop condition on retries.
Context loop: The agent has lost track of what it already did and is re-running completed steps. This usually happens after a compaction event or a session restart where the agent reconstructed state incorrectly. The loop is driven by incorrect state reconstruction.
I think my agent is stuck in a loop. Describe exactly what you are doing right now. Are you in the middle of a task? What was the last instruction you received? What tool calls have you made in the last 5 minutes? I need to understand what is driving the repeated behavior before I can stop it.
If the OpenClaw agent is actively looping when you send this, the answer it gives will identify the loop type. If the looping has already stopped because you interrupted it, because it hit a rate limit, or because the spend threshold triggered a pause, check the gateway logs for the sequence of events that triggered the repeated turns and use that to identify the loop type retroactively.
Step 2: Stop the loop before diagnosing
Diagnosing an active OpenClaw agent loop is significantly harder than diagnosing a stopped one. Each new turn that fires while you are investigating adds noise and may overwrite state you are trying to read. Stop the loop first, then investigate. The stopping method depends on which loop type you have, but the immediate first step for any loop is to pause whatever is triggering new turns.
For a cron loop, disable or pause the cron job that is driving the turns:
List all my cron jobs and their schedules. Which cron jobs have run in the last 10 minutes? Disable any cron jobs that have run more than twice in the last 10 minutes. Show me the job IDs and schedules before disabling anything so I can confirm which ones to pause.
For a heartbeat loop, clear the HEARTBEAT.md file so the next heartbeat ping returns HEARTBEAT_OK immediately:
Show me the current contents of HEARTBEAT.md. If it contains any active tasks or instructions that could be triggering repeated heartbeat responses, I want to clear it to stop the loop. What is currently in HEARTBEAT.md?
For a retry loop, identify the failing tool call and the error it is hitting:
What tool call is failing and causing repeated retries? Show me the exact error message from the last 3 failed attempts. What is the tool, the parameters, and the error? I need the exact error text before I can fix the underlying issue.
For a context loop, a session restart is the cleanest stop. Start a new session so the agent initializes from the checkpoint rather than from the incorrect reconstructed state it is working from.
Diagnosing and fixing a cron loop
A cron loop almost always has one of two causes: the cron schedule is wrong (too frequent), or the cron job’s payload is producing output that the job logic treats as a trigger to fire again.
Check the schedule first:
Show me the schedule for cron job [JOB_ID]. What is the everyMs or cron expression? Convert it to a human-readable interval. Was this schedule intentional, or does it look like it might be misconfigured? For example: an everyMs of 60000 fires every minute, not every hour.
A common mistake is setting everyMs to milliseconds when you intended seconds or minutes. 60000ms is 1 minute, not 1 hour. 3600 is 3.6 seconds, not 1 hour. The value is always in milliseconds.
Update cron job [JOB_ID] to fire at the correct interval. The intended schedule is [INTENDED_FREQUENCY, e.g., once per hour]. Calculate the correct everyMs value and apply the update. Confirm the new schedule before saving.
If the schedule is correct but the job is still looping, the issue is in the job payload or delivery config. A cron job with a systemEvent payload injects text into the main session. If that injected text produces a response that looks like a cron trigger, and the delivery config routes that response back to the cron system, a loop can form.
Show me the full config for cron job [JOB_ID] including the payload type, payload text, delivery mode, and any delivery channel config. I want to see if the job’s output could be getting routed back as a new trigger.
Diagnosing and fixing a heartbeat loop
The OpenClaw heartbeat mechanism fires on a regular interval and sends a standard fixed prompt to the agent. If HEARTBEAT.md contains an active task or instruction, the agent responds with a full task response rather than the simple HEARTBEAT_OK acknowledgment, which depending on how the heartbeat delivery is configured can trigger another check or create a rapid response cycle.
Read HEARTBEAT.md. Does it contain anything other than empty content or comments? If it contains active tasks or instructions that would cause me to respond with more than HEARTBEAT_OK, show me exactly what is there. That content may be causing the heartbeat loop.
The correct behavior for a heartbeat response when nothing needs attention is a single line: HEARTBEAT_OK. If HEARTBEAT.md contains a task that always needs attention (for example, a standing check that always finds something to report), every heartbeat ping triggers a full response, which can drive unexpected API spend and create the appearance of a loop.
The fix is structural: tasks placed in HEARTBEAT.md should always have a clear completion condition that removes them from the file once met. A task that runs once, completes, and removes itself from HEARTBEAT.md is the correct pattern. A standing task with no completion condition that never resolves on its own does not belong in HEARTBEAT.md. Move any standing recurring tasks to proper cron jobs with explicit schedules and delivery configs instead.
Clear HEARTBEAT.md of any active tasks that are causing repeated heartbeat responses. Move any legitimate recurring tasks to proper cron jobs with explicit schedules. Show me the proposed cron job configs for any tasks that need to be migrated before making changes.
Diagnosing and fixing a retry loop
A retry loop in an OpenClaw agent occurs when a tool call fails and the agent retries it without an explicit stop condition that halts further attempts. SOUL.md and AGENTS.md both include explicit loop protection rules: if the same command fails twice in a row with the same error, stop and report to Ghost via Telegram. Do not attempt a third retry without new instructions. If those loop protection rules are not in the active context at the time of the failure (because compaction pushed them out, or they were never written), the agent falls back to model training defaults, which lean toward retrying rather than stopping because stopping feels like failing.
What is the exact error from the failing tool call? Is this a transient error (network timeout, rate limit) or a permanent error (invalid credentials, missing file, wrong path)? How many times have you retried it? Show me the error type and the retry count.
Transient errors (timeouts, rate limits, temporary service unavailability) are worth one or two retries with backoff. Permanent errors (invalid API key, file not found, permission denied) are not worth retrying at all. If the agent is retrying a permanent error, it needs the loop protection rule applied: stop after two failures, report the exact error, wait for instruction.
Rate limit loops burn through your budget fast
A common retry loop is hitting an API rate limit and retrying on a short interval. Rate limit responses often include a Retry-After header specifying how long to wait. An agent that ignores the Retry-After and retries immediately will hit the rate limit again on every attempt. The correct behavior is to wait the specified duration before retrying. If you are seeing repeated rate limit errors in close succession, check whether the agent is respecting the Retry-After value.
Stop retrying the failing tool call. Apply loop protection: the command has failed twice, so stop and report. Tell me: the exact tool call that failed, the exact error message, whether this is transient or permanent, and what I need to do to resolve the underlying issue. Then wait for my instruction before attempting anything further.
Diagnosing and fixing a context loop
A context loop is the hardest OpenClaw agent loop type to diagnose because the agent genuinely believes it has not done the work yet. After a compaction event, the agent reconstructs state from the checkpoint and summary. If the checkpoint was not updated before compaction, or if the summary omitted completed steps, the agent starts over from an earlier point in the task.
Read .context-checkpoint.md. What does it say the active task is and what steps have been completed? Now compare that to what I can see in the recent conversation. Are there steps marked as incomplete in the checkpoint that I can see were actually already done? If yes, update the checkpoint to reflect the true current state.
If the checkpoint is outdated, updating it is the fix. An accurate checkpoint means the next session (or the next compaction recovery) starts from the correct point rather than re-running completed steps.
Update .context-checkpoint.md to accurately reflect the current state. Mark all completed steps as done. Set the active task to the correct next step, not a step that was already completed. Show me the updated checkpoint before writing it so I can confirm it is accurate.
After updating the checkpoint, start a new session. The new session reads the corrected checkpoint on startup and begins from the right place rather than re-running from the beginning.
Adding loop protection to prevent recurrence
The best time to add loop protection is after you have experienced a loop, because you now know which specific pattern caused it. Loop protection rules in SOUL.md and AGENTS.md give the agent an explicit stop condition for each loop type.
Check whether AGENTS.md contains a Loop Protection section. If it does, show me what it says. If it does not, I want to add one. The rules should cover: (1) stop after 2 consecutive failures of the same command, (2) stop if the same tool call is being made with the same parameters more than 3 times in a session, (3) stop if HEARTBEAT.md is driving repeated full responses rather than HEARTBEAT_OK. Show me the current content before any changes.
OpenClaw agent loop protection rules work because they give the model an explicit override instruction that takes priority over its trained default behavior of trying to help by continuing. The model is trained to make progress and views stopping as a last resort. The explicit loop protection rule tells it that stopping and reporting the exact error IS the correct progress-making behavior when the alternative is burning through API budget in a circle with no forward movement.
Add a Loop Protection section to AGENTS.md with these rules: (1) If the same tool call fails twice in a row with the same error: STOP. Report the exact error to Ghost via Telegram and wait for instruction. Do not retry. (2) If the same sequence of tool calls has been made more than 3 times in the current session: STOP. Something is wrong. Checkpoint, report, wait. (3) If a heartbeat response is longer than 3 sentences: it is probably not a simple status check. Review whether HEARTBEAT.md contains a task that belongs in a cron job instead.
Understanding the cost impact of a loop
A loop that runs for an hour before being noticed can generate significant API spend. Each turn in the loop costs tokens for input and output. A retry loop hitting a failing API call costs input tokens on every attempt. A cron loop firing every minute costs a full turn every 60 seconds.
Estimate the cost impact of the loop I just experienced. How many turns ran in the loop? What model was being used for each turn? Estimate the approximate token cost per turn and give me a rough total. If the total is above $0.50, flag it explicitly.
The daily spend monitoring rules in SOUL.md (notify at $2, pause all API activity at $5) are the backstop for runaway loops. But those thresholds exist for the full day, not for a single loop event. A loop that burns $3 in an hour will trigger the pause correctly but will also have already consumed most of the day’s budget before stopping.
The better protection is loop detection that fires on the number of consecutive identical actions, before the spend threshold is hit. The loop protection rules above do this. The spend thresholds are a last resort, not the primary defense.
Check my current day’s API spend. Has the loop caused spending above the $2 notification threshold? If yes, report the spend total and the model breakdown. If the total is approaching $5, pause all non-essential API calls until I explicitly approve resuming.
Reading gateway logs to trace a loop
When an OpenClaw agent loop has already completed or been interrupted, the gateway logs are the most reliable source for reconstructing what happened. Each turn is logged with a timestamp, the triggering event, and the tool calls made. A loop appears in the logs as a repeating pattern: the same event type, the same tool calls, at a high frequency.
Check the gateway logs for the last 30 minutes. Look for any patterns of repeated events, repeated tool calls, or high-frequency turn triggers. Show me a summary of how many turns ran, what triggered each turn, and which tool calls were made repeatedly. I want to understand the shape of the loop from the log data.
The pattern you are looking for is a cluster of log entries with similar or identical content at short intervals. Cron loops show up as the same cron job ID firing repeatedly. Retry loops show up as the same tool call with the same parameters and the same error code. Heartbeat loops show up as heartbeat events triggering full model responses rather than HEARTBEAT_OK acks.
From the gateway logs, tell me: what was the first event that started the loop? What was the exact trigger? At what time did the loop start, and at what time did it stop or get interrupted? How many total turns ran during the loop period?
Knowing the exact start trigger is important for the fix. A cron loop that started at a specific time tells you the cron schedule is wrong. A retry loop that started when a specific API call began failing tells you when the API started having issues and what the failing endpoint is.
Loops in isolated cron sessions
Cron jobs with agentTurn payloads run in isolated sessions. These sessions are separate from your main chat session and run autonomously. If an isolated session gets into a loop, you will not see it happening in your Discord or Telegram channel unless the session has a delivery config that sends output to a channel.
List all currently running sessions, including isolated cron sessions. Are any sessions currently active that have been running for an unusually long time? What is the current status and turn count for each active session? I want to identify any isolated sessions that might be stuck in a loop.
An isolated session that has been running for 30 minutes on a task that should take 2 minutes is a strong loop signal. Kill it:
Kill isolated session [SESSION_ID] that appears to be stuck in a loop. After killing it, show me the run history for the cron job that spawned it. How many times has this cron job run in the last hour? If it has run more than its scheduled frequency suggests it should have, disable the job until I can review the config.
Isolated session loops that run without delivery config are silent. They consume API budget with no visible output. This is why checking active sessions when you suspect unusual API spend is an important diagnostic step, even if no messages are appearing in your channels.
How compaction can trigger loops
OpenClaw’s LCM compaction runs when the context window reaches a threshold. During compaction, recent conversation is summarized and older content is compressed. When the agent resumes after compaction, it reads the summaries to reconstruct working state. If the summaries do not accurately capture what was completed, the agent can re-run completed steps.
The specific scenario: the agent is partway through a multi-step task. Compaction fires. The summary says “working on step 3” but the agent actually completed steps 3, 4, and 5 before the compaction threshold was hit. After compaction, the agent reads “working on step 3” and starts step 3 again from the beginning.
Has a compaction event occurred during or just before the loop I experienced? Check the LCM database for recent compaction events and their timestamps. Compare those timestamps to when the loop started. If compaction preceded the loop, show me what the post-compaction summary said the active task was.
Prevention: the checkpoint update protocol in AGENTS.md (update the checkpoint every 5 turns) ensures the checkpoint reflects true current state at the time of compaction. If the checkpoint is current, the post-compaction reconstruction starts from the right place. The loop only happens when the checkpoint is stale.
Model behavior and why some models loop more than others
Different models have different tendencies when it comes to retrying failures and interpreting ambiguous instructions. Models trained toward helpfulness will retry failing tasks more aggressively because stopping feels like failure. Models with better instruction-following tend to respect explicit stop conditions more reliably.
Which model was running during the loop? Was it the primary model or a fallback? Did any model switches happen during the loop period (for example, a primary model hitting rate limits and falling back to a different model with different behavior)? Show me the model usage from the loop period.
A loop that starts when a model fallback occurs is a signal that the fallback model is interpreting the loop protection rules differently than the primary. The fix is to ensure the loop protection rules are explicit and unambiguous enough that any model in the fallback chain will respect them. Vague instructions like “avoid loops” are less effective than specific ones like “if the same tool call fails twice with the same error, stop and send a Telegram message.”
Local models (Ollama) are more prone to loop behavior in complex multi-step tasks because they have less instruction-following capability than larger models. If a task that normally runs on a local model gets into a loop, consider routing that specific task to a more capable model. The cost savings from local models are not worth the budget burn of a loop.
Multi-agent and subagent loops
If you use subagents, a loop can span multiple sessions. The orchestrating agent spawns a subagent, the subagent produces output, the orchestrator interprets that output as a new task trigger, spawns another subagent, and the pattern repeats.
List all subagents I have spawned in the last 30 minutes. How many are currently active? What task was each one given? Is any subagent spawning other subagents? I want to see if the loop involves multiple agents rather than just the main session.
Multi-agent loops are more expensive because each spawned session has its own context initialization cost. They are also harder to stop because you have to kill each active session individually. The safest approach is to kill all active subagents, return to the main session, and reconstruct state from the checkpoint before resuming any multi-agent work.
Kill all active subagent sessions. List them first so I can see what is being terminated. After killing all subagents, read .context-checkpoint.md and tell me the last confirmed completed step. We will resume from there after I confirm the state is accurate.
Setting up spend monitoring to catch loops early
The most practical early warning for an agent loop is a sudden spike in API spend. A loop that runs for 20 minutes at one turn per minute on a midrange model can generate $1 to $3 in spend before you notice it. Spend monitoring gives you an alert before it becomes a bigger problem.
Set up a cron job that checks my API spend every 30 minutes and sends me a Telegram alert if the spend in the last 30 minutes exceeds $0.50. Include in the alert: the total 30-minute spend, the model breakdown, and the number of turns that ran. This gives me an early warning for loops before they hit the $2 daily notification threshold.
The 30-minute interval and $0.50 threshold used above are reasonable starting points for a personal OpenClaw deployment. If your normal 30-minute API spend is around $0.30, a $0.50 threshold catches a loop within its first few minutes of runaway behavior. If your normal spend is higher due to a heavier workload, adjust the threshold to roughly 150% of your typical 30-minute spend so the alert fires on genuine anomalies rather than normal busy periods. The goal is a threshold that fires reliably on unusual spend patterns caused by loops, not on normal operation.
Combine spend monitoring with a turn-count alert for a more targeted loop signal:
Add a rule to the spend monitoring cron: if more than 20 agent turns have run in the last 30 minutes, send a Telegram alert regardless of spend amount. Twenty turns in 30 minutes is one turn every 90 seconds, which is unusually high for a personal agent. Flag it for my review even if the spend is within normal range.
Documenting loops for future prevention
After resolving an agent loop, the most useful thing you can do is document what happened. Not an elaborate postmortem, just a few sentences in your daily memory file: what triggered the loop, what kind of loop it was, how it was stopped, and what protection was added to prevent recurrence.
Write a brief loop incident note to memory/[TODAY].md. Include: the date and time of the loop, the type of loop (cron/heartbeat/retry/context), the root cause, how it was stopped, the approximate API spend impact, and what loop protection was added afterward. Keep it under 10 lines.
The reason to document each OpenClaw agent loop incident is pattern recognition over time. If you see the same loop type recurring across multiple separate incidents, the root cause is systemic rather than situational: a consistently stale checkpoint, a cron schedule that gets reset after updates, a specific tool that retries too aggressively. Documented incidents across multiple sessions make the systemic pattern visible where a single memory entry does not have enough signal to identify it.
When to rebuild the session entirely
Most loops are fixable without a full session restart. But there are cases where the cleanest path forward is to start fresh: the agent’s working state is so corrupted by the loop that correcting it piecemeal would take longer than a clean rebuild, or the loop has left partial work scattered across multiple files and the agent cannot reliably distinguish what was completed from what was abandoned mid-loop.
Assess whether I should restart the session or try to continue from the current state. The loop affected these areas: [list what the loop was doing]. How confident are you that the current working state is accurate? Are there any partial writes or incomplete operations that could cause problems if I continue without a restart?
Indicators that a restart is better than continuing:
- The loop wrote partial data to files that are now in an inconsistent state
- The checkpoint is stale by more than 10 turns and the agent cannot reconstruct current state reliably
- The loop involved multiple subagents and it is not clear which ones completed their tasks
- The agent is expressing uncertainty about what it already did versus what still needs doing
In any of these cases, writing a full context checkpoint and then starting a new session is faster and significantly safer than trying to patch the current broken state back to consistency piece by piece.
Frequently asked questions
How do I tell if my agent is looping or just working on a long task?
The key difference is progress. A long-running task produces different output on each turn: it is reading a new file, processing a new item in a queue, or moving to the next step in a multi-step sequence. Each turn advances the state. An OpenClaw agent loop produces the same or similar output on every turn: the same tool call with the same parameters, the same error response, the same reply to the same prompt. If you see the agent making the same tool call with the same parameters more than twice in a session without a different result, that is a clear loop signal. If each turn is touching a different resource, writing to a different file, or producing a different result, it is almost certainly a legitimate long task. The distinguishing question is: did this turn move anything forward?
Can I set a maximum number of turns for a cron job?
Yes. Cron jobs with agentTurn payloads in isolated sessions have a timeoutSeconds parameter. Setting a reasonable timeout, for example 300 seconds for a five-minute task or 600 seconds for a more complex one, prevents a single cron execution from running indefinitely, even if the OpenClaw agent inside the isolated session gets stuck in a loop. The session is terminated when the timeout expires and the cron job is marked as completed regardless of whether the task finished. This does not prevent the cron from firing again on its next scheduled interval, but it caps how long any single run can consume. Combine a turn timeout on each cron job with a reasonable schedule interval and you have both a per-run cap and a per-interval frequency cap. Together they bound the maximum spend any single cron loop can generate.
My agent sent me a Telegram message saying it was looping. What do I do?
That means the loop protection in AGENTS.md worked correctly. The agent hit the stop condition, sent the Telegram alert, and is now waiting for instruction. Read the message carefully: it should specify the exact tool call that was failing, the exact error text, and how many times it retried before stopping. Reply with either a concrete fix (the corrected parameter, the right file path, the new API key, the clarified instruction) or explicitly tell it to abandon the task and move on. Do not reply with just “continue” or “try again” without providing the fix. That response sends it back to the same failure point immediately and you will get another loop protection alert within seconds.
Is there a way to see a log of all turns the agent took during a loop?
Yes. The gateway logs contain a record of every inbound message and every tool call. The LCM database also stores turn history before compaction. For a recent loop, the gateway logs are the most complete record of what happened turn by turn. Ask the agent to read the relevant log section and summarize the turn sequence, or SSH to the server and examine the logs directly if the agent is unresponsive.
My agent looped overnight and I woke up to hundreds of Telegram messages. How do I prevent this?
Two settings address this. First, ensure SOUL.md loop protection rules are active and include a maximum turns-per-session limit. Second, configure the Telegram channel notification settings on your phone to not push notifications for every message from the bot. Muting the bot conversation means overnight loops produce messages you can review in the morning rather than waking you up. The $5 daily spend pause is also your financial backstop: an overnight loop should hit that threshold and stop before causing serious damage.
Can a loop get my API key banned or rate limited?
A sustained loop hitting an external API heavily enough can trigger rate limiting by the provider. Most providers implement rate limiting at the request-per-minute and token-per-day level. A fast loop can exhaust the per-minute limit, resulting in 429 errors. If the loop then retries on those 429s without respecting the Retry-After header, the provider may implement more aggressive throttling. In severe cases, providers have suspended accounts for automated abuse patterns. The loop protection and spend monitoring rules in AGENTS.md and SOUL.md are designed to prevent this, but they only work if the context is active and the rules are current.
What is the difference between a loop and the agent being slow?
Speed. A slow agent is taking longer than usual on each turn but is making forward progress. Each turn completes and moves the task one step forward. A loop repeats the same step without making progress. If you send a command and you are waiting more than five minutes with no visible progress in the conversation, the agent is either working on a genuinely difficult turn or has silently crashed. If you are seeing rapid repeated messages, you have a loop. Check the cron job run history and the gateway logs to distinguish a crashed agent from a slow one.
