The Autonomy Problem – Red Rook AI LLC

OpenClaw can run 100 tasks a week without you touching it. Most setups never get there because the wiring between the scheduler, the queue, and the completion signal is never quite right. This guide covers each piece, in order, with the exact config patterns that work.

TL;DR: Autonomy has two parts. A cron job fires on a timer. A queue file decides what runs next. Wire them together with a processor prompt that claims tasks atomically, runs them, updates status, and sends a completion notification. Once that loop is working, add error handling and task chaining. In that order. The rest is configuration.

What autonomy actually means in OpenClaw

An autonomous OpenClaw setup is not an agent that runs indefinitely on its own. It is a scheduled loop. A cron job fires at an interval. The agent wakes, reads a task file, picks the highest-priority pending task, runs it, marks it done, and goes back to sleep. The next cron fires and the loop repeats.

The pieces that make this work are simpler than they sound. A QUEUE.md file in your workspace. A processor prompt the heartbeat cron runs on every fire. A completion signal that updates task status and optionally notifies you. That is the whole thing. The complexity comes from the edge cases: what happens when a task fails, what happens when two tasks need to run in sequence, what happens when the queue is empty. Those are handled by the processor prompt and a few config patterns.

Show me my current cron jobs. Are any of them set up as a queue processor or heartbeat? If not, tell me what is needed to create one and what files in my workspace I would need to set up first.

Piece 1: The heartbeat cron job

The heartbeat is a recurring cron job that fires every N minutes. It is the engine that makes the queue run without you. The cron fires, the agent wakes, it processes whatever is waiting, and the session ends. Five minutes later it fires again.

Create a heartbeat cron job that fires every 5 minutes. The payload should be a systemEvent on my main session with this text: “Read HEARTBEAT.md if it exists in my workspace. Follow it strictly. Do not infer or repeat old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK.” Use sessionTarget: main and set the name to “heartbeat”.

Model selection matters for heartbeats: Use a local model (ollama/llama3.1:8b or ollama/phi4:latest) for the heartbeat cron. A heartbeat that fires every 5 minutes runs 288 times per day. At even $0.001 per call with an API model, that is $0.29 per day or roughly $9 per month just for the heartbeat pings. Local models run heartbeats at zero API cost.

Choosing the right interval

Five minutes is the standard starting interval for a queue processor. It balances responsiveness with resource usage. For a task queue where most tasks are low-urgency, 15 minutes works fine and halves the number of API sessions created per day. For a setup where you need near-real-time response to new queue entries, 2-3 minutes is reasonable. Below 2 minutes, you risk a task run overlapping with the next heartbeat fire, which can cause duplicate processing if your atomic claiming is not implemented correctly. Start at 5 minutes, observe actual task durations for a week, then adjust.

Check my current heartbeat cron interval. Given the typical task length for the tasks in my QUEUE.md (or the tasks I am planning to add), is 5 minutes the right interval? If a task takes longer than 5 minutes to run, what happens? Tell me whether I need to adjust the interval or add a lock mechanism to prevent overlap.

Piece 2: The QUEUE.md task file

The queue is a markdown file your agent reads on every heartbeat. Each task has an ID, a status, a priority, and a description. The processor picks the next PENDING task, runs it, and updates the status to DONE. New tasks go in as PENDING. That is the full lifecycle.

Create a QUEUE.md file in my workspace with this structure: a header section with the processor instructions, and a tasks table with columns for ID, Priority, Status, Description, and Notes. Add three example tasks: one HIGH priority PENDING, one MED priority PENDING, and one LOW priority DONE. Use the format the processor will expect when reading the file.

Atomic task claiming

The most important behavior in the processor is claiming a task atomically before running it. This means: the agent reads the queue, picks the highest-priority PENDING task, immediately writes the file with that task’s status changed to IN_PROGRESS, and only then starts the work. If the agent runs the work first and updates status after, a heartbeat that fires mid-task will try to claim and run the same task again.

Update my queue processor instructions to include atomic task claiming. The sequence should be: (1) read QUEUE.md, (2) find the highest-priority PENDING task, (3) immediately write QUEUE.md with that task’s status changed to IN_PROGRESS, (4) run the task, (5) write QUEUE.md with status DONE and add a timestamp to the Notes column. Show me the updated processor instructions.

Queue schema design

A minimal queue needs five columns: ID, Priority, Status, Description, and Notes. As your automation grows, you may want to add: a model column (which model to use for this task), a notify column (send a message on completion), a depends-on column (don’t run until another task is DONE), and a scheduled-after column (don’t run before this time). Add columns only when you have a concrete use for them. A six-column queue you actually use beats a twelve-column queue you built speculatively.

Review my current QUEUE.md schema. Based on the tasks I have queued, do I need any additional columns beyond ID, Priority, Status, Description, and Notes? Specifically: do any tasks need to run after other tasks complete (depends-on), do any tasks need a different model than the default, and do I want Telegram or Discord notifications for specific tasks only? Add columns only for the ones I actually need.

Piece 3: The completion signal

The completion signal closes the loop. After a task runs, the agent updates its status in QUEUE.md and optionally sends a notification. Without the notification, the queue is silent. You have to check the file to know what ran. With the notification, you get a message for every completed task and can stay informed without any manual checking.

Update my queue processor instructions to include a completion signal. For tasks where the Notes column includes “notify:yes”, send a Telegram message to my chat ID after the task completes. The message should include the task ID, a one-sentence summary of what was done, and the timestamp. For tasks without “notify:yes”, update QUEUE.md silently.

Error handling without human intervention

The basic queue loop works when tasks succeed. The interesting part is what happens when they fail. Without error handling, a failed task stays IN_PROGRESS forever, blocking the queue. The processor cannot skip it because it does not know the task failed. There is no error state in the schema.

The fix is a FAILED status and a retry count. When a task fails, the processor writes FAILED to the status and increments a retry counter in Notes. On the next heartbeat, the processor checks FAILED tasks before PENDING tasks: if the retry count is below the threshold, it resets the task to PENDING for another attempt. If the retry count exceeds the threshold, it sends a notification and leaves the task as FAILED for human review.

Add error handling to my queue processor. When a task fails, write FAILED to the status and record the error message and retry count in the Notes column. On the next run, if a task has FAILED status and retry count is below 3, reset it to PENDING. If retry count is 3 or more, send me a Telegram message with the task ID and error, and leave it as FAILED. Show me the updated processor instructions and the QUEUE.md schema change to support retry count.

What “failure” means in practice: Most task failures are not catastrophic errors. They are API timeouts, rate limits, or network blips. Setting max retries to 3 with a 5-minute heartbeat means a transiently failing task gets three more attempts over 15 minutes before escalating to you. Most transient failures resolve within that window.

Task chaining and completion triggers

Task chaining runs task B only after task A has completed successfully. This is the depends-on column. The processor checks the depends-on value before running any PENDING task: if the dependency task is not DONE, it skips the dependent task and moves to the next one.

I have a two-step workflow: task A generates a report, task B formats and sends it. Add a depends-on column to my QUEUE.md and set task B to depend on task A. Update my processor instructions to check the depends-on column before running any task. If the dependency is not DONE, skip the task on this run and try again on the next heartbeat.

Passing output between tasks

Task chaining that passes output from one task to the next requires a shared file. Task A writes its output to a known path (e.g., workspace/task-output/T001-output.md). Task B reads from that path. The processor does not need to manage this. It just runs the tasks in the right order. The task descriptions in QUEUE.md specify the read/write paths explicitly.

I want task A to write its output to workspace/task-output/ and task B to read from that path. Show me how to structure the task descriptions in QUEUE.md so the processor knows exactly where task A should write and where task B should read, without the processor needing to interpret or transform the output.

The processor prompt

The processor prompt is the instruction set the heartbeat cron fires. It is what turns a raw QUEUE.md file into an autonomous system. A good processor prompt is unambiguous about priority order, handles the atomic claim step explicitly, defines what to do when the queue is empty, and specifies error behavior.

Write me a complete queue processor prompt for my HEARTBEAT.md. It should: (1) read QUEUE.md, (2) check for any IN_PROGRESS tasks first (these indicate a crash mid-run and should be reset to PENDING), (3) pick the highest-priority PENDING task that has its dependencies met, (4) claim it atomically by updating status to IN_PROGRESS, (5) run the task, (6) update status to DONE with timestamp, (7) send a notification if notify:yes, (8) if the queue is empty, reply HEARTBEAT_OK. Include error handling as described.

Scaling the queue

A queue running one task per heartbeat at a 5-minute interval processes a maximum of 12 tasks per hour, or roughly 288 tasks per day. For most personal automation setups, that is more than enough. If you have a backlog of 100+ tasks, the queue will clear them in under 9 hours at default settings without any intervention. The realistic limit for most setups is not throughput but task duration: a queue full of 20-minute tasks processes only 3 tasks per hour regardless of the heartbeat interval.

If you need higher throughput, the main lever is running multiple tasks per heartbeat fire rather than shortening the interval. The processor prompt can be modified to run up to N tasks per fire if the task queue depth exceeds a threshold. This is more efficient than shortening the interval because it avoids the overhead of session startup and teardown on every fire.

How many tasks are currently in my QUEUE.md with PENDING status? Based on the typical task length and the current heartbeat interval, estimate how long it will take to clear the backlog. If it will take more than 4 hours, suggest whether to increase tasks-per-fire or shorten the interval, and explain the tradeoff for my specific setup.

Monitoring the queue without babysitting it

The goal is to know what is happening without watching it. Three patterns handle this: the activity log, the daily summary, and the exception alert.

The activity log records every task completion and failure to a file. The daily summary is a cron that fires once per day, reads the activity log for the past 24 hours, and sends you a digest. The exception alert fires immediately when a task exceeds its retry limit. Together, they give you complete visibility with zero active monitoring. The activity log is the most important of the three. Without it, the queue is a black box: you can see the current state of QUEUE.md but you have no history of what ran, when it ran, or what it produced. With the log, you can reconstruct the full execution history and diagnose any failure precisely.

Set up a daily queue summary. Create a cron job that fires at 8am my local time, reads my queue activity log for the past 24 hours, and sends me a Telegram message with: total tasks completed, total tasks failed, any tasks currently stuck in IN_PROGRESS, and the three most recent task completions with their summaries. Use ollama/phi4:latest to keep it free.

Common queue failure patterns

Duplicate task runs

A task runs twice when the atomic claim step is missing or fails. If the processor reads the queue, starts the task, and then writes the IN_PROGRESS status after starting work rather than before, a heartbeat that fires mid-task will find the task still PENDING and claim it again. Fix: always write IN_PROGRESS before starting any work, not after. The cron scheduling guide covers duplicate run diagnosis in detail.

Tasks stuck IN_PROGRESS

A task gets stuck IN_PROGRESS when the agent crashes or the session ends after claiming a task but before marking it DONE. The processor prompt should handle this: on every heartbeat, check for any IN_PROGRESS tasks first. If a task has been IN_PROGRESS for longer than the maximum expected task duration (say, 30 minutes), reset it to PENDING and log the stuck state. The silent failure guide covers this pattern.

Queue never clears

A queue that accumulates tasks faster than it processes them never reaches empty. Check two things: the rate at which new tasks are being added versus the processing rate, and whether any long-running tasks are blocking the queue. A single task that takes 20 minutes to run blocks 4 heartbeat cycles and slows the effective throughput to 3 tasks per hour. Break long tasks into smaller steps or move them to a separate slower-interval queue.

My queue is not clearing. Read QUEUE.md and tell me: how many PENDING tasks are there, how many IN_PROGRESS tasks (which may indicate stuck tasks), what is the oldest PENDING task, and based on the task descriptions, are any of them likely to take longer than my heartbeat interval to run?

Queue patterns for specific use cases

Content pipeline

A content pipeline queue typically has three stages: draft, review, publish. Task A drafts a piece to a file. Task B (depends on A) runs quality checks on the file. Task C (depends on B) publishes if checks pass, queues a human review task if they fail. The notify column on task C sends you a link to the published piece.

Research and monitoring

A research queue runs searches, reads sources, and writes summaries to files. The daily summary cron then aggregates the files and sends you a digest. The key design decision: each research task writes to a dated file rather than appending to a shared file, so you can review any day’s research independently.

Operations and maintenance

An ops queue handles recurring maintenance tasks: log rotation, memory audits, usage reports, config backups. These run on a schedule rather than demand. The pattern: use cron schedule fields to set the right cadence on each task rather than putting everything at the same priority and letting the queue decide order.

I want to set up a recurring ops queue for: weekly memory audit, monthly config backup, and daily log rotation. Show me how to structure these in QUEUE.md so they recur automatically rather than being one-shot tasks that I have to re-add after they complete. Include the right priority and any scheduling metadata.

Using isolated sessions for heavy tasks

The default queue processor runs tasks in the main session. For short tasks (under a minute), this works well. For long or resource-intensive tasks, running them in the main session has two downsides: the main session is blocked for the duration of the task, and if the task produces a very long output, it consumes context that reduces the effective working window for subsequent tasks.

The solution is isolated sessions for heavy tasks. An isolated session is a fresh agent context spawned specifically for the task and discarded when it finishes. The main session spawns it, waits for completion, reads the output file the task wrote, and continues. The main session’s context stays clean regardless of how much output the task generates.

My queue has some tasks that run for a long time and produce large outputs. For tasks where the Notes column includes “isolated:yes”, spawn an isolated session to run the task instead of running it in the main session. The isolated session should write its output to workspace/task-output/[task-id]-output.md and then exit. The main processor should read that file and update QUEUE.md when the isolated session completes. Show me how to update the processor instructions to support this.

Sub-agent model selection: Isolated sessions can use a different model than the main session. A task that requires careful reasoning can use deepseek-chat or claude-sonnet while the main processor loop uses a cheap local model. Set the model in the Notes column: “isolated:yes model:deepseek/deepseek-chat”. The processor reads this and passes it to sessions_spawn.

Debugging a queue that is not behaving

Queue problems rarely produce visible errors. The most common symptom is that tasks are not running, not completing, or running in the wrong order, with no error message explaining why. The diagnostic approach is always the same: work backward from the symptom to the component that failed.

Tasks are not running at all

Start with the heartbeat cron. Is it active and firing? A cron that was created correctly can be deleted, paused, or rate-limited. Check the cron list, verify the heartbeat is there, and check when it last fired. If the cron is active but tasks are not running, check HEARTBEAT.md. Does it contain the processor instructions? If HEARTBEAT.md was accidentally cleared or overwritten, the heartbeat fires but does nothing useful.

Diagnose my queue. Check: (1) is the heartbeat cron active and what was its last fire time, (2) does HEARTBEAT.md exist and contain processor instructions, (3) does QUEUE.md exist and have at least one PENDING task, (4) is there any task currently IN_PROGRESS that might be blocking processing. Report the state of each check.

Tasks are running in the wrong order

The processor picks tasks by priority. If tasks with the same priority are running in unexpected order, check whether the processor is reading the priority column correctly. A common cause: the QUEUE.md table formatting was changed and the processor is now reading a different column than intended. Verify the column header names exactly match what the processor prompt expects.

Completion notifications not arriving

Notification failures are almost always a channel configuration issue rather than a queue issue. The task completes and updates QUEUE.md correctly, but the send_message call fails silently. Check the Telegram or Discord channel configuration first. If the channel is working (test by asking your agent to send a test message directly), the issue is in how the processor is constructing the notification call. Check whether it is passing the correct chat ID and channel name.

Send me a test Telegram message right now: “Queue notification test from [timestamp]”. If that works, the channel is fine. If it does not, tell me what error occurred and what the current Telegram configuration shows.

Best practices that prevent most problems

Most queue problems come from not following a small set of invariants. These are the ones that matter.

Always claim before working. Write IN_PROGRESS to the task status before starting any task work. This is non-negotiable. A processor that writes status after the work means any interruption leaves the task in a state where it will be re-run on the next heartbeat.

Keep task descriptions self-contained. The task description in QUEUE.md should include everything the processor needs to run the task without external context: the output file path, the input source if applicable, the notification preference, any model preference. A task that requires the processor to remember something from a previous session will fail unpredictably after compaction. Think of each task description as a standalone instruction that would make sense to a fresh agent with no memory of prior sessions. If the description would confuse someone seeing it for the first time, it is not self-contained.

Write output to files, not to the chat. A task that produces output by writing to a file creates a durable artifact the next task can read. A task that produces output as chat text creates context overhead with no durability. The output file path should always be in the task description.

Use LOCAL models for the processor loop. The heartbeat fires constantly. Every API call it makes costs money. Use ollama/llama3.1:8b for the main processor loop. This model handles queue management tasks (reading QUEUE.md, picking a task, updating status, sending a notification) reliably and costs nothing in API fees. Reserve API models for tasks that explicitly require them, specified in the task Notes column. The processor itself never needs a frontier model.

Review my queue setup against best practices: (1) does the processor claim tasks atomically, (2) are task descriptions self-contained with output paths specified, (3) is the heartbeat using a local model, (4) do tasks write to files rather than producing chat output. Tell me which practices are missing and show me how to fix them.

When to use a queue vs. a direct cron job

Not every automation needs a queue. A direct cron job is simpler, cheaper to run, and easier to debug. Use the queue when the added complexity earns its keep. The biggest mistake in automation design is defaulting to a queue for everything. A queue built for tasks that do not need one is maintenance overhead with no return.

Use a direct cron job when: the task runs on a fixed schedule, has no dependencies, never needs to be prioritized above other tasks, and has no dynamic input. A daily briefing, a weekly usage report, a monthly config backup. These run at the same time with the same instructions every time. A queue adds nothing. If you can describe the task in a single cron schedule expression and it will never need to wait for another task, use a cron job.

Use a queue when: tasks arrive dynamically (you add them as work comes in), tasks have different urgency levels that affect run order, tasks depend on other tasks completing first, or you need retry logic when tasks fail. A content pipeline where articles move through draft/review/publish stages. A research workflow where searches feed into summaries. An inbox triage where priorities change based on what arrives.

Review the automations I have set up. For each one, tell me: should it stay as a queue task or would it be better as a direct cron job? Consider: does it need priority ordering, does it have dependencies, does it arrive dynamically, does it need retry logic. Give me a recommendation for each and the config change if I should switch it.

Setup sequence: the right order

Order matters when setting up the queue. Getting the cron running before the queue file exists means the heartbeat fires with nothing to process and you cannot tell whether it is working. Getting the queue file set up before writing the processor prompt means the processor has no instructions to follow. The right sequence:

Write HEARTBEAT.md with the processor prompt
Create QUEUE.md with at least one test task
Run the processor manually once to confirm it works
Create the heartbeat cron job
Verify the cron fires and processes the test task
Add error handling to HEARTBEAT.md
Add real tasks to QUEUE.md

Step 3 (manual run) is the most commonly skipped. It is also the step that catches 80% of setup problems before the cron is involved. A processor that fails manually will fail on every cron fire. Running it manually once with a test task in the queue proves the basic loop before adding the complexity of scheduled firing. Common things the manual run catches: the processor instructions reference a file that does not exist yet, the task description is ambiguous enough that the processor takes the wrong action, the notification channel is misconfigured, or the atomic claim step is missing from the processor instructions.

Run the queue processor manually right now as a test. Read HEARTBEAT.md and follow the processor instructions for the current state of QUEUE.md. Report: what was the highest-priority PENDING task, did you claim it and run it, what was the result, and is the QUEUE.md status updated correctly?

Frequently asked questions

Does the heartbeat cost money every time it fires?

Only if you use an API model for it. A heartbeat using ollama/llama3.1:8b or ollama/phi4:latest runs at zero API cost. The session has startup overhead (context loading) and a small compute cost on your server, but no external API call. The recommendation: always use a local model for heartbeats. If local models are not available, use the cheapest API model in your config and set the heartbeat to fire less frequently (every 15-30 minutes) to keep costs low. At 30-minute intervals with a cheap API model, the heartbeat costs under $0.50/month.

When should I use a queue versus a direct cron job?

Use a direct cron job for tasks that run on a fixed schedule with no dependencies and no dynamic priority: daily briefings, weekly reports, monthly backups. Use a queue for tasks that: (1) have variable urgency and need priority ordering, (2) depend on other tasks completing first, (3) are added dynamically as new work arrives, or (4) need retry logic on failure. Most setups use both: cron for scheduled recurring tasks, queue for on-demand work and workflows with dependencies.

How many tasks can the queue handle before it gets slow?

The queue file is just a markdown table. Reading and writing it takes under a second at any size. Performance does not degrade with queue length. The practical limit is throughput: one task per heartbeat at a 5-minute interval processes 288 tasks per day. If you need more throughput, increase tasks-per-fire rather than shortening the interval. A queue with 1,000 pending tasks clears in under 4 days at default settings.

My queue stopped processing and I don’t know why

This is one of the most disorienting problems in autonomous setups because the agent appears to be running normally (the gateway is up, sessions start correctly) but nothing from the queue is executing. Four common causes: (1) the heartbeat cron was deleted or disabled, (2) a task is stuck IN_PROGRESS and blocking the processor, (3) HEARTBEAT.md was deleted or has a syntax error the processor cannot parse, (4) the local Ollama model the heartbeat uses is not running. Check in that order. The fastest diagnostic: ask your agent to read QUEUE.md and tell you the current state of every task, then check whether the heartbeat cron is active. The silent failure guide covers this diagnosis end to end.

Can I run multiple queues for different purposes?

Yes. Use separate queue files: QUEUE.md for general tasks, CONTENT-QUEUE.md for content pipeline, OPS-QUEUE.md for maintenance. Each queue file gets its own heartbeat cron with its own interval. The general-purpose queue might fire every 5 minutes while the ops queue fires every 60 minutes. The processor for each queue only reads and writes to its own file, so there is no interference between them. The downside is that each queue needs its own HEARTBEAT.md entry or its own dedicated heartbeat cron job, and monitoring across multiple queues requires reading multiple files. Have each heartbeat processor check only the relevant file. Alternatively, use a single QUEUE.md with a category column and separate processors for each category. The single-file approach is simpler to monitor. The multi-file approach is cleaner for high-volume setups where categories have very different priorities and processing logic.

Can different tasks use different models?

Yes. The simplest approach: include a model preference in the task Notes column (for example, “model:deepseek/deepseek-chat”). The processor reads this and adjusts the model it uses for that task. For tasks that do not specify a model, the processor defaults to the heartbeat model (which should always be a local model for cost reasons). This lets you run routine tasks cheaply on local models while reserving API models for tasks that genuinely need better output quality. The cost difference is significant: a task running on phi4:latest via Ollama costs nothing in API fees. The same task on claude-sonnet costs roughly $0.01-0.05 depending on length. Over 100 tasks per week, routing correctly saves $5-20 per week.

What happens if a task takes longer than the heartbeat interval?

The next heartbeat fires while the task is still running. If the task was properly claimed (status set to IN_PROGRESS before work started), the processor on the next heartbeat will see IN_PROGRESS and skip it. The task completes at its own pace and the processor picks up the next PENDING task on the following heartbeat. If the atomic claim was not implemented, the processor will try to run the same task again, causing duplicate execution. This is why atomic claiming is non-negotiable, not optional. See the full guide in cron job pileup prevention.

Complete fix

Queue Commander

The complete autonomous task system. Includes the full processor prompt, QUEUE.md schema, completion signal config, error handling with retry logic, and task chaining patterns. Drop it into your agent and the queue starts running on the next heartbeat.

Get it for $67 →

Keep Reading:

Queue CommanderMy OpenClaw cron job ran twice, or never ran at allDuplicate runs, missed fires, and the timing edge cases behind both. With diagnostic commands.Queue CommanderMy OpenClaw agent failed overnight and I didn’t find out until morningRetry logic, failure states, dead letter handling. How to keep the queue moving when a task breaks.Queue CommanderTask B ran before Task A finished and everything brokeDependency ordering, output passing between tasks, and multi-step workflows that don’t need supervision.