OpenClaw tool calls are failing silently

Silent tool call failures are one of the hardest OpenClaw problems to catch because the agent does not stop or report an error. It continues the conversation, generates a response that looks correct, and the failure is only visible when you check whether the expected outcome actually happened. A file that was supposed to be written was not. A message that was supposed to be sent was not. A cron job that was supposed to be created is missing. This article covers how to surface these failures, diagnose their causes, and prevent them from recurring.

TL;DR

  • Silent failures happen when a tool returns an error the agent treats as non-fatal or when the agent skips verification after a tool call.
  • The Logician Gate in AGENTS.md is the primary prevention: verify the outcome, not the tool success message.
  • Check the gateway logs for tool call error codes that the agent did not surface in the conversation.
  • Most common causes: permission errors, missing dependencies, wrong paths, and tool approval policies blocking calls silently.

Throughout this article you will see indented blocks like the ones below. Each one is a command you can paste directly into your OpenClaw chat. Your agent will run it and report back. You do not need to open a terminal or edit any files manually.

Why OpenClaw tool calls fail silently

An OpenClaw tool call can fail silently in two distinct ways. The first: the tool is called, it returns an error code or exception, but the agent interprets the error as non-fatal and continues the conversation without surfacing it. The second: the tool is never called at all because an approval policy, security setting, or missing dependency prevents it, and the agent generates a response that implies it ran the tool and completed the task without actually doing either.

The agent is trained to be helpful and to complete tasks. When a tool call fails, the model’s default behavior is to try to work around the failure rather than stop and report it. That workaround can look like a successful response to a casual read even when nothing was actually written, sent, or changed. The Logician Gate in AGENTS.md exists precisely to override this behavior: verify the outcome on disk or in the system before declaring the task done.

I suspect a tool call may have failed silently. Check the gateway logs for the last 15 minutes. List every tool call that returned a non-200 status, an error object, or an exception. Show me the tool name, the parameters used, and the exact error for each one. I want to see failures the agent may not have surfaced in the conversation.

The gateway logs are the ground truth for what tool calls actually did. A tool call that looks successful in the conversation can show a clear error, timeout, or block in the logs. Checking the gateway logs is always the first diagnostic step when you suspect a silent failure, before doing anything else. The conversation is what the agent told you. The logs are what actually happened.

The most common causes of silent tool failures

Silent tool call failures in OpenClaw cluster around a small set of root causes. Knowing which ones to check first saves time during diagnosis.

Permission and approval policy blocks: The exec tool, file write tool, and certain channel operations require approval depending on your security config. If the approval policy is set to deny a class of commands and the agent does not receive explicit approval, the call is blocked. Some approval modes return a pending state rather than an explicit error, which the agent may interpret as a success condition.

Wrong file paths: A write tool call that targets a path the agent does not have write access to, or a path that does not exist and whose parent directory was not created, returns a filesystem error. The agent may not check whether the file was actually created after the write call.

Missing environment dependencies: An exec call that runs a command requiring a binary that is not installed, a Python module that is not available, or an environment variable that is not set will fail at the system level. The error may appear in the exec output but the agent may not parse it as a failure if the exit code handling is ambiguous.

API credential failures: A tool that requires an external API (web search, Telegram, Discord, memory recall) will fail silently if the credential is invalid, expired, or missing from the config. The tool returns an authentication error that the agent may treat as a temporary issue and skip past.

Rate limit suppression: When a rate-limited API returns a 429, the agent sometimes generates a response based on what it expected the tool to return rather than what it actually returned. This is particularly common with web search tools during high-frequency research tasks.

Run a tool call diagnostic. Check each of the following and report pass or fail: (1) exec tool can run a simple echo command, (2) write tool can write a test file to /tmp/test-gambit.txt, (3) web search returns results for a simple query, (4) memory recall returns any results, (5) Telegram send works by sending me a test message. These five checks cover the most common silent failure categories.

Diagnosing silent exec tool failures

The exec tool is the most common source of silent failures in OpenClaw because shell commands can fail in many ways that produce output that looks like success to a model reading it. A command that exits with code 1 has failed by convention, but if the partial output it produced before failing looks plausible, the agent may interpret it as a success and move on without flagging the failure.

Run this diagnostic: execute the following command and show me the complete raw output including any stderr: bash -c 'echo "STDOUT_TEST" && echo "STDERR_TEST" >&2 && exit 0'. I want to confirm the exec tool is capturing both stdout and stderr and returning both to you. If you only see STDOUT_TEST, stderr is being suppressed.

A common exec failure pattern is a command that requires sudo or elevated permissions. The command runs, produces a permission error on stderr, but stdout may be empty, which the agent parses as a clean (if empty) result. The fix is to check exec output explicitly for permission denied, command not found, and non-zero exit codes.

For any exec tool call that produces an empty stdout response, check whether the exit code was non-zero and whether stderr contains error text. From now on, when an exec call produces empty stdout, treat that as a potential failure and show me the full raw output including stderr and exit code before assuming it succeeded.

The exec approval policy is another exec failure source. If execApprovals.enabled is true and the command requires approval but no approval is given, the exec call returns a pending state. The agent should report this explicitly rather than treating it as a success, but in some model responses the pending state is swallowed.

Read my openclaw.json and show me the current execApprovals settings. Is exec approval currently enabled? What approval mode is set? Have any exec calls been blocked in the last session due to pending approval status?

Diagnosing silent file write failures

A file write tool call that fails silently is particularly damaging in an OpenClaw context because the agent believes the file exists and continues referencing it in subsequent tool calls as if the write succeeded. If the next step reads the file the failed write was supposed to create, it either gets a file-not-found error that itself gets swallowed, or it reads stale content from a previous version of the file and the agent processes outdated data as if it were current.

After every file write operation, verify the file was actually written. Run: ls -la [FILE_PATH] and confirm the file exists, has a non-zero size, and has a modification timestamp from the last few seconds. If any of these checks fail, the write did not succeed even if the write tool reported success.

The most common file write failures:

  • Parent directory does not exist: Writing to /home/node/.openclaw/workspace/subdir/file.md fails if subdir/ does not exist. The write tool does not create parent directories automatically on all configurations.
  • Disk full: A write to a full disk silently truncates or fails. Check available disk space if writes are failing unexpectedly.
  • Path contains special characters: Filenames with spaces, colons, or other special characters can cause write failures depending on how the path is passed to the tool.
  • File is locked: A file open in another process can block a write. This is rare but happens with log files and database files.

Check available disk space on the server. Run: df -h /home/node/.openclaw/workspace and show me the used and available space. If available space is under 500MB, that is likely causing write failures. Also check: ls -la /home/node/.openclaw/workspace to confirm the workspace directory exists and is writable.

Diagnosing silent API tool failures

Tools that call external APIs including web search, Telegram, Discord, memory, and OpenAI can fail silently when credentials are invalid, have expired, or the API endpoint is temporarily unavailable. The failure mode varies by tool and provider: some return an empty result set that looks identical to a legitimate empty response, some return an error object the agent parses as a partial success and works around, and some time out after a delay with no result returned to the agent at all.

Test my external API connections. For each of the following, run a minimal test call and report whether it succeeded or failed with the exact error: (1) web search for “openclaw”, (2) memory recall for “Gambit”, (3) Telegram send test message to my chat ID. If any fail, show me the exact error response including any HTTP status code.

Credential-related failures have a specific pattern: the tool call is made, an authentication error (401 or 403) is returned, and the agent may retry once before producing a response that implies the operation worked. Checking the gateway logs for 401 and 403 errors from any tool in the last session will surface these quickly.

Check the gateway logs for any 401 or 403 errors from tool calls in the last 24 hours. List the tool, the timestamp, and the endpoint that returned the auth error. If any credentials appear to be failing, I want to know which ones so I can rotate or re-enter them.

Memory recall returning empty is not always a failure

A memory recall that returns zero results is not necessarily a tool failure. It may mean the query did not match any stored memories. Verify that memories have actually been stored before treating an empty recall as a failure. Run memory_stats to confirm the database has records, then retry the recall with a broader query. Only if the database has records and a broad query still returns nothing is the tool likely failing rather than returning a legitimate empty result.

The Logician Gate: verify outcomes, not tool messages

The Logician Gate in AGENTS.md is the primary structural defense against silent tool call failures in OpenClaw. The rule is deliberately simple: done means verified on disk, in config, or in the live system. Not done means the tool reported success. These are different things, and conflating them is the root cause of most silent failure damage. Every substantive tool call needs a follow-up verification step that confirms the expected outcome actually occurred before the conversation moves on.

Check AGENTS.md for the Logician Gate section. Show me what it says. Is the verification step currently being applied after tool calls that write files, send messages, or make config changes? If the Logician Gate section is missing or incomplete, I want to add explicit verification rules for each tool category.

Logician Gate verification by tool category:

  • File write: Read the file back and confirm it exists with the expected content.
  • Config change: Run openclaw config get [path] and confirm the value is set correctly.
  • Exec command: Check the exit code, stdout, and stderr. Confirm the expected side effect (file created, service restarted, etc.).
  • Message send: Confirm delivery acknowledgment from the channel API. A 200 response from Telegram or Discord confirms the message was accepted.
  • Cron job create: List cron jobs after creation and confirm the new job appears with the correct schedule.
  • Memory store: Run memory recall for the stored content and confirm it returns.

Add explicit Logician Gate verification steps to AGENTS.md for each tool category above. After any tool call in one of these categories, I want you to run the corresponding verification step before reporting the task as complete. Never declare a task done based on the tool’s success message alone.

Tool approval policies that block silently

OpenClaw’s exec approval system can silently block tool calls when the approval mode is set to deny certain command patterns. A blocked exec call returns a denial or pending state that the agent should surface explicitly but sometimes treats as a non-event.

Show me my current exec security settings. Read openclaw.json and show me the execApprovals config and the exec security mode setting. Are there any command patterns currently blocked? Have any exec calls been silently denied in the last session? Check gateway logs for any “approval required” or “exec denied” events.

If you are seeing tool call results that imply commands ran but finding no evidence they did, a silent approval block is the most likely cause. The fix is either to adjust the approval policy to allow the commands you intend, or to explicitly approve them when the approval prompt appears rather than proceeding without approval.

For the exec calls that are being silently blocked, show me the exact command patterns being blocked and the current approval policy rule that is blocking them. I want to decide whether to adjust the policy to allow these commands or to add explicit approval for each one.

Tool call timeouts and partial results

Some tool calls have internal timeouts. A web fetch that takes too long returns a partial result or a timeout error. A memory extraction that runs on a slow local model may time out before completing. These timeout failures can produce partial data that the agent treats as complete.

Check the gateway logs for any tool call timeout events in the last 24 hours. Show me: the tool that timed out, the timeout duration, and how many times it occurred. If the memory extraction tool timed out, check whether the configured timeout in the plugin matches the actual extraction time for the model being used.

The most common timeout silent failure is memory extraction. If the local model used for extraction (llama3.1:8b or phi4) takes longer than the configured timeout to complete, the extraction returns empty or partial results without an explicit error. The AGENTS.md source patch note (90s timeout for the memory plugin) exists for exactly this reason.

Verify the memory plugin timeout patch is still applied. Read the memory plugin source file at the path documented in AGENTS.md and confirm the timeoutMs value is set to 90000 (or whatever was patched in). If the patch was overwritten by a plugin update, re-apply it now.

Making silent failures loud

The long-term fix for OpenClaw silent tool failures is a set of configuration changes and AGENTS.md rules that together make failures explicit and loud rather than silently swallowed. No single setting covers all failure modes, but the combination below pushes the agent toward explicit failure reporting across all the tool categories where silent failures are most common.

I want to add a failure reporting rule to AGENTS.md. The rule: after any tool call that returns an error, exception, empty result when a non-empty result was expected, or non-zero exit code, report the failure explicitly before continuing. Do not generate a response that implies success. State: “Tool call failed: [tool name] returned [exact error]. I cannot complete [task] without resolving this. To proceed, I need [specific requirement].” Show me the current AGENTS.md content before making changes.

This explicit failure declaration rule, added to AGENTS.md, eliminates the large majority of silent failures by making the agent report failures in the conversation rather than working around them. Combined with the Logician Gate verification requirement, the agent has two explicit checkpoints that prevent a failed tool call from being buried in a successful-looking response.

Building a verification habit into every task

The most effective long-term fix for OpenClaw silent tool failures is not a single config setting but a verification habit embedded explicitly in the agent’s operating instructions via AGENTS.md. Every substantive tool call gets a verification step. Not as an afterthought, but as part of the definition of what “done” means.

This is a meaningfully different framing than standard error handling. Error handling catches failures after they surface. A verification habit embedded in operating instructions prevents the conversation from ever moving past a step that did not actually complete, before downstream operations use the failed output as if it were valid. The distinction matters in practice: a silent failure caught immediately by a verification step costs one extra tool call. A silent failure that is not caught propagates through the next three steps of a task chain and costs far more to untangle than it would have to catch at the source.

Review my last five completed tasks. For each one, did I run a verification step after the critical tool calls? Specifically: if a file was written, did I read it back? If a cron job was created, did I list jobs to confirm it? If a config change was made, did I read the config to verify the value? Report which tasks had verification and which did not.

If verification steps were skipped, the completed tasks may have silent failures sitting in them. For any task where the outcome has not been verified, the Logician Gate requires going back and checking now rather than assuming the tool success message was accurate.

Silent failures in chained tool calls

A chained tool call sequence, where the output of one tool is the input to the next, amplifies silent failures. If step 2 in a chain fails silently, step 3 uses incorrect input. Step 4 uses step 3’s incorrect output. By the time you see the final result, the error has propagated through multiple steps and the source is hard to trace.

I am about to run a multi-step tool chain. Before proceeding, I want a verification checkpoint after each step in the chain. After each tool call, confirm the output is what was expected before using it as input to the next step. If any step produces unexpected output (empty when non-empty was expected, error when success was expected, wrong format), stop the chain and report before continuing.

The specific pattern to watch for in tool chains: a read tool call that returns empty because the file does not exist, followed by a write tool call that writes empty content to a new file, followed by a publish or send that ships the empty content. Each step succeeds technically. The chain produces garbage output. The verification checkpoint between steps 1 and 2 catches this by flagging the empty read before the chain continues.

For any tool chain involving a file read followed by a write or publish: verify the read returned non-empty content before proceeding to the write or publish step. If the read returns empty or an error, stop immediately. Do not write an empty file or publish empty content. Report the empty read as a failure and ask for the correct file path or source before continuing.

Silent failures in cron job tool calls

Cron jobs that run in isolated sessions have tool calls that run with no human watching. A silent failure in an isolated cron session produces no visible error in your Telegram or Discord channel unless the cron job has a delivery config that reports failures. Most default cron delivery configs only send on completion, not on error, so a failed cron run goes unnoticed.

List all my cron jobs. For each job, show me the delivery config. Which jobs have delivery configured to report on error or failure? Which jobs only report on success (or not at all)? I want to know which cron jobs could be failing silently with no notification to me.

Adding error delivery to critical cron jobs is a straightforward fix. Set the delivery mode to announce and configure the channel so failures are reported the same way successes are. A cron job that fetches data, processes it, and stores results should report both “completed successfully” and “failed: [error]” rather than silently doing nothing on failure.

Update my most critical cron jobs to include error reporting. For each job that currently has no error delivery, add delivery config that sends me a Telegram message if the job fails or produces an empty result. Show me the proposed delivery config for each job before applying the changes.

Plugin-level silent failures

Some OpenClaw plugins have internal error handling that catches exceptions and returns a default value rather than propagating the error. This is a deliberate design choice in some plugins to prevent one failing plugin from crashing the entire agent turn. But it produces silent failures when the plugin’s default value (often empty or null) is indistinguishable from a legitimate result.

Check the gateway logs for any plugin-level error events in the last 24 hours. Look for: unhandled exceptions, plugin initialization errors, plugin config validation failures, and any plugin that logged a warning but did not propagate an error. Show me the plugin name and the error or warning for each event.

Plugin initialization failures are a common silent failure source. If a plugin fails to initialize on gateway startup (bad config, missing dependency, invalid credential), it is registered but non-functional. Tool calls routed to that plugin return empty or error, and the agent may interpret the consistently empty returns as the plugin’s normal behavior rather than a failure state.

Check the gateway startup logs from the last time the gateway started. Did all plugins initialize successfully? Are there any plugin initialization warnings or errors? List every plugin that had a non-success initialization event and show me the exact error. A plugin that failed to initialize will produce silent failures on every tool call routed to it.

Running a systematic tool call audit

If you have reason to believe silent failures have been occurring over a period of time, a systematic audit of recent outcomes is the most direct way to surface them. The audit compares what the agent reported doing against what actually exists in the system.

Run a tool call audit for the last 7 days. Check the following: (1) List all files in the workspace that were created or modified in the last 7 days and confirm they have non-zero content, (2) List all cron jobs and confirm each one’s last run time matches its expected schedule, (3) Check memory_stats to confirm the record count is consistent with expected memory activity, (4) Verify the last 5 Telegram messages the agent sent were actually delivered (check delivery receipts in the gateway logs). Report any discrepancy between what was reported done and what is verified to exist.

The audit may surface silent failures that occurred days ago. For each one found, trace the root cause, fix the underlying issue, and re-run the affected task if the outcome still matters. Failures older than a few days may be irrelevant if the task context has changed, but documenting them helps identify systemic patterns.

How different models handle tool failures differently

The model driving the agent has a significant effect on whether tool failures surface explicitly or get swallowed. Models with stronger instruction-following tend to respect explicit failure-reporting rules. Models optimized for task completion tend to work around failures rather than stop and report them. Understanding this helps you choose the right model for tasks where silent failures are a serious risk.

The primary model in this deployment is deepseek-chat, which has good instruction-following for explicit rules. When AGENTS.md contains a clear failure reporting rule, it applies it consistently. The risk is during fallback: if deepseek-chat hits a rate limit and falls back to a local model like llama3.1:8b or phi4, those models have weaker instruction-following and the explicit failure reporting rules are less reliably applied.

Which model was running during any recent silent tool failures I experienced? Check the gateway logs for model switches during the relevant time period. If the failure occurred during a fallback to a local model, that may explain why the failure reporting rule was not followed. Show me any model switches in the last 24 hours and the tool calls that followed each switch.

If silent failures cluster around fallback model periods, the fix is not to route sensitive tasks through local models. Write tasks, config changes, and external API calls should be routed to the primary model specifically. Use local models for read-only, non-critical operations where a silent failure is detectable and recoverable. Never route a task that sends a message, modifies a file, or changes config to a local model as the primary handler.

Verifying the fix actually resolved the silent failure

After identifying and addressing a silent tool failure, the fix needs to be verified the same way the original failure should have been verified. A fix that itself fails silently is worse than no fix, because you now have incorrect confidence that the issue is resolved.

The fix for the silent tool failure I just addressed was [describe the fix]. Now verify the fix works: re-run the tool call that was failing and confirm it succeeds. Then run the Logician Gate verification: confirm the expected outcome exists on disk or in the system. Do not report the fix as complete until both the tool call succeeds AND the expected outcome is verified.

One additional check that is worth running after any silent failure fix: reproduce the original failure condition and confirm the fix prevents it. If the failure was a missing parent directory, create a test with a missing parent and confirm the agent now creates it or reports the error explicitly. If the failure was a bad credential, temporarily provide an incorrect credential and confirm the agent reports an auth error rather than a silent failure.

Testing failure conditions directly is the highest-confidence verification that the fix actually works rather than just working under ideal conditions.

Frequently asked questions

How do I know if a tool call failed silently versus just being slow?

Speed is not a reliable signal. A slow tool call that eventually succeeds looks similar to a timed-out tool call in the short term. The reliable signal is the gateway logs: a successful (if slow) tool call shows a completion entry with a result. A timed-out or failed tool call shows an error entry or a timeout entry. If the conversation implies a tool ran but you do not see a completion entry in the logs for that tool within a reasonable time window, the call failed rather than being slow.

My agent said it wrote a file but the file is not there. What happened?

The write tool call failed and the agent did not check the outcome before reporting it as done. This is the Logician Gate failure mode. The most common causes are: the parent directory did not exist, the agent used a relative path that resolved to a different location than intended, or disk space was insufficient. Check the gateway logs for the write tool call and look for the exact error. Then verify the intended path and parent directory exist before retrying the write.

Can I get the agent to always show me raw tool output?

You can ask for it in the conversation, but the more reliable approach is adding a rule to AGENTS.md that requires showing raw exit codes and stderr for exec calls, and read-back verification for write calls. Asking in the conversation works for the current session. The AGENTS.md rule persists across sessions and applies consistently without requiring a reminder each time.

Why does memory recall return empty even though I can see memories were stored?

The most common cause is a scope mismatch. Memory was stored under agent:main scope but the recall query is hitting a different scope, or no scope, and the plugin is not searching the right scope. Check the memory_stats call with --scope agent:main to confirm records exist in the correct scope, then run the recall with the scope explicitly specified. If the database shows records but recall returns empty even with the correct scope, the LanceDB index may need to be rebuilt or the embedding model is returning vectors that do not match what was stored.

The agent sent a message to Telegram and confirmed it, but I never received it. What happened?

The Telegram API accepted the message (returned 200) but either the chat ID is wrong or the bot does not have permission to message that chat. A 200 from Telegram means the API received the request, not that the message was delivered to the recipient. Check the chat ID in your config against the actual Telegram chat ID for your conversation with the bot. Also verify the bot has not been blocked or restricted in that chat. The send tool can report success on a 200 even if the message was never shown to you.

I updated OpenClaw and now tool calls that worked before are failing silently. What changed?

A tool permission config or a plugin update changed the behavior. The most common post-update silent failure causes are: a new exec security setting that denies a command class that was previously allowed, a plugin update that changed how tool errors are formatted (causing the agent to misparse the error as success), or a config migration that reset a tool-specific setting to a more restrictive default. Check the gateway logs for the tool call in question and compare the error format to what you saw before the update.

Is there a way to test all tool calls at once without sending real messages or writing real files?

Run the diagnostic blockquote from the “most common causes” section above, which tests exec, write, web search, memory, and Telegram in sequence. For the write test, use a temp path (/tmp/test-gambit.txt) so the file can be discarded afterward. For Telegram, a test message to yourself is the cleanest option since there is no dry-run mode for message sends. The diagnostic covers the five categories that account for the large majority of silent tool failures in a typical OpenClaw deployment.


Go deeper

DebuggingMy OpenClaw agent is looping on the same task over and overLoop types, stop methods, and loop protection rules that prevent recurrence.SecurityHow to lock down who can send commands to your OpenClaw agentExec approval policies and access control that prevent unauthorized tool calls.MaintenanceHow to update OpenClaw without losing your configPre/post update checklist to prevent tool behavior regressions after updates.