How to reproduce an OpenClaw bug before you report it

A bug report without a reproduction case is hard for anyone to act on, including yourself. When an OpenClaw issue cannot be reliably reproduced, it is nearly impossible to know whether a fix worked, whether the problem was actually a bug or a configuration issue, or whether it will recur in a future session. This article walks through the systematic process of isolating and reproducing OpenClaw issues before reporting them, whether to the OpenClaw Discord community, the core team via GitHub, or just to yourself as a private debugging record to verify a fix worked.

TL;DR

A good reproduction case is the minimum set of steps that consistently triggers the problem on a clean state.
Before reporting anything, confirm it is actually a bug and not a config issue, model behavior, or expected limitation.
Gather first: OpenClaw version, model, plugin list, exact error, log excerpt, and the steps to reproduce.
Isolate variables one at a time: plugins first, then models, then config settings. Keep removing elements until you find the smallest failing case that still reproduces consistently.

Throughout this article you will see indented blocks like the ones below. Each one is a command you can paste directly into your OpenClaw chat. Your agent will run it and report back. You do not need to open a terminal or edit any files manually.

Step 1: Confirm it is actually a bug

Before investing time in building a reproduction case, establish that what you are seeing is genuinely unexpected behavior rather than expected behavior you did not anticipate. Many reported OpenClaw issues that look like bugs on first encounter are configuration mismatches, model behavior that falls within normal variance, or documented limitations that have been present since the feature launched.

Three questions that distinguish bugs from non-bugs:

Did this ever work? If the feature never worked as you expected from day one, the issue is likely a misunderstanding of what it does or how it needs to be configured rather than a regression introduced in a recent version.
Did something change before it broke? An OpenClaw update, a config change, a newly installed plugin, or a model routing adjustment that immediately preceded the issue points strongly toward a configuration root cause rather than a pre-existing software bug.
Does the OpenClaw documentation describe this behavior as expected? Some behaviors that feel wrong to a new operator are intentional design choices that exist for security or stability reasons and are explicitly documented as such.

I am seeing this behavior: [describe the problem]. Help me determine whether this is a bug or expected behavior. Check: (1) Is this behavior documented anywhere in the OpenClaw docs? (2) Is there a config setting that controls this behavior that I might have set incorrectly? (3) Has anything changed in my setup recently that could explain this? Review my openclaw.json and tell me what you find.

If the answers suggest it is a configuration issue rather than a software bug, fix the config first and confirm the behavior changes before doing anything else. If the behavior persists even after correcting the config to match what the documentation describes as correct, you likely have a real bug that is worth the time to reproduce properly.

Step 2: Gather environment information

A reproducible OpenClaw bug report requires precise environment information. The exact same symptom can have completely different root causes on different OpenClaw versions, operating systems, and plugin combinations. Gathering this information first, before you start changing anything to investigate, saves significant time during isolation because you know exactly what you were running at the moment the bug occurred.

Gather my OpenClaw environment information for a bug report. Show me: (1) OpenClaw version (run openclaw version or check the gateway status), (2) Node.js version, (3) Operating system and kernel version, (4) Current model configured as primary, (5) List of all enabled plugins with their versions, (6) Whether Ollama is in use and which models are installed. Format this as a block I can paste into a bug report.

Save this output before you start changing anything. The environment at the time of the bug is the only environment that matters for reproducing it. If you update OpenClaw, install or remove a plugin, or change config values while investigating, you may not be able to get back to the exact original conditions. Capturing the environment snapshot immediately is cheap insurance against losing reproducibility during the investigation.

Write the environment information you just gathered to a file: tmp/bug-report-env.md. Include the current timestamp. This gives me a snapshot of the environment at the time the bug occurred that I can reference even if I update or change config before reporting.

Step 3: Capture the exact error

Reproducing a bug requires knowing precisely what the failure looks like. Vague descriptions like “it did not work” or “the agent got confused” are not enough. You need the exact error message, the exact unexpected output, or the exact missing outcome.

For the bug I am experiencing: show me the exact error message from the gateway logs. Run: tail -100 /var/log/openclaw/gateway.log or the equivalent log path for my installation. If no log file exists, check the systemd journal: journalctl -u openclaw -n 100 --no-pager. I want the exact error text, not a summary.

The gateway log is the most reliable source for exact error text because it records what actually happened at the system level rather than what the agent described in the conversation. The agent may paraphrase, summarize, or reinterpret an error before presenting it to you, sometimes in ways that obscure the actual cause. The raw gateway log shows the original error exactly as it occurred, with the component name, log level, and full message intact.

From the gateway logs, find the log entry for the specific error I am reporting. Show me: the full log line including timestamp, log level, component name, and message. If there is a stack trace, show the full stack trace. If there are related log entries in the 30 seconds before and after the error, show those too for context.

Write the exact error to the bug report file:

Append the exact error text and surrounding log context to tmp/bug-report-env.md. Add a section header “## Error” and paste the raw log excerpt. Do not summarize it. I need the literal text for the bug report.

Step 4: Identify the minimal reproduction steps

The reproduction steps are the most important and most useful part of any OpenClaw bug report. A reproduction case is the minimum sequence of actions that reliably triggers the problem starting from a clean state. Minimum means the fewest steps possible that still produce the failure on every attempt. Each extra step in the sequence is an additional variable that makes it harder for anyone else to identify what is actually causing the bug and to verify that a proposed fix resolves it.

Start with what you know triggered the issue and work backward step by step, removing steps to see if the failure still occurs:

Help me find the minimal reproduction steps for this bug. I currently believe the steps are: [list what you did before the bug occurred]. Let’s test whether each step is necessary. Start with the last step before the failure: if I skip everything before it and go directly to that step, does the bug still occur? Walk through this elimination process with me.

Common reduction paths:

Remove plugins: Disable plugins one at a time and check whether the bug persists. If disabling a specific plugin makes the bug disappear, that plugin is involved.
Simplify the input: If the bug involves a long prompt or complex input, gradually shorten it until you find the minimum input that still triggers the failure.
Start from a fresh session: If the bug requires a specific session state to trigger, that required state is itself part of the reproduction case. Document exactly what state is needed.
Use the default config: Temporarily revert to default config values for the relevant setting and test whether the bug still occurs.

I want to test whether this bug is plugin-related. List all currently enabled plugins. Disable each non-essential plugin one at a time, testing the bug after each disable. Report whether the bug persists or disappears after each change. Start with plugins that seem most likely related to the failing behavior.

Step 5: Confirm the reproduction is reliable

A bug that occurs once is hard to act on. A bug that occurs consistently on every attempt from a fresh state is the gold standard for a bug report. Before reporting, run the reproduction steps at least three times from a clean starting state to confirm the failure is consistent rather than intermittent.

I believe I have found the minimal reproduction steps for this bug. The steps are: [list them]. Let’s verify this is reliable. Start a new session, follow exactly these steps from a clean state, and tell me whether the bug occurs. Do this twice and report the result each time. If it fails to reproduce either time, the reproduction case needs more refinement.

If the bug reproduces consistently on every run from a clean state, you have a solid reproduction case that is ready to include in a report. If it reproduces inconsistently (sometimes but not every time), you have a timing-dependent or state-dependent bug, which is a more complex category that requires additional investigation before the reproduction case is complete.

Intermittent bugs need additional context

If the bug only occurs sometimes, the reproduction case is incomplete. Something varies between runs that affects whether the bug triggers. Common variables: model response variance (the same prompt can produce different outputs), timing (some bugs only occur when two events happen in close succession), session state (prior context affects behavior), and external dependencies (network latency, API availability). For an intermittent bug, document the conditions under which it occurs most frequently rather than claiming it always occurs.

Isolating model behavior from software bugs

A significant fraction of apparent OpenClaw bugs turn out to be model behavior variance rather than software failures when investigated carefully. The model powering the agent is a probabilistic system: the same input does not always produce exactly the same output on every run. Before reporting a behavior issue as an OpenClaw software bug, test it across multiple models to see whether the unexpected behavior is consistent across all models or specific to one particular model or model provider.

I am seeing this behavior: [describe it]. Test whether this is model-specific. Run the same prompt using three different models: the current primary model, a local model (phi4 or llama3.1:8b), and if available, a different API model. Does the same unexpected behavior occur with all three models, or only with the current primary?

If the unexpected behavior is model-specific (only one model or one provider produces it consistently), you almost certainly have a model behavior issue rather than an OpenClaw software bug. The appropriate fix in that case is a prompt engineering change in SOUL.md or AGENTS.md, a model routing adjustment that avoids the problematic model for that task type, or acceptance that the behavior is within the model’s normal variance. If all tested models produce the same unexpected behavior, the issue is in the OpenClaw infrastructure layer and is worth reporting.

Run the reproduction steps one more time but switch to a different primary model before running them. Use ollama/phi4:latest for this test. Does the same failure occur? If yes, the issue is in OpenClaw’s infrastructure. If no, the issue is in how the primary model interprets the relevant instruction or tool call.

Narrowing down when the bug was introduced

If you know the bug was not present in a previous OpenClaw version, version bisection helps identify when it was introduced. This is more useful for reporting than for personal debugging, but knowing the approximate introduction window helps the OpenClaw team locate the relevant change quickly.

Check the OpenClaw changelog or release notes for changes between the last known good version and the current version. What changed in the areas relevant to this bug (plugins, model routing, exec tool, memory system, etc.)? Show me any changes that could plausibly explain the behavior I am seeing.

Even without version rollback capability, reviewing the changelog between the last known working state and the current state narrows the search. A bug that appeared after a specific plugin update is almost certainly related to that update. A bug that appeared after a config migration is almost certainly related to a setting that changed in the migration.

Writing the bug report

A complete OpenClaw bug report has six parts. All six are required for someone else to reproduce and investigate the issue efficiently. Missing any one of them reduces the usefulness of the report significantly and often results in a request for more information before anything can be investigated.

Compile a complete bug report from the information we have gathered. The report needs: (1) Environment block (version, OS, Node, model, plugins), (2) Summary: one sentence describing what is wrong, (3) Expected behavior: what should have happened, (4) Actual behavior: what actually happened, including exact error text, (5) Reproduction steps: numbered list, minimum steps from a clean state, (6) Additional context: anything else relevant (what changed before the bug appeared, whether it is consistent or intermittent, which models reproduce it). Write this to tmp/bug-report-final.md.

The expected vs. actual behavior section is the single most important section for triage and initial investigation. Someone reading the report needs to immediately understand the gap between what the system is supposed to do and what it actually does. Vague descriptions here make triage much harder. Be specific about both sides of the gap: not “it did not work” but “the write tool should have created a file at path X with content Y, but no file was created and the gateway log shows error Z.”

The value of bug reports you never send

Going through the systematic reproduction process is valuable even for OpenClaw bugs you intend to fix yourself and never report publicly. The process forces you to be precise about what is actually wrong, which turns out to be the same precision you need to verify that any fix you apply actually worked. A bug you cannot reproduce reliably is also a bug you cannot confidently verify has been fixed after you make a change. A bug you can reproduce in three clear steps from a clean starting state is a bug where you will know within minutes whether your fix resolved it or whether the underlying cause is still present.

After I apply a fix for this bug, run the reproduction steps again from a clean state to verify the fix worked. The fix is successful only if: (1) the reproduction steps no longer trigger the failure, (2) the original expected behavior now occurs, and (3) the fix does not introduce any new unexpected behavior in the areas I changed. Run all three checks and report the results.

This closing verification step is the Logician Gate principle applied directly to bug fixes: verified on disk, in config, or in the live system. Not done based on the fix seeming logically correct in theory. A fix that has not been verified against the original reproduction case has not been confirmed to actually work.

Isolating configuration as a cause

Configuration is the most common non-bug cause of unexpected OpenClaw behavior, and it is also the easiest root cause to rule out or confirm. Before attributing a problem to a software bug, spend a few minutes confirming the current config is valid and that every relevant setting has the value you think it has. OpenClaw operators frequently discover that what looked like a software bug was a config value they changed during a tuning session weeks ago, forgot about entirely, and only rediscovered when isolating the problem systematically.

Show me the current value of every config setting relevant to the behavior I am debugging. For a bug related to [area: exec tool / memory / model routing / cron / channel messaging], read openclaw.json and extract all settings in that area. Show me the exact values rather than summarizing them. I want to verify each setting is what I believe it is.

The most reliable config isolation technique is the binary test: start from a known working config, which for OpenClaw means the minimal defaults, and add your customizations back one at a time, restarting the gateway and testing after each addition, until the bug reappears. The last customization you added before the bug reappeared is almost certainly the cause or a contributing factor. This binary approach is slower than guessing but far more reliable, and it generates evidence rather than hunches. It is especially valuable when multiple config changes have accumulated over months of tuning and you have lost track of what was changed when.

List every non-default config value in my openclaw.json. What settings have I changed from the defaults? For each non-default value, tell me: what the default is, what I have set it to, and whether that setting could plausibly affect the behavior I am debugging. I want to identify which customizations to test disabling during isolation.

If reverting a config value to default makes the bug disappear, you have found the cause. The next step is determining whether your custom value is invalid (and needs correction) or whether OpenClaw is handling a valid config value incorrectly (which is an actual bug worth reporting).

Isolating session state

Some OpenClaw bugs only surface when the agent session has reached a specific state: a particular compaction level, a specific set of memories loaded via auto-recall, a certain number of prior turns in the context window, or a specific combination of prior tool call results in the session. These state-dependent bugs are harder to reproduce than stateless ones because the required state must be reconstructed precisely before running the reproduction steps. The reproduction case needs to document the required session state as explicitly as it documents the steps themselves.

For the bug I am investigating: does it occur in a fresh new session, or only after the session has been running for some time? Test this by starting a completely new session and immediately running the reproduction steps. If the bug occurs in a fresh session, session state is not required. If it does not occur, something about the session state is part of the reproduction case.

Common state requirements that need to be documented:

Post-compaction state: The bug only occurs after LCM compaction has run at least once in the session.
Specific memory contents: The bug is triggered by a specific memory being present in the recall results for a given query.
High context usage: The bug occurs when the context window is near its limit but not yet at the compaction threshold.
Prior tool call state: The bug requires that a specific previous tool call (a file write, a config change) happened earlier in the session.

Check the current session state: context usage percentage, whether compaction has occurred this session, and what memories were most recently loaded via recall. Write this to tmp/bug-report-session-state.md. This documents the session state at the time the bug occurs so we can reproduce it in a fresh session if needed.

Increasing log verbosity for hard-to-catch bugs

The default OpenClaw gateway log level is set to capture errors and warnings but not the full detail of every internal event. For subtle bugs involving tool call timing, plugin interaction, model-tool handoffs, or internal state transitions, increasing the log verbosity before running the reproduction case gives you a substantially more complete trace of exactly what happened during the failure.

Check the current log level setting in my openclaw.json or gateway config. What is the current verbosity level? Is there a debug or verbose mode available? I want to increase the log level before running my reproduction steps so the logs capture more detail about what happens internally during the failing operation.

Higher log verbosity produces larger log files and can affect performance slightly, but for a debugging session the tradeoff is worthwhile. The goal is a log file that shows the full sequence of internal events during the reproduction, not just the surface-level error. After capturing the verbose log excerpt from one or two reproduction runs, restore the normal log level. Verbose logging in production accumulates large log files quickly and can affect gateway performance over time on resource-constrained servers.

Temporarily enable verbose or debug logging for the gateway. Capture the log output during one complete reproduction of the bug. Then restore the normal log level. Save the verbose log excerpt from the reproduction to tmp/bug-report-verbose-log.md. Include only the portion of the log from 5 seconds before to 5 seconds after the failure to keep the excerpt manageable.

Final checklist before posting a bug report

Before sharing a bug report publicly, run through this checklist. Each item represents a common reason bug reports get closed without action or require multiple follow-up rounds to resolve.

Run a final check on the bug report before I share it. Verify: (1) The OpenClaw version is included and current, (2) The reproduction steps start from a clean state with no assumed prior setup, (3) The expected vs actual behavior is described specifically with exact text, not general descriptions, (4) Any sensitive data (API keys, credentials, personal info) has been redacted, (5) I have confirmed the bug reproduces at least twice from a clean state, (6) I have tested whether it is model-specific. Report pass or fail for each item.

A report that passes all six checks is complete and actionable. A report that fails any item should be updated before posting. The few minutes it takes to complete the checklist dramatically increases the likelihood that the report gets properly investigated and resolved, rather than closed as needing more information or waiting indefinitely for a follow-up that never arrives.

Tracking bugs you find for yourself

If you run an OpenClaw instance with any regularity, you will encounter issues that are not worth the overhead of a formal public report but are absolutely worth tracking privately for your own reference. A personal bug log is a lightweight record of what has gone wrong in your specific deployment, what you tried during investigation, and what ultimately resolved each issue. Over time it accumulates into a reference for patterns specific to your setup: issues that recur under predictable conditions, issues that resolved themselves after an update, and issues that turned out to share a root cause even though they looked unrelated on first encounter.

Create a personal bug log at memory/bug-log.md if one does not already exist. For the current bug I am investigating, add an entry with: date, one-line description, environment snapshot reference, reproduction steps, current status (investigating / resolved / workaround in place), and resolution if known. Update this entry as the investigation progresses.

The bug log also serves as a before/after record for OpenClaw updates. When you update, check whether any open entries in the log are resolved by the update. If an update introduces a new issue, you have a clear record of when it appeared. When an issue you reported publicly gets fixed in a new version, the log gives you the reproduction case you need to verify the fix works in your specific environment.

When not to file a bug report

Not every OpenClaw problem should be reported as a bug. Some issues are better handled through other channels, and filing a bug report for the wrong category of problem adds noise for the maintainers and is less likely to result in the help you need.

Do not file a bug report for:

Model behavior you dislike: If the model is responding in a way you find unhelpful but not broken, that is a prompt engineering issue, not a bug. Adjust SOUL.md or AGENTS.md to shape the behavior you want.
Community skill or plugin issues: File an issue with the skill or plugin author, not the OpenClaw core team.
Expected limitations: If the documentation says a feature has a known limitation and you are hitting that limitation, that is not a bug.
Configuration help: If you are not sure whether your config is correct and the system is behaving oddly, ask in the Discord community for config help before filing a bug.
One-time transient failures: A failure that occurred once and has never recurred is almost certainly transient. Document it privately and monitor, but do not file a bug until it recurs.

For the issue I am currently investigating: based on what I know so far, should this be reported as a bug, handled as a config issue, or addressed through prompt engineering? What is the strongest evidence for each possibility? Help me decide which category this falls into before I spend more time on a formal bug report.

Using community resources during investigation

The OpenClaw Discord community is a genuinely active and useful resource during the investigation phase, well before you commit to writing a formal bug report. Other operators running similar configurations may have already encountered the same issue and found a working workaround. Asking the community with a clear, specific problem description, not a full formal bug report but just the symptom, your OpenClaw version, and what you have already tried, often surfaces an existing solution or points you toward the right config change faster than continuing to investigate independently.

Draft a short community post for the OpenClaw Discord describing the issue I am investigating. The post should be: (1) two to three sentences describing the symptom, (2) my OpenClaw version and primary model, (3) one sentence on what I have already tried, (4) a direct question asking whether anyone has seen this and found a workaround. Keep it under 150 words so people will actually read it. Show me the draft before I send it.

The community post serves a second purpose beyond getting help: if other operators confirm the same issue, you have evidence that it is a software bug rather than something specific to your environment. Multiple independent operators seeing the same issue on different setups is strong evidence for a real bug. That evidence strengthens a subsequent bug report significantly.

Frequently asked questions

What if I cannot reproduce the bug at all after it happened once?

Document what you saw as precisely as possible and move on. A one-time failure with no reproduction path is either a transient external factor (network blip, temporary API unavailability, model response variance) or a timing-dependent bug that requires specific conditions to trigger. Check the gateway logs for the timestamp of the failure and note the exact log entries. If it recurs, you will have the log from the first occurrence to compare. If it never recurs, it was probably transient and not worth further investigation.

How do I report a bug to the OpenClaw community without exposing my config?

Sanitize the bug report before sharing. Replace your actual API keys, chat IDs, domain names, and personal details with placeholders before posting. The environment block needs OpenClaw version, Node version, OS, model name, and plugin list, none of which expose sensitive data. For log excerpts, redact any lines that contain credentials or personal identifiers. The OpenClaw Discord community and GitHub issues are the standard places to report; neither requires you to share sensitive config to get help.

Is a bug in a community skill or plugin an OpenClaw bug?

No. Community skills and plugins are not maintained by the OpenClaw team. A bug in a skill’s SKILL.md instructions or a plugin’s code should be reported to the skill or plugin author rather than OpenClaw core. The exception is if the skill or plugin is triggering unexpected behavior in OpenClaw’s own infrastructure (gateway crashes, memory corruption, tool registration conflicts). Those are worth reporting to OpenClaw even if the trigger is a community plugin.

My bug involves private data. Can I still report it?

Yes, with sanitized data. Replace any private content with realistic but fictional placeholders that preserve the structure of the input. If your bug requires a specific type of input to trigger (long content, Unicode characters, a specific file structure), describe the characteristics of the input rather than sharing the actual content. Most bugs do not require the exact private data to reproduce; the structure and size of the data matter more than the content.

The OpenClaw team asked me for a minimal reproduction case. What does that mean?

It means the shortest possible sequence of steps, starting from a default OpenClaw installation, that produces the failure. If your reproduction case requires 10 custom config changes, 5 specific plugins, and a 2000-word prompt, the team cannot test it without replicating your entire setup. A minimal case might be: start OpenClaw with default config, enable one specific plugin, send this 10-word prompt, observe this specific error. The smaller and more self-contained the reproduction case, the faster the team can investigate and fix it.

How do I report a security bug?

Security bugs should not be reported publicly in Discord or GitHub issues. Check the OpenClaw documentation for the security disclosure process, which typically involves a private report to the maintainers via email or a dedicated security reporting channel. A security bug reported publicly gives attackers the information before a fix is available. If you are unsure whether something is a security issue, err on the side of private disclosure. Responsible disclosure protects other OpenClaw operators who may be affected by the same vulnerability.

I found a bug and also the fix. How should I report it?

Include both in the report: the standard reproduction case describing the bug, and then a separate section describing the fix you found. If you are comfortable with GitHub pull requests, the most effective path is a PR with the fix plus the bug report as the PR description. If you are not comfortable with PRs, a GitHub issue with the fix described in detail is also useful. The reproduction case still matters even when you have a fix: the maintainers need to verify your fix on a clean reproduction before merging it.

Go deeper

DebuggingOpenClaw tool calls are failing silentlyHow to surface tool failures the agent swallows and prevent them from recurring.DebuggingMy OpenClaw agent is looping on the same task over and overFour loop types, how to stop each one, and loop protection rules for prevention.RecoveryHow to roll back an OpenClaw config change when you can’t get back inManual config recovery from SSH when the gateway won’t start.