Your agent was working fine and now it keeps going offline and coming back, over and over, without you doing anything. This article walks you through exactly how to find out why, in order of what causes it most often. Paste the first diagnostic command into your OpenClaw and the answer is in the first response.
Before You Start
- Your OpenClaw agent is currently reachable (periodically between restarts)
- You have SSH access to the server as a fallback if the agent is down when you need it
- You have not made any recent changes to openclaw.json. If you have, start with the config rollback article linked at the bottom
TL;DR
Gateway restarts have four common causes: the server ran out of memory and killed the process, a config error is causing it to crash on startup, a plugin is crashing on load, or the disk is full. The journal log tells you which one. Paste the diagnostic blockquote below and read what comes back before doing anything else.
Time to diagnose: 5 minutes
Jump to what you need
- Agent goes down silently with no error? Out-of-memory kill
- Crash happens immediately on startup? Config error on startup
- Restarts over and over in a tight loop? Crash loop and start-limit
- Started after installing or updating a plugin? Plugin crash on load
- Disk warnings or write errors in the journal? Full disk
- Not sure what type of restart you have? Identify the pattern first
- Not sure? Start here: Read the journal first
Read the journal first: the answer is almost always there
When OpenClaw restarts, systemd (the service manager on Linux) records the reason. That record is called the journal. Most operators never look at it, which is why restart loops feel mysterious. The journal is not mysterious. It tells you exactly what happened in the last 30 seconds before the service stopped, and what the process exited with.
The single most important thing you can do when your gateway keeps restarting is read the last 50 lines of the journal. Everything else in this article is about interpreting what you find there.
Show me the last 50 lines of the openclaw gateway service journal with timestamps. Then tell me: how many times has the service restarted in the last hour, and what is the last exit code or signal it exited with? Also tell me the current status of the service right now.
Your agent will show you the journal output and interpret it. Here is what to look for in the response:
What the journal tells you
- “Killed” with no other message: the kernel killed the process. This is an out-of-memory (OOM) kill. Go to the OOM section.
- A specific error about JSON, config, or a missing field: the config file has a problem. Go to the config crash section.
- The service restarting 5+ times in under 2 minutes: you are in a crash loop. Go to the crash loop section.
- An error naming a specific plugin: a plugin is crashing on load. Go to the plugin crash section.
- “No space left on device” or write errors: disk is full. Go to the disk full section.
- Nothing: journal shows clean startup and then silence: the service started and then stopped without logging. This is a Node.js heap exhaustion issue. Go to the heap exhaustion section.
If your agent is not currently reachable, SSH into the server and run: journalctl -u openclaw -n 50 --no-pager. Read the output yourself using the same key above.
Out-of-memory kill: the most common cause
The Linux kernel has a last-resort mechanism called the OOM (out-of-memory) killer. When the server runs completely out of RAM, the kernel picks a process to kill to free up memory. OpenClaw is frequently that process. The OOM killer does not produce a helpful error message. It just kills the process and writes “Killed” to the journal. Systemd then restarts the gateway because it is configured to do so, and the cycle repeats.
The reason this looks like a “restart” rather than a crash is exactly that: systemd is doing its job by restarting the dead process. The real event was a kill, not a crash.
The pattern in the journal: Main process exited, code=killed, status=9/KILL or just the word “Killed” with no stack trace or error message before it.
Check whether this server has had any OOM kills today. Look in the kernel log (dmesg) for any “Out of memory” or “oom-kill” events. Show me the full lines including which process was killed and when. Also show me current RAM usage, swap usage, and whether swap is enabled at all.
If dmesg shows OOM events targeting the openclaw process, the fix is not to restart the service more carefully. The fix is to address the memory pressure so the kill stops happening.
Fix: add swap
Most cheap VPS instances ship with no swap. Swap is disk space that Linux uses as emergency overflow when RAM fills up. Without it, the OOM killer fires the moment RAM is exhausted. Adding 2GB of swap gives the server a buffer that prevents OOM kills in all but the most extreme cases.
Check whether swap is enabled on this server. If it is not, create a 2GB swap file, enable it immediately, and add it to /etc/fstab so it persists across reboots. Show me each step before running it, then confirm the final swap status.
Manual fallback (SSH)
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Add to /etc/fstab: /swapfile none swap sw 0 0
If fallocate fails: use sudo dd if=/dev/zero of=/swapfile bs=1M count=2048 instead.
Verify: free -h should show Swap: 2.0Gi
If fallocate fails with “Operation not supported”
Some filesystems (ZFS and btrfs are specialist disk formats; certain cloud block storage volumes also fall into this category) do not support fallocate. Use dd instead: sudo dd if=/dev/zero of=/swapfile bs=1M count=2048. Then continue with the remaining steps as written.
Fix: reduce memory pressure from OpenClaw
If swap alone doesn’t stop the OOM kills, the server is under sustained memory pressure, not just occasional spikes. The three levers for reducing OpenClaw’s memory footprint are: reducing the context window size, using a lighter model for compaction, and disabling memory-heavy plugins you are not actively using.
Read my openclaw.json. What is the current context window size? What model is being used for compaction? Which plugins are enabled? For each of these, tell me what the memory impact is and what I could change to reduce RAM usage on a server with 1GB of RAM.
If you are running Ollama on the same server
Ollama keeps loaded models in RAM. A 7B parameter model uses 4 to 6GB of RAM. On a server with 1GB or 2GB total, running Ollama alongside OpenClaw will cause constant OOM kills regardless of how much swap you add. Move Ollama to a separate server or switch to API models if your VPS has less than 6GB of RAM.
Config error on startup: the second most common cause
When OpenClaw starts, it reads openclaw.json. If that file has a problem, the gateway process exits immediately with an error. Systemd restarts it. It reads the same broken config. It exits again. This cycle looks exactly like a “restart loop” but the cause is entirely different from OOM: the service is not being killed by the kernel, it is failing to start cleanly.
Config errors in the journal look like: JSON parse errors (“Unexpected token”), missing required fields (“gateway.bind is required”), or type errors (“Expected string, got number”). These are always visible in the last lines before the service exits.
Show me the last 100 lines of the openclaw service journal. I am looking for any error messages that appear right before the service exits. Specifically, look for JSON parse errors, missing config fields, or type validation errors. Copy the exact error text.
If the journal shows a config error, the fix is to correct the config. The fastest path is to ask your agent to read and validate the config file, then tell you exactly what is wrong.
Read my openclaw.json and validate it. Is it valid JSON? Are there any fields with wrong types or missing required values? Tell me exactly what is wrong and what the correct value should be.
WRITE, TEST, THEN IMPLEMENT
If your agent cannot start because the config is broken, you will need to fix the config via SSH. Do not edit the file and restart without first verifying the fix is correct. An invalid config edit will just keep the restart loop going. The rollback article (linked at the bottom) covers exactly how to recover from a broken config when the agent is not reachable.
Manual fallback (SSH)
Validate the config file without restarting: node -e "JSON.parse(require('fs').readFileSync(process.env.HOME+'/.openclaw/openclaw.json','utf8'))" && echo "Valid JSON"
If it prints a SyntaxError, the file has a JSON formatting problem. Find the line number in the error and fix it. If it prints “Valid JSON” but OpenClaw still won’t start, the problem is a field value, not the JSON syntax. Check the journal for the specific field name in the error.
OpenClaw gateway restart loop and start-limit: when systemd stops retrying
Systemd has a built-in protection against infinite restart loops. If the OpenClaw service crashes more than a set number of times within a short window (on Ubuntu 22.04 and 24.04, the default is 5 crashes within 10 seconds, but this varies by distribution and service unit configuration), systemd stops trying to restart it and marks the service as “failed.” You will see “failed (Result: start-limit-hit)” in the service status.
This is actually systemd protecting you: a service that crashes immediately every time it starts has a problem that restarting will not fix. The crash loop is a symptom. The root cause is either a config error (above) or a plugin crash (below).
Check the current status of the openclaw service. Has it hit the systemd start-limit? Show me the full status output including any “Result:” line. Then show me the last 20 lines of the journal so I can see the crash reason.
If the service has hit the start limit, the status will say “failed (Result: start-limit-hit)”. The service will not restart again on its own until you manually reset it. This is intentional: systemd is waiting for you to fix the underlying problem before trying again.
Step 1: Find the crash reason before resetting
Resetting the service without fixing the crash reason just starts the loop again. Always read the journal for the specific error first.
Show me the last 50 lines of the openclaw service journal. I need to see what error appears right before each crash. Is it the same error every time, or different errors?
Same error every time: a specific, reproducible problem. Fix that thing. Different errors every time: likely memory pressure or a race condition. OOM is the likely cause.
Step 2: Reset the service after fixing the root cause
SSH only , agent is not reachable when service is in start-limit-hit state
After fixing the root cause:
sudo systemctl reset-failed openclaw
sudo systemctl start openclaw
Verify: systemctl status openclaw should show “active (running)”
If it fails again immediately, read the journal again: journalctl -u openclaw -n 20 --no-pager
Adjusting the start limit
If your service keeps hitting the start limit due to legitimate transient failures (network not up on boot, database taking time to start), you can raise the limit. This is different from hiding a real crash. Ask your agent to check whether the restart pattern is transient or consistent before adjusting. Raising the limit on a service with a real crash problem means the loop runs longer before stopping, which wastes time and can fill logs faster.
Plugin crash on load: what it looks like and how to isolate it
OpenClaw loads all enabled plugins during startup. If a plugin throws an unhandled error during its initialization, the gateway process exits. Systemd restarts it. The plugin loads again, crashes again. This is a crash loop with a specific pattern: the error in the journal names the plugin.
Common causes: a plugin was installed for a version of OpenClaw you are not running, a plugin has a native dependency that failed to install correctly (especially common on ARM servers), or a plugin requires a config field that is missing from your openclaw.json.
Show me the last 50 lines of the openclaw service journal. I am specifically looking for any error that names a plugin, mentions a failed require, an unhandled exception during startup, or a module not found error. If you see any, tell me exactly which plugin and what the error is.
If the journal points to a specific plugin, the fastest fix is to disable that plugin and restart. Once the service is stable, you can investigate whether the plugin can be fixed or needs to be reinstalled.
Read my openclaw.json. Show me the list of all enabled plugins. I want to temporarily disable [plugin name] by setting its enabled field to false. Show me the change before making it, then make it and restart the gateway service.
WRITE, TEST, THEN IMPLEMENT
Disabling a plugin is reversible but it will remove that plugin’s functionality immediately. If the plugin is your memory system or your Discord integration, disabling it means those features stop working until the plugin is fixed. Confirm which plugin you are disabling and what it does before proceeding.
If the service is in start-limit-hit (agent not reachable)
You cannot ask the agent to disable the plugin if it cannot start. SSH in and edit the config directly:
nano ~/.openclaw/openclaw.json
Find the plugin entry with "enabled": true and change it to "enabled": false.
Save the file, then reset and restart:
sudo systemctl reset-failed openclaw && sudo systemctl start openclaw
If you do not know which plugin is causing it
If the journal does not name a specific plugin but you suspect a plugin is involved (the crashes started right after you installed or updated one), you can isolate the problem by disabling plugins one at a time in reverse installation order, or by disabling all non-essential plugins at once and re-enabling them one by one until the crash reappears.
Read my openclaw.json. List all enabled plugins in the order they appear in the config. I want to know: which ones are essential (built-in or core functionality) and which ones are third-party? Then disable all third-party plugins and restart. Once stable, I will tell you which ones to re-enable one at a time.
Full disk: the cause that looks nothing like a disk problem
When a disk fills up completely, OpenClaw cannot write session data, LCM database entries, or log files. The process does not exit cleanly. Writes fail with “No space left on device” errors. Depending on which write fails and when, this can cause the gateway to crash in ways that look completely unrelated to disk space: database corruption errors, failed state writes, or the process simply refusing to start.
The pattern: journal shows write errors, or the service starts and then stops immediately with a database-related error. The journal itself is empty when there was no disk space to write the log entry.
Check the current disk usage on this server. Show me the usage percentage for each mounted filesystem. Then check the openclaw workspace directory, the ~/.openclaw directory, and any log directories for the largest files and directories. If any filesystem is above 90% full, tell me what is taking up the most space.
If disk usage is above 90%, you need to free space before the gateway will run stably. The most common culprits are session archive files that were never pruned, log files that were never rotated, and LCM database files that grew beyond their expected size.
Find the 10 largest files and directories in the openclaw workspace, the ~/.openclaw directory, and any log directories. Sort by size, largest first. For each one, tell me whether it is safe to delete or archive, and what I would lose if I removed it.
Set up log rotation to prevent this recurring
Once you have freed space, set up log rotation so this does not happen again. Ask your agent to configure logrotate for the openclaw log directories, rotating daily and keeping 7 days of logs. The $5 VPS article (linked at the bottom) covers the exact setup.
Node.js heap exhaustion: the silent version
Node.js manages its own memory in a region called the heap (the pool of memory the process uses to store active data). When the heap grows faster than the garbage collector can reclaim it, Node.js eventually runs out of memory within the process itself and crashes. This is different from an OOM kill: the Linux kernel did not kill the process. The process killed itself.
The journal pattern for heap exhaustion: the process exits with code 1 or “SIGABRT”, and the journal line before the exit says something like “FATAL ERROR: Reached heap limit Allocation failed” or “JavaScript heap out of memory.” There is no visible error and the service stops with exit code 1.
Show me the last 100 lines of the openclaw service journal. I am looking for any mention of heap, memory allocation failure, SIGABRT, or exit code 1 without a clear error message. Also check whether Node.js has a max-old-space-size flag set anywhere in the openclaw startup config or environment.
If heap exhaustion is the cause and you do not fix it, the gateway will keep restarting on the same schedule: every time the session runs long enough to fill the heap. The fix is to set an explicit memory limit on the Node.js process so it garbage collects aggressively before the heap grows out of control. This does not reduce what the agent can do. It just makes it clean up memory more often.
Check how OpenClaw is started on this server by looking at the ExecStart line in the systemd service file. Then tell me: is a max-old-space-size flag set? What is the total RAM on this server? Based on both, what should the max-old-space-size be set to, and how do I set it for the way OpenClaw is actually being launched here?
Manual fallback (SSH)
Check how it starts: systemctl cat openclaw | grep ExecStart
If it launches via the openclaw CLI, add to the service’s [Service] section:
Environment="NODE_OPTIONS=--max-old-space-size=512" (adjust value for your RAM)
Then: sudo systemctl daemon-reload && sudo systemctl restart openclaw
Less common causes worth checking
OpenClaw was updated mid-session
If you or a cron job ran an update while the gateway was active, the process may have been stopped and restarted as part of the update. This is a single restart, not a loop. Check whether an update ran around the time the restarts started.
Check the openclaw service journal for any entries that mention “update”, “restart”, or “reload” in the last 24 hours. Also check whether any cron jobs are scheduled to run updates or restart the service. Show me the relevant lines.
An external monitor or watchdog is triggering restarts
Some server setups include external monitoring tools (UptimeRobot, Monit, custom health-check scripts) configured to restart services when health checks fail. If your gateway restarts happen on a regular schedule rather than randomly, this is likely the cause.
Are there any cron jobs, systemd timers, or monitoring scripts on this server that check the openclaw service health and restart it if it fails? Also check whether Monit or any similar process supervisor is installed and watching openclaw. List anything that could be triggering external restarts.
The gateway port is in use
If another process is using the port that OpenClaw needs to bind to, the gateway process starts, fails to bind, and exits immediately. The journal error for this is “EADDRINUSE” (address already in use). If you see that in the journal, this is your cause. This happens after server reboots when another service starts on the same port, or after a previous OpenClaw instance did not shut down cleanly and left the port bound.
Read my openclaw.json and find the gateway port. Then check whether anything else on this server is currently listening on that port. If something else is using it, tell me what it is and what my options are.
After the OpenClaw gateway crash fix: confirm the service is stable
Once you have addressed the root cause, the gateway should stay up. Run this check to confirm the service is running cleanly and the journal shows no new errors.
Check the openclaw service status. Tell me: is it currently active and running? How long has it been up since the last restart? Show me the last 20 journal lines to confirm there are no new errors. Then check current RAM usage and disk usage to make sure there is no ongoing pressure that will cause another restart.
If the service has been up for 10 or more minutes with no restarts and the journal is clean, the problem is resolved. If it restarts again within that window, the root cause was not fully addressed.
What stable looks like:
- Service status: active (running)
- Uptime since last restart: more than 10 minutes and growing
- Journal: no errors after the last successful start
- RAM: not at 100% (some headroom available)
- Disk: below 80% usage
- Restart count in the last hour: 0 or 1 (the recovery restart)
How to tell which kind of restart you have
Not all gateway restarts look the same in the journal, and the difference matters. There are three patterns. Knowing which one you have tells you where to look.
Pattern 1: Restarts infrequently, agent comes back within a few seconds
The journal shows the service stopping and starting with short gaps. No error message visible before the stop. This is almost always an OOM kill. The kernel killed the process silently. Systemd brought it back. The OOM event is in the kernel log, not the service journal.
Where to look: OOM section and kernel log (dmesg)
Pattern 2: OpenClaw service keeps stopping and restarting in a tight loop
The journal shows 5 or more stop/start cycles in under 2 minutes, then nothing. The service is in “failed (start-limit-hit)” state. There is a consistent error message right before each stop. This is a crash loop caused by a config error or plugin crash.
Where to look: Crash loop section, then config crash or plugin crash
Pattern 3: Restarts on a regular schedule or after a specific trigger
The restart happens at predictable intervals (every few hours, every morning) or right after a specific action (starting a long task, running a pipeline). This points to a scheduled trigger, a memory limit that fills up over time, or task-specific memory pressure.
Where to look: Other causes section (external monitor, memory limit)
If you are not sure which pattern you have, the diagnostic blockquote at the top of this article is the right starting point. It reads the journal and tells you which pattern you are in.
Preventing restarts before they happen
Most gateway restarts are preventable. The causes are predictable: low RAM, missing swap, unvalidated config changes, untested plugins. The only reason they surprise operators is that nobody checks the warning signs until after the first crash.
Three things prevent the majority of restart events:
Swap before you need it. Adding 2GB of swap costs nothing and prevents the most common cause of gateway restarts on cheap servers. Do it now, not after the next OOM kill.
Validate config before restarting. Any time you edit openclaw.json, ask your agent to validate the file before you apply the change. A one-line validation check takes 5 seconds. A restart loop caused by a bad JSON edit can take 30 minutes to diagnose if you do not know what to look for.
Test plugins on a stable service. Never install a third-party plugin on a server that is already under memory pressure or that has not been running stably for at least 24 hours. If the service was marginal before the plugin, the plugin will push it over.
Run a prevention check on my current OpenClaw setup. Tell me: (1) is swap enabled and what size, (2) how many times has the gateway restarted in the last 7 days, (3) are there any config fields with values that are known to cause instability, (4) are there any third-party plugins enabled that could be causing memory pressure? Give me a simple pass/fail for each item and what to do if any fail.
Stop putting out fires. Harden the setup.
Brand New Claw: $37
The complete checklist for making OpenClaw production-stable on any server: swap, systemd hardening, OOM protection, log rotation, and the monitoring setup that catches problems before they become crashes. Everything in one place, formatted to paste directly into your agent.
Questions people actually ask about this
The journal shows “Killed” but I have swap. Why is it still getting OOM killed?
Swap is not a guarantee against OOM kills. It is a buffer. If memory consumption grows faster than swap can be used (high memory spike, slow disk for swap I/O), the OOM killer can still fire. Also check: is Ollama running on the same server? Ollama’s loaded models sit in RAM and are not swappable in the same way. Check whether the OOM kill targets openclaw specifically or another process that then causes openclaw to fail indirectly.
Show me all OOM kill events in the kernel log from the last 24 hours. For each one, tell me which process was killed and what the total memory usage was at the time. Also tell me whether Ollama is running and how much RAM the currently loaded models are using.
The gateway restarts exactly once every few hours. What causes that pattern?
Regular-interval restarts point to a scheduled trigger, not a crash. Something is restarting the service on a schedule. Check cron jobs, systemd timers, and any external monitoring tools. Also check whether a memory limit (either from systemd or Node.js) is being hit at a predictable rate as a long-running session accumulates memory. The interval often corresponds to how long it takes the session to grow to the memory limit.
Check all cron jobs and systemd timers on this server that touch the openclaw service. Also check whether there is a MemoryMax or MemoryLimit setting in the systemd service unit. Show me the interval between the last three service restarts from the journal timestamps.
My agent is responsive but I keep seeing “gateway restarted” notifications. Is that a problem?
One restart, especially after a server reboot or a config change, is normal. Repeated restarts that happen while the agent is otherwise working fine are worth investigating even if the disruption seems minor. Each restart drops any in-progress task and resets the session state. If they happen often enough, tasks will never complete. Check the restart count over the last 24 hours and read the journal to confirm the restarts are not escalating.
How many times has the openclaw service restarted in the last 24 hours? Show me the timestamps of each restart from the journal. Is the frequency increasing, staying the same, or decreasing?
The service restarts right after I start a specific type of task. What does that mean?
Task-triggered restarts are almost always memory-related. Certain tasks, like deep research with many web searches, long pipeline runs, or context compaction on a large session, spike memory usage significantly. If the restart happens consistently during those tasks on a low-RAM server, the server cannot handle the load. The fix is either more swap, a smaller context window, or running that class of task on a more capable machine.
Check the kernel log for OOM events in the last hour. Also show me the current RAM and swap usage. If memory is above 80%, tell me what the largest memory consumers are right now and what I can do to reduce them immediately.
I cannot connect to my agent at all and I am not sure if it is a restart loop or something else. How do I tell?
SSH into the server and run two commands. First: systemctl status openclaw. This tells you whether the service is running, failed, or in a restart loop. Second: journalctl -u openclaw -n 20 --no-pager. This shows the last 20 journal lines. If the service is active (running), the issue is network/firewall, not a restart loop. If it shows “failed” or repeated “Started/Stopped” entries, that is your restart loop. The journal lines tell you why.
The restarts stopped on their own. Should I still investigate?
Yes. Restarts that stop on their own mean the root cause went away temporarily, not that it was fixed. OOM-triggered restarts stabilize if memory pressure drops after the session is reset. Disk-full restarts pause if a log rotation happened to run. Neither of these is a resolution. Find the cause while the system is stable so you can address it before it happens again.
Run a health check: show me current RAM usage, swap usage, disk usage, the number of times the openclaw service has restarted in the last 24 hours, and the last 10 journal lines. I want to confirm the system is actually healthy, not just temporarily quiet.
