OpenClaw Telegram responses slow down for fixable reasons: polling mode, model choice, context size, and compaction timing. This article covers each cause in order of impact, with paste-in prompts for each fix. If you want to speed up openclaw telegram responses fast, start at Step 1.
TL;DR
The fastest single win for slow Telegram responses is switching from polling mode to webhook mode, which cuts the delivery delay from 1 to 3 seconds down to under half a second. If your agent is already on webhooks and still slow, the next fix is model routing: putting a faster local or cheaper API model on Telegram responses instead of the full-capability model you use for complex tasks.
Where the delay actually comes from
A Telegram response from your OpenClaw agent involves several steps happening in sequence. Each step adds time, and slowness can come from any of them. Understanding the full chain helps you isolate exactly which part is your bottleneck before changing anything.
The chain looks like this (openclaw slow response time always traces back to one of these steps): Telegram receives your message, then either pushes it to your agent (webhook mode) or waits for your agent to ask for it (polling mode). OpenClaw receives the message, builds the full context window from your system prompt, injected memories, and conversation history, passes all of that to the model, the model generates a response, OpenClaw receives the completed response, then sends it back to Telegram. Telegram delivers it to your chat. Each handoff in that sequence adds time, and removing even one unnecessary delay can change the feel from sluggish to responsive.
Each of those steps has a different latency profile:
- Polling delay: If you are in polling mode, your agent checks for new messages on a fixed interval. If the interval is 2 seconds and you send a message immediately after a poll, you wait up to 2 seconds before your agent even sees it. This is pure idle delay, nothing to do with your model.
- Model inference time: The time the model takes to generate a response. This varies from under 1 second for small local models to 10 to 30 seconds for large cloud models with long context. This is usually the biggest variable.
- Context assembly time: Before calling the model, OpenClaw assembles the context: system prompt, injected memories, conversation history. If this context is large, the assembly and tokenization adds time before the model call even starts.
- Compaction pause: If context compaction runs mid-conversation, it adds a several-second pause while the compaction model processes the summary. You see this as an unusually slow response on specific turns.
- Network round-trip: The time for OpenClaw to reach your model provider’s API and receive the response. For local Ollama models, this is negligible. For cloud APIs, it varies by geographic distance and provider load.
My Telegram responses are slow and I want to diagnose which part of the chain is causing the delay. Tell me: what mode is the Telegram plugin using (polling or webhook)? What is the configured polling interval if polling is active? What is the primary model I am using for Telegram responses, and is there a separate model configured for that channel? How large is my current context window setting, and roughly how many tokens are in my injected context at the start of a conversation? Are there any compaction events in my recent session logs?
Step 1: Switch from polling to webhook mode
Polling mode and webhook mode are two ways OpenClaw can receive messages from Telegram. Polling means your agent repeatedly asks Telegram “any new messages?” on a timer. Webhook mode means Telegram pushes messages directly to your agent the moment they arrive.
Polling adds a waiting period before your agent sees any message: the time between when you send it and when your agent’s next poll fires. With a 2-second polling interval, the average wait before your agent even starts processing is 1 second, and the worst case is 2 seconds. This delay compounds with model inference time. A 2-second model response already feels slow at 2 seconds. Add a 2-second polling wait and it feels like 4. Polling delay is invisible in the logs, which is why it is often overlooked. The logs show when the message arrived at the agent, not when you sent it. The gap between those two timestamps is the polling delay, and the logs do not surface it as a problem because from the agent’s perspective, the message arrived and was handled promptly.
Webhook mode eliminates this entirely. Telegram pushes the message to your agent in under 100 milliseconds. Your agent starts processing immediately. If polling delay is your bottleneck, switching to webhooks is the single biggest performance improvement available.
Check whether my Telegram plugin is configured for polling or webhook mode. If it is polling, I want to switch to webhook mode. Tell me what I need to configure: the webhook URL format for my setup, whether I need a publicly reachable domain or IP, and what the setWebhook Telegram API call looks like for my bot token. Walk me through the steps to make the switch without breaking my current integration.
Webhook mode requires a publicly reachable HTTPS URL
Telegram must be able to reach your OpenClaw gateway from the internet to deliver webhook messages. This requires either a public IP with port 443 open, or a reverse proxy (Caddy or Nginx) in front of your OpenClaw instance that handles HTTPS. If your OpenClaw runs on a home network or behind NAT without a static public IP, polling is the only viable mode unless you set up a tunnel service. In that case, skip to Step 2 below for the fastest model-side improvements.
Manual path: reducing polling interval if you cannot switch to webhooks
If webhook mode is not feasible for your setup, you can reduce the polling interval. In openclaw.json, find the Telegram plugin config and look for a pollingInterval or interval field. The exact field name varies by plugin version, so ask your agent to find it. Setting it to 500 milliseconds instead of the default 2000 milliseconds cuts average polling wait from 1 second to 250 milliseconds. Be aware this increases API calls to Telegram’s servers: at 500ms interval, your agent makes 120 requests per minute to Telegram’s getUpdates endpoint. Telegram does not document a hard minimum for getUpdates polling frequency, but in practice values below 300 milliseconds tend to produce inconsistent delivery behavior. Staying at 500 milliseconds or above is the reliable practical floor for polling setups.
Step 2: Route Telegram to a faster model
Model routing is the practice of assigning different models to different tasks or channels. Your primary model, the one you use for complex research or tool-heavy tasks, is probably optimized for capability rather than speed. Routing Telegram chat specifically to a faster model makes conversational responses feel instant while keeping your primary model available for tasks that need it.
As of March 2026, the fastest response times for Telegram chat come from two categories of model: small local models running on Ollama (sub-second for short responses on modern hardware), and fast cloud API models with low time-to-first-token. The tradeoff is capability: a smaller, faster model handles conversational responses well but may struggle with complex multi-step tasks. The fix is routing: use the fast model for Telegram by default, and let the agent escalate to the full model when a task is clearly complex.
I want to set a faster model specifically for my Telegram channel to reduce response times. Check whether my openclaw.json supports channel-specific model overrides. If it does, configure the Telegram channel to use my fastest available model. Tell me what models I currently have configured, which ones are the fastest for short conversational responses, and what the config change looks like. If I have Ollama installed with a small model available, include that as an option.
If you have Ollama installed on your server, a model like llama3.1:8b or phi4:latest running locally will respond in 1 to 3 seconds for most short Telegram messages, with no API cost and no network round-trip to an external provider. For longer or more complex requests, your agent can be configured to escalate to a more capable model automatically.
Testing model response time before committing
Before setting a new model as your Telegram default, test its response time for the kind of messages you actually send. Ask your agent to run three short conversational prompts using the candidate model and report how long each one took. A model that responds in 2 seconds for a simple greeting may take 45 seconds for a task that requires tool use. Know your actual workload before optimizing.
Step 3: Trim context window bloat
The context window is the total amount of text OpenClaw sends to the model with every request. It includes your system prompt, any injected memories, the current conversation history, and any workspace files or skill instructions that are loaded automatically. The larger this context, the longer every model call takes, even for a simple “what time is it?” question.
Context bloat is a common cause of slow Telegram responses that operators do not immediately identify as a context problem. The agent feels fast for the first few exchanges in a conversation, then slows down noticeably as the conversation history grows and fills the context. This is the context size increasing with each turn. The model does not care whether a token comes from your system prompt, an old message from three days ago, or the current message: it processes all of them on every request. A 50,000 token context takes the same processing time whether the current message needed 40 tokens of that context or all of it.
Check my current context window configuration and tell me how large the injected context is at the start of a fresh conversation. Break it down by section: system prompt tokens, injected memories tokens, workspace files tokens, skill instructions tokens, and conversation history. Tell me which sections are the largest, and for each section over 2000 tokens, tell me whether its size is necessary or whether it could be reduced without affecting core functionality.
The most common sources of excessive context are:
- Auto-recalled memories: If your memory plugin is injecting a large block of recalled memories on every turn, this adds tokens to every model call. For Telegram conversations, most recalled memories are irrelevant. Consider whether auto-recall is set to inject more context than your Telegram use cases actually need.
- Large SOUL.md or AGENTS.md files: Workspace files loaded automatically as context add their full character count to every context window. A 10,000 character AGENTS.md adds roughly 2,500 tokens to every request. If these files contain detail that is relevant for complex tasks but not for quick Telegram responses, consider whether they need to be loaded on every turn.
- Loaded skill instructions: Skills with large SKILL.md files add their content to the context if they are active. A skill loaded and unused costs the same tokens as one that is actively needed on every turn.
- Long conversation history: As a Telegram conversation grows, the history grows with it. Compaction is designed to trim this, but if compaction is configured conservatively, history accumulates until compaction fires, causing a large context for many turns and then a compaction pause.
I want to reduce the context size sent with each Telegram message to speed up responses. Without affecting core functionality, what are the top three changes I can make to reduce tokens in the injected context? For each change, tell me how many tokens it would save, what the trade-off is, and what config change or file edit would implement it.
Step 4: Fix compaction lag
Compaction is the process OpenClaw uses to summarize and trim conversation history when the context window gets full. The compaction process runs a separate model call to produce the summary. During that call, the user-facing response is delayed. If you notice that most Telegram responses are fast but specific turns are unusually slow (10 to 30 seconds instead of 2 to 3 seconds), compaction firing mid-conversation is the likely cause.
Check my compaction configuration. Tell me: at what context percentage or token count does compaction fire, what model is currently configured for compaction, and what model would be fastest for compaction without producing significantly worse summaries. Also check whether compaction has been firing frequently in my recent Telegram sessions by looking at the session logs.
Two config changes reduce compaction lag. First, routing compaction to a faster model. Compaction summaries do not need to be written by your primary model. A smaller, cheaper model produces acceptable summaries faster and at lower cost. As of March 2026, phi4:latest running locally on Ollama is a strong option for compaction: free, fast, and produces coherent summaries for typical OpenClaw conversation histories. The compaction summary does not need to be perfect: it needs to preserve the key facts from the conversation so the agent can continue functioning. A local model running in 5 to 8 seconds for compaction is substantially better than a cloud model that takes 25 to 40 seconds, even if the local model’s prose quality is lower.
Second, increasing the compaction threshold. If compaction fires too early (at 60% context usage, for example), it runs more frequently than necessary. Setting it to fire at 80% reduces how often compaction interrupts a conversation, at the cost of a slightly larger context on late turns before compaction fires.
Update my compaction config to use the fastest available model for summaries, preferring Ollama local models if available. Set the compaction threshold to 80% if it is currently lower than that. After making these changes, restart the gateway and confirm the new settings are active.
After any compaction config change, start a fresh session to verify
OpenClaw caches context window and compaction settings in the session entry when a session first starts. Config changes to compaction settings do not apply to sessions that are already running. After changing the compaction model or threshold, start a new conversation in Telegram rather than continuing an existing one. The new settings will apply from the first message in the new session.
Step 5: Tune Ollama performance for local models
Ollama is a tool that runs AI language models on your own hardware, eliminating API costs and external network delays. If you are using Ollama models for Telegram responses and finding them slow, the bottleneck is usually hardware resources or model configuration rather than anything in OpenClaw itself.
I am using Ollama for Telegram responses and they are slow. Run a quick performance check: what is my server’s available RAM, CPU count, and whether a GPU is available to Ollama. Then check which Ollama model I am using for Telegram, how large it is, and whether Ollama is configured to keep models loaded in memory between requests. If the model is being unloaded between requests, tell me how to set OLLAMA_KEEP_ALIVE so it stays loaded.
The single biggest Ollama performance factor for Telegram use is whether the model stays loaded in memory between requests. By default, Ollama unloads models after 5 minutes of inactivity. Loading a model from disk takes 10 to 30 seconds depending on model size and disk speed. If your Telegram conversations are spread out, every new message may wait for a model load. Setting OLLAMA_KEEP_ALIVE=-1 keeps models loaded indefinitely, eliminating cold-start delays. On Linux and macOS, this environment variable is set in the shell profile or systemd service file before starting Ollama. On Windows, set it in System Properties under Environment Variables before restarting the Ollama service.
The second factor is RAM. A model that fits entirely in RAM runs significantly faster than one that spills to disk. An 8B parameter model quantized to Q4_K_M requires roughly 5GB of RAM. If your server has less than that free when Ollama is running, the model is being paged to disk during inference and responses will be slow regardless of other settings. You can check how much RAM Ollama is actively using by asking your agent to run a memory usage check while a model is loaded. If you are running multiple Ollama models simultaneously, the combined RAM usage may be the source of the slowdown. Ollama loads each model independently, so running two models uses twice the RAM of running one.
Model size vs response speed trade-offs as of March 2026
On a VPS with 4 CPU cores and 8GB RAM (no GPU), a realistic response time benchmark for Telegram-length messages: llama3.1:8b Q4_K_M produces responses in 3 to 8 seconds. phi4:latest (14B, Q4_K_M) produces responses in 8 to 20 seconds. qwen2.5:3b (if available) produces responses in 1 to 3 seconds. For purely conversational Telegram use, smaller and faster beats larger and capable.
Step 6: Measure and verify the improvement
After making changes, verify the improvement is real before considering the issue resolved. Response time perception is subjective. A change that reduced polling lag by 1 second may feel significant or insignificant depending on what the model inference time is. The goal is a full round-trip time (message sent to response received in Telegram) that feels acceptable for your use case, which is typically under 5 seconds for conversational exchanges.
I have made changes to speed up my Telegram responses and I want to measure whether they worked. Send three short test responses via Telegram and log the time from message received to response sent for each one. Also tell me the current polling interval or whether webhooks are active, what model is handling Telegram responses, and the approximate context size at the start of a fresh conversation. Compare this to any baseline you have from before the changes.
If the round-trip time is still above your target after working through all the steps above, there are two remaining options. The first is accepting the trade-off and setting user expectations by updating your Telegram bot description to indicate response times. The second is hardware: response time below 2 seconds for most queries reliably requires either a fast cloud API model (which costs money per request) or GPU-accelerated local inference. A GPU-enabled server or a machine with a consumer GPU (RTX 3080 or better) can run llama3.1:8b at tokens-per-second rates that produce responses in under 2 seconds for most Telegram-length messages. If you are already on a VPS with CPU-only inference and need faster responses, a fast cloud API model is the more cost-effective upgrade path compared to renting GPU compute full-time for a conversational workload. For high-volume Telegram bots where cost per message matters, DeepSeek V3 via the DeepSeek API offers near-GPT-4-class quality at a fraction of the cost, with competitive latency for interactive chat as of March 2026.
FAQ
My Telegram responses are fast for simple questions but slow for anything complex. Is that a different problem?
No, that is the expected behavior when the model is the bottleneck. Longer and more complex responses take more inference time because the model is generating more tokens. A simple “yes” takes a fraction of the time of a 500-word explanation. This is working as intended. If complex responses are unacceptably slow, the options are: route complex tasks to a cloud API model with better throughput, increase server RAM so a larger local model can run with better efficiency, or add a typed indicator to your Telegram bot so users see that the agent is working while a longer response generates.
Telegram shows the bot as “typing” but the response never arrives. Is that a speed problem?
That is not a speed problem: it is a failed response. Something in the processing chain completed enough to send the typing indicator but failed before the response was written. The most common causes are a model call that timed out, a context that exceeded the model’s maximum length, or a tool call that failed and caused the response generation to abort. Check your gateway logs for errors in the relevant session. Look for timeout errors, context length exceeded errors, or tool call failures that coincide with the timing of the stuck response. OpenClaw sends the typing indicator early in the processing pipeline, before the model call completes. If the model call or a subsequent step fails, the typing indicator has already been sent and cannot be retracted, leaving the user seeing “typing” indefinitely. If this happens consistently on specific kinds of messages, the failure is deterministic and reproducible. Paste one of those messages into your OpenClaw while watching the logs in real time to capture the exact error.
Can I use a different model for Telegram than for Discord or other channels?
Yes, if your OpenClaw version supports channel-specific model overrides in the agent config. As of March 2026, channel-specific routing can be configured under the agent defaults or via a per-session model override in the channel plugin config. Ask your agent to check whether your current config version supports this and show you the exact config path. If channel-level routing is not supported in your version, the alternative is running two separate OpenClaw instances, each with a different primary model, connected to different channel integrations. Two instances is more infrastructure to manage but is completely reliable: each instance has its own config, its own model, and its own channel connection. There is no risk of a routing rule misclassifying a message and using the wrong model. For operators who want predictable speed on Telegram and do not want to depend on routing logic, two instances is a legitimate production setup.
Switching to webhooks broke my Telegram integration entirely. How do I roll back to polling?
To revert to polling: call Telegram’s deleteWebhook API to remove the registered webhook URL, then update your openclaw.json Telegram plugin config to remove or empty the webhook URL field, and restart the gateway. OpenClaw will fall back to polling mode when no webhook URL is configured. Ask your agent to run the deleteWebhook call and update the config in a single step. After the gateway restarts, send a test message and confirm polling resumes in the logs.
My openclaw telegram responses slow down after a few hours but start fast. What causes that?
Three things cause this pattern. First, conversation history growth: the context grows with each exchange, making later turns slower than earlier ones. Compaction is supposed to prevent this but may be configured to fire late or use a slow model for summaries. Second, memory plugin accumulation: if your memory plugin is injecting more recalled memories as the session progresses, the context grows over time. Third, Ollama model eviction: if Ollama evicts the model from memory during a quiet period (the default timeout is 5 minutes), the next message after a pause waits for a cold model load. Set OLLAMA_KEEP_ALIVE=-1 to prevent eviction.
I do not want to pay for a faster cloud API model. What is the fastest free option for Telegram?
The fastest free option is a small Ollama model on local hardware. As of March 2026, qwen2.5:3b produces the fastest responses on CPU-only hardware at conversational quality. llama3.1:8b is a better balance of speed and capability if your server has more than 6GB free RAM. Both are zero API cost. The real ceiling for free speed is your server’s CPU speed and available RAM. On a budget VPS with 2 CPU cores and 4GB RAM, even the smallest models will feel slow for anything beyond a short reply. If you are on limited hardware and need faster free responses, consider a hybrid: use a local model for the simplest 80 percent of Telegram messages (status checks, reminders, short answers) and route only complex requests to a cloud API. This keeps the majority of interactions fast and free while preserving quality for the tasks that need it. Ask your agent to help you identify which of your recent Telegram messages would have been handled adequately by a small local model, to get a realistic picture of how much of your traffic the hybrid approach would cover.
Does enabling prompt caching speed up Telegram responses?
Prompt caching reduces the cost of repeated context, specifically the parts of your context that do not change between requests: your system prompt, static workspace files, and stable skill instructions. For Telegram conversations, the system prompt and injected files are repeated on every turn. With caching enabled on a provider that supports it (Anthropic and OpenAI as of March 2026), subsequent turns in a conversation can be significantly faster because the static portion of the context is served from cache rather than re-processed. This has no effect on Ollama local models, which do not support the same caching mechanism.
Understanding what fast actually means
Before chasing a speed problem, it helps to know what a realistic target is. Telegram response time depends on too many variables for a single universal benchmark, but here is what you should expect at each tier of setup as of March 2026:
- Webhook + fast cloud API (GPT-4o-mini, Claude Haiku, DeepSeek V3): 1 to 4 seconds total round-trip for a short conversational response. These are low-latency models optimized for interactive use. The network round-trip to the API adds roughly 200 to 800 milliseconds depending on your geography.
- Webhook + primary cloud API (Claude Sonnet, GPT-4o): 3 to 10 seconds for a short response, more for longer outputs. These models are significantly slower than their cheaper siblings for interactive chat, though they produce better outputs for complex tasks.
- Webhook + local Ollama (llama3.1:8b on 4-core CPU VPS): 3 to 8 seconds for a short response. No network delay to an external provider, but CPU inference is slow relative to GPU or cloud hardware.
- Polling (any model): Add the polling interval (1 to 2 seconds average wait) to whichever model time above applies to your setup. Polling always adds latency on top of model time.
If your responses are within these ranges, they are performing normally for your configuration. If they are substantially outside them, there is a specific bottleneck to find. The steps below help you find it.
Measure before you change anything
Speed improvements are only visible if you have a baseline to compare against. Before making any config changes, take three measurements of your current response time. Send three separate short messages in Telegram, note the time you sent each one, and note when the response arrived. The difference is your current round-trip time.
I want to benchmark my current Telegram response time before making any changes. For my next three Telegram messages, log the time each message was received and the time the response was sent. Report each round-trip time in seconds after the third message. Also tell me what model handled each response and whether polling or webhook mode was active during the test.
Write down the three times. Once you have made changes, run the same three-message test again and compare. This prevents the common outcome of making multiple changes simultaneously and not knowing which one improved things, or making a change that felt like an improvement because of a different variable (a shorter message, a lighter server load) rather than the config change itself. The most useful baseline stat is not the average of three measurements but the worst-case measurement. Users remember the slow ones. If your slowest response was 18 seconds, that is the number that matters for user experience, not the average of 4, 5, and 18 seconds. Target reducing that worst-case, not just the mean.
What your measurement tells you about the cause of slow openclaw telegram responses
The pattern of your slow responses is diagnostic. Three different patterns point to three different causes:
- Every response is slow by a consistent amount (1 to 2 seconds extra): Polling delay. The consistency is the signature. Every message waits the same amount because the polling interval is fixed. Switch to webhooks (Step 1 above).
- Response time grows as a conversation continues: Context bloat or compaction. Short conversations are fast, long ones slow down progressively. The context is filling up and each turn adds more tokens to process. Address with Step 3 or Step 4.
- Most responses are fast, specific turns take 15 to 30 seconds: Compaction firing. The slow turns coincide with compaction runs. Address with Step 4.
- All responses are slow regardless of message length or conversation age: Model inference time. The model itself is the bottleneck. Address with Step 2.
- Responses are slow only at certain times of day: Cloud API load or network congestion. Some providers have higher latency during peak hours. Consider switching to a different provider or adding a local fallback model for peak periods.
Advanced: parallel response paths for mixed workloads
If you use your Telegram integration for both quick conversational exchanges and complex research or tool-use tasks, the speed problem is structural: a single model cannot be both fast for chat and capable for complex tasks simultaneously. The solution is explicit routing based on message content.
I want to set up model routing in my Telegram channel so that short conversational messages use a fast model and complex requests automatically escalate to my primary model. Show me how to configure this in my openclaw.json. I want the fast model to handle anything that is a simple question, status check, or short response. Complex tasks involving multiple tool calls, research, or long outputs should escalate to the capable model. Tell me the config structure and give me a practical example of how the agent decides which model to use for a given message.
This routing approach gives you the best of both worlds: Telegram feels responsive for casual use, and complex tasks still get the model capability they need. The trade-off is configuration complexity and occasional routing errors (a message that looks simple but triggers a complex task will start on the fast model and may need to hand off mid-task). For most Telegram use patterns, simple conversational routing covers the majority of messages and the hand-off edge case is rare enough to be acceptable.
Speed up openclaw telegram responses with prompt caching on the static context
If you are using Anthropic or OpenAI models for Telegram, prompt caching on the static portion of your context (system prompt, workspace files, skill instructions) reduces per-turn cost and improves response time for subsequent turns in a conversation. The first turn in a new conversation populates the cache. Every subsequent turn that reuses the same static context hits the cache and gets faster, cheaper processing for that portion. Ask your agent to confirm whether prompt caching is enabled in your config and which provider you are using, since caching behavior and pricing vary by provider.
Cheap Claw: $17
Cut your OpenClaw costs without cutting your agent’s capability
The exact model routing config, caching setup, and Ollama integration that operators use to run OpenClaw at a fraction of the default API cost. Includes the speed optimization settings covered in this article.
