Connecting your agent to external data: APIs, webhooks, and file watchers

APIs, webhooks, and file watchers. How to give your agent real-time information to work with, and what to do when the data arrives. This guide covers the patterns for connecting OpenClaw to external data sources and building automations that react to what’s happening in the world.

TL;DR

OpenClaw agents can read from any data source that’s accessible via HTTP: APIs, webhooks, RSS feeds, public datasets, internal tools. The patterns are the same: fetch, parse, act. This guide covers the exact curl commands, authentication methods, parsing instructions, and error handling for connecting to external data.

Why external data matters

An agent that only knows what’s in your workspace is useful. An agent that also knows what’s happening in the world is powerful. External data turns your agent from a personal assistant into a real-time monitoring system, a research assistant, a market intelligence tool, and an automation engine that reacts to events as they happen.

Examples of what becomes possible with external data:

  • Market monitoring: Track competitor pricing, new product launches, industry news
  • Research automation: Monitor arXiv, academic journals, specific authors for new publications
  • Infrastructure monitoring: Watch API status pages, server metrics, error logs
  • Personal tracking: Follow package deliveries, flight statuses, stock prices, weather
  • Workflow integration: Connect to Notion, Trello, Slack, Google Calendar, GitHub
  • Public data analysis: Government datasets, financial reports, regulatory filings

The common thread: the agent fetches data, parses it, and decides what to do based on what it finds. The decision logic is in plain English instructions, not code.

The three patterns for external data

All external data connections follow one of three patterns:

Pattern 1: Polling (cron-based)

A cron job runs on a schedule, fetches data from an endpoint, checks for changes or new information, and acts if something is found. This is the most common pattern. The morning brief uses it for news, the file watcher uses it for file changes.

Create a cron that runs every 30 minutes. Task: fetch data from [API endpoint] using curl. Parse the JSON response for [specific field]. Compare the value to the previous value stored in [state file]. If the value changed, send me a Telegram message with the old and new values. Update the state file with the new value. Use ollama/phi4:latest for parsing and comparison.

Pattern 2: Webhook (push-based)

An external service sends data to your OpenClaw agent when something happens. The agent receives the webhook payload, parses it, and acts immediately. This is real-time reaction without polling.

Set up a webhook endpoint in OpenClaw that listens for POST requests at [URL]. When a request arrives, parse the JSON body for [specific fields]. Based on the values, decide what action to take: send a Telegram alert, update a file, trigger another automation. Show me the webhook configuration and the parsing instruction.

Pattern 3: File watcher (local polling)

A cron watches a local file that other processes write to (logs, exports, downloads). When the file changes, the agent reads the new content and acts. This bridges external processes that write files with OpenClaw’s ability to read and understand them.

Create a cron that runs every 5 minutes. Task: check if [file path] has been modified since the last check (store timestamp in [state file]). If modified, read the new lines added since last check. Parse them for [patterns or keywords]. If any match, send a Telegram alert with the matching lines. Update the timestamp state file.

Pattern 1 in depth: Polling APIs

The basic curl pattern

Most API polling starts with curl. The agent runs curl with appropriate headers, captures the response, and parses it:

Fetch data from this API endpoint: [URL]. Use curl with these headers: [headers]. Save the response to a temporary file. Read the file and parse the JSON for these fields: [field1, field2, field3]. Write the extracted values to [output file]. Use ollama/phi4:latest for parsing.

Authentication patterns

APIs use different authentication methods. The agent handles all of them via curl flags:

  • API key in header: curl -H "Authorization: Bearer TOKEN" [URL]
  • API key in query parameter: curl "[URL]?api_key=TOKEN"
  • Basic auth: curl -u username:password [URL]
  • OAuth 2.0: curl -H "Authorization: Bearer ACCESS_TOKEN" [URL] (token obtained separately)
  • Cookie-based: curl -b "session=COOKIE_VALUE" [URL]

Test authentication to [API endpoint]. Try these authentication methods in order: API key header, basic auth, cookie. Show me which one works and the exact curl command that succeeds. Store the working command in [config file] for future use.

Parsing JSON responses

phi4:latest at 14B parameters is capable of parsing JSON and extracting specific fields. The instruction needs to be explicit about what to look for:

Parse this JSON response: [paste sample JSON]. Extract these fields: price, timestamp, status. Write them to a file in this format: [field]: [value]. If any field is missing, write “field missing” for that line. Use ollama/phi4:latest.

State management between polls

Polling needs to remember what was seen last time to detect changes. The simplest state management is a file with the last-seen values:

After parsing the API response, read [state file] to get the previous values. Compare current values to previous. If any value changed by more than [threshold], send a Telegram alert with the change. Then write the current values to [state file] for the next run.

Pattern 2 in depth: Webhooks

Setting up a webhook endpoint

OpenClaw can receive webhooks via its gateway. The configuration depends on your setup, but the pattern is:

Configure a webhook endpoint at [URL] that accepts POST requests. When a request arrives, log the headers and body to [log file]. Parse the body (assume JSON) for [specific fields]. Based on the values, decide: if field X equals Y, send Telegram alert; if field A is greater than B, update [status file]; otherwise, log and ignore. Show me the webhook configuration.

Webhook security

Public webhook endpoints need verification to prevent abuse. Common methods:

  • Secret token in header: Verify X-Webhook-Token matches expected value
  • Signature verification: HMAC signature of payload with shared secret
  • IP whitelisting: Only accept from known source IPs
  • Query parameter token: ?token=SECRET in the URL

Update the webhook endpoint to require authentication. Check for header X-Webhook-Token with value [secret token]. If missing or incorrect, respond with 401 Unauthorized and log the attempt. If correct, process the webhook as normal. Show me the updated configuration.

Webhook rate limiting and queuing

High-volume webhooks need rate limiting to avoid overwhelming the agent:

Add rate limiting to the webhook endpoint. Track incoming requests by source IP in [rate limit file]. Allow maximum 10 requests per minute per IP. If exceeded, respond with 429 Too Many Requests and log the IP. Reset counters every minute. Show me the implementation.

Pattern 3 in depth: File watchers

Watching local files for changes

File watchers are polling crons that check file modification timestamps:

Create a cron that runs every minute. Check if [file path] exists and has been modified since the timestamp stored in [timestamp file]. If modified, read the entire file, compare to the previous content stored in [previous content file], extract only the new lines, process them, then update both state files. If not modified, do nothing.

Processing log files

Log files are a common file watcher target. The agent reads new log entries and looks for patterns:

Watch [log file path] for new entries. For each new line, check if it contains any of these patterns: [list of error patterns, keywords]. If a line matches, send a Telegram alert with the matching line and timestamp. Also append to [alert log file] for tracking.

Watching downloaded files

Automations that download files (reports, exports, data dumps) can trigger processing when the download completes:

Watch [download directory] for new files with extension .csv or .json. When a new file appears, read it, parse the contents, extract [specific data], write to [processed output file], and send a Telegram notification that processing completed. Then move the original file to [archive directory] with timestamp.

Real-world examples

Example 1: GitHub repository monitor

Track specific repositories for new commits, issues, or releases:

Create a cron that runs every hour. Fetch the GitHub API endpoint for repository [owner/repo]. Parse the response for latest commit, latest issue, latest release. Compare to previous state. If any are new, send Telegram alert with details. Store state in [state file]. Use GitHub token [token] for authentication.

Example 2: Package tracking

Monitor shipping carriers for package status changes:

Create a cron that runs every 4 hours. Call the carrier API for tracking number [number]. Parse the response for current status and location. Compare to previous status. If status changed to “delivered” or “out for delivery”, send Telegram alert. If status is “delayed” or “exception”, send urgent alert. Store previous status in [state file].

Example 3: Weather alert system

Monitor weather conditions for specific locations:

Create a cron that runs every 30 minutes. Fetch weather data for [city] from wttr.in API. Parse for temperature, conditions, and alerts. If temperature drops below [threshold] or rises above [threshold], send alert. If weather alert is issued (hurricane, storm, etc.), send urgent alert. Store previous values for comparison.

Example 4: Stock price monitor

Track specific stocks or cryptocurrencies:

Create a cron that runs every 15 minutes during market hours. Fetch stock prices for [symbols] from financial API. Calculate percentage change from previous check. If any stock moves more than [threshold]% up or down, send Telegram alert with symbol and change. Store previous prices in [state file].

Error handling and reliability

API failure handling

APIs fail. The agent needs to handle timeouts, rate limits, authentication errors, and malformed responses:

Update the API polling cron with error handling. If curl returns non-zero exit code or the response is not valid JSON, log the error to [error log], wait 5 minutes, and retry once. If the second attempt also fails, send Telegram alert “API [name] unreachable” and skip processing until next scheduled run.

Rate limit awareness

Many APIs have rate limits. The agent should track usage and back off when approaching the limit:

Update the API polling cron with rate limit tracking. After each API call, check the response headers for rate limit information (X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After). If remaining requests are below 20% of the limit, log a warning and increase the polling interval to 2x. If a 429 response is received, wait for the duration specified in Retry-After before retrying. Track all API call counts in [rate limit log].

Stale data detection

An API that returns the same data every time might be broken, cached, or the source hasn’t updated. The agent should notice when data stops changing:

After each poll, check if the data is identical to the previous 3 polls (stored in [state file]). If the data has not changed in 3 consecutive polls, send a warning: “Data from [API name] hasn’t changed in [time]. Possible issue: stale cache, API outage, or source offline.” Do not send this warning more than once per day.

Security considerations for external data

Storing API credentials safely

API keys should not be hardcoded in cron instructions where they might end up in logs or workspace files that are committed to git:

Create a credentials file at /home/node/.openclaw/workspace/.credentials (add this path to .gitignore). Store API keys in this format: API_NAME=key_value (one per line). Update all cron instructions to read the key from this file instead of having it inline. Show me which crons currently have hardcoded keys and the updated instructions.

Data validation

External data is untrusted. Before acting on parsed API responses, the agent should validate the data makes sense:

After parsing the API response, validate: (1) the expected fields exist, (2) values are in a reasonable range (e.g., a stock price should be positive, a temperature should be between -60 and 60°C), (3) timestamps are recent (not stale data from weeks ago). If validation fails, log the raw response to [debug log] and skip processing. Do not send alerts based on invalid data.

Logging what goes in and out

Every external data connection should log what it receives and what it does with it. This audit trail is essential for debugging and for understanding what your agent is doing with outside information:

Update all external data crons to log each API call to /home/node/.openclaw/workspace/automations/logs/external-data.log. Format: [timestamp] | [API name] | [HTTP status] | [response size] | [action taken]. This log should grow by one line per API call. Review it weekly for anomalies.

Building a multi-source data pipeline

The real power of external data shows up when you combine multiple sources into a single analysis. A multi-source pipeline reads from several APIs, correlates the data, and produces insights that no single source provides.

Example: Competitor intelligence pipeline

A freelancer monitors three competitors. The pipeline reads pricing pages, social media activity, and job postings to build a weekly competitive picture:

Create a weekly cron (Mondays at 6 AM). For each competitor: (1) fetch their pricing page and extract current prices, (2) search for their company name in recent news, (3) check their careers page for new job postings. Write all findings to /home/node/.openclaw/workspace/market/weekly-competitive-report.md. After collecting data for all competitors, add an analysis section: “What changed this week” with a one-line summary per competitor. Use ollama/phi4:latest for all parsing. Send me a Telegram message when the report is ready.

Example: Personal finance dashboard

Combine bank transaction data (if available via API), investment values, and expense tracking into a weekly financial snapshot:

Every Sunday at 8 PM, build a weekly financial snapshot. Fetch data from these sources: [list APIs or files]. Calculate: total income this week, total expenses by category, net change, investment portfolio change. Write to /home/node/.openclaw/workspace/finance/weekly-snapshot.md. Compare to last week’s snapshot and note significant changes. Send a Telegram summary with the top-line numbers. Use ollama/phi4:latest.

Example: Content research pipeline

Writers and content creators can automate research collection. The pipeline monitors specified topics across multiple sources and builds a research file:

Create a daily cron at 5 AM. Search these topics across web search, RSS feeds, and social media: [topic list]. For each result, extract: title, source, publication date, key claims or data points. Write to /home/node/.openclaw/workspace/research/daily-findings.md. Score each finding by relevance to my current project: [describe project]. Include only findings scoring 7 or above. Use ollama/phi4:latest.

SOTA data integration tools (March 2026)

As of March 2026, the best approach for data integration depends on the data source type:

Structured APIs (JSON/XML)

curl for HTTP calls, phi4 for parsing. This covers 80% of use cases. The agent constructs the curl command, runs it, reads the response, and extracts what matters. No additional tools needed.

Web scraping (HTML pages)

The web_fetch tool extracts readable content from HTML pages and converts to markdown. For structured extraction from specific elements, the agent can also use the browser tool for JavaScript-rendered content:

Fetch the content from [URL] using web_fetch. Extract: all prices listed on the page, any product names, and any date references. Write the extracted data to [output file]. If web_fetch returns incomplete data (the page requires JavaScript), try again using the browser tool to load the page fully before extracting.

RSS/Atom feeds

curl fetches the XML, phi4 parses the entries. RSS is predictable in structure, which makes it reliable for automated parsing. Most blogs, news sites, and academic journals publish RSS feeds.

GraphQL APIs

Some modern APIs use GraphQL instead of REST. The curl command includes a JSON body with the query:

Call this GraphQL API endpoint: [URL]. Use this query: { [your GraphQL query] }. Send as a POST request with Content-Type application/json and Authorization header. Parse the response for [specific fields in the data object]. Write results to [output file].

Cost analysis for external data integrations

External data connections cost $0.00 on the OpenClaw side when using phi4:latest for parsing. The cost comes from the external APIs themselves:

  • Free tier APIs (weather, some financial data, public data): $0.00
  • Freemium APIs (GitHub, most productivity tools): $0.00 within free tier limits
  • Paid APIs (premium data sources, high-frequency financial data): varies by provider
  • Agent processing cost (phi4, all parsing and analysis): $0.00
  • Agent processing cost (deepseek-chat, for high-quality output): ~$0.01-0.03 per run

Most operators run 5-10 external data integrations and pay nothing for the agent-side processing. The only costs are API subscriptions if you need premium data sources. For most professional use cases, the free tiers are more than sufficient for daily monitoring needs.

Debugging external data connections

The data isn’t arriving

Check three things in order: (1) can curl reach the endpoint from the server, (2) is authentication correct, (3) is the parsing instruction finding the right fields:

Debug the [API name] connection. Step 1: Run the curl command manually and show me the raw response. Step 2: Check authentication by inspecting the response status code and any error messages. Step 3: If the response looks correct, show me the parsing instruction and what it extracts from the actual response. Identify the failure point.

The data is wrong or garbled

Usually a parsing problem. The API response format changed, or the parsing instruction doesn’t match the actual JSON structure:

Show me the raw API response from [API name] and the current parsing instruction. Compare the response structure to what the instruction expects. If the structure changed (new fields, renamed fields, different nesting), update the instruction to match the current response format.

The alert is too noisy

If you’re getting too many alerts, tighten the threshold or add a cooldown:

Update the [API name] alert cron. Add a cooldown: after sending an alert, do not send another alert for the same condition for [duration]. Change the alert threshold from [current] to [new threshold]. Show me the updated instruction before saving.

Advanced pattern: event-driven automation chains

The most powerful external data integrations don’t just monitor. They react. An event-driven chain connects data detection to action: when a specific condition is met, the agent executes a multi-step response automatically.

Example: New customer onboarding

When a new customer signs up (detected via API or webhook), the agent creates a welcome task, sends a personalized message, updates a CRM file, and schedules a follow-up check:

When a new customer is detected in the [CRM/API] response (customer ID not in previous state file), execute this chain: (1) Add a welcome task to TASKS.md with due date today. (2) Draft a welcome message using their name and signup details. (3) Append their information to /home/node/.openclaw/workspace/customers/ACTIVE.md. (4) Create a follow-up check cron that fires in 7 days to verify their onboarding status. Send me a Telegram notification for each step completed.

Example: Error escalation pipeline

When a monitored service returns errors, the response escalates based on severity and duration:

Monitor [service URL] every 5 minutes. On first error: log it, no alert. On second consecutive error: send Telegram warning. On third consecutive error: send Telegram urgent alert and create a task in TASKS.md marked URGENT. On recovery (first successful response after errors): send Telegram “Service recovered after [count] consecutive failures over [duration].” Track consecutive failure count in [state file].

Example: Content publication workflow

When a draft file reaches a specific state, the agent triggers a publication workflow:

Watch /home/node/.openclaw/workspace/content/drafts/ for files with “READY” in the first line. When a file is marked READY: (1) Run quality checks (word count, formatting). (2) If checks pass, move to /content/reviewed/ and send Telegram “Draft [filename] passed QA, ready for publish.” (3) If checks fail, send Telegram with the specific failures. Check every 15 minutes.

Testing external data integrations

Every external data connection should be tested before going live. The testing process follows a specific order that catches the most common problems:

Step 1: Test the raw connection

Before writing any parsing logic, confirm the API call works and returns data:

Run this curl command and show me the raw response: [your curl command]. Do not parse or process the response. Just show me what comes back, including HTTP status code and response headers.

Step 2: Test the parsing

With a confirmed working response, test the parsing instruction against the actual data:

Using the response from the previous step, extract these fields: [field list]. Show me what you extracted and confirm it matches what the raw response contains. If any field is missing or extracted incorrectly, show me why.

Step 3: Test the state comparison

Run the integration twice to confirm change detection works:

Run the [integration name] polling cron manually twice, 30 seconds apart. After the first run, show me the state file. After the second run, show me whether the comparison detected “no change” correctly. Then manually modify the state file to simulate a change and run again to confirm the alert fires.

Step 4: Test the failure path

Deliberately break the API call and confirm error handling works:

Test failure handling for the [integration name] cron. Change the API URL to an invalid endpoint temporarily. Run the cron and confirm: (1) it detects the failure, (2) it logs the error, (3) it sends the failure notification, (4) it does not overwrite the state file with bad data. Then restore the correct URL.

Scaling from one integration to many

One external data connection is simple. Ten is a system. Here’s how to scale without creating a maintenance burden:

Standardize the folder structure

Every integration should follow the same pattern:

  • /automations/state/[integration-name].json for state files
  • /automations/logs/[integration-name].log for per-integration logs
  • /automations/output/[integration-name]/ for output files

Create the standard folder structure for a new integration called [name]. Create the state file, log file, and output directory. Confirm all paths exist. Then show me the cron instruction template I should use for this integration.

Build a central status page

A daily cron reads all integration logs and produces a status report:

Every evening at 9 PM, read all files in /automations/logs/. For each log, find the most recent entry and check its status. Write a summary to /automations/STATUS.md: integration name, last run time, last status, and any errors in the last 24 hours. If any integration has failed in the last 24 hours, send a Telegram alert with the failures listed.

Version your integration configurations

When you update a cron instruction, save the previous version. This lets you roll back when an update breaks something:

Before updating any cron instruction, save the current instruction text to /automations/history/[integration-name]-[date].txt. Then make the update. If the updated cron fails within 24 hours, I can tell you “roll back [integration name]” and you should restore from the saved version.

Frequently asked questions

Can I connect to APIs that require OAuth 2.0 with refresh tokens?

Yes, but it requires an initial setup step. The agent stores the refresh token, uses it to obtain an access token before each API call, and handles token expiration automatically. The initial OAuth flow (getting the first token) usually requires manual browser interaction.

What happens if the API changes its response format?

The parsing will fail or produce wrong results. The stale data detection and validation steps catch most of these failures. When detected, update the parsing instruction to match the new format. Checking API changelogs or version headers helps anticipate format changes.

Can I write data back to external APIs (POST/PUT)?

Yes. The agent can send POST and PUT requests via curl. The pattern is the same: construct the curl command with appropriate headers, body, and authentication. Common write operations include creating tickets, posting messages, updating records, and triggering workflows in external tools.

How many API calls per day is reasonable?

That depends entirely on the API’s rate limits and your needs. Most free-tier APIs allow 1,000-10,000 calls per day. A cron that polls every 30 minutes makes 48 calls per day per API, well within most free tiers. If you need more frequent polling, check the API’s rate limit documentation first.

Can I connect to databases directly?

If the database is accessible from the server (same machine or network), the agent can run database queries via command-line tools like psql (PostgreSQL), mysql (MySQL), or sqlite3 (SQLite). The exec tool runs the query and the agent parses the output. For remote databases, use the API layer in front of them instead of direct connections.

Run this database query and show me the results: psql -h localhost -U [username] -d [database] -c "SELECT count(*) FROM [table] WHERE created_at > now() - interval '24 hours';". Parse the result and tell me the count. Store this count in [state file] for daily comparison.

How do I handle APIs that require pagination?

Some APIs return results in pages. The agent handles this by making multiple requests, following the pagination links until all results are collected:

Fetch all results from [API endpoint]. The API returns 50 results per page. After each request, check for a “next” link or “next_page” token in the response. If present, fetch the next page. Continue until no more pages remain. Collect all results into a single file. Show me the total count when done.

What about APIs that use XML instead of JSON?

phi4 parses XML with the same approach as JSON: read the response, identify the tags and structure, extract the values between specific tags. RSS feeds are XML, and the morning brief guide already demonstrates this pattern. For complex XML with deep nesting, the instruction needs to be specific about which nested elements to extract. Include the full tag path (e.g., “extract the text inside channel > item > title”) rather than just the tag name.

What if I need real-time data (sub-minute polling)?

OpenClaw crons fire at minute-level granularity. For sub-minute monitoring, use a webhook pattern instead of polling: have the data source push to your agent when something changes. If the source doesn’t support webhooks, run a lightweight external process (a small Python script, a shell loop) that polls at high frequency and writes changes to a file the agent watches.


Cheap Claw

Every cost lever OpenClaw exposes, ranked by impact. Local model routing, prompt caching, cron scheduling, and the spend monitoring setup that keeps everything under $2/month.

Get Cheap Claw: $17

Keep Reading:


WORKSHOP
Your first OpenClaw automation
From zero to a working scheduled task in under an hour. The foundational guide for building automations with OpenClaw.


WORKSHOP
Replacing your morning routine with a cron job
News, inbox triage, priorities, and weather delivered to your phone before you sit down. The complete morning brief stack.


WORKSHOP
Building a personal AI clone
Voice, knowledge, decisions, and routing. Four layers that turn a generic agent into something that responds the way you would.