Powering AI Agents with Web Context
Give your AI agents real-time access to website content, brand data, and structured product information through Orsa's API.
The Problem
LLMs have training data cutoffs and can't access live web content. AI agents need fresh, structured data to make decisions and take actions.
The Solution
Use Orsa as the web context layer for your AI agents:
- Scrape pages for up-to-date content (markdown format, LLM-ready)
- Extract brand data for company research and analysis
- Discover products for competitive intelligence and comparison
- Query websites with natural language for custom data extraction
Template-driven scraping for agents
For search-engine, marketplace, social, and AI-tool scraping flows, use the template endpoint:
curl --request GET "https://api.orsa.dev/api/v1/web/scrape/template?template=amazon-search&query=laptop&domain=com&mode=markdown" \
--header "Authorization: Bearer $ORSA_API_KEY"This endpoint builds the target URL from template + query + domain, then runs through the same browser-pool pipeline as standard scraping routes.
Common template IDs
google-search-ai-overviewamazon-searchweb(direct URL mode)bing-searchwalmart-searchtarget-searchyoutube-searchreddit-subredditchatgptperplexity
Use dashboard-only templates for provider-specific verticals where the dashboard adds additional parser behavior and post-processing.
Agent Tool Definitions
OpenAI Function Calling
{
"tools": [
{
"type": "function",
"function": {
"name": "scrape_website",
"description": "Scrape a website and return its content as clean markdown. Use this when you need current information from a specific URL.",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL to scrape"
}
},
"required": ["url"]
}
}
},
{
"type": "function",
"function": {
"name": "get_brand_info",
"description": "Get comprehensive brand data for a company including name, logo, colors, industry, and social links.",
"parameters": {
"type": "object",
"properties": {
"domain": {
"type": "string",
"description": "Company domain (e.g., stripe.com)"
}
},
"required": ["domain"]
}
}
},
{
"type": "function",
"function": {
"name": "extract_custom_data",
"description": "Extract specific data from a website using natural language. Describe what you need and the AI will find it.",
"parameters": {
"type": "object",
"properties": {
"domain": {
"type": "string",
"description": "Domain to analyze"
},
"query": {
"type": "string",
"description": "What data to extract (e.g., 'pricing plans with features and prices')"
}
},
"required": ["domain", "query"]
}
}
}
]
}Tool Implementation
import { Orsa } from 'orsa';
const orsa = new Orsa({ apiKey: process.env.ORSA_API_KEY });
const toolHandlers = {
async scrape_website({ url }: { url: string }) {
const result = await orsa.web.scrapeMarkdown({ url });
return {
content: result.markdown,
title: result.title,
wordCount: result.word_count,
};
},
async get_brand_info({ domain }: { domain: string }) {
const brand = await orsa.brand.retrieve({ domain });
return {
name: brand.name,
domain: brand.domain,
description: brand.description,
industry: brand.industry,
logo: brand.logos[0]?.url,
colors: brand.colors,
socials: brand.socials,
};
},
async extract_custom_data({ domain, query }: { domain: string; query: string }) {
const result = await orsa.ai.query({
domain,
dataToExtract: query,
responseFormat: 'json',
});
return result.result;
},
};MCP Server Integration
For Claude Desktop and Cursor, use the Orsa MCP server to give Claude direct access to Orsa tools.
If you are using OpenClaw, follow the dedicated OpenClaw setup guide for the dashboard key-generation flow, CLI examples, and MCP config.
Use Cases
Company Research Agent
// Agent prompt: "Research Vercel and summarize their product offering"
// Agent calls: get_brand_info("vercel.com") + extract_custom_data("vercel.com", "all products with pricing")Competitive Analysis Agent
// Agent prompt: "Compare pricing between Linear and Jira"
// Agent calls: extract_custom_data("linear.app", "pricing") + extract_custom_data("atlassian.com/jira", "pricing")Content Generation Agent
// Agent prompt: "Write a blog post about the latest updates from Stripe"
// Agent calls: scrape_website("https://stripe.com/blog") → generates content from live dataCredit Budget
| Tool Call | Endpoint | Credits |
|---|---|---|
| Scrape page | GET /v1/web/scrape/markdown | 1 |
| Brand data | GET /v1/brand/retrieve | 5 |
| AI query | POST /v1/brand/ai/query | 10 |
| Products | GET /v1/brand/ai/products | 10 |
Tip: Set credit budgets per agent run to prevent runaway costs. A typical research task uses 15-30 credits.
Tips
- Prefer markdown scraping over HTML for LLM consumption — it's cleaner and uses fewer tokens.
- Cache aggressively. If your agent might query the same domain twice in one session, cache the first result.
- Use the AI Query endpoint for complex extractions instead of scraping + parsing in your agent logic.
- Set timeout budgets — Orsa calls can take 10-60s for uncached data. Plan your agent's timeout accordingly.