Powering AI Agents with Web Context

Give your AI agents real-time access to website content, brand data, and structured product information through Orsa’s API.

The Problem

LLMs have training data cutoffs and can’t access live web content. AI agents need fresh, structured data to make decisions and take actions.

The Solution

Use Orsa as the web context layer for your AI agents:

  • Scrape pages for up-to-date content (markdown by default, LLM-ready)
  • Extract brand data for company research and analysis
  • Discover products for competitive intelligence and comparison
  • Query websites with natural language for custom data extraction

Template-driven scraping for agents

For search-engine, marketplace, social, and AI-tool scraping flows, use the template endpoint:

curl --request GET "https://api.orsa.dev/api/v1/web/scrape/template?template=amazon-search&query=laptop&domain=com&mode=markdown" \
  --header "Authorization: Bearer $ORSA_API_KEY"

This endpoint builds the target URL from template + query + domain, then runs through the same browser-pool pipeline as standard scraping routes.

Common template IDs

  • google-search-ai-overview
  • amazon-search
  • web (direct URL mode)
  • bing-search
  • walmart-search
  • target-search
  • youtube-search
  • reddit-subreddit
  • chatgpt
  • perplexity

Use dashboard-only templates for provider-specific verticals where the dashboard adds additional parser behavior and post-processing.

Agent Tool Definitions

OpenAI Function Calling

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "scrape_website",
        "description": "Scrape a website and return its content as clean markdown. Use this when you need current information from a specific URL.",
        "parameters": {
          "type": "object",
          "properties": {
            "url": { "type": "string", "description": "The URL to scrape" }
          },
          "required": ["url"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "get_brand_info",
        "description": "Get comprehensive brand data for a company including name, logo, colors, industries, and social links.",
        "parameters": {
          "type": "object",
          "properties": {
            "domain": { "type": "string", "description": "Company domain (e.g., stripe.com)" }
          },
          "required": ["domain"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "extract_custom_data",
        "description": "Extract specific data from a website using natural language. Describe what you need and the AI will find it.",
        "parameters": {
          "type": "object",
          "properties": {
            "domain": { "type": "string", "description": "Domain to analyze" },
            "query": { "type": "string", "description": "What data to extract (e.g., 'pricing plans with features and prices')" }
          },
          "required": ["domain", "query"]
        }
      }
    }
  ]
}

Tool Implementation

import Orsa from '@orsa.dev/sdk';
 
const orsa = new Orsa({ apiKey: process.env.ORSA_API_KEY! });
 
const toolHandlers = {
  async scrape_website({ url }: { url: string }) {
    // Default mode is 'markdown' — perfect for LLM consumption.
    const page = await orsa.web.scrape({ url });
    return {
      content: page.markdown,
      url: page.url,
      statusCode: page.status_code,
    };
  },
 
  async get_brand_info({ domain }: { domain: string }) {
    const brand = await orsa.brand.retrieve({ domain });
    return {
      title: brand.title,
      domain: brand.domain,
      description: brand.description,
      industries: brand.industries,
      logo: brand.logos.find((l) => l.type === 'logo')?.url,
      colors: brand.colors,
      socials: brand.socials,
      links: brand.links,
    };
  },
 
  async extract_custom_data({ domain, query }: { domain: string; query: string }) {
    // `result` is a string. Prompt for JSON if you want structured output.
    const ans = await orsa.ai.query({ domain, dataToExtract: query });
    return { answer: ans.result, usage: ans.usage };
  },
};

MCP Server Integration

For Claude Desktop and Cursor, use the Orsa MCP server to give Claude direct access to Orsa tools.

If you are using OpenClaw, follow the dedicated OpenClaw setup guide for the dashboard key-generation flow, CLI examples, and MCP config.

Use Cases

Company Research Agent

// Agent prompt: "Research Vercel and summarize their product offering"
// Agent calls: get_brand_info("vercel.com") + extract_custom_data("vercel.com", "all products with pricing")

Competitive Analysis Agent

// Agent prompt: "Compare pricing between Linear and Jira"
// Agent calls: extract_custom_data("linear.app", "pricing") + extract_custom_data("atlassian.com/jira", "pricing")

Content Generation Agent

// Agent prompt: "Write a blog post about the latest updates from Stripe"
// Agent calls: scrape_website("https://stripe.com/blog") → generates content from live data

Credit Budget

Tool CallEndpointCredits
Scrape pageGET /v1/web/scrape/template1
Brand dataGET /v1/brand/retrieve10
AI queryPOST /v1/brand/ai/query20
ProductsGET /v1/brand/ai/products15

Tip: Set credit budgets per agent run to prevent runaway costs. A typical research task uses 30-60 credits.

Tips

  • Prefer markdown scraping over HTML for LLM consumption — it’s cleaner and uses fewer tokens. The default mode: 'markdown' runs Readability + turndown server-side.
  • Cache aggressively. If your agent might query the same domain twice in one session, cache the first result.
  • Use the AI Query endpoint for complex extractions instead of scraping + parsing in your agent logic.
  • Set timeout budgets. AI calls and styleguide extraction can take 30-60s for uncached data. Pass { timeout: 60_000 } in RequestOptions for those calls.