Scrape HTML

Extract raw HTML from any URL.

Endpoint: GET /v1/web/scrape/html Credits: 1 per request

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	The URL to scrape
`wait_for`	string	No	CSS selector to wait for before extraction
`timeout`	number	No	Timeout in milliseconds (default: 30000)
`javascript`	boolean	No	Enable JavaScript rendering (default: true)

Response Schema

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "status_code": 200,
    "html": "<!DOCTYPE html><html>...</html>",
    "headers": {
      "content-type": "text/html; charset=utf-8"
    },
    "metadata": {
      "title": "Example Domain",
      "description": "...",
      "load_time_ms": 842
    }
  },
  "credits_used": 1
}

Code Examples

cURL

curl -X GET "https://api.orsa.dev/v1/web/scrape/html?url=https://example.com" \
  -H "Authorization: Bearer YOUR_API_KEY"

TypeScript

const result = await client.web.scrapeHtml({
  url: 'https://example.com',
  waitFor: '.main-content',
  timeout: 15000,
});
 
console.log(result.html);
console.log(result.metadata.title);

Python

result = client.web.scrape_html(
    url="https://example.com",
    wait_for=".main-content",
    timeout=15000,
)
 
print(result.html)
print(result.metadata.title)

Notes

JavaScript rendering is enabled by default. Set javascript=false for static pages to reduce latency.
The wait_for parameter is useful for SPAs that load content dynamically.
HTML is returned as-is from the browser — no cleaning or transformation.

Transaction Categorization Scrape Markdown