API Reference
Web Scraping
Scrape HTML

Scrape HTML

Extract raw HTML from any URL.

Endpoint: GET /v1/web/scrape/html Credits: 1 per request

Parameters

ParameterTypeRequiredDescription
urlstringYesThe URL to scrape
wait_forstringNoCSS selector to wait for before extraction
timeoutnumberNoTimeout in milliseconds (default: 30000)
javascriptbooleanNoEnable JavaScript rendering (default: true)

Response Schema

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "status_code": 200,
    "html": "<!DOCTYPE html><html>...</html>",
    "headers": {
      "content-type": "text/html; charset=utf-8"
    },
    "metadata": {
      "title": "Example Domain",
      "description": "...",
      "load_time_ms": 842
    }
  },
  "credits_used": 1
}

Code Examples

cURL

curl -X GET "https://api.orsa.dev/v1/web/scrape/html?url=https://example.com" \
  -H "Authorization: Bearer YOUR_API_KEY"

TypeScript

const result = await client.web.scrapeHtml({
  url: 'https://example.com',
  waitFor: '.main-content',
  timeout: 15000,
});
 
console.log(result.html);
console.log(result.metadata.title);

Python

result = client.web.scrape_html(
    url="https://example.com",
    wait_for=".main-content",
    timeout=15000,
)
 
print(result.html)
print(result.metadata.title)

Notes

  • JavaScript rendering is enabled by default. Set javascript=false for static pages to reduce latency.
  • The wait_for parameter is useful for SPAs that load content dynamically.
  • HTML is returned as-is from the browser — no cleaning or transformation.