Scrape HTML
Extract raw HTML from any URL.
Endpoint: GET /v1/web/scrape/html
Credits: 1 per request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL to scrape |
wait_for | string | No | CSS selector to wait for before extraction |
timeout | number | No | Timeout in milliseconds (default: 30000) |
javascript | boolean | No | Enable JavaScript rendering (default: true) |
Response Schema
{
"success": true,
"data": {
"url": "https://example.com",
"status_code": 200,
"html": "<!DOCTYPE html><html>...</html>",
"headers": {
"content-type": "text/html; charset=utf-8"
},
"metadata": {
"title": "Example Domain",
"description": "...",
"load_time_ms": 842
}
},
"credits_used": 1
}Code Examples
cURL
curl -X GET "https://api.orsa.dev/v1/web/scrape/html?url=https://example.com" \
-H "Authorization: Bearer YOUR_API_KEY"TypeScript
const result = await client.web.scrapeHtml({
url: 'https://example.com',
waitFor: '.main-content',
timeout: 15000,
});
console.log(result.html);
console.log(result.metadata.title);Python
result = client.web.scrape_html(
url="https://example.com",
wait_for=".main-content",
timeout=15000,
)
print(result.html)
print(result.metadata.title)Notes
- JavaScript rendering is enabled by default. Set
javascript=falsefor static pages to reduce latency. - The
wait_forparameter is useful for SPAs that load content dynamically. - HTML is returned as-is from the browser — no cleaning or transformation.