Scrape Sitemap
Parse a domain's sitemap and return all discovered URLs. Automatically checks /sitemap.xml, /sitemap_index.xml, and /sitemap.txt.
Endpoint: GET /v1/web/scrape/sitemap
Credits: 1 per request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
domain | string | Yes | Domain to parse sitemap for (e.g., stripe.com) |
Response Schema
{
"success": true,
"urls": [
"https://stripe.com",
"https://stripe.com/pricing",
"https://stripe.com/payments",
"https://stripe.com/billing",
"https://stripe.com/connect",
"https://stripe.com/terminal"
],
"count": 847,
"domain": "stripe.com",
"cached": false,
"credits_used": 1,
"request_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901"
}Code Examples
cURL
curl -X GET "https://api.orsa.dev/v1/web/scrape/sitemap?domain=stripe.com" \
-H "Authorization: Bearer YOUR_API_KEY"TypeScript
const result = await client.web.scrapeSitemap({
domain: 'stripe.com',
});
console.log(result.count); // 847
console.log(result.urls); // ["https://stripe.com", ...]Python
result = client.web.scrape_sitemap(domain="stripe.com")
print(result.count) # 847
print(result.urls) # ["https://stripe.com", ...]Error Codes
| Code | Status | Description |
|---|---|---|
INPUT_VALIDATION_ERROR | 400 | Invalid or missing domain |
UNAUTHORIZED | 401 | Missing or invalid API key |
NOT_FOUND | 404 | No sitemap found for domain |
RATE_LIMITED | 429 | Rate limit exceeded |
INTERNAL_ERROR | 500 | Server error during parsing |
Notes
- Automatically follows sitemap index files (nested sitemaps) recursively.
- Supports both XML and plain-text sitemap formats.
- Returns up to 10,000 URLs per request. For larger sites, consider using the Crawl Website endpoint.
- Results are cached for 24 hours.
- URLs are automatically deduplicated.