Scrape Sitemap
Discover URLs declared in a domain’s sitemap. Reads /robots.txt for non-standard sitemap locations, then falls through to /sitemap.xml, /sitemap_index.xml, and /sitemap-index.xml. Sitemap-index files are walked recursively. Up to 1,000 URLs are returned, grouped by first path segment so you can quickly find what you need.
Endpoint: GET /v1/web/scrape/sitemap
Credits: 1 per request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
domain | string | Yes | Domain to parse sitemap for (e.g., stripe.com) |
Response Schema
{
"data": {
"domain": "stripe.com",
"sitemap": "https://stripe.com/sitemap.xml",
"urls": [
"https://stripe.com",
"https://stripe.com/pricing",
"https://stripe.com/payments",
"https://stripe.com/billing"
],
"count": 847,
"groups": {
"docs": { "count": 240, "samples": ["https://stripe.com/docs/api", "https://stripe.com/docs/payments"] },
"blog": { "count": 156, "samples": ["https://stripe.com/blog/announcing-stripe-link"] },
"pricing": { "count": 4, "samples": ["https://stripe.com/pricing"] }
}
},
"_meta": { "timing": { "total_ms": 1240 }, "cache": { "hit": false } }
}Code Examples
cURL
curl -X GET "https://api.orsa.dev/v1/web/scrape/sitemap?domain=stripe.com" \
-H "Authorization: Bearer YOUR_API_KEY"TypeScript
const { data } = await client.web.scrapeSitemap({
domain: 'stripe.com',
});
console.log(data.count); // 847
console.log(data.sitemap); // which sitemap URL we resolved
console.log(data.groups.docs.count); // 240
console.log(data.urls.slice(0, 5));Python
res = client.web.scrape_sitemap(domain="stripe.com")
data = res["data"]
print(data["count"]) # 847
print(data["sitemap"]) # resolved sitemap URL
print(data["groups"]["docs"]["count"])Error Codes
| Code | Status | Description |
|---|---|---|
INPUT_VALIDATION_ERROR | 400 | Invalid or missing domain |
UNAUTHORIZED | 401 | Missing or invalid API key |
RATE_LIMITED | 429 | Rate limit exceeded |
INTERNAL_ERROR | 500 | Server error during parsing |
Notes
sitemapis the URL we actually resolved (e.g. one fromrobots.txt, or/sitemap.xml).nullif no sitemap was found.groupsbuckets URLs by first non-empty path segment with up to 5 sample URLs each — useful for quickly answering “where are the docs?” or “is there a blog?” without iterating the full list.- Returns up to 1,000 URLs per request. URLs are not deduplicated across sub-sitemaps beyond what the source declares.
- No credits are deducted if zero URLs were found.