API Reference
Web Scraping
Scrape Sitemap

Scrape Sitemap

Parse a domain's sitemap and return all discovered URLs. Automatically checks /sitemap.xml, /sitemap_index.xml, and /sitemap.txt.

Endpoint: GET /v1/web/scrape/sitemap Credits: 1 per request

Parameters

ParameterTypeRequiredDescription
domainstringYesDomain to parse sitemap for (e.g., stripe.com)

Response Schema

{
  "success": true,
  "urls": [
    "https://stripe.com",
    "https://stripe.com/pricing",
    "https://stripe.com/payments",
    "https://stripe.com/billing",
    "https://stripe.com/connect",
    "https://stripe.com/terminal"
  ],
  "count": 847,
  "domain": "stripe.com",
  "cached": false,
  "credits_used": 1,
  "request_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901"
}

Code Examples

cURL

curl -X GET "https://api.orsa.dev/v1/web/scrape/sitemap?domain=stripe.com" \
  -H "Authorization: Bearer YOUR_API_KEY"

TypeScript

const result = await client.web.scrapeSitemap({
  domain: 'stripe.com',
});
 
console.log(result.count);    // 847
console.log(result.urls);     // ["https://stripe.com", ...]

Python

result = client.web.scrape_sitemap(domain="stripe.com")
 
print(result.count)    # 847
print(result.urls)     # ["https://stripe.com", ...]

Error Codes

CodeStatusDescription
INPUT_VALIDATION_ERROR400Invalid or missing domain
UNAUTHORIZED401Missing or invalid API key
NOT_FOUND404No sitemap found for domain
RATE_LIMITED429Rate limit exceeded
INTERNAL_ERROR500Server error during parsing

Notes

  • Automatically follows sitemap index files (nested sitemaps) recursively.
  • Supports both XML and plain-text sitemap formats.
  • Returns up to 10,000 URLs per request. For larger sites, consider using the Crawl Website endpoint.
  • Results are cached for 24 hours.
  • URLs are automatically deduplicated.