Scrape Images
Extract all images from any web page with dimensions, alt text, and a heuristic role classification (logo, hero, product, icon, decorative).
Endpoint: GET /v1/web/scrape/images
Credits: 1 per request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL to extract images from |
Response Schema
{
"data": {
"url": "https://stripe.com",
"images": [
{
"url": "https://stripe.com/img/v3/home/social.png",
"alt": "Stripe payment processing",
"width": 1200,
"height": 630,
"role": "hero"
},
{
"url": "https://stripe.com/favicon.svg",
"alt": "",
"width": null,
"height": null,
"role": "logo"
}
],
"count": 47
},
"_meta": { "timing": { "total_ms": 1820 }, "cache": { "hit": false } }
}Code Examples
cURL
curl -X GET "https://api.orsa.dev/v1/web/scrape/images?url=https://stripe.com" \
-H "Authorization: Bearer YOUR_API_KEY"TypeScript
const { data } = await client.web.scrapeImages({
url: 'https://stripe.com',
});
console.log(data.count); // 47
console.log(data.images[0].url);
console.log(data.images[0].role); // "hero" | "logo" | "product" | "icon" | "decorative"
const logos = data.images.filter(i => i.role === 'logo');Python
res = client.web.scrape_images(url="https://stripe.com")
data = res["data"]
print(data["count"])
print(data["images"][0]["url"])
print(data["images"][0]["role"])
logos = [i for i in data["images"] if i["role"] == "logo"]Error Codes
| Code | Status | Description |
|---|---|---|
INPUT_VALIDATION_ERROR | 400 | Invalid or missing URL |
UNAUTHORIZED | 401 | Missing or invalid API key |
RATE_LIMITED | 429 | Rate limit exceeded |
INTERNAL_ERROR | 500 | Server error during extraction |
Notes
- Images are extracted after JavaScript rendering, so dynamically loaded images are included. Falls back to plain fetch if the browser pool is unreachable.
- The
rolefield is a heuristic:logofor images in<header>/<nav>or with logo-y alt text,iconfor tiny squares (≤64×64),herofor large above-the-fold images,productfor images inside product/item/card containers, anddecorativeotherwise. - Both
<img src>,<img srcset>(highest-density variant), and<picture><source srcset>are picked up. Data URIs are skipped.