Scrape Markdown
Extract clean, readable Markdown from any URL. Automatically removes navigation, ads, and boilerplate.
Endpoint: GET /v1/web/scrape/markdown
Credits: 2 per request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL to scrape |
include_links | boolean | No | Preserve hyperlinks in markdown (default: true) |
include_images | boolean | No | Include image references (default: true) |
main_content_only | boolean | No | Extract only the main content area (default: true) |
Response Schema
{
"success": true,
"data": {
"url": "https://example.com/blog/post",
"markdown": "# Blog Post Title\n\nThis is the content...",
"word_count": 1247,
"metadata": {
"title": "Blog Post Title",
"author": "Jane Doe",
"published_date": "2024-12-01",
"load_time_ms": 1120
}
},
"credits_used": 2
}Code Examples
cURL
curl -X GET "https://api.orsa.dev/v1/web/scrape/markdown?url=https://stripe.com/pricing" \
-H "Authorization: Bearer YOUR_API_KEY"TypeScript
const result = await client.web.scrapeMarkdown({
url: 'https://stripe.com/pricing',
includeLinks: true,
mainContentOnly: true,
});
console.log(result.markdown);
console.log(result.wordCount);Python
result = client.web.scrape_markdown(
url="https://stripe.com/pricing",
include_links=True,
main_content_only=True,
)
print(result.markdown)
print(result.word_count)Notes
- The markdown converter strips navigation, headers, footers, ads, and cookie banners automatically.
main_content_only=trueuses heuristics to find the primary content area — ideal for blog posts and articles.- Output is clean CommonMark-compatible markdown.