self-hosting
Overview

Self-Hosting Orsa

Run Orsa on your own infrastructure. Full control over your data, no usage limits, no vendor lock-in.

Self-Hosted vs Managed Cloud

FeatureSelf-HostedManaged Cloud (orsa.dev)
Data residencyYour servers, your rulesUS cloud regions
Usage limitsNone — limited only by your infraCredit-based billing
Custom proxiesBring any proxy providerPre-configured providers
LLM providersAny provider, including local (Ollama)OpenAI + Anthropic
Browser poolScale to any sizeShared pool with fair-use limits
UpdatesManual (pull + migrate)Automatic
SupportCommunity (GitHub Issues)Priority support
SSO / Audit logsFull access to Enterprise tablesEnterprise plan only

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Client / SDK                             │
│              (TypeScript, Python, MCP, cURL)                    │
└──────────────────────────┬──────────────────────────────────────┘
                           │ HTTPS
┌──────────────────────────▼──────────────────────────────────────┐
│                     API (Next.js on Vercel)                     │
│                     apps/api — /api/v1/*                        │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────────┐   │
│  │ Scraping │  │  Brand   │  │    AI     │  │  Screenshot  │   │
│  │ Routes   │  │  Routes  │  │  Routes   │  │   Routes     │   │
│  └────┬─────┘  └────┬─────┘  └─────┬─────┘  └──────┬───────┘   │
│       │              │              │               │           │
│  ┌────▼──────────────▼──────────────▼───────────────▼───────┐   │
│  │                  @orsa/core                               │   │
│  │  Extraction engine — scraping, brand pipeline, AI, etc.  │   │
│  └──────┬──────────────────┬────────────────────┬───────────┘   │
└─────────┼──────────────────┼────────────────────┼───────────────┘
          │                  │                    │
    ┌─────▼─────┐     ┌─────▼─────┐       ┌─────▼──────┐
    │ Supabase  │     │  Upstash  │       │  Browser   │
    │ (Postgres │     │  Redis    │       │  Worker    │
    │ + Auth +  │     │ (Cache +  │       │ (Playwright│
    │  Storage) │     │  Rate     │       │  on Fly.io)│
    └───────────┘     │  Limits)  │       └─────┬──────┘
                      └───────────┘             │
                                          ┌─────▼──────┐
                                          │ Proxy Pool │
                                          │ (DC/Resi/  │
                                          │  ISP)      │
                                          └────────────┘

    ┌─────────────┐     ┌──────────────┐     ┌──────────────┐
    │ Trigger.dev │     │ Cloudflare   │     │   Stripe     │
    │ (Queues —   │     │ R2 (Asset    │     │  (Billing +  │
    │  crawl jobs)│     │  CDN/Storage)│     │   Credits)   │
    └─────────────┘     └──────────────┘     └──────────────┘

Service Boundaries

ServiceRoleRuns On
API (apps/api)All /api/v1/* endpoints. Auth, rate limiting, credit deduction, request routing.Vercel (or any Node.js host)
Web (apps/web)Dashboard + marketing site. User management, API key creation, usage analytics.Vercel
Docs (apps/docs)Nextra documentation site (docs.orsa.dev in managed cloud).Vercel
Core (packages/core)Extraction engine. Scraping, brand pipeline, AI extraction, screenshots, classification. Shared library — not deployed independently.Bundled with API
DB (packages/db)Supabase client, generated types, query helpers.Bundled with API/Web
Browser Worker (services/browser-worker)Playwright browser pool. Renders pages, takes screenshots, executes JavaScript.Fly.io (or Docker)
Trigger Jobs (services/trigger)Background job definitions — full-site crawls, brand extraction queues, AI queries.Trigger.dev (cloud or self-hosted)

Prerequisites

Required

DependencyMinimum VersionPurpose
Node.js22.0+Runtime for API, Web, and all packages
pnpm9.0+Package manager (monorepo workspaces)
Docker24.0+Browser worker, Redis, local development
PostgreSQL16+Primary database (via Supabase)

Recommended

DependencyPurpose
Supabase CLILocal development, migrations, type generation
flyctlBrowser worker deployment to Fly.io
Vercel CLIAPI/Web/Docs deployment
Stripe CLIWebhook testing in development

System Requirements

API Server:

  • 1 vCPU, 1 GB RAM minimum
  • Scales horizontally (stateless)

Browser Worker:

  • 2 vCPU, 4 GB RAM minimum per instance
  • 512 MB shared memory (/dev/shm) for Chromium
  • Scales horizontally — each instance handles POOL_SIZE concurrent browsers

Database:

  • PostgreSQL 16+ with extensions: pgcrypto, pg_trgm, vector
  • 10 GB storage minimum for brand cache
  • Supabase (managed or self-hosted) recommended

Redis:

  • 256 MB RAM minimum
  • Persistent storage recommended (AOF enabled)

Next Steps