self-hosting
Configuration

Configuration

Complete reference for every environment variable, rate limiting, credit system, cache TTLs, and proxy escalation.

Environment Variables

All variables are set in .env.local (development) or your deployment platform's environment settings (production).

Supabase (Required)

VariableTypeRequiredDefaultDescription
NEXT_PUBLIC_SUPABASE_URLstringSupabase project URL. Example: https://xxxxx.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEYstringSupabase anonymous/public key. Safe to expose to the browser.
SUPABASE_SERVICE_ROLE_KEYstringSupabase service role key. Server-side only. Never expose to the client. Bypasses RLS.

Redis / Upstash (Required)

VariableTypeRequiredDefaultDescription
UPSTASH_REDIS_REST_URLstringUpstash Redis REST API URL. Example: https://us1-xxxxx.upstash.io
UPSTASH_REDIS_REST_TOKENstringUpstash Redis REST API token.

Stripe (Required for Billing)

VariableTypeRequiredDefaultDescription
STRIPE_SECRET_KEYstringStripe secret key. sk_test_ for dev, sk_live_ for production.
STRIPE_WEBHOOK_SECRETstringStripe webhook signing secret. whsec_...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEYstringStripe publishable key. pk_test_ or pk_live_. Safe for client-side.

LLM Providers

VariableTypeRequiredDefaultDescription
OPENAI_API_KEYstring⚠️OpenAI API key. Required for AI extraction features.
OPENAI_BASE_URLstringhttps://api.openai.com/v1OpenAI-compatible base URL. Set for Ollama, vLLM, LiteLLM, etc.
ANTHROPIC_API_KEYstringAnthropic API key. Used as fallback or for specific tasks.
LLM_DEFAULT_MODELstringgpt-4oDefault model for all AI tasks.
LLM_EXTRACTION_MODELstringValue of LLM_DEFAULT_MODELModel for structured data extraction.
LLM_CLASSIFICATION_MODELstringValue of LLM_DEFAULT_MODELModel for NAICS classification.
LLM_QUERY_MODELstringValue of LLM_DEFAULT_MODELModel for free-form AI queries.

Browser Pool

VariableTypeRequiredDefaultDescription
BROWSER_POOL_URLstringURL of the browser worker. Example: https://orsa-browser-pool.fly.dev or http://localhost:3002
FLY_API_TOKENstringFly.io API token. Only needed if deploying browser worker to Fly.io.
BROWSER_POOL_SIZEnumber3Number of concurrent Chromium instances per worker.
MAX_CONCURRENT_PAGESnumber10Maximum open pages across all browser instances.
PAGE_TIMEOUTnumber30000Milliseconds before a page load times out.

Trigger.dev

VariableTypeRequiredDefaultDescription
TRIGGER_SECRET_KEYstring⚠️Trigger.dev project secret key. Required for crawl and batch jobs.
TRIGGER_API_URLstringhttps://api.trigger.devTrigger.dev API URL. Set for self-hosted Trigger.dev.

Cloudflare R2 / Storage

VariableTypeRequiredDefaultDescription
CLOUDFLARE_R2_ACCESS_KEYstring⚠️Cloudflare R2 access key ID.
CLOUDFLARE_R2_SECRET_KEYstring⚠️Cloudflare R2 secret access key.
CLOUDFLARE_R2_ENDPOINTstring⚠️R2 S3-compatible endpoint. Example: https://xxxxx.r2.cloudflarestorage.com
CLOUDFLARE_R2_BUCKETstring⚠️orsa-assetsR2 bucket name for stored assets.
STORAGE_PROVIDERstringr2Storage backend: r2, supabase, or s3.

Proxy Providers

VariableTypeRequiredDefaultDescription
PROXY_DATACENTER_URLstringDatacenter proxy URL. Format: http://user:pass@host:port
PROXY_RESIDENTIAL_URLstringResidential proxy URL. Format: http://user:pass@host:port
PROXY_ISP_URLstringISP proxy URL. Format: http://user:pass@host:port

Email

VariableTypeRequiredDefaultDescription
RESEND_API_KEYstringResend API key for transactional email.

Application

VariableTypeRequiredDefaultDescription
NEXT_PUBLIC_APP_URLstringhttp://localhost:3000Public URL of the web dashboard. Used for OAuth callbacks, email links.
NEXT_PUBLIC_API_URLstringhttp://localhost:3001 (dev); https://api.orsa.dev (prod)Public API origin for browser requests from apps/web when the API is on a subdomain (e.g. api.orsa.dev). No trailing slash.
API_URLstringhttp://localhost:3001Internal API URL. Used for server-to-server communication, scripts, and load tests.
ORSA_BASE_URLstringMCP server (@orsa-dev/mcp-server): optional API origin; unset uses https://api.orsa.dev.
NODE_ENVstringdevelopmentdevelopment, production, or test.
LOG_LEVELstringinfoLogging level: debug, info, warn, error.

Rate Limiting

Orsa uses @upstash/ratelimit for API rate limiting. Limits are enforced per API key.

Default Limits

PlanRequests/MinuteRequests/DayConcurrent Crawls
Free201,0001
Starter6010,0003
Pro200100,00010
EnterpriseCustomCustomCustom

Configuration

Rate limits are defined in code (apps/api). To customize for self-hosting, modify the rate limit configuration:

// apps/api/src/middleware/rate-limit.ts (example)
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
 
const redis = new Redis({
  url: process.env.UPSTASH_REDIS_REST_URL!,
  token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});
 
// Sliding window rate limiter
export const rateLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(60, '1 m'),  // 60 req/min
  analytics: true,
  prefix: 'orsa:ratelimit',
});

Disabling Rate Limits

For self-hosted instances that don't need rate limiting:

// Set a very high limit or bypass entirely
export const rateLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(100000, '1 m'),
  prefix: 'orsa:ratelimit',
});

Credit System

Every API request consumes credits. The credit system tracks usage and enforces limits.

Credit Costs by Endpoint

EndpointCreditsNotes
GET /v1/web/scrape/*1HTML, Markdown, images, sitemap
POST /v1/web/crawl1/pageCharged per page crawled
GET /v1/brand/retrieve10Full brand data
GET /v1/brand/retrieve-by-name10Brand lookup by name
GET /v1/brand/retrieve-by-email10Brand lookup by email
GET /v1/brand/retrieve-by-ticker10Brand lookup by ticker
GET /v1/brand/retrieve-simplified5Simplified brand data
GET /v1/brand/screenshot10Website screenshot
GET /v1/brand/styleguide10Design system extraction
GET /v1/brand/fonts5Font detection
GET /v1/brand/naics5NAICS classification
GET /v1/brand/transaction-identifier10Merchant identification
POST /v1/brand/ai/query10AI-powered extraction
GET /v1/brand/ai/products10Product extraction
GET /v1/brand/ai/product10Single product extraction
POST /v1/brand/prefetch0Free — warms cache

Database Functions

Credits are managed via PostgreSQL functions (defined in supabase/migrations/00001_initial_schema.sql):

  • deduct_credits(user_id, amount, endpoint, request_id) — Atomic deduction with balance check. Returns (success, remaining_balance).
  • refund_credits(user_id, amount, reason, request_id) — Refund on failure.
  • check_balance(user_id) — Read current balance.

New User Credits

New users (via Supabase Auth) automatically receive 100 free credits via the handle_new_user() trigger. Modify this in the migration:

-- supabase/migrations/00001_initial_schema.sql
-- Change the starting credit balance:
INSERT INTO public.credit_balances (user_id, balance)
VALUES (NEW.id, 100);  -- Change 100 to your desired amount

Disabling Credits (Unlimited Usage)

For self-hosted instances that don't need credit tracking, modify the deduction function to always succeed:

CREATE OR REPLACE FUNCTION deduct_credits(
    p_user_id UUID,
    p_amount INTEGER,
    p_endpoint VARCHAR,
    p_request_id UUID
)
RETURNS TABLE(success BOOLEAN, remaining_balance BIGINT) AS $$
BEGIN
    -- Self-hosted: always succeed, don't deduct
    RETURN QUERY SELECT true, 999999::BIGINT;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

Cache Configuration

Orsa caches brand data and scrape results in Redis to reduce redundant extraction.

Cache TTLs

Data TypeDefault TTLEnv OverrideDescription
Brand data7 daysCACHE_TTL_BRANDFull brand extraction results
Simplified brand7 daysCACHE_TTL_BRAND_SIMPLESimplified brand data
HTML scrape1 hourCACHE_TTL_HTMLRaw HTML results
Markdown scrape1 hourCACHE_TTL_MARKDOWNMarkdown conversion results
Screenshot24 hoursCACHE_TTL_SCREENSHOTScreenshot images
Sitemap24 hoursCACHE_TTL_SITEMAPSitemap parse results
NAICS30 daysCACHE_TTL_NAICSNAICS classification results
Fonts7 daysCACHE_TTL_FONTSFont detection results

All TTL values are in seconds. Example:

CACHE_TTL_BRAND=604800       # 7 days (default)
CACHE_TTL_HTML=3600          # 1 hour (default)
CACHE_TTL_SCREENSHOT=86400   # 24 hours (default)

Cache Key Format

orsa:cache:{endpoint}:{hash(params)}

Example: orsa:cache:brand:retrieve:a1b2c3d4 where the hash is derived from the normalized domain.

Cache Bypass

Clients can bypass the cache by passing cache=false as a query parameter:

curl "https://api.orsa.dev/v1/brand/retrieve?domain=stripe.com&cache=false" \
  -H "Authorization: Bearer YOUR_KEY"

This still writes to cache but doesn't read from it.

Disabling Cache

Set all TTLs to 0 to disable caching entirely (not recommended for production):

CACHE_TTL_BRAND=0
CACHE_TTL_HTML=0
CACHE_TTL_SCREENSHOT=0

Proxy Escalation

When a request fails or gets blocked, Orsa automatically escalates through proxy tiers.

Escalation Order

1. No proxy (direct request)
   ↓ on failure (403, 429, timeout)
2. Datacenter proxy (PROXY_DATACENTER_URL)
   ↓ on failure
3. Residential proxy (PROXY_RESIDENTIAL_URL)
   ↓ on failure
4. ISP proxy (PROXY_ISP_URL)
   ↓ on failure
5. Return error to client

Configuration

# Enable/disable proxy escalation
PROXY_ESCALATION_ENABLED=true
 
# Skip direct request, always start with datacenter
PROXY_SKIP_DIRECT=false
 
# Maximum retries per tier before escalating
PROXY_MAX_RETRIES_PER_TIER=2
 
# Timeout per request (ms) before considering it failed
PROXY_REQUEST_TIMEOUT=15000
 
# HTTP status codes that trigger escalation
PROXY_ESCALATION_STATUS_CODES=403,429,503,520,521,522,523,524

Disabling Proxies

If you don't need proxies (e.g., scraping only your own domains):

PROXY_ESCALATION_ENABLED=false

Or simply don't set any PROXY_*_URL variables — Orsa will make direct requests only.


Full .env.example

# ─── Supabase (Required) ────────────────────────────────────
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
 
# ─── Upstash Redis (Required) ───────────────────────────────
UPSTASH_REDIS_REST_URL=https://your-redis.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-token
 
# ─── Stripe (Required for Billing) ──────────────────────────
STRIPE_SECRET_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...
 
# ─── LLM Providers ──────────────────────────────────────────
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_BASE_URL=http://localhost:11434/v1  # For Ollama/local
# LLM_DEFAULT_MODEL=gpt-4o
# LLM_EXTRACTION_MODEL=gpt-4o
# LLM_CLASSIFICATION_MODEL=gpt-4o-mini
# LLM_QUERY_MODEL=gpt-4o
 
# ─── Browser Pool ───────────────────────────────────────────
BROWSER_POOL_URL=http://localhost:3002
FLY_API_TOKEN=your-fly-token
# BROWSER_POOL_SIZE=3
# MAX_CONCURRENT_PAGES=10
# PAGE_TIMEOUT=30000
 
# ─── Trigger.dev ─────────────────────────────────────────────
TRIGGER_SECRET_KEY=tr_dev_...
TRIGGER_API_URL=https://api.trigger.dev
 
# ─── Cloudflare R2 ──────────────────────────────────────────
CLOUDFLARE_R2_ACCESS_KEY=your-access-key
CLOUDFLARE_R2_SECRET_KEY=your-secret-key
CLOUDFLARE_R2_ENDPOINT=https://your-account.r2.cloudflarestorage.com
CLOUDFLARE_R2_BUCKET=orsa-assets
# STORAGE_PROVIDER=r2  # r2, supabase, or s3
 
# ─── Proxy Providers ────────────────────────────────────────
PROXY_DATACENTER_URL=http://user:pass@dc-proxy:port
PROXY_RESIDENTIAL_URL=http://user:pass@res-proxy:port
PROXY_ISP_URL=http://user:pass@isp-proxy:port
# PROXY_ESCALATION_ENABLED=true
# PROXY_SKIP_DIRECT=false
# PROXY_MAX_RETRIES_PER_TIER=2
# PROXY_REQUEST_TIMEOUT=15000
 
# ─── Email ───────────────────────────────────────────────────
# RESEND_API_KEY=re_...
 
# ─── Application ─────────────────────────────────────────────
NEXT_PUBLIC_APP_URL=http://localhost:3000
API_URL=http://localhost:3001
# NODE_ENV=development
# LOG_LEVEL=info
 
# ─── Cache TTLs (seconds) ───────────────────────────────────
# CACHE_TTL_BRAND=604800
# CACHE_TTL_BRAND_SIMPLE=604800
# CACHE_TTL_HTML=3600
# CACHE_TTL_MARKDOWN=3600
# CACHE_TTL_SCREENSHOT=86400
# CACHE_TTL_SITEMAP=86400
# CACHE_TTL_NAICS=2592000
# CACHE_TTL_FONTS=604800