Configuration

Complete reference for every environment variable, rate limiting, credit system, cache TTLs, and proxy escalation.

Environment Variables

All variables are set in .env.local (development) or your deployment platform’s environment settings (production).

Supabase (Required)

Variable	Type	Required	Default	Description
`NEXT_PUBLIC_SUPABASE_URL`	`string`	✅	—	Supabase project URL. Example: `https://xxxxx.supabase.co`
`NEXT_PUBLIC_SUPABASE_ANON_KEY`	`string`	✅	—	Supabase anonymous/public key. Safe to expose to the browser.
`SUPABASE_SERVICE_ROLE_KEY`	`string`	✅	—	Supabase service role key. Server-side only. Never expose to the client. Bypasses RLS.

Redis / Upstash (Required)

Variable	Type	Required	Default	Description
`UPSTASH_REDIS_REST_URL`	`string`	✅	—	Upstash Redis REST API URL. Example: `https://us1-xxxxx.upstash.io`
`UPSTASH_REDIS_REST_TOKEN`	`string`	✅	—	Upstash Redis REST API token.

Stripe (Required for Billing)

Variable	Type	Required	Default	Description
`STRIPE_SECRET_KEY`	`string`	✅	—	Stripe secret key. `sk_test_` for dev, `sk_live_` for production.
`STRIPE_WEBHOOK_SECRET`	`string`	✅	—	Stripe webhook signing secret. `whsec_...`
`NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY`	`string`	✅	—	Stripe publishable key. `pk_test_` or `pk_live_`. Safe for client-side.

LLM Providers

Variable	Type	Required	Default	Description
`OPENAI_API_KEY`	`string`	⚠️	—	OpenAI API key. Required for AI extraction features.
`OPENAI_BASE_URL`	`string`	❌	`https://api.openai.com/v1`	OpenAI-compatible base URL. Set for Ollama, vLLM, LiteLLM, etc.
`ANTHROPIC_API_KEY`	`string`	❌	—	Anthropic API key. Used as fallback or for specific tasks.
`LLM_DEFAULT_MODEL`	`string`	❌	`gpt-4o`	Default model for all AI tasks.
`LLM_EXTRACTION_MODEL`	`string`	❌	Value of `LLM_DEFAULT_MODEL`	Model for structured data extraction.
`LLM_CLASSIFICATION_MODEL`	`string`	❌	Value of `LLM_DEFAULT_MODEL`	Model for NAICS classification.
`LLM_QUERY_MODEL`	`string`	❌	Value of `LLM_DEFAULT_MODEL`	Model for free-form AI queries.

Browser Pool

Variable	Type	Required	Default	Description
`BROWSER_POOL_URL`	`string`	✅	—	URL of the browser worker. Example: `https://orsa-browser-pool.fly.dev` or `http://localhost:3002`
`FLY_API_TOKEN`	`string`	❌	—	Fly.io API token. Only needed if deploying browser worker to Fly.io.
`BROWSER_POOL_SIZE`	`number`	❌	`3`	Number of concurrent Chromium instances per worker.
`MAX_CONCURRENT_PAGES`	`number`	❌	`10`	Maximum open pages across all browser instances.
`PAGE_TIMEOUT`	`number`	❌	`30000`	Milliseconds before a page load times out.

Trigger.dev

Variable	Type	Required	Default	Description
`TRIGGER_SECRET_KEY`	`string`	⚠️	—	Trigger.dev project secret key. Required for crawl and batch jobs.
`TRIGGER_API_URL`	`string`	❌	`https://api.trigger.dev`	Trigger.dev API URL. Set for self-hosted Trigger.dev.

Cloudflare R2 / Storage

Variable	Type	Required	Default	Description
`CLOUDFLARE_R2_ACCESS_KEY`	`string`	⚠️	—	Cloudflare R2 access key ID.
`CLOUDFLARE_R2_SECRET_KEY`	`string`	⚠️	—	Cloudflare R2 secret access key.
`CLOUDFLARE_R2_ENDPOINT`	`string`	⚠️	—	R2 S3-compatible endpoint. Example: `https://xxxxx.r2.cloudflarestorage.com`
`CLOUDFLARE_R2_BUCKET`	`string`	⚠️	`orsa-assets`	R2 bucket name for stored assets.
`STORAGE_PROVIDER`	`string`	❌	`r2`	Storage backend: `r2`, `supabase`, or `s3`.

Proxy Providers

Variable	Type	Required	Default	Description
`PROXY_DATACENTER_URL`	`string`	❌	—	Datacenter proxy URL. Format: `http://user:pass@host:port`
`PROXY_RESIDENTIAL_URL`	`string`	❌	—	Residential proxy URL. Format: `http://user:pass@host:port`
`PROXY_ISP_URL`	`string`	❌	—	ISP proxy URL. Format: `http://user:pass@host:port`

Email

Variable	Type	Required	Default	Description
`RESEND_API_KEY`	`string`	❌	—	Resend API key for transactional email.

Application

Variable	Type	Required	Default	Description
`NEXT_PUBLIC_APP_URL`	`string`	✅	`http://localhost:3000`	Public URL of the web dashboard. Used for OAuth callbacks, email links.
`NEXT_PUBLIC_API_URL`	`string`	✅	`http://localhost:3001` (dev); `https://api.orsa.dev` (prod)	Public API origin for browser requests from `apps/web` when the API is on a subdomain (e.g. `api.orsa.dev`). No trailing slash.
`API_URL`	`string`	✅	`http://localhost:3001`	Internal API URL. Used for server-to-server communication, scripts, and load tests.
`ORSA_BASE_URL`	`string`	❌	—	MCP server (`@orsa.dev/mcp-server`): optional API origin; unset uses `https://api.orsa.dev`.
`NODE_ENV`	`string`	❌	`development`	`development`, `production`, or `test`.
`LOG_LEVEL`	`string`	❌	`info`	Logging level: `debug`, `info`, `warn`, `error`.

Rate Limiting

Orsa uses @upstash/ratelimit for API rate limiting. Limits are enforced per API key.

Default Limits

Plan	Requests/Minute	Requests/Day	Concurrent Crawls
Free	20	1,000	1
Starter	60	10,000	3
Pro	200	100,000	10
Enterprise	Custom	Custom	Custom

Configuration

Rate limits are defined in code (apps/api). To customize for self-hosting, modify the rate limit configuration:

// apps/api/src/middleware/rate-limit.ts (example)
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
 
const redis = new Redis({
  url: process.env.UPSTASH_REDIS_REST_URL!,
  token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});
 
// Sliding window rate limiter
export const rateLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(60, '1 m'),  // 60 req/min
  analytics: true,
  prefix: 'orsa:ratelimit',
});

Disabling Rate Limits

For self-hosted instances that don’t need rate limiting:

// Set a very high limit or bypass entirely
export const rateLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(100000, '1 m'),
  prefix: 'orsa:ratelimit',
});

Credit System

Every API request consumes credits. The credit system tracks usage and enforces limits.

Credit Costs by Endpoint

Endpoint	Credits	Notes
`GET /v1/web/scrape/template`	1	Single-page markdown / html / text
`GET /v1/web/scrape/images`	1	Image extraction
`GET /v1/web/scrape/sitemap`	1	Sitemap discovery
`GET /v1/brand/retrieve`	10	Full brand data (Context.dev shape)
`GET /v1/brand/retrieve-by-domain`	10	Alias of `/retrieve` (legacy dict shape)
`GET /v1/brand/retrieve-by-name`	2	Fuzzy name match + candidates
`GET /v1/brand/retrieve-by-email`	10	Lookup by business email
`GET /v1/brand/retrieve-by-ticker`	10	Lookup by stock ticker
`GET /v1/brand/retrieve-by-isin`	10	Lookup by ISIN
`GET /v1/brand/retrieve-simplified`	10	Minimal payload (title, logo, primaryColor, industries)
`GET /v1/brand/screenshot`	5	Inline base64 PNG of homepage
`GET /v1/brand/styleguide`	15	W3C-DTCG design tokens + DESIGN.md
`GET /v1/brand/fonts`	5	Font families + sources
`GET /v1/brand/naics`	5	Industry classification
`GET /v1/brand/transaction-identifier`	10	Match bank descriptors to brands
`POST /v1/brand/ai/query`	20	Natural language extraction
`GET /v1/brand/ai/products`	15	Tool-use enforced product list

Database Functions

Credits are managed via PostgreSQL functions (defined in supabase/migrations/00001_initial_schema.sql):

deduct_credits(user_id, amount, endpoint, request_id) — Atomic deduction with balance check. Returns (success, remaining_balance).
refund_credits(user_id, amount, reason, request_id) — Refund on failure.
check_balance(user_id) — Read current balance.

New User Credits

New users (via Supabase Auth) automatically receive 100 free credits via the handle_new_user() trigger. Modify this in the migration:

-- supabase/migrations/00001_initial_schema.sql
-- Change the starting credit balance:
INSERT INTO public.credit_balances (user_id, balance)
VALUES (NEW.id, 100);  -- Change 100 to your desired amount

Disabling Credits (Unlimited Usage)

For self-hosted instances that don’t need credit tracking, modify the deduction function to always succeed:

CREATE OR REPLACE FUNCTION deduct_credits(
    p_user_id UUID,
    p_amount INTEGER,
    p_endpoint VARCHAR,
    p_request_id UUID
)
RETURNS TABLE(success BOOLEAN, remaining_balance BIGINT) AS $$
BEGIN
    -- Self-hosted: always succeed, don't deduct
    RETURN QUERY SELECT true, 999999::BIGINT;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

Cache Configuration

Orsa caches brand data and scrape results in Redis to reduce redundant extraction.

Cache TTLs

Data Type	Default TTL	Env Override	Description
Brand data	7 days	`CACHE_TTL_BRAND`	Full brand extraction results
Simplified brand	7 days	`CACHE_TTL_BRAND_SIMPLE`	Simplified brand data
HTML scrape	1 hour	`CACHE_TTL_HTML`	Raw HTML results
Markdown scrape	1 hour	`CACHE_TTL_MARKDOWN`	Markdown conversion results
Screenshot	24 hours	`CACHE_TTL_SCREENSHOT`	Screenshot images
Sitemap	24 hours	`CACHE_TTL_SITEMAP`	Sitemap parse results
NAICS	30 days	`CACHE_TTL_NAICS`	NAICS classification results
Fonts	7 days	`CACHE_TTL_FONTS`	Font detection results

All TTL values are in seconds. Example:

CACHE_TTL_BRAND=604800       # 7 days (default)
CACHE_TTL_HTML=3600          # 1 hour (default)
CACHE_TTL_SCREENSHOT=86400   # 24 hours (default)

Cache Key Format

orsa:cache:{endpoint}:{hash(params)}

Example: orsa:cache:brand:retrieve:a1b2c3d4 where the hash is derived from the normalized domain.

Cache Bypass

Clients can bypass the cache by passing cache=false as a query parameter:

curl "https://api.orsa.dev/v1/brand/retrieve?domain=stripe.com&cache=false" \
  -H "Authorization: Bearer YOUR_KEY"

This still writes to cache but doesn’t read from it.

Disabling Cache

Set all TTLs to 0 to disable caching entirely (not recommended for production):

CACHE_TTL_BRAND=0
CACHE_TTL_HTML=0
CACHE_TTL_SCREENSHOT=0

Proxy Escalation

When a request fails or gets blocked, Orsa automatically escalates through proxy tiers.

Escalation Order

1. No proxy (direct request)
   ↓ on failure (403, 429, timeout)
2. Datacenter proxy (PROXY_DATACENTER_URL)
   ↓ on failure
3. Residential proxy (PROXY_RESIDENTIAL_URL)
   ↓ on failure
4. ISP proxy (PROXY_ISP_URL)
   ↓ on failure
5. Return error to client

Configuration

# Enable/disable proxy escalation
PROXY_ESCALATION_ENABLED=true
 
# Skip direct request, always start with datacenter
PROXY_SKIP_DIRECT=false
 
# Maximum retries per tier before escalating
PROXY_MAX_RETRIES_PER_TIER=2
 
# Timeout per request (ms) before considering it failed
PROXY_REQUEST_TIMEOUT=15000
 
# HTTP status codes that trigger escalation
PROXY_ESCALATION_STATUS_CODES=403,429,503,520,521,522,523,524

Disabling Proxies

If you don’t need proxies (e.g., scraping only your own domains):

PROXY_ESCALATION_ENABLED=false

Or simply don’t set any PROXY_*_URL variables — Orsa will make direct requests only.

Full .env.example

# ─── Supabase (Required) ────────────────────────────────────
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
 
# ─── Upstash Redis (Required) ───────────────────────────────
UPSTASH_REDIS_REST_URL=https://your-redis.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-token
 
# ─── Stripe (Required for Billing) ──────────────────────────
STRIPE_SECRET_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...
 
# ─── LLM Providers ──────────────────────────────────────────
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_BASE_URL=http://localhost:11434/v1  # For Ollama/local
# LLM_DEFAULT_MODEL=gpt-4o
# LLM_EXTRACTION_MODEL=gpt-4o
# LLM_CLASSIFICATION_MODEL=gpt-4o-mini
# LLM_QUERY_MODEL=gpt-4o
 
# ─── Browser Pool ───────────────────────────────────────────
BROWSER_POOL_URL=http://localhost:3002
FLY_API_TOKEN=your-fly-token
# BROWSER_POOL_SIZE=3
# MAX_CONCURRENT_PAGES=10
# PAGE_TIMEOUT=30000
 
# ─── Trigger.dev ─────────────────────────────────────────────
TRIGGER_SECRET_KEY=tr_dev_...
TRIGGER_API_URL=https://api.trigger.dev
 
# ─── Cloudflare R2 ──────────────────────────────────────────
CLOUDFLARE_R2_ACCESS_KEY=your-access-key
CLOUDFLARE_R2_SECRET_KEY=your-secret-key
CLOUDFLARE_R2_ENDPOINT=https://your-account.r2.cloudflarestorage.com
CLOUDFLARE_R2_BUCKET=orsa-assets
# STORAGE_PROVIDER=r2  # r2, supabase, or s3
 
# ─── Proxy Providers ────────────────────────────────────────
PROXY_DATACENTER_URL=http://user:pass@dc-proxy:port
PROXY_RESIDENTIAL_URL=http://user:pass@res-proxy:port
PROXY_ISP_URL=http://user:pass@isp-proxy:port
# PROXY_ESCALATION_ENABLED=true
# PROXY_SKIP_DIRECT=false
# PROXY_MAX_RETRIES_PER_TIER=2
# PROXY_REQUEST_TIMEOUT=15000
 
# ─── Email ───────────────────────────────────────────────────
# RESEND_API_KEY=re_...
 
# ─── Application ─────────────────────────────────────────────
NEXT_PUBLIC_APP_URL=http://localhost:3000
API_URL=http://localhost:3001
# NODE_ENV=development
# LOG_LEVEL=info
 
# ─── Cache TTLs (seconds) ───────────────────────────────────
# CACHE_TTL_BRAND=604800
# CACHE_TTL_BRAND_SIMPLE=604800
# CACHE_TTL_HTML=3600
# CACHE_TTL_MARKDOWN=3600
# CACHE_TTL_SCREENSHOT=86400
# CACHE_TTL_SITEMAP=86400
# CACHE_TTL_NAICS=2592000
# CACHE_TTL_FONTS=604800

External Providers Upgrading