Configuration
Complete reference for every environment variable, rate limiting, credit system, cache TTLs, and proxy escalation.
Environment Variables
All variables are set in .env.local (development) or your deployment platform's environment settings (production).
Supabase (Required)
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
NEXT_PUBLIC_SUPABASE_URL | string | ✅ | — | Supabase project URL. Example: https://xxxxx.supabase.co |
NEXT_PUBLIC_SUPABASE_ANON_KEY | string | ✅ | — | Supabase anonymous/public key. Safe to expose to the browser. |
SUPABASE_SERVICE_ROLE_KEY | string | ✅ | — | Supabase service role key. Server-side only. Never expose to the client. Bypasses RLS. |
Redis / Upstash (Required)
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
UPSTASH_REDIS_REST_URL | string | ✅ | — | Upstash Redis REST API URL. Example: https://us1-xxxxx.upstash.io |
UPSTASH_REDIS_REST_TOKEN | string | ✅ | — | Upstash Redis REST API token. |
Stripe (Required for Billing)
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
STRIPE_SECRET_KEY | string | ✅ | — | Stripe secret key. sk_test_ for dev, sk_live_ for production. |
STRIPE_WEBHOOK_SECRET | string | ✅ | — | Stripe webhook signing secret. whsec_... |
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY | string | ✅ | — | Stripe publishable key. pk_test_ or pk_live_. Safe for client-side. |
LLM Providers
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
OPENAI_API_KEY | string | ⚠️ | — | OpenAI API key. Required for AI extraction features. |
OPENAI_BASE_URL | string | ❌ | https://api.openai.com/v1 | OpenAI-compatible base URL. Set for Ollama, vLLM, LiteLLM, etc. |
ANTHROPIC_API_KEY | string | ❌ | — | Anthropic API key. Used as fallback or for specific tasks. |
LLM_DEFAULT_MODEL | string | ❌ | gpt-4o | Default model for all AI tasks. |
LLM_EXTRACTION_MODEL | string | ❌ | Value of LLM_DEFAULT_MODEL | Model for structured data extraction. |
LLM_CLASSIFICATION_MODEL | string | ❌ | Value of LLM_DEFAULT_MODEL | Model for NAICS classification. |
LLM_QUERY_MODEL | string | ❌ | Value of LLM_DEFAULT_MODEL | Model for free-form AI queries. |
Browser Pool
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
BROWSER_POOL_URL | string | ✅ | — | URL of the browser worker. Example: https://orsa-browser-pool.fly.dev or http://localhost:3002 |
FLY_API_TOKEN | string | ❌ | — | Fly.io API token. Only needed if deploying browser worker to Fly.io. |
BROWSER_POOL_SIZE | number | ❌ | 3 | Number of concurrent Chromium instances per worker. |
MAX_CONCURRENT_PAGES | number | ❌ | 10 | Maximum open pages across all browser instances. |
PAGE_TIMEOUT | number | ❌ | 30000 | Milliseconds before a page load times out. |
Trigger.dev
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
TRIGGER_SECRET_KEY | string | ⚠️ | — | Trigger.dev project secret key. Required for crawl and batch jobs. |
TRIGGER_API_URL | string | ❌ | https://api.trigger.dev | Trigger.dev API URL. Set for self-hosted Trigger.dev. |
Cloudflare R2 / Storage
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
CLOUDFLARE_R2_ACCESS_KEY | string | ⚠️ | — | Cloudflare R2 access key ID. |
CLOUDFLARE_R2_SECRET_KEY | string | ⚠️ | — | Cloudflare R2 secret access key. |
CLOUDFLARE_R2_ENDPOINT | string | ⚠️ | — | R2 S3-compatible endpoint. Example: https://xxxxx.r2.cloudflarestorage.com |
CLOUDFLARE_R2_BUCKET | string | ⚠️ | orsa-assets | R2 bucket name for stored assets. |
STORAGE_PROVIDER | string | ❌ | r2 | Storage backend: r2, supabase, or s3. |
Proxy Providers
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
PROXY_DATACENTER_URL | string | ❌ | — | Datacenter proxy URL. Format: http://user:pass@host:port |
PROXY_RESIDENTIAL_URL | string | ❌ | — | Residential proxy URL. Format: http://user:pass@host:port |
PROXY_ISP_URL | string | ❌ | — | ISP proxy URL. Format: http://user:pass@host:port |
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
RESEND_API_KEY | string | ❌ | — | Resend API key for transactional email. |
Application
| Variable | Type | Required | Default | Description |
|---|---|---|---|---|
NEXT_PUBLIC_APP_URL | string | ✅ | http://localhost:3000 | Public URL of the web dashboard. Used for OAuth callbacks, email links. |
NEXT_PUBLIC_API_URL | string | ✅ | http://localhost:3001 (dev); https://api.orsa.dev (prod) | Public API origin for browser requests from apps/web when the API is on a subdomain (e.g. api.orsa.dev). No trailing slash. |
API_URL | string | ✅ | http://localhost:3001 | Internal API URL. Used for server-to-server communication, scripts, and load tests. |
ORSA_BASE_URL | string | ❌ | — | MCP server (@orsa-dev/mcp-server): optional API origin; unset uses https://api.orsa.dev. |
NODE_ENV | string | ❌ | development | development, production, or test. |
LOG_LEVEL | string | ❌ | info | Logging level: debug, info, warn, error. |
Rate Limiting
Orsa uses @upstash/ratelimit for API rate limiting. Limits are enforced per API key.
Default Limits
| Plan | Requests/Minute | Requests/Day | Concurrent Crawls |
|---|---|---|---|
| Free | 20 | 1,000 | 1 |
| Starter | 60 | 10,000 | 3 |
| Pro | 200 | 100,000 | 10 |
| Enterprise | Custom | Custom | Custom |
Configuration
Rate limits are defined in code (apps/api). To customize for self-hosting, modify the rate limit configuration:
// apps/api/src/middleware/rate-limit.ts (example)
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const redis = new Redis({
url: process.env.UPSTASH_REDIS_REST_URL!,
token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});
// Sliding window rate limiter
export const rateLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(60, '1 m'), // 60 req/min
analytics: true,
prefix: 'orsa:ratelimit',
});Disabling Rate Limits
For self-hosted instances that don't need rate limiting:
// Set a very high limit or bypass entirely
export const rateLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(100000, '1 m'),
prefix: 'orsa:ratelimit',
});Credit System
Every API request consumes credits. The credit system tracks usage and enforces limits.
Credit Costs by Endpoint
| Endpoint | Credits | Notes |
|---|---|---|
GET /v1/web/scrape/* | 1 | HTML, Markdown, images, sitemap |
POST /v1/web/crawl | 1/page | Charged per page crawled |
GET /v1/brand/retrieve | 10 | Full brand data |
GET /v1/brand/retrieve-by-name | 10 | Brand lookup by name |
GET /v1/brand/retrieve-by-email | 10 | Brand lookup by email |
GET /v1/brand/retrieve-by-ticker | 10 | Brand lookup by ticker |
GET /v1/brand/retrieve-simplified | 5 | Simplified brand data |
GET /v1/brand/screenshot | 10 | Website screenshot |
GET /v1/brand/styleguide | 10 | Design system extraction |
GET /v1/brand/fonts | 5 | Font detection |
GET /v1/brand/naics | 5 | NAICS classification |
GET /v1/brand/transaction-identifier | 10 | Merchant identification |
POST /v1/brand/ai/query | 10 | AI-powered extraction |
GET /v1/brand/ai/products | 10 | Product extraction |
GET /v1/brand/ai/product | 10 | Single product extraction |
POST /v1/brand/prefetch | 0 | Free — warms cache |
Database Functions
Credits are managed via PostgreSQL functions (defined in supabase/migrations/00001_initial_schema.sql):
deduct_credits(user_id, amount, endpoint, request_id)— Atomic deduction with balance check. Returns(success, remaining_balance).refund_credits(user_id, amount, reason, request_id)— Refund on failure.check_balance(user_id)— Read current balance.
New User Credits
New users (via Supabase Auth) automatically receive 100 free credits via the handle_new_user() trigger. Modify this in the migration:
-- supabase/migrations/00001_initial_schema.sql
-- Change the starting credit balance:
INSERT INTO public.credit_balances (user_id, balance)
VALUES (NEW.id, 100); -- Change 100 to your desired amountDisabling Credits (Unlimited Usage)
For self-hosted instances that don't need credit tracking, modify the deduction function to always succeed:
CREATE OR REPLACE FUNCTION deduct_credits(
p_user_id UUID,
p_amount INTEGER,
p_endpoint VARCHAR,
p_request_id UUID
)
RETURNS TABLE(success BOOLEAN, remaining_balance BIGINT) AS $$
BEGIN
-- Self-hosted: always succeed, don't deduct
RETURN QUERY SELECT true, 999999::BIGINT;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;Cache Configuration
Orsa caches brand data and scrape results in Redis to reduce redundant extraction.
Cache TTLs
| Data Type | Default TTL | Env Override | Description |
|---|---|---|---|
| Brand data | 7 days | CACHE_TTL_BRAND | Full brand extraction results |
| Simplified brand | 7 days | CACHE_TTL_BRAND_SIMPLE | Simplified brand data |
| HTML scrape | 1 hour | CACHE_TTL_HTML | Raw HTML results |
| Markdown scrape | 1 hour | CACHE_TTL_MARKDOWN | Markdown conversion results |
| Screenshot | 24 hours | CACHE_TTL_SCREENSHOT | Screenshot images |
| Sitemap | 24 hours | CACHE_TTL_SITEMAP | Sitemap parse results |
| NAICS | 30 days | CACHE_TTL_NAICS | NAICS classification results |
| Fonts | 7 days | CACHE_TTL_FONTS | Font detection results |
All TTL values are in seconds. Example:
CACHE_TTL_BRAND=604800 # 7 days (default)
CACHE_TTL_HTML=3600 # 1 hour (default)
CACHE_TTL_SCREENSHOT=86400 # 24 hours (default)Cache Key Format
orsa:cache:{endpoint}:{hash(params)}Example: orsa:cache:brand:retrieve:a1b2c3d4 where the hash is derived from the normalized domain.
Cache Bypass
Clients can bypass the cache by passing cache=false as a query parameter:
curl "https://api.orsa.dev/v1/brand/retrieve?domain=stripe.com&cache=false" \
-H "Authorization: Bearer YOUR_KEY"This still writes to cache but doesn't read from it.
Disabling Cache
Set all TTLs to 0 to disable caching entirely (not recommended for production):
CACHE_TTL_BRAND=0
CACHE_TTL_HTML=0
CACHE_TTL_SCREENSHOT=0Proxy Escalation
When a request fails or gets blocked, Orsa automatically escalates through proxy tiers.
Escalation Order
1. No proxy (direct request)
↓ on failure (403, 429, timeout)
2. Datacenter proxy (PROXY_DATACENTER_URL)
↓ on failure
3. Residential proxy (PROXY_RESIDENTIAL_URL)
↓ on failure
4. ISP proxy (PROXY_ISP_URL)
↓ on failure
5. Return error to clientConfiguration
# Enable/disable proxy escalation
PROXY_ESCALATION_ENABLED=true
# Skip direct request, always start with datacenter
PROXY_SKIP_DIRECT=false
# Maximum retries per tier before escalating
PROXY_MAX_RETRIES_PER_TIER=2
# Timeout per request (ms) before considering it failed
PROXY_REQUEST_TIMEOUT=15000
# HTTP status codes that trigger escalation
PROXY_ESCALATION_STATUS_CODES=403,429,503,520,521,522,523,524Disabling Proxies
If you don't need proxies (e.g., scraping only your own domains):
PROXY_ESCALATION_ENABLED=falseOr simply don't set any PROXY_*_URL variables — Orsa will make direct requests only.
Full .env.example
# ─── Supabase (Required) ────────────────────────────────────
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
# ─── Upstash Redis (Required) ───────────────────────────────
UPSTASH_REDIS_REST_URL=https://your-redis.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-token
# ─── Stripe (Required for Billing) ──────────────────────────
STRIPE_SECRET_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...
# ─── LLM Providers ──────────────────────────────────────────
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_BASE_URL=http://localhost:11434/v1 # For Ollama/local
# LLM_DEFAULT_MODEL=gpt-4o
# LLM_EXTRACTION_MODEL=gpt-4o
# LLM_CLASSIFICATION_MODEL=gpt-4o-mini
# LLM_QUERY_MODEL=gpt-4o
# ─── Browser Pool ───────────────────────────────────────────
BROWSER_POOL_URL=http://localhost:3002
FLY_API_TOKEN=your-fly-token
# BROWSER_POOL_SIZE=3
# MAX_CONCURRENT_PAGES=10
# PAGE_TIMEOUT=30000
# ─── Trigger.dev ─────────────────────────────────────────────
TRIGGER_SECRET_KEY=tr_dev_...
TRIGGER_API_URL=https://api.trigger.dev
# ─── Cloudflare R2 ──────────────────────────────────────────
CLOUDFLARE_R2_ACCESS_KEY=your-access-key
CLOUDFLARE_R2_SECRET_KEY=your-secret-key
CLOUDFLARE_R2_ENDPOINT=https://your-account.r2.cloudflarestorage.com
CLOUDFLARE_R2_BUCKET=orsa-assets
# STORAGE_PROVIDER=r2 # r2, supabase, or s3
# ─── Proxy Providers ────────────────────────────────────────
PROXY_DATACENTER_URL=http://user:pass@dc-proxy:port
PROXY_RESIDENTIAL_URL=http://user:pass@res-proxy:port
PROXY_ISP_URL=http://user:pass@isp-proxy:port
# PROXY_ESCALATION_ENABLED=true
# PROXY_SKIP_DIRECT=false
# PROXY_MAX_RETRIES_PER_TIER=2
# PROXY_REQUEST_TIMEOUT=15000
# ─── Email ───────────────────────────────────────────────────
# RESEND_API_KEY=re_...
# ─── Application ─────────────────────────────────────────────
NEXT_PUBLIC_APP_URL=http://localhost:3000
API_URL=http://localhost:3001
# NODE_ENV=development
# LOG_LEVEL=info
# ─── Cache TTLs (seconds) ───────────────────────────────────
# CACHE_TTL_BRAND=604800
# CACHE_TTL_BRAND_SIMPLE=604800
# CACHE_TTL_HTML=3600
# CACHE_TTL_MARKDOWN=3600
# CACHE_TTL_SCREENSHOT=86400
# CACHE_TTL_SITEMAP=86400
# CACHE_TTL_NAICS=2592000
# CACHE_TTL_FONTS=604800