How It Works | VoxDiff

# How VoxDiff Works: Your Unbiased AI Comparison Engine

## The VoxDiff Vision

You submit a single question to multiple AI models at the exact same time. Their answers stream in side-by-side so you can see how they differ—not just in content, but in *tone*, *structure*, *confidence*, and even proximity to their safety guardrails. Our independent judge LLM then analyzes the divergence.

**The result?** You get genuinely different perspectives, untainted by one model's influence on another.

---

## The Core Innovation: Isolated Lanes

Traditional AI comparison tools send your prompt to one model, then another, then another—sequentially. By the time the second model responds, it has only the original prompt as context. VoxDiff breaks this pattern.

### What Isolation Means

Each AI model operates in its own **isolated lane**—completely unaware of what the others are doing:

- **No crosstalk** — Each model sees only the conversation history within its own lane, not the other models' responses
- **Simultaneous streaming** — All models respond at exactly the same time, so no model "sees" another's answer before giving its own
- **Deterministic reproduction** — If you run the same comparison twice with the same settings, you get the same responses (within model variance)

Think of it like a transparent box around each AI. The models can see their own thought process unfolding, but they can't peek at their neighbors. This is why VoxDiff comparisons are trustworthy—you're not comparing answers that have been influenced by each other.

---

## How the Comparison Flow Works

### 1. **Configuration** — You Choose the Variables
You select:
- **Which models** to compare (e.g., GPT-4o vs Claude Sonnet vs Gemini)
- **Temperature** — How creative should they be? (a single slider applies to all)
- **System prompt** — An optional persona or instruction (same for all lanes)

Server-side, we validate every choice—tier restrictions, model access, lane count limits. The browser never gets to override these.

### 2. **Streaming** — Real-Time Responses
When you hit "Send," your question travels to each model independently:

```
┌─────────────────────────────────────────────────┐
│ Your Question │
│ + System Prompt (if any) │
│ + Conversation History (per lane) │
└─────────────────────────────────────────────────┘
↓ (fanned out to N models)
┌──────┴──────┬──────────┬────────────┐
↓ ↓ ↓ ↓
GPT-4o Claude Gemini Grok
(Lane 1) (Lane 2) (Lane 3) (Lane 4)
↓ ↓ ↓ ↓
[stream] [stream] [stream] [stream]
│ │ │ │
└──────┬──────┴──────────┴────────────┘
↓
Your browser sees responses
arrive in real-time, side-by-side
```

Each model's tokens are counted as they arrive. Energy and cost estimates are calculated in real-time from those token counts.

### 3. **Analysis** — The Judge LLM**
After all lanes finish, a dedicated judge LLM reads all the responses and produces a structured analysis:

- **Consensus** — What did all models agree on?
- **Unique points** — What did each model say that others didn't?
- **Tone** — Is each response authoritative, exploratory, cautious, prescriptive, or conversational?
- **Structure** — Prose, lists, mixed, code-heavy?
- **Divergence score** — How different were the answers? (0 = identical, 100 = completely different)
- **Controversy radar** — A visual profile of each lane's proximity to guardrail edges (based on tone assertiveness and unique-point density)

### 4. **Multi-Turn** — Same Question, Different Angles
You can send follow-up messages. Each follow-up goes to all lanes simultaneously, using the full conversation history from that lane. The judge re-analyzes after each turn.

---

## Why This Design?

### **Unbiased Responses**
Because each model works in isolation, there's no winner-takes-all effect. OpenAI's answer doesn't "anchor" Claude's thinking. A dominant persona doesn't overshadow minority perspectives.

### **Real-Time Visibility**
Token counts and energy estimates stream in as the responses arrive. You see costs building in real-time, not as a post-hoc surprise. This transparency builds trust.

### **No Account Crossover**
Red Cup (free) comparisons are session-scoped. Open Bar (verified user) comparisons don't affect your subscription tier. Cash Bar purchases are one-time and don't reset. If you use a competitor's service tomorrow, your VoxDiff account isn't touched. **Your AI activity stays within VoxDiff's isolated environment.**

---

## The Server: Your Data's Isolation Box

All comparison data lives on VoxDiff servers in **encrypted storage**. Here's what that means:

### **Encryption in Motion**
Your prompts and responses are transmitted over HTTPS (standard encrypted web traffic).

### **Encryption at Rest**
Stored comparison turns, analysis data, and your token ledger are encrypted with AES-256-CBC. We hold the decryption keys—not you, not the LLM providers.

### **Your API Keys Are Never Shared**
VoxDiff sits between you and the LLM providers. OpenAI doesn't know your IP. Anthropic doesn't log that "you" made a request—it just sees our server making an API call. This is the closest thing to a **VPN for AI interactions**: you're routed through our servers, so provider APIs see only our origin.

### **No Account Linking**
VoxDiff doesn't sync with your OpenAI account, Anthropic account, or Google account. You don't log in with them; you sign up independently with VoxDiff. This isolation means:
- Anthropic's token usage dashboard shows only "VoxDiff API calls," not individual user queries
- OpenAI sees an aggregate bill from VoxDiff, not a detailed breakdown per person
- **Your AI comparison activity is invisible to the providers themselves**

---

## Token Accounting: Transparent & Atomic

Your token budget is the only thing that meters your usage. Every comparison costs tokens—the combined input + output tokens from all lanes, plus the judge LLM's analysis.

### **Pre-Flight Estimation**
Before streaming starts, we estimate the token cost and check your balance. If you don't have enough, we block the request with a clear message before spending anything.

### **Real-Time Counting**
As responses stream in, the actual token counts are pulled from each provider's own reports—not estimates. You see the real numbers accumulating in the UI.

### **Atomic Debit**
After your turn completes, the server debits the exact token count from your balance in a single atomic database operation. If two requests arrive simultaneously (a race condition), only one succeeds—preventing double-spends.

### **Immutable Ledger**
Every transaction—grant, debit, refund, admin adjustment—is logged in an append-only ledger. Rows are never updated or deleted. This creates an auditable, cryptographically-sound history of your account activity.

---

## Tier Isolation: Each Tier Lives in Its Own Box

| Tier | Access | Budget | Lanes | Premium Models | Renewal |
|---|---|---|---|---|---|
| **Red Cup** | Unregistered | 1M tokens/session | 2–3 | No | Never (ephemeral) |
| **Open Bar** | Verified email | 1M tokens/month | 2–3 | No | Auto-reset monthly |
| **Cash Bar** | One-time $5 | 1M tokens/purchase | 2–8 | Yes | Never (until spent) |
| **Run A Tab** | $3/month subscription | 1M tokens/month | 2–8 | Yes | Auto-reset monthly |

Each tier has its own token pool and rate limits. Spending your Red Cup budget doesn't touch your Open Bar budget. Canceling a subscription doesn't erase past Cash Bar purchases.

---

## Security Architecture at a Glance

### **Nonce System**
Every AJAX request includes a time-limited security token (nonce) that validates the request came from your authenticated session. This prevents CSRF (cross-site request forgery).

### **Stream Tokens**
Server-Side-Events (used for streaming) can't send traditional nonces. Instead, we generate one-time HMAC tokens that expire in 60 seconds. Each token is valid for a single stream, then deleted.

### **Rate Limiting**
Tier-based rate limits catch abuse (e.g., scripted submissions). Red Cup allows 20 comparisons/hour; Open Bar allows 60; paid tiers allow 200. These are secondary safeguards; the token budget is the primary economic gate.

### **Admin Overrides**
Site administrators can manually grant bonus tokens, adjust tiers, or exempt users from rate limits. All overrides are logged with the admin's user ID for audit purposes.

---

## Next Steps

Ready to dive deeper? Explore:
- **[Your Data: Encryption & Privacy](./2-data-privacy.md)** — Where your data lives and how we protect it
- **[How Tokens Work](./3-token-system.md)** — Token estimation, real-time counting, and atomic debits
- **[Tier & Payment System](./4-tier-system.md)** — How tiers, budgets, and payments interlock
- **[Technical Deep Dive](./5-architecture.md)** — For developers: the component model, database schema, and AJAX flows