Documentation

Get started with Multiverse in a few minutes.

Install

npm install @virtualkitchenco/multiverse-sdk

Configure

Point the SDK at your Multiverse server. LLM API keys are read from environment variables on the server.

import { multiverse } from '@virtualkitchenco/multiverse-sdk';

multiverse.configure({
  baseUrl: 'http://localhost:3000',
  llm: { provider: 'anthropic' },  // Uses ANTHROPIC_API_KEY from server env
});

Set ANTHROPIC_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY on your server.

Wrap your tools

Wrap your agent's tools so Multiverse can intercept calls during testing and return simulated responses.

import { wrap } from '@virtualkitchenco/multiverse-sdk';

// wrap() auto-extracts name, description, and input schema from LangChain tools
const searchFlights = wrap(searchFlightsTool, {
  output: FlightSchema,  // Zod schema for simulation output
});

const bookFlight = wrap(bookFlightTool, {
  output: BookingSchema,
  effects: (output) => [{
    operation: 'create',
    collection: 'bookings',
    id: output.id,
    data: output,
  }],
});

input schema (auto-extracted) helps simulation understand query semantics.
output schema helps simulation generate valid responses.
effects declares how tool outputs change world state.

Run tests

Use multiverse.describe() to define your test, then run() it. Multiverse generates scenarios and runs your agent against each one.

const agent = createReactAgent({ llm, tools: [searchFlights, bookFlight] });

const test = multiverse.describe({
  name: 'flight-booking-agent',
  task: 'Book a flight from NYC to LA',
  agent: (ctx) => agent.invoke(ctx.userMessage),
});

const results = await test.run({
  success: (world) => world.getCollection('bookings').size > 0,
  simulateUser: true,   // Enable multi-turn conversations
  scenarioCount: 10,
  trialsPerScenario: 3,
});

console.log(results.passRate);  // e.g. 87
console.log(results.url);       // link to dashboard
console.log(results.markdown);  // LLM-analyzed report

Success functions

The success function checks whether the task was actually completed by examining world state—not by parsing agent output.

success: (world, trace) => {
  // Check world state
  const bookings = world.getCollection('bookings');
  if (bookings.size === 0) return false;

  // Check trace for specific tool calls
  const booked = trace.some(t => t.tool === 'bookFlight');
  return booked;
}

world contains all entities created via effects.
trace is an array of tool calls with inputs and outputs.

Scenarios

Multiverse generates scenarios by combining user personas with failure modes.

Personas

cooperative — follows instructions
impatient — rushes, skips details
confused — changes mind, unclear
adversarial — probes edge cases

Failure modes

none — happy path
timeout — API times out
error — API returns error
bad_data — malformed response

Test options

const test = multiverse.describe({
  name: 'my-agent',          // groups runs in dashboard
  task: 'Book a flight',     // task description
  agent,                     // (ctx) => Promise<string>
});

const results = await test.run({
  success,                    // (world, trace, scenario) => boolean

  scenarioCount: 5,           // number of scenarios to generate
  trialsPerScenario: 1,      // trials per scenario
  simulateUser: false,        // enable multi-turn simulation
  maxTurns: 10,               // max turns if simulateUser
  qualityThreshold: 70,       // LLM judge threshold (0-100)

  ci: {
    postToPR: true,           // auto-post report to GitHub PR
    printReport: true,        // print report to stdout
  },

  onProgress: (p) => {},      // progress callback
});