Documentation
Get started with Multiverse in a few minutes.
Install
npm install @virtualkitchenco/multiverse-sdk
Configure
Point the SDK at your Multiverse server. LLM API keys are read from environment variables on the server.
import { multiverse } from '@virtualkitchenco/multiverse-sdk';
multiverse.configure({
baseUrl: 'http://localhost:3000',
llm: { provider: 'anthropic' }, // Uses ANTHROPIC_API_KEY from server env
});Set ANTHROPIC_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY on your server.
Wrap your tools
Wrap your agent's tools so Multiverse can intercept calls during testing and return simulated responses.
import { wrap } from '@virtualkitchenco/multiverse-sdk';
// wrap() auto-extracts name, description, and input schema from LangChain tools
const searchFlights = wrap(searchFlightsTool, {
output: FlightSchema, // Zod schema for simulation output
});
const bookFlight = wrap(bookFlightTool, {
output: BookingSchema,
effects: (output) => [{
operation: 'create',
collection: 'bookings',
id: output.id,
data: output,
}],
});input schema (auto-extracted) helps simulation understand query semantics.
output schema helps simulation generate valid responses.
effects declares how tool outputs change world state.
Run tests
Use multiverse.describe() to define your test, then run() it. Multiverse generates scenarios and runs your agent against each one.
const agent = createReactAgent({ llm, tools: [searchFlights, bookFlight] });
const test = multiverse.describe({
name: 'flight-booking-agent',
task: 'Book a flight from NYC to LA',
agent: (ctx) => agent.invoke(ctx.userMessage),
});
const results = await test.run({
success: (world) => world.getCollection('bookings').size > 0,
simulateUser: true, // Enable multi-turn conversations
scenarioCount: 10,
trialsPerScenario: 3,
});
console.log(results.passRate); // e.g. 87
console.log(results.url); // link to dashboard
console.log(results.markdown); // LLM-analyzed reportSuccess functions
The success function checks whether the task was actually completed by examining world state—not by parsing agent output.
success: (world, trace) => {
// Check world state
const bookings = world.getCollection('bookings');
if (bookings.size === 0) return false;
// Check trace for specific tool calls
const booked = trace.some(t => t.tool === 'bookFlight');
return booked;
}world contains all entities created via effects.
trace is an array of tool calls with inputs and outputs.
Scenarios
Multiverse generates scenarios by combining user personas with failure modes.
Personas
- cooperative — follows instructions
- impatient — rushes, skips details
- confused — changes mind, unclear
- adversarial — probes edge cases
Failure modes
- none — happy path
- timeout — API times out
- error — API returns error
- bad_data — malformed response
Test options
const test = multiverse.describe({
name: 'my-agent', // groups runs in dashboard
task: 'Book a flight', // task description
agent, // (ctx) => Promise<string>
});
const results = await test.run({
success, // (world, trace, scenario) => boolean
scenarioCount: 5, // number of scenarios to generate
trialsPerScenario: 1, // trials per scenario
simulateUser: false, // enable multi-turn simulation
maxTurns: 10, // max turns if simulateUser
qualityThreshold: 70, // LLM judge threshold (0-100)
ci: {
postToPR: true, // auto-post report to GitHub PR
printReport: true, // print report to stdout
},
onProgress: (p) => {}, // progress callback
});