Quick start
From install to captured-event-and-replay in a few minutes.
This walkthrough takes you from a fresh install to: making a call, inspecting the captured event, querying persisted events, and replaying one.
We use SQLite so there is no external service to set up. Swap database_url for a postgresql://... URL when you move to a real environment.
1. Install
pip install "leanllm-ai[sqlite]"
2. Configure and make a call
from leanllm import LeanLLM, LeanLLMConfig
client = LeanLLM(
api_key="sk-...", # your provider key
config=LeanLLMConfig(
database_url="sqlite:///leanllm_events.db",
capture_content=True, # store prompt + response text
last_event_buffer=32, # keep last 32 events in memory
),
)
response = client.chat(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Reply with a single word."}],
labels={"team": "backend", "feature": "demo"},
)
print(response.choices[0].message.content)
chat() returns the LiteLLM response immediately. Capture, persistence, retries, and migrations all happen on a background worker thread.
3. Inspect the most recent event in memory
event = client.last_event print(event.event_id, event.model, event.total_tokens, event.cost) print(event.prompt) # only set when capture_content=True print(event.response)
client.recent_events(n=5) returns the last N events from the same in-process ring buffer.
4. Query persisted events
The query methods are async so they don't block your loop. They run on the worker's asyncio loop via run_coroutine_threadsafe, so you can call them from any thread.
import asyncio
async def main() -> None:
last = await client.list_events(limit=5)
for e in last:
print(e.event_id, e.model, e.latency_ms, "ms")
fetched = await client.get_event(event_id=last[0].event_id)
print(fetched.cost, fetched.parameters)
asyncio.run(main())
5. Replay an event
import asyncio
from leanllm import ReplayEngine, ReplayOverrides
engine = ReplayEngine(client=client)
async def replay() -> None:
result = await engine.replay_by_id(
event_id="<event_id from step 4>",
overrides=ReplayOverrides(parameters={"temperature": 0.0}),
)
print(result.summary())
print("text identical:", result.text_identical)
asyncio.run(replay())
Replay re-issues the original call (same model, same messages, same captured parameters) with optional overrides, and reports whether the new response matches the stored one.
What to read next
- Configuration — every field on
LeanLLMConfigand its env var. - Request interception — sync vs streaming, error capture.
- Storage query API — full filter set for
list_events.