After the talk
Appendix: install and FAQ
The main route stays lean on purpose. The practical setup details and deeper questions live here instead.
Install + first case
Get started in minutes
Install the Python package, add the dashboard CLI, and write a first case that reads like a human expectation instead of a brittle text-matching rule.
What the first setup gives you
Use natural-language expectations for the outcome you want.
Assert expected tool usage when the path matters as much as the answer.
Rerun the same case after prompt and tool changes instead of rewriting checks every sprint.
pip install llm-goose
npm install -g @llm-goose/dashboard-cli goose init
goose test run tests
goose api
goose-dashboard from goose.testing import Goose
from my_agent import get_weather
def test_weather_query(weather_goose: Goose) -> None:
weather_goose.case(
query="What's the weather like in San Francisco?",
expectations=[
"Agent provides weather information for San Francisco",
"Response mentions sunny weather and 75°F",
],
expected_tool_calls=[get_weather],
) Questions developers ask first
FAQ
What exactly is Goose? +
Goose is an open-source Python library, CLI, and web dashboard for building, testing, and debugging LLM agents. It focuses on validating behavior and tool usage, not just producing a polished demo answer.
Why not just use regex or keyword assertions? +
Because LLMs vary wording even when they behave correctly. Goose lets you describe expectations in natural language while also checking the tools the agent should have used.
Does Goose only work for the browser dashboard? +
No. The workflow spans Python tests, a CLI, and a dashboard. You can run suites in the terminal, inspect history in the UI, and debug tools directly.
Can I keep using pytest-style patterns? +
Yes. Goose uses pytest-inspired fixtures so teams can reuse setup code and keep testing close to familiar Python workflows.
What kind of bugs is Goose best at catching? +
Prompt regressions, wrong tool selection, skipped tool calls, silent workflow drift, and failures where the final answer sounds plausible but the path was wrong.
Is Goose production-ready? +
Goose is open source and evolving fast. It is already useful for serious agent iteration, especially when you need repeatable validation and better visibility into failures.