← Back to the presentation

After the talk

Appendix: install and FAQ

The main route stays lean on purpose. The practical setup details and deeper questions live here instead.

Install + first case

Get started in minutes

Install the Python package, add the dashboard CLI, and write a first case that reads like a human expectation instead of a brittle text-matching rule.

What the first setup gives you

step 01

Use natural-language expectations for the outcome you want.

step 02

Assert expected tool usage when the path matters as much as the answer.

step 03

Rerun the same case after prompt and tool changes instead of rewriting checks every sprint.

Install Goose
pip install llm-goose
npm install -g @llm-goose/dashboard-cli
CLI and dashboard
goose init
goose test run tests
goose api
goose-dashboard
Minimal test
from goose.testing import Goose
from my_agent import get_weather


def test_weather_query(weather_goose: Goose) -> None:
    weather_goose.case(
        query="What's the weather like in San Francisco?",
        expectations=[
            "Agent provides weather information for San Francisco",
            "Response mentions sunny weather and 75°F",
        ],
        expected_tool_calls=[get_weather],
    )

Questions developers ask first

FAQ

What exactly is Goose? +

Goose is an open-source Python library, CLI, and web dashboard for building, testing, and debugging LLM agents. It focuses on validating behavior and tool usage, not just producing a polished demo answer.

Why not just use regex or keyword assertions? +

Because LLMs vary wording even when they behave correctly. Goose lets you describe expectations in natural language while also checking the tools the agent should have used.

Does Goose only work for the browser dashboard? +

No. The workflow spans Python tests, a CLI, and a dashboard. You can run suites in the terminal, inspect history in the UI, and debug tools directly.

Can I keep using pytest-style patterns? +

Yes. Goose uses pytest-inspired fixtures so teams can reuse setup code and keep testing close to familiar Python workflows.

What kind of bugs is Goose best at catching? +

Prompt regressions, wrong tool selection, skipped tool calls, silent workflow drift, and failures where the final answer sounds plausible but the path was wrong.

Is Goose production-ready? +

Goose is open source and evolving fast. It is already useful for serious agent iteration, especially when you need repeatable validation and better visibility into failures.