docs

Thinking Mode

Add a pre-reasoning step before each tool execution. A separate (often cheaper) model analyzes the situation and provides context, then the main model acts on those insights.

Configuration

agentfile.yaml
execution:
  mode: planned
  thinking:
    enabled: true
    model: gemini-2.5-flash        # cheap model for reasoning
    provider: google
    max_tokens: 2048
    temperature: 0.1
    min_input_length: 50           # skip thinking for short inputs
    prompt: |
      Analyze the situation and suggest the best approach.
      Focus on what tools to use and in what order.

Fields

FieldDefaultDescription
enabledfalseEnable the thinking step
modelModel for the thinking LLM
providerProvider for the thinking LLM
max_tokens2048Output token budget for the reasoning step
temperature0.1Sampling temperature (low = more focused)
min_input_length50Skip thinking for inputs shorter than this (characters)
promptBuilt-inCustom instruction for the thinking step

Runtime overrides

Bash
# Override at runtime without rebuilding — set env vars on the host
AGENTFILE_THINKING_ENABLED=true \
AGENTFILE_THINKING_MODEL=claude-haiku-4-5-20251001 \
AGENTFILE_THINKING_PROVIDER=anthropic \
ninetrix run
Cost optimization
Use a cheap, fast model (Haiku, Gemini Flash) for thinking and a capable model (Sonnet, GPT-4o) for execution. The reasoning step adds latency but often reduces total token usage by making better tool choices.
On this page