docs

Thinking Mode

Add a pre-reasoning step before each tool execution. A separate (often cheaper) model analyzes the situation and provides context, then the main model acts on those insights.

Configuration

agentfile.yaml

execution:
  mode: planned
  thinking:
    enabled: true
    model: gemini-2.5-flash        # cheap model for reasoning
    provider: google
    max_tokens: 2048
    temperature: 0.1
    min_input_length: 50           # skip thinking for short inputs
    prompt: |
      Analyze the situation and suggest the best approach.
      Focus on what tools to use and in what order.

Fields

Field	Default	Description
`enabled`	`false`	Enable the thinking step
`model`	—	Model for the thinking LLM
`provider`	—	Provider for the thinking LLM
`max_tokens`	`2048`	Output token budget for the reasoning step
`temperature`	`0.1`	Sampling temperature (low = more focused)
`min_input_length`	`50`	Skip thinking for inputs shorter than this (characters)
`prompt`	Built-in	Custom instruction for the thinking step

Runtime overrides

Bash

# Override at runtime without rebuilding — set env vars on the host
AGENTFILE_THINKING_ENABLED=true \
AGENTFILE_THINKING_MODEL=claude-haiku-4-5-20251001 \
AGENTFILE_THINKING_PROVIDER=anthropic \
ninetrix run

Cost optimization

Use a cheap, fast model (Haiku, Gemini Flash) for thinking and a capable model (Sonnet, GPT-4o) for execution. The reasoning step adds latency but often reduces total token usage by making better tool choices.

#Configuration

#Fields

#Runtime overrides

Configuration

Fields

Runtime overrides