docs
Thinking Mode
Add a pre-reasoning step before each tool execution. A separate (often cheaper) model analyzes the situation and provides context, then the main model acts on those insights.
Configuration
agentfile.yaml
execution:
mode: planned
thinking:
enabled: true
model: gemini-2.5-flash # cheap model for reasoning
provider: google
max_tokens: 2048
temperature: 0.1
min_input_length: 50 # skip thinking for short inputs
prompt: |
Analyze the situation and suggest the best approach.
Focus on what tools to use and in what order.
Fields
| Field | Default | Description |
|---|---|---|
enabled | false | Enable the thinking step |
model | — | Model for the thinking LLM |
provider | — | Provider for the thinking LLM |
max_tokens | 2048 | Output token budget for the reasoning step |
temperature | 0.1 | Sampling temperature (low = more focused) |
min_input_length | 50 | Skip thinking for inputs shorter than this (characters) |
prompt | Built-in | Custom instruction for the thinking step |
Runtime overrides
Bash
# Override at runtime without rebuilding — set env vars on the host
AGENTFILE_THINKING_ENABLED=true \
AGENTFILE_THINKING_MODEL=claude-haiku-4-5-20251001 \
AGENTFILE_THINKING_PROVIDER=anthropic \
ninetrix run
Cost optimization
Use a cheap, fast model (Haiku, Gemini Flash) for thinking and a capable model (Sonnet, GPT-4o) for execution. The reasoning step adds latency but often reduces total token usage by making better tool choices.