docs

Agents

How agents work in Ninetrix — system prompts, metadata, runtime config, and execution.

An agent is a Docker container running a generated Python runtime. It reads a system prompt built from metadata, calls an LLM via the configured provider, executes tool calls, and loops until the task is complete.

System prompt

The system prompt is automatically generated from your metadata block:

YAML

metadata:
  role: Senior software engineer
  goal: Write production-quality code and tests
  instructions: |
    You write clean, well-documented Python code.
    Always add type hints. Always write tests.
    Ask clarifying questions before starting complex tasks.
  constraints:
    - Never modify files outside the project directory
    - Always run tests before declaring a task complete

Instructions are powerful

The instructions field is the most impactful field for agent quality. Be explicit about what the agent should and should not do. Treat it like a detailed job description.

Runtime configuration

Field	Type	Default	Description
`provider`	string	—	LLM provider: `anthropic`, `openai`, `google`, `deepseek`, `mistral`, `groq`, `together_ai`, `openrouter`, `cerebras`, `fireworks_ai`, `bedrock`, `azure`, `minimax`
`model`	string	—	Model ID. e.g. `claude-sonnet-4-6`, `gpt-4o`, `gemini-2.0-flash`
`temperature`	float	0.2	Sampling temperature 0.0–2.0
`resources.cpu`	string	—	Docker `--cpus` limit, e.g. `"1.0"`
`resources.memory`	string	—	Docker memory limit, e.g. `"2Gi"`, `"512Mi"`
`resources.base_image`	string	`python:3.12-slim`	Override the Dockerfile `FROM` image
`resources.warm_pool`	bool	false	Keep container alive after run completes (used by `ninetrix up`)

Execution loop

The default execution loop is a standard agentic tool-use pattern:

User message arrives (stdin, webhook, or schedule trigger)
LLM generates a response — either plain text or a tool call
If tool call: execute the tool, append the result to the message history
Loop back to the LLM with updated history
Stop when the LLM returns plain text with no tool calls, or max_steps is reached

Constant	Default	Description
`max_steps`	10	Maximum tool-call iterations before stopping
`TOOL_TIMEOUT`	30s	Timeout per tool call
`MAX_TOKENS`	8192	Max output tokens per LLM call
`HISTORY_WINDOW_TOKENS`	90,000 tokens	Sliding window budget — oldest messages trimmed when exceeded; measured with `litellm.token_counter()`

Entry agent

In a multi-agent agentfile.yaml, the first key in the agents: map is the entry agent — the one that receives the initial message. Subsequent agents are started by handoffs.

Pre-run thinking

Enable a reasoning step before the agent starts executing. The thinking output is stored in the checkpoint history:

YAML

execution:
  thinking:
    enabled: true
    provider: anthropic
    model: claude-opus-4-6
    max_tokens: 2048
    prompt: "Think carefully about the best strategy before acting."

#System prompt

#Runtime configuration

#Execution loop

#Entry agent

#Pre-run thinking

System prompt

Runtime configuration

Execution loop

Entry agent

Pre-run thinking