docs
Agents
How agents work in Ninetrix — system prompts, metadata, runtime config, and execution.
An agent is a Docker container running a generated Python runtime. It reads a system prompt built from metadata, calls an LLM via the configured provider, executes tool calls, and loops until the task is complete.
System prompt
The system prompt is automatically generated from your metadata block:
YAML
metadata:
role: Senior software engineer
goal: Write production-quality code and tests
instructions: |
You write clean, well-documented Python code.
Always add type hints. Always write tests.
Ask clarifying questions before starting complex tasks.
constraints:
- Never modify files outside the project directory
- Always run tests before declaring a task complete
Instructions are powerful
The
instructions field is the most impactful field for agent quality. Be explicit about what the agent should and should not do. Treat it like a detailed job description.Runtime configuration
| Field | Type | Default | Description |
|---|---|---|---|
provider | string | — | LLM provider: anthropic, openai, google, deepseek, mistral, groq, together_ai, openrouter, cerebras, fireworks_ai, bedrock, azure, minimax |
model | string | — | Model ID. e.g. claude-sonnet-4-6, gpt-4o, gemini-2.0-flash |
temperature | float | 0.2 | Sampling temperature 0.0–2.0 |
resources.cpu | string | — | Docker --cpus limit, e.g. "1.0" |
resources.memory | string | — | Docker memory limit, e.g. "2Gi", "512Mi" |
resources.base_image | string | python:3.12-slim | Override the Dockerfile FROM image |
resources.warm_pool | bool | false | Keep container alive after run completes (used by ninetrix up) |
Execution loop
The default execution loop is a standard agentic tool-use pattern:
- User message arrives (stdin, webhook, or schedule trigger)
- LLM generates a response — either plain text or a tool call
- If tool call: execute the tool, append the result to the message history
- Loop back to the LLM with updated history
- Stop when the LLM returns plain text with no tool calls, or
max_stepsis reached
| Constant | Default | Description |
|---|---|---|
max_steps | 10 | Maximum tool-call iterations before stopping |
TOOL_TIMEOUT | 30s | Timeout per tool call |
MAX_TOKENS | 8192 | Max output tokens per LLM call |
HISTORY_WINDOW_TOKENS | 90,000 tokens | Sliding window budget — oldest messages trimmed when exceeded; measured with litellm.token_counter() |
Entry agent
In a multi-agent agentfile.yaml, the first key in the agents: map is the entry agent — the one that receives the initial message. Subsequent agents are started by handoffs.
Pre-run thinking
Enable a reasoning step before the agent starts executing. The thinking output is stored in the checkpoint history:
YAML
execution:
thinking:
enabled: true
provider: anthropic
model: claude-opus-4-6
max_tokens: 2048
prompt: "Think carefully about the best strategy before acting."