原文 · 未翻译
statewright
Agents are suggestions, states are laws.
Agents are suggestions, states are laws.
State machine guardrails that control which tools your AI agent can use in each phase. Define a workflow once, enforce it across Claude Code, Codex, Cursor, opencode, and Pi. Full docs →
The problem
AI agents are brittle. Give a model 40+ tools and an open-ended problem and it re-reads the same file five times, calls Edit during review, deploys before tests pass. The common fix is bigger models and longer prompts... it helps sometimes. Observability tells you what went wrong after the fact; it doesn't prevent it.
The approach
Instead of making the model bigger, make the problem smaller.
State machines constrain the tool and solution spaces so the model reasons in a focused context at each step. A planning state gets read-only tools. When the agent transitions to implementation, edit tools unlock with limited shell access. Write-via-redirect and destructive ops are still blocked even when Bash is allowed. Testing only permits designated test commands.
Call a tool that's not in the current phase and you get rejected with a message telling you what IS available and how to transition. State machines loop and retry (unlike DAGs), which is what agentic work actually needs.
Works on frontier and local models alike. Below 13GB, models can produce tool calls but can't retain enough file content to make accurate edits. Above that threshold, the guardrails start turning failures into completions.
Quickstart
Install into Claude Code:
/plugin marketplace add statewright/statewright /plugin install statewright
/plugin marketplace add statewright/statewright /plugin install statewright
Your browser opens → sign up at statewright.ai → generate a key → paste it → done.
Then start a workflow:
❯ start the bugfix workflow — fix the failing tests in calc.py ◆ statewright — statewright_start (workflow: bugfix) ◆ [statewright] Workflow activated: bugfix ◆ statewright — statewright_get_state (MCP) ◆ Current phase: planning. Let me read the code first. Read 2 files [statewright] planning => implementing ◆ statewright — statewright_transition (READY) Edit calc.py: 1 line changed [statewright] implementing => testing ◆ statewright — statewright_transition (DONE) Bash: pytest -x — 7 passed [statewright] testing => completed ◆ [statewright] Workflow complete. 46 seconds.
❯ start the bugfix workflow — fix the failing tests in calc.py ◆ statewright — statewright_start (workflow: bugfix) ◆ [statewright] Workflow activated: bugfix ◆ statewright — statewright_get_state (MCP) ◆ Current phase: planning. Let me read the code first. Read 2 files [statewright] planning => implementing ◆ statewright — statewright_transition (READY) Edit calc.py: 1 line changed [statewright] implementing => testing ◆ statewright — statewright_transition (DONE) Bash: pytest -x — 7 passed [statewright] testing => completed ◆ [statewright] Workflow complete. 46 seconds.
You can also use the slash command directly: /statewright start bugfix.
/statewright start bugfix
Research results
In our 5-task SWE-bench subset (not the full 2294-instance benchmark), two local models went from 2 of 10 attempts passing to 10 of 10 with statewright constraints. Same tasks, same hardware.
Model Size Bug Fix (26 lines) SWE-bench (5 tasks) gemma3 3.3GB FAIL FAIL gemma4:e2b 7.2GB PASS* FAIL gpt-oss:20b 13.8GB PASS PASS (5/5) gemma4:31b 19.9GB PASS PASS (5/5) llama3.3 42.5GB PASS PASS (2/2)†
*with specialized edit_line tool adaptation †tested on 2 of the 5 tasks (added after initial experiment run)
The floor is around 13GB. Below that, models identify bugs correctly but can't serialize surgical edits (they rewrite entire files). That's a model limitation, not ours.
The structural win on larger models is breaking read-loop death spirals and keeping the tool space small enough that the model reasons instead of flailing. Research brief →
How it works
Architecture
Three layers, each independently useful:
Engine (crates/engine) — Pure Rust state machine evaluator. States, transitions, guards, tool restrictions. Deterministic. No LLM in the loop. No runtime dependencies.
Engine (crates/engine) — Pure Rust state machine evaluator. States, transitions, guards, tool restrictions. Deterministic. No LLM in the loop. No runtime dependencies.
crates/engine
Agent binary (crates/cli, binary: sw-agent) — Direct-to-Ollama agent executor. Loads a workflow, runs the LLM in a constrained loop, enforces tool access, and streams structured JSONL events. Supports per-state model routing via --config, and single-state execution via --state (the TUI or MCP gateway orchestrates, sw-agent executes one state at a time and exits).
Agent binary (crates/cli, binary: sw-agent) — Direct-to-Ollama agent executor. Loads a workflow, runs the LLM in a constrained loop, enforces tool access, and streams structured JSONL events. Supports per-state model routing via --config, and single-state execution via --state (the TUI or MCP gateway orchestrates, sw-agent executes one state at a time and exits).
crates/cli
sw-agent
--config
--state
sw-agent
Plugin layer (crates/mcp-gateway + plugins/) — MCP gateway that integrates with coding agents (Claude Code, Codex, Pi, etc.). When you activate a workflow, hooks enforce tool restrictions per state. The model sees 5 tools instead of 30. It gets clear instructions for the current phase and transitions when conditions are met. The statewright_run_agent MCP tool spawns the Rust binary for states that benefit from direct Ollama execution.
Plugin layer (crates/mcp-gateway + plugins/) — MCP gateway that integrates with coding agents (Claude Code, Codex, Pi, etc.). When you activate a workflow, hooks enforce tool restrictions per state. The model sees 5 tools instead of 30. It gets clear instructions for the current phase and transitions when conditions are met. The statewright_run_agent MCP tool spawns the Rust binary for states that benefit from direct Ollama execution.
crates/mcp-gateway
plugins/
statewright_run_agent
The TUI (crates/tui, binary: statewright) is a ratatui terminal interface that spawns sw-agent as a subprocess and renders its JSONL event stream in real time. It handles keyboard input, demo mode, and fixture selection.
crates/tui
statewright
sw-agent
Per-state model routing
States can specify which model to use via the model field. A default_model in meta applies to states without an explicit override. Clients that support programmatic model switching (Pi, the Rust harness) enforce this; others treat it as advisory.
model
default_model
meta
{ "meta": { "default_model": "claude-sonnet-4-20250514" }, "states": { "diagnose": { "model": "claude-haiku-4-5-20251001", "allowed_tools": ["Read", "Bash"] }, "propose_fix": { "model": "anthropic/claude-opus-4-6", "allowed_tools": ["Read"] }, "execute": { "allowed_tools": ["Read", "Edit", "Bash"] } } }
In this example, diagnose uses Haiku (fast, cheap reconnaissance), propose_fix escalates to Opus (high-stakes reasoning), and execute inherits the default_model (Sonnet). The sw-agent binary also accepts a --config file with a model_routing block for per-state Ollama URL, temperature, and context window overrides.
diagnose
propose_fix
execute
default_model
sw-agent
--config
model_routing
Guardrails
Guardrail What it does Per-state tool enforcement Agent can't see or call tools outside allowed_tools for the current state Bash discernment Blocks echo > file, rm -rf, sed -i, and scripting interpreters (python, node) when Write/Edit aren't allowed. Even if Bash itself is permitted. Edit guards Rejects diffs exceeding max_edit_lines, caps files edited per state Command allow-lists Only prefix-matched commands run (e.g. pytest, cargo test) Conditional transitions Programmatic guards on context data: test_result eq pass, coverage gt 80 Approval gates requires_approval pauses for human review Interrupts Edit a file matching a glob pattern? Auto-transition to a validation state, then return where you were Fork/join Run branches sequentially or in parallel, join when all (or any) complete Environment scoping Hide PROD_DB_URL via blocked_env, substitute with env_overrides Session isolation Per-session state via CLAUDE_SESSION_ID Per-state model routing Route cheap states to small models, expensive states to frontier models. model per state, default_model in meta. Thinking level control Per-state thinking_level field (high, medium, low, off) for clients that support reasoning effort tuning. Tool escalation detection Validator warns when a state jumps 2+ privilege levels without an approval gate
allowed_tools
echo > file
rm -rf
sed -i
python
node
max_edit_lines
pytest
cargo test
test_result eq pass
coverage gt 80
requires_approval
PROD_DB_URL
blocked_env
env_overrides
CLAUDE_SESSION_ID
model
default_model
meta
thinking_level
high
medium
low
off
Full guardrail reference in the docs.
Define your own workflows
{ "id": "bugfix", "initial": "planning", "meta": { "default_model": "claude-sonnet-4-20250514" }, "states": { "planning": { "allowed_tools": ["Read", "Grep", "Glob"], "model": "claude-haiku-4-5-20251001", "thinking_level": "low", "max_iterations": 8, "on": { "READY": "implementing" } }, "implementing": { "allowed_tools": ["Read", "Edit", "Write"], "max_edit_lines": 20, "max_files_per_state": 3, "on": { "DONE": "testing" } }, "testing": { "allowed_tools": ["Read", "Bash"], "allowed_commands": ["pytest", "cargo test", "npm test"], "on": { "PASS": { "target": "completed", "guard": "tests_passed" }, "FAIL_TEST": "implementing" } }, "completed": { "type": "final" } }, "guards": { "tests_passed": { "field": "test_result", "op": "eq", "value": "pass" } } }
Point your agent at the JSON schema and it generates a workflow via statewright_create_workflow. Tweak tools, commands, and environment blocks in the visual editor.
statewright_create_workflow
Supported agents
Hard enforcement means tool calls are intercepted at the hook layer before execution. Advisory means rules are injected into context but the model isn't prevented from ignoring them.
Agent Integration Enforcement Claude Code Hooks + MCP Hard Codex Hooks + MCP Hard Oh My Codex Hooks + MCP Hard Pi TypeScript extension Hard* opencode TypeScript plugin Hard (alpha) Cursor MCP + rules Advisory
*Pi includes tool name normalization and tool-call recovery for local models (Ollama, LM Studio).
MCP tools
The gateway exposes these tools to the connected agent:
Tool Purpose statewright_load_workflow Activate a named workflow, optionally resuming a paused run statewright_get_state Current state, allowed tools, transitions, iteration count, model, thinking level statewright_transition Emit an event to advance the state machine statewright_list_workflows List available workflows and which is active statewright_create_workflow Create a new workflow from a JSON definition statewright_pause Pause the current run; resume later with load_workflow(resume=true) statewright_deactivate Turn off enforcement; all tools pass through statewright_get_status Gateway health: active workflow, state, available workflows statewright_run_agent Spawn the Rust agent executor (sw-agent) for direct-to-Ollama bug fixing statewright_force_state Jump to any state bypassing guards (debug mode only, gated on meta.debug)
statewright_load_workflow
statewright_get_state
statewright_transition
statewright_list_workflows
statewright_create_workflow
statewright_pause
load_workflow(resume=true)
statewright_deactivate
statewright_get_status
statewright_run_agent
sw-agent
statewright_force_state
meta.debug
Pricing
The managed cloud at statewright.ai handles workflow storage, run history, and the MCP gateway. Prices won't go up.
Plan Workflows Transitions/mo Run History Price Free 3 200 72 hours $0 Pro 10 2500 7 days $29/mo Team 30 10000 90 days $99/mo Enterprise Unlimited Unlimited to Specification Contact us
Self-hosting
Run the full stack locally with Docker Compose — PocketBase, MCP gateway, and workflow editor. BYO Ollama. Self-hosted guide →
cd self-hosted && docker compose up --build
The engine (crates/engine) and agent layer (crates/agent) are Apache 2.0, embeddable with no runtime dependencies. The MCP gateway is FSL-1.1-ALv2 (converts to Apache 2.0 in 2029). Single-developer and single-team self-hosting is permitted under the FSL license.
crates/engine
crates/agent
Tradeoffs
Requires MCP support in the agent (or hooks for non-MCP agents like Codex)
Workflow definitions are authored by hand, though agents can generate them via statewright_create_workflow
statewright_create_workflow
Cursor enforcement is advisory, not hard. MCP alone can't gate tool calls in Cursor's architecture
Research results are from a 5-task SWE-bench subset, not the full 2294-instance benchmark
If a workflow is too restrictive, the agent gets stuck. statewright_deactivate is the escape hatch
statewright_deactivate
Docs
docs.statewright.ai — install guide, workflow authoring, schema reference, MCP tool reference, and agent-generated workflows.
Contributing
Workflow definitions, templates, and bug reports welcome. See Create Your Own for how to write workflows.
Report an issue
Discussions & feedback
License
Apache 2.0 — portions FSL-1.1-ALv2 (converts to Apache 2.0 on May 3, 2029). Managed cloud at statewright.ai.
This project includes a patent pledge covering independent implementations of the techniques described in the patent. Solo developers, researchers, open source projects, and single-team self-hosted deployments are covered regardless of whether they use Statewright software.
One hook to rule them all.
One hook to rule them all.