1 Why Stone
Stone is a Racket library for building agent harnesses — the runtime infrastructure around an LLM that turns it into an agent — as composable, validatable pipelines. In Stone, every step — the deterministic glue, the LLM call, the multi-turn tool-using agent, the human-in-the-loop approval — is the same kind of object: an ashlar. Ashlars share a typed, append-only DAG; they coordinate by writing nodes to it and reading nodes from it; and every run leaves behind a complete, content-addressed record of what happened.
If you’ve arrived via Anthropic’s or LangChain’s or smolagents’ writing about "agent harnesses," the Harness vs. pipeline explanation maps the industry vocabulary onto Stone’s. Short version: Stone is a harness construction kit, not a deployed harness in the shape of Claude Code or Codex — you compose your own harness from ashlars.
This document is the positioning: what Stone is for, where it sits in the landscape, and when you should reach for something else.
1.1 What Stone is for
Stone is the right tool when you want LLM work to ship as a pipeline you can retry, inspect, and recombine — not as a single opaque prompt.
The canonical shape of a Stone pipeline is this: a deterministic seed step puts something into a typed DAG. A language-model step reads it, produces structured output, and writes the output back. A validation step checks the result and either passes it through or emits a failure. A loop keeps trying until validation passes, asking a human for clarification when it can’t. A reducer collects everything and ships it.
Every one of those steps is an ashlar. Every interaction between them goes through the DAG. Every run produces the same kind of artifact: a typed graph of nodes you can replay, diff, or reason about without the code that produced it.
1.2 Who Stone is for
Stone is for three overlapping audiences.
Engineers building LLM pipelines who need more control than an agentic assistant. Claude Code and Aider are extraordinary when the task is "do my job for me, end to end." Stone is what you reach for when the task is "I want this pipeline to run a thousand times across a thousand inputs and produce an auditable record of each run." The shape of the work is different: you are not prompting an autonomous agent, you are building the agent’s scaffolding and handing it a pipeline to execute.
Engineers new to pipeline-style or "dark factory" workflows. If you haven’t worked with pipelines before, the intuition Stone asks for is this: every interesting piece of work has a before and an after, and the transitions between them are what you care about. Stone makes those transitions first-class. The DAG is the place where every ashlar’s output becomes the next ashlar’s input — not through function calls, but through structural coordination on a shared medium.
Engineers new to Racket. Racket is the host language; you will write short, ordinary code in it (lambdas, hashes, parameters, a bit of pattern matching). The Reading Racket guide covers everything you need to read and write Stone pipelines. If you have written any Lisp or any ML-family language, you are home.
1.3 Against agentic assistants
Claude Code, Aider, and similar agentic tools are users of the underlying API. They drive a conversation with a language model to accomplish a goal — opening files, running commands, reading output, adapting. That is the right shape when the work is exploratory and the termination condition is "the human says we’re done."
Stone is a compiler target, not a user. You don’t run Stone the way you run Claude Code; you build a pipeline with it, and the pipeline runs. The conversation shape is fixed by the topology you composed. The tool set for each step is declared at construction time. The termination conditions are predicates you wrote. Every step is visible, every output is typed, every failure is a node in the graph.
When to pick each:
Agentic assistant: "refactor this code" / "diagnose this bug" / "help me explore this codebase."
Stone: "take N inputs; for each, produce a structured proposal, verify it, retry on failure, ship a typed result to the next stage."
They are not competitors. A Stone pipeline can include a step that calls out to an agentic tool as an external process. An agentic tool can be asked to build a Stone pipeline. They solve different problems.
1.4 Against pipeline frameworks
LangChain, LangGraph, LlamaIndex, Haystack, and the surrounding Python ecosystem target the same shape of problem as Stone. The difference is where the coordination lives.
In LangChain-style frameworks, coordination is usually implicit: method chains, memory objects, retrieved context assembled under the hood, runnable interfaces with hidden state. When a chain produces an unexpected result, debugging means reasoning about what the framework did on your behalf.
In Stone, coordination is explicit: every ashlar declares what node types it produces and queries, every step runs against a DAG you can print and inspect, every intermediate result has a stable identity and a clear place in the graph. When a pipeline produces an unexpected result, you open the DAG and look.
That explicitness is the entire bet. It costs you some concision — a Stone pipeline says more words than a LangChain chain for the same work — and buys you visibility, validatability, and recomposition. If your pipelines are one-off prototypes, the LangChain form is probably faster to write. If your pipelines are load-bearing — if they run a thousand times and each run matters — the Stone form is what you want.
Stone is also the right tool when you want a harness — the runtime infrastructure around an LLM — but the deployed-harness shape of Claude Code or Codex doesn’t fit your work. A deployed harness is opinionated and batteries-included: one loop, one toolset, one UI. Stone is opinionless about all three: you pick the topology, you pick the middleware, you ship the result however you ship things. The translation table is in the Harness vs. pipeline explanation.
For migrating mental models, see the Coming from LangChain / LlamaIndex how-to.
1.5 Against bare SDK calls
The simplest thing is to call the Anthropic or OpenAI SDK directly from a script. That is the right answer when the work is "ask the model a question once" or "run this one prompt in a loop." You give up no flexibility, and you pay no framework overhead.
You start paying for that simplicity the moment you need any of:
A loop with a termination condition that depends on structured output.
A human-in-the-loop approval step that pauses execution.
Fan-out over a list of items followed by a reducer.
Multi-turn tool use with deterministic validation after each turn.
Replay of a failed run from the point of failure.
A typed trace of what the model saw and decided at each step.
Stone’s scaffolding exists to make those capabilities compose under a single vocabulary. If you find yourself hand-rolling loops and state machines around SDK calls, you are paying the cost of the absent scaffolding without the benefit. Stone’s tax is the framework learning curve; your tax is the state machines you maintain.
1.6 What Stone does not do
Stone is deliberately small in several directions.
Stone is not a model serving layer. Run your own vLLM or ollama, or use a hosted endpoint. Stone talks to OpenAI-compatible servers and to Anthropic’s API. It does not run models.
Stone is not a RAG framework. Stone does not ship vector stores, retrievers, or reranker primitives. If you want RAG inside a Stone pipeline, bring your own retrieval ashlar — it reads from the DAG and writes back to it, like any other ashlar.
Stone’s streaming observability is partial. The OpenAI-compatible caller emits per-token events through an outbox; the Anthropic caller does not stream yet. See Provider constraints in the explanation section for the full picture.
Stone does not hide provider quirks. When a specific model or server has a known trap (Qwen3.5’s thinking-by-default; the schema + tools combination on vLLM), Stone exposes the knob and the constraint rather than papering over it. See The ashlar-pair pattern and Provider constraints in the explanation section.
1.7 Where to go next
Getting Started — build your first two-ashlar pipeline.
Your First Orchestration — add a loop and a human-in-the-loop approval.
Ashlars — the atomic unit explained.
The DAG as State — the shared medium every ashlar reads and writes.