How pagantic enforces dependency direction, structural typing, and deterministic control around probabilistic inference.
pagantic organizes code into numbered directories under layers/.
Each directory name starts with a two-digit prefix: 00_core,
01_inference, 02_orchestrate, and so on up to
10_observe.
The numbering is not decorative. It encodes a hard rule: higher-numbered layers may depend on lower-numbered layers, never the reverse. Layer 05 can import layer 00 and 01, but layer 01 cannot import layer 05. This prevents circular dependencies and keeps the dependency graph a DAG.
go-arch-lint at CI time. A pull request that adds an import from
a lower layer to a higher layer will fail the lint check before review.
| Layer | Package | Purpose |
|---|---|---|
| 0 | core | Shared domain types - Message, ToolCall, Schema, TokenUsage |
| 1 | inference | Engine interface for model inference |
| 2 | orchestrate | Control loops - AgentLoop, SpecializedLoop, PlanExecutor, RedundantLoop |
| 3 | context | Knowledge retrieval - Retriever, ContextBuilder |
| 4 | tool | Tool registry and execution |
| 5 | constraint | Output enforcement - JSON validation, repair, schema, GBNF grammar |
| 6 | rerank | Candidate scoring and reranking |
| 7 | validate | Guardrails, rule validation, retry policy |
| 8 | prompt | Prompt construction - SystemPrompt, InstructionSet, Template |
| 9 | memory | State management - ConversationBuffer, SessionState, WorkingMemory |
| 10 | observe | Tracing, metrics, event logging, cost tracking |
Each layer is a standalone Go package with its own doc.go explaining
domain concepts and design decisions. Reading those files is the recommended
starting point before modifying any layer.
Layers 3 through 10 (context, tool, constraint, rerank, validate, prompt,
memory, observe) form a flat band. They do not depend on each
other. Each middle layer depends only on core (layer 0) and
optionally on inference (layer 1).
The orchestrate layer (layer 2) is the only layer that reaches
across the band. It imports from tool, constraint, memory, and observe to
coordinate their work. This makes orchestrate the "control hub" -- it knows
about everything, but everything does not know about it.
core.
Go interfaces are satisfied implicitly. A type does not need to declare "I implement X" -- it just needs to have the right methods. pagantic exploits this to keep layers decoupled.
The orchestrate layer defines what it needs from a context provider:
// In layers/02_orchestrate/loop.go
// orchestrate defines the interface it consumes:
type ContextProvider interface {
Build(ctx context.Context, query string) ([]core.Message, error)
}
The context layer implements a ContextBuilder that satisfies this
interface -- without importing orchestrate:
// In layers/03_context/builder.go
// context.ContextBuilder satisfies orchestrate.ContextProvider
// via Go structural typing -- no import of orchestrate needed.
type ContextBuilder struct {
Retriever Retriever
MaxChunks int
}
func (cb *ContextBuilder) Build(ctx context.Context, query string) ([]core.Message, error) {
// retrieves chunks from Retriever, assembles into system messages
chunks, err := cb.Retriever.Retrieve(ctx, query, cb.MaxChunks)
if err != nil {
return nil, err
}
var builder strings.Builder
builder.WriteString("Relevant context:\n\n")
for i, chunk := range chunks {
fmt.Fprintf(&builder, "[%d] (%s): %s", i+1, chunk.Source, chunk.Content)
if i < len(chunks)-1 {
builder.WriteString("\n")
}
}
return []core.Message{core.NewSystemMessage(builder.String())}, nil
}
Same pattern. Orchestrate defines CandidateReranker:
// In layers/02_orchestrate/plan.go
type CandidateReranker interface {
Rerank(ctx context.Context, input RerankInput) ([]RerankCandidate, error)
}
The rerank layer has its own Reranker type with a matching
Rerank method. An adapter function at the call site converts
between orchestrate's RerankCandidate and rerank's
Candidate types.
ContextBuilder or change its retrieval strategy without modifying orchestrate.Build method of the right signature can stand in as a mock.RerankCandidate -- the fields are explicitly
mirrored in each layer. Orchestrate defines its own RerankCandidate
with the same fields as rerank's Candidate, and adapter code
converts between them at the call site.
// orchestrate defines its own struct (layers/02_orchestrate/plan.go):
type RerankCandidate struct {
Content string
Score float64
Source string
Metadata map[string]any
}
// rerank defines its own struct (layers/06_rerank/scorer.go):
type Candidate struct {
Content string
Score float64
Source string
Metadata map[string]any
}
// Adapter code at the call site converts between the two.
// Fields are identical, making conversion straightforward.
Adapters live in adapters/ (cli, tui, api). They are thin shells
that translate external input into internal types, call the orchestrate layer,
and format output. An adapter must not contain:
// adapters/cli/runner.go - thin shell around orchestrate
type RunConfig struct {
Engine inference.Engine
SystemPrompt string
Registry *tool.Registry
ContextProvider orchestrate.ContextProvider
Stream *inference.StreamHandler
Timeout time.Duration
Out io.Writer
}
type Runner struct {
cfg RunConfig
}
func NewRunner(cfg RunConfig) *Runner {
if cfg.Engine == nil {
panic("cli: RunConfig.Engine must not be nil")
}
if cfg.Out == nil {
cfg.Out = os.Stdout
}
return &Runner{cfg: cfg}
}
func (r *Runner) Run(ctx context.Context, prompt string) error {
// 1. Apply timeout
timeout := r.cfg.Timeout
if timeout <= 0 {
timeout = DefaultTimeout // 120s
}
ctx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
// 2. Delegate to orchestrate layer
agent := orchestrate.NewAgentLoop(orchestrate.LoopConfig{
SystemPrompt: r.cfg.SystemPrompt,
Engine: r.cfg.Engine,
Tools: r.cfg.Registry,
Stream: r.cfg.Stream,
ContextProvider: r.cfg.ContextProvider,
})
result, err := agent.Chat(ctx, prompt)
if err != nil {
return fmt.Errorf("cli: %w", err)
}
// 3. Write output (only if not streaming)
if r.cfg.Stream == nil {
fmt.Fprintln(r.cfg.Out, result.Content)
}
return nil
}
Notice what Runner.Run does: set timeout, build AgentLoop, call
Chat, write output. No prompt construction, no tool execution, no
JSON validation. All of that lives in the layers below.
Layer 0 (core) defines the shared vocabulary. Every other layer
speaks these types. There are no external dependencies -- just plain Go structs.
// core/message.go - the conversation unit
type Role string
const (
RoleSystem Role = "system"
RoleUser Role = "user"
RoleAssistant Role = "assistant"
RoleTool Role = "tool"
)
type Message struct {
Role Role
Content string
ToolCalls []ToolCall // assistant requesting tool execution
ToolCallID string // tool result referencing original call
Name string // tool name for tool result messages
}
// Convenience constructors enforce correct field combinations:
func NewSystemMessage(content string) Message { ... }
func NewUserMessage(content string) Message { ... }
func NewAssistantMessage(content string) Message { ... }
func NewToolResultMessage(callID, name, content string) Message { ... }
Fields are populated differently depending on the Role. System and user messages carry Content only. Assistant messages carry Content and/or ToolCalls. Tool messages carry Content, ToolCallID, and Name.
// core/toolcall.go
type ToolCall struct {
ID string
Name string
Arguments map[string]any
}
type ToolDefinition struct {
Name string
Description string
Parameters Schema // JSON Schema for the function's input
}
type ToolResult struct {
CallID string
Name string
Content string
IsError bool
}
// core/schema.go - JSON Schema subset for structured output
type Schema struct {
Type string `json:"type,omitempty"`
Description string `json:"description,omitempty"`
Properties map[string]Schema `json:"properties,omitempty"`
Required []string `json:"required,omitempty"`
Enum []string `json:"enum,omitempty"`
Items *Schema `json:"items,omitempty"`
Default any `json:"default,omitempty"`
}
Schema covers the common cases: objects with typed properties, arrays, enums, and primitive types. It is used both for structured LLM output and for tool parameter definitions.
// core/usage.go - inference metrics
type TokenUsage struct {
PromptTokens int
ReasoningTokens int
OutputTokens int
ContextTokens int
ContextWindow int
TokensPerSecond float64
}
The inference.Engine interface is the key abstraction separating
pagantic from any specific model runtime:
// layers/01_inference/engine.go
type Engine interface {
Infer(ctx context.Context, req Request) (*Result, error)
ModelInfo() ModelInfo
}
type Request struct {
Messages []core.Message
Tools []core.ToolDefinition
Schema *core.Schema
Grammar string // GBNF grammar for decoder-level constraint
MaxTokens int
Temperature *float64 // nil means use model default
Options map[string]any
}
type Result struct {
Content string
ToolCalls []core.ToolCall
Messages []core.Message
Usage core.TokenUsage
}
type ModelInfo struct {
Name string
ContextWindow int
}
Everything above inference (orchestrate, adapters) works with Engine.
They never see vendor-specific types. This means you can swap model runtimes
without touching any orchestration code.
The kronk package bridges pagantic types to kronk's
model.D format. It lives outside the layer hierarchy because it
depends on an external SDK.
// kronk/adapter.go
type Adapter struct {
chat llmChat // interface that kronk engine satisfies
handler *inference.StreamHandler
}
// llmChat is the interface kronk's engine satisfies:
type llmChat interface {
ChatStreaming(ctx context.Context, d model.D) (<-chan model.ChatResponse, error)
ModelConfig() model.Config
}
func NewAdapter(chat llmChat, handler *inference.StreamHandler) *Adapter {
return &Adapter{chat: chat, handler: handler}
}
// Infer converts pagantic Request to model.D, calls ChatStreaming,
// streams chunks, and assembles the Result.
func (a *Adapter) Infer(ctx context.Context, req inference.Request) (*inference.Result, error) {
requestD := model.D{}
requestD["messages"] = messagesToD(req.Messages)
if req.MaxTokens > 0 {
requestD["max_tokens"] = req.MaxTokens
}
if len(req.Tools) > 0 {
requestD["tools"] = toolDefsToD(req.Tools)
}
if req.Grammar != "" {
requestD["grammar"] = req.Grammar
}
if req.Schema != nil {
requestD["json_schema"] = schemaToD(*req.Schema)
}
ch, err := a.chat.ChatStreaming(ctx, requestD)
// ... stream processing, tool call extraction, usage tracking
return &inference.Result{
Content: content.String(),
ToolCalls: toolCalls,
Messages: messages,
Usage: extractUsage(a.chat, lastResp),
}, nil
}
func (a *Adapter) ModelInfo() inference.ModelInfo {
cfg := a.chat.ModelConfig()
return inference.ModelInfo{
Name: path.Base(cfg.ModelFiles[0]),
ContextWindow: cfg.ContextWindow(),
}
}
krn, cleanup, err := kronk.Load(ctx, kronk.Config{
ModelSource: "unsloth/gemma-4-E4B-it",
})
defer cleanup()
engine := kronk.NewAdapter(krn, nil) // satisfies inference.Engine
pagantic defines stable boundary contracts at the system edge. Every adapter maps external input into a SystemRequest and maps a SystemResponse back to the user. No adapter accesses internal layers directly.
| Contract | Direction | Purpose |
|---|---|---|
SystemRequest |
Adapter -> System | Carries messages, mode, execution hints, output contract, correlation ids |
SystemResponse |
System -> Adapter | Returns content, structured output, confidence, validation result, token usage, error |
SystemError |
System -> Adapter | Canonical error with category from failure taxonomy, retryable flag, details |
SystemRequest carries a Mode field that selects the orchestration pattern:
| Mode | Pattern | Description |
|---|---|---|
chat | AgentLoop | Multi-turn conversation with tool loop |
structured | SpecializedLoop | Single-shot schema-constrained output |
plan | PlanExecutor | Multi-step typed pipeline |
redundant | RedundantLoop | N-version inference with voting |
Every request follows a lifecycle state machine with defined states and transitions:
Each state emits observability events. RetryPolicy is a lifecycle transition: EXECUTE -> VALIDATE -> (back to EXECUTE on repair/retry, or ERROR on terminal failure). Full lifecycle details in Contracts: Execution Lifecycle.
All errors map to one of seven categories. Each category has a stable error code pattern, retryable flag, and recovery path:
| Category | Emitting Layer | Retryable |
|---|---|---|
| InferenceFailure | inference (L01) | Sometimes |
| ToolFailure | tool (L04) | If idempotent |
| ConstraintFailure | constraint (L05) | Yes |
| ValidationFailure | validate (L07) | Yes |
| OrchestrationFailure | orchestrate (L02) | No |
| ConfigurationFailure | Any layer at init | No |
| Cancellation | Any layer | No |
Full failure taxonomy with error codes and recovery paths in Contracts: Failure Taxonomy.
When data crosses step boundaries inside PlanExecutor, it uses a canonical intermediate representation (IR). The IR lives in the orchestrate layer and provides common field sets that avoid per-layer type conversion noise.
| Type | Purpose | Key Fields |
|---|---|---|
StepInput |
Data entering a Step | Tagged wrapper: may hold messages, candidates, context, or tool results |
StepOutput |
Data leaving a Step | Tagged wrapper: carries results forward to next Step |
CandidateIR |
Unified candidate across steps | content, source, score, metadata |
ContextIR |
Retrieved context with provenance | List of chunks/messages with source tracking |
ToolCallIR / ToolResultIR |
Tool interaction records | Already defined in core as ToolCall; documented here as part of IR |
CandidateIR carries a score field. Score meaning depends on who produced it:
The orchestrate layer owns the IR types. Other layers produce and consume their own
native types (e.g., context.Chunk, rerank.Candidate).
Conversion happens at integration sites - user code that wires layers together:
context.Chunk -> CandidateIR: user code maps Chunk fields to CandidateIR at the boundary between context retrieval and reranking stepsrerank.Candidate -> CandidateIR: user code maps Candidate to CandidateIR when passing rerank results into subsequent pipeline stepscore.ToolCall -> ToolCallIR: direct mapping, same underlying type
Any layer can record events through the observe package. The
pattern is consistent: capture start time, do work, record event with duration
and error.
// Recording an event from any layer:
observer.Record(observe.Event{
Timestamp: started,
Layer: "orchestrate",
Action: "infer",
Data: map[string]any{"messages": len(req.Messages)},
Duration: time.Since(started),
Error: err,
})
// layers/10_observe/event.go
type Event struct {
Timestamp time.Time
Layer string
Action string
Data map[string]any
Duration time.Duration
Error error
}
type EventLog interface {
Record(event Event)
Events() []Event
}
// layers/10_observe/trace.go
type TraceRecorder interface {
StartSpan(ctx context.Context, name string) (context.Context, Span)
}
type Span interface {
End()
SetAttribute(key string, value any)
RecordError(err error)
}
// layers/10_observe/metrics.go
type MetricsCollector interface {
RecordLatency(layer string, duration time.Duration)
RecordTokens(usage core.TokenUsage)
IncrementCounter(name string, delta int)
}
// layers/10_observe/cost.go
type CostTracker interface {
RecordUsage(model string, usage core.TokenUsage)
TotalCost() float64
}
Each interface has two implementations: an in-memory version for testing and development, and a no-op version for production paths that do not need observability:
| Interface | In-Memory | No-Op |
|---|---|---|
EventLog | InMemoryEventLog | NoOpEventLog |
TraceRecorder | InMemoryTracer | NoOpTracer |
MetricsCollector | InMemoryMetrics | NoOpMetrics |
CostTracker | InMemoryCostTracker | NoOpCostTracker |
sync.Mutex for safe concurrent access. Events and spans are
deep-copied on read and write to prevent aliasing bugs.
All observability events carry correlation identifiers that link events to requests, sessions, and execution spans:
// Correlation identifiers on every event
CorrelationContext {
RequestID string // Stable per SystemRequest
SessionID string // Stable per conversation (AgentLoop)
TraceID string // Distributed tracing root
SpanID string // Current execution unit
ParentSpanID string // Parent in span hierarchy
CausedBy string // Causal link: tool_call_id, step_name
}
Every event must include:
request_id - always presentsession_id - when inside AgentLoopstep_name - when inside PlanExecutortool_call_id - when tool-relatedInference events additionally include: message_count, tool_defs_count, schema_present, grammar_present, temperature. Tool events include the tool_call_id from the inference response that triggered the tool call, establishing causality.
tool_call_id.
Full event schemas in Contracts: Observability Correlation.
The prompt layer (L08) shapes model behavior through structured prompt construction. Orchestration patterns consume prompts through a structural typing interface rather than raw strings.
// Structural typing interface consumed by orchestrate:
type PromptProvider interface {
BuildSystemPrompt() (core.Message, error)
}
Prompt layer implementations that expose BuildSystemPrompt() (core.Message, error)
satisfy this interface without importing orchestrate - same structural typing pattern as
ContextProvider and CandidateReranker. The existing
SystemPrompt.Build() and Template.Render() use different
signatures; production code bridges between them or implements the interface directly.
When retrieved context is injected into prompts, placement follows
ContextPolicy rules:
SystemPrompt string as a convenience default. PromptProvider is the
primary mechanism for production use where prompt assembly needs to be structured,
testable, and policy-driven.
LLM output is inherently non-deterministic. pagantic wraps it with
deterministic control structures: tool loops with iteration limits, schema
validation with repair, GBNF grammar constraints at the decoder level, and
redundant inference with majority voting. The probabilistic part is
contained within Engine.Infer() -- everything outside that call
is predictable and testable.
No hidden closures, no package-level globals, no init() functions with side
effects. Conversation state lives in ConversationBuffer. Observer
state lives in EventLog. Configuration is passed as structs.
This means you can inspect any value at any point in the call stack.
// State is always visible and inspectable:
agent := orchestrate.NewAgentLoop(orchestrate.LoopConfig{
Engine: engine,
Tools: registry,
SystemPrompt: "You are helpful.",
MaxTokens: 2048,
Observer: &observe.InMemoryEventLog{},
})
// After a call, you can inspect everything:
messages := agent.Messages() // full conversation history
events := observer.Events() // every action that happened
Every inference call is logged. Every tool execution is recorded. Every structured output is validated against its schema. Every JSON repair attempt is traceable. When something goes wrong, the event log tells you exactly what happened, in what order, and how long each step took.
Output constraints can be layered:
These are not mutually exclusive. A single ChatStructured call
can apply grammar constraint at inference time, then repair, normalize, and
validate the result:
// From AgentLoop.ChatStructured -- constraints compose naturally:
result, err := al.infer(ctx, inference.Request{
Messages: messages,
Schema: &schema,
Grammar: al.cfg.Grammar, // decoder-level GBNF
})
// Post-inference constraint pipeline:
if json.Valid([]byte(result.Content)) {
result.Content = constraint.NormalizeEnumValues(result.Content, schema)
return al.validateSchema(result, schema)
}
repaired := constraint.RepairJSON(result.Content)
result.Content = constraint.NormalizeEnumValues(repaired, schema)
return al.validateSchema(result, schema)
Grammar is empty, pagantic relies on post-generation
constraints (repair + validation) to enforce structure. When grammar is
available, it provides a stronger guarantee at the decoder level.