Table of Contents

1. The 10-Layer System

pagantic organizes code into numbered directories under layers/. Each directory name starts with a two-digit prefix: 00_core, 01_inference, 02_orchestrate, and so on up to 10_observe.

The numbering is not decorative. It encodes a hard rule: higher-numbered layers may depend on lower-numbered layers, never the reverse. Layer 05 can import layer 00 and 01, but layer 01 cannot import layer 05. This prevents circular dependencies and keeps the dependency graph a DAG.

CI enforcement. These rules are checked automatically by go-arch-lint at CI time. A pull request that adds an import from a lower layer to a higher layer will fail the lint check before review.

Full Layer Table

LayerPackagePurpose
0coreShared domain types - Message, ToolCall, Schema, TokenUsage
1inferenceEngine interface for model inference
2orchestrateControl loops - AgentLoop, SpecializedLoop, PlanExecutor, RedundantLoop
3contextKnowledge retrieval - Retriever, ContextBuilder
4toolTool registry and execution
5constraintOutput enforcement - JSON validation, repair, schema, GBNF grammar
6rerankCandidate scoring and reranking
7validateGuardrails, rule validation, retry policy
8promptPrompt construction - SystemPrompt, InstructionSet, Template
9memoryState management - ConversationBuffer, SessionState, WorkingMemory
10observeTracing, metrics, event logging, cost tracking

Each layer is a standalone Go package with its own doc.go explaining domain concepts and design decisions. Reading those files is the recommended starting point before modifying any layer.

2. Dependency Flow

Adapters (cli, tui, api) | v orchestrate <-- Control hub: coordinates all middle layers | +-------+--------+----------+---------+----------+--------+ | | | | | | | v v v v v v v context tool constraint rerank validate prompt memory | | | | | | | +-------+--------+----------+---------+----------+--------+ | v inference <-- Engine abstraction (Infer + ModelInfo) | v kronk <-- SDK adapter (converts pagantic types to model.D) | v core <-- Shared types (Message, ToolCall, Schema, TokenUsage)

Key Insight: Flat Middle Layers

Layers 3 through 10 (context, tool, constraint, rerank, validate, prompt, memory, observe) form a flat band. They do not depend on each other. Each middle layer depends only on core (layer 0) and optionally on inference (layer 1).

The orchestrate layer (layer 2) is the only layer that reaches across the band. It imports from tool, constraint, memory, and observe to coordinate their work. This makes orchestrate the "control hub" -- it knows about everything, but everything does not know about it.

Why this matters. You can change the reranking algorithm without touching context retrieval. You can swap the prompt template without affecting tool execution. Each middle layer evolves independently as long as it speaks the shared vocabulary defined in core.

3. Structural Typing Pattern

Go interfaces are satisfied implicitly. A type does not need to declare "I implement X" -- it just needs to have the right methods. pagantic exploits this to keep layers decoupled.

Example: ContextProvider

The orchestrate layer defines what it needs from a context provider:

// In layers/02_orchestrate/loop.go
// orchestrate defines the interface it consumes:
type ContextProvider interface {
    Build(ctx context.Context, query string) ([]core.Message, error)
}

The context layer implements a ContextBuilder that satisfies this interface -- without importing orchestrate:

// In layers/03_context/builder.go
// context.ContextBuilder satisfies orchestrate.ContextProvider
// via Go structural typing -- no import of orchestrate needed.
type ContextBuilder struct {
    Retriever Retriever
    MaxChunks int
}

func (cb *ContextBuilder) Build(ctx context.Context, query string) ([]core.Message, error) {
    // retrieves chunks from Retriever, assembles into system messages
    chunks, err := cb.Retriever.Retrieve(ctx, query, cb.MaxChunks)
    if err != nil {
        return nil, err
    }

    var builder strings.Builder
    builder.WriteString("Relevant context:\n\n")
    for i, chunk := range chunks {
        fmt.Fprintf(&builder, "[%d] (%s): %s", i+1, chunk.Source, chunk.Content)
        if i < len(chunks)-1 {
            builder.WriteString("\n")
        }
    }

    return []core.Message{core.NewSystemMessage(builder.String())}, nil
}

Example: CandidateReranker

Same pattern. Orchestrate defines CandidateReranker:

// In layers/02_orchestrate/plan.go
type CandidateReranker interface {
    Rerank(ctx context.Context, input RerankInput) ([]RerankCandidate, error)
}

The rerank layer has its own Reranker type with a matching Rerank method. An adapter function at the call site converts between orchestrate's RerankCandidate and rerank's Candidate types.

Why This Pattern?

Structs cannot use structural typing. Go's implicit satisfaction works for interfaces only, not structs. For struct types that cross layer boundaries -- like RerankCandidate -- the fields are explicitly mirrored in each layer. Orchestrate defines its own RerankCandidate with the same fields as rerank's Candidate, and adapter code converts between them at the call site.
// orchestrate defines its own struct (layers/02_orchestrate/plan.go):
type RerankCandidate struct {
    Content  string
    Score    float64
    Source   string
    Metadata map[string]any
}

// rerank defines its own struct (layers/06_rerank/scorer.go):
type Candidate struct {
    Content  string
    Score    float64
    Source   string
    Metadata map[string]any
}

// Adapter code at the call site converts between the two.
// Fields are identical, making conversion straightforward.

4. Adapter Pattern

Adapters live in adapters/ (cli, tui, api). They are thin shells that translate external input into internal types, call the orchestrate layer, and format output. An adapter must not contain:

CLI Adapter Example

// adapters/cli/runner.go - thin shell around orchestrate
type RunConfig struct {
    Engine          inference.Engine
    SystemPrompt    string
    Registry        *tool.Registry
    ContextProvider orchestrate.ContextProvider
    Stream          *inference.StreamHandler
    Timeout         time.Duration
    Out             io.Writer
}

type Runner struct {
    cfg RunConfig
}

func NewRunner(cfg RunConfig) *Runner {
    if cfg.Engine == nil {
        panic("cli: RunConfig.Engine must not be nil")
    }
    if cfg.Out == nil {
        cfg.Out = os.Stdout
    }
    return &Runner{cfg: cfg}
}

func (r *Runner) Run(ctx context.Context, prompt string) error {
    // 1. Apply timeout
    timeout := r.cfg.Timeout
    if timeout <= 0 {
        timeout = DefaultTimeout // 120s
    }
    ctx, cancel := context.WithTimeout(ctx, timeout)
    defer cancel()

    // 2. Delegate to orchestrate layer
    agent := orchestrate.NewAgentLoop(orchestrate.LoopConfig{
        SystemPrompt:    r.cfg.SystemPrompt,
        Engine:          r.cfg.Engine,
        Tools:           r.cfg.Registry,
        Stream:          r.cfg.Stream,
        ContextProvider: r.cfg.ContextProvider,
    })

    result, err := agent.Chat(ctx, prompt)
    if err != nil {
        return fmt.Errorf("cli: %w", err)
    }

    // 3. Write output (only if not streaming)
    if r.cfg.Stream == nil {
        fmt.Fprintln(r.cfg.Out, result.Content)
    }
    return nil
}

Notice what Runner.Run does: set timeout, build AgentLoop, call Chat, write output. No prompt construction, no tool execution, no JSON validation. All of that lives in the layers below.

Testing adapters. Because adapters contain no logic, testing them means testing wiring: "does the right config reach the right layer?" A mock Engine that returns a canned result is all you need.

5. Core Domain Types

Layer 0 (core) defines the shared vocabulary. Every other layer speaks these types. There are no external dependencies -- just plain Go structs.

Message

// core/message.go - the conversation unit
type Role string

const (
    RoleSystem    Role = "system"
    RoleUser      Role = "user"
    RoleAssistant Role = "assistant"
    RoleTool      Role = "tool"
)

type Message struct {
    Role       Role
    Content    string
    ToolCalls  []ToolCall // assistant requesting tool execution
    ToolCallID string     // tool result referencing original call
    Name       string     // tool name for tool result messages
}

// Convenience constructors enforce correct field combinations:
func NewSystemMessage(content string) Message   { ... }
func NewUserMessage(content string) Message     { ... }
func NewAssistantMessage(content string) Message { ... }
func NewToolResultMessage(callID, name, content string) Message { ... }

Fields are populated differently depending on the Role. System and user messages carry Content only. Assistant messages carry Content and/or ToolCalls. Tool messages carry Content, ToolCallID, and Name.

ToolCall and ToolDefinition

// core/toolcall.go
type ToolCall struct {
    ID        string
    Name      string
    Arguments map[string]any
}

type ToolDefinition struct {
    Name        string
    Description string
    Parameters  Schema // JSON Schema for the function's input
}

type ToolResult struct {
    CallID  string
    Name    string
    Content string
    IsError bool
}

Schema

// core/schema.go - JSON Schema subset for structured output
type Schema struct {
    Type        string            `json:"type,omitempty"`
    Description string            `json:"description,omitempty"`
    Properties  map[string]Schema `json:"properties,omitempty"`
    Required    []string          `json:"required,omitempty"`
    Enum        []string          `json:"enum,omitempty"`
    Items       *Schema           `json:"items,omitempty"`
    Default     any               `json:"default,omitempty"`
}

Schema covers the common cases: objects with typed properties, arrays, enums, and primitive types. It is used both for structured LLM output and for tool parameter definitions.

TokenUsage

// core/usage.go - inference metrics
type TokenUsage struct {
    PromptTokens    int
    ReasoningTokens int
    OutputTokens    int
    ContextTokens   int
    ContextWindow   int
    TokensPerSecond float64
}
Why these types live in core. Every layer needs to pass messages around, describe tools, or report token usage. Putting these in layer 0 means no circular imports -- everyone can reach them, but core reaches nobody.

6. Engine Abstraction (kronk)

The inference.Engine interface is the key abstraction separating pagantic from any specific model runtime:

// layers/01_inference/engine.go
type Engine interface {
    Infer(ctx context.Context, req Request) (*Result, error)
    ModelInfo() ModelInfo
}

type Request struct {
    Messages    []core.Message
    Tools       []core.ToolDefinition
    Schema      *core.Schema
    Grammar     string     // GBNF grammar for decoder-level constraint
    MaxTokens   int
    Temperature *float64   // nil means use model default
    Options     map[string]any
}

type Result struct {
    Content   string
    ToolCalls []core.ToolCall
    Messages  []core.Message
    Usage     core.TokenUsage
}

type ModelInfo struct {
    Name          string
    ContextWindow int
}

Everything above inference (orchestrate, adapters) works with Engine. They never see vendor-specific types. This means you can swap model runtimes without touching any orchestration code.

kronk Adapter

The kronk package bridges pagantic types to kronk's model.D format. It lives outside the layer hierarchy because it depends on an external SDK.

// kronk/adapter.go
type Adapter struct {
    chat    llmChat // interface that kronk engine satisfies
    handler *inference.StreamHandler
}

// llmChat is the interface kronk's engine satisfies:
type llmChat interface {
    ChatStreaming(ctx context.Context, d model.D) (<-chan model.ChatResponse, error)
    ModelConfig() model.Config
}

func NewAdapter(chat llmChat, handler *inference.StreamHandler) *Adapter {
    return &Adapter{chat: chat, handler: handler}
}

// Infer converts pagantic Request to model.D, calls ChatStreaming,
// streams chunks, and assembles the Result.
func (a *Adapter) Infer(ctx context.Context, req inference.Request) (*inference.Result, error) {
    requestD := model.D{}
    requestD["messages"] = messagesToD(req.Messages)
    if req.MaxTokens > 0 {
        requestD["max_tokens"] = req.MaxTokens
    }
    if len(req.Tools) > 0 {
        requestD["tools"] = toolDefsToD(req.Tools)
    }
    if req.Grammar != "" {
        requestD["grammar"] = req.Grammar
    }
    if req.Schema != nil {
        requestD["json_schema"] = schemaToD(*req.Schema)
    }

    ch, err := a.chat.ChatStreaming(ctx, requestD)
    // ... stream processing, tool call extraction, usage tracking
    return &inference.Result{
        Content:   content.String(),
        ToolCalls: toolCalls,
        Messages:  messages,
        Usage:     extractUsage(a.chat, lastResp),
    }, nil
}

func (a *Adapter) ModelInfo() inference.ModelInfo {
    cfg := a.chat.ModelConfig()
    return inference.ModelInfo{
        Name:          path.Base(cfg.ModelFiles[0]),
        ContextWindow: cfg.ContextWindow(),
    }
}
Typical usage at the application level:
krn, cleanup, err := kronk.Load(ctx, kronk.Config{
    ModelSource: "unsloth/gemma-4-E4B-it",
})
defer cleanup()
engine := kronk.NewAdapter(krn, nil) // satisfies inference.Engine

7. System Contracts

pagantic defines stable boundary contracts at the system edge. Every adapter maps external input into a SystemRequest and maps a SystemResponse back to the user. No adapter accesses internal layers directly.

Boundary Types

ContractDirectionPurpose
SystemRequest Adapter -> System Carries messages, mode, execution hints, output contract, correlation ids
SystemResponse System -> Adapter Returns content, structured output, confidence, validation result, token usage, error
SystemError System -> Adapter Canonical error with category from failure taxonomy, retryable flag, details

Execution Modes

SystemRequest carries a Mode field that selects the orchestration pattern:

ModePatternDescription
chatAgentLoopMulti-turn conversation with tool loop
structuredSpecializedLoopSingle-shot schema-constrained output
planPlanExecutorMulti-step typed pipeline
redundantRedundantLoopN-version inference with voting

Execution Lifecycle

Every request follows a lifecycle state machine with defined states and transitions:

INIT -> PLAN -> PREPARE -> EXECUTE -> VALIDATE -> COMPLETE | | | | | +-------+--------+----------+----------+---> ERROR +-------+--------+----------+----------+---> CANCELLED

Each state emits observability events. RetryPolicy is a lifecycle transition: EXECUTE -> VALIDATE -> (back to EXECUTE on repair/retry, or ERROR on terminal failure). Full lifecycle details in Contracts: Execution Lifecycle.

Failure Taxonomy

All errors map to one of seven categories. Each category has a stable error code pattern, retryable flag, and recovery path:

CategoryEmitting LayerRetryable
InferenceFailureinference (L01)Sometimes
ToolFailuretool (L04)If idempotent
ConstraintFailureconstraint (L05)Yes
ValidationFailurevalidate (L07)Yes
OrchestrationFailureorchestrate (L02)No
ConfigurationFailureAny layer at initNo
CancellationAny layerNo

Full failure taxonomy with error codes and recovery paths in Contracts: Failure Taxonomy.

Canonical reference. The full contract specifications, lifecycle state definitions, and failure taxonomy live in Contracts. This section provides an architectural summary.

8. Execution IR

When data crosses step boundaries inside PlanExecutor, it uses a canonical intermediate representation (IR). The IR lives in the orchestrate layer and provides common field sets that avoid per-layer type conversion noise.

IR Types

TypePurposeKey Fields
StepInput Data entering a Step Tagged wrapper: may hold messages, candidates, context, or tool results
StepOutput Data leaving a Step Tagged wrapper: carries results forward to next Step
CandidateIR Unified candidate across steps content, source, score, metadata
ContextIR Retrieved context with provenance List of chunks/messages with source tracking
ToolCallIR / ToolResultIR Tool interaction records Already defined in core as ToolCall; documented here as part of IR

Score Semantics

CandidateIR carries a score field. Score meaning depends on who produced it:

Score contract. Every Retriever must document what its score means and whether higher is better. A Reranker must document whether it overwrites the original score or writes a separate field. See Retrieval: Score Semantics.

Conversion Responsibilities

The orchestrate layer owns the IR types. Other layers produce and consume their own native types (e.g., context.Chunk, rerank.Candidate). Conversion happens at integration sites - user code that wires layers together:

Layer rules preserved. Orchestrate owns the IR. Context and rerank layers do not import orchestrate. Conversion is done by user code at integration sites via structural typing. No layer dependency violations.

9. Observability Integration

Any layer can record events through the observe package. The pattern is consistent: capture start time, do work, record event with duration and error.

// Recording an event from any layer:
observer.Record(observe.Event{
    Timestamp: started,
    Layer:     "orchestrate",
    Action:    "infer",
    Data:      map[string]any{"messages": len(req.Messages)},
    Duration:  time.Since(started),
    Error:     err,
})

Core Types

// layers/10_observe/event.go
type Event struct {
    Timestamp time.Time
    Layer     string
    Action    string
    Data      map[string]any
    Duration  time.Duration
    Error     error
}

type EventLog interface {
    Record(event Event)
    Events() []Event
}
// layers/10_observe/trace.go
type TraceRecorder interface {
    StartSpan(ctx context.Context, name string) (context.Context, Span)
}

type Span interface {
    End()
    SetAttribute(key string, value any)
    RecordError(err error)
}
// layers/10_observe/metrics.go
type MetricsCollector interface {
    RecordLatency(layer string, duration time.Duration)
    RecordTokens(usage core.TokenUsage)
    IncrementCounter(name string, delta int)
}
// layers/10_observe/cost.go
type CostTracker interface {
    RecordUsage(model string, usage core.TokenUsage)
    TotalCost() float64
}

Implementations

Each interface has two implementations: an in-memory version for testing and development, and a no-op version for production paths that do not need observability:

InterfaceIn-MemoryNo-Op
EventLogInMemoryEventLogNoOpEventLog
TraceRecorderInMemoryTracerNoOpTracer
MetricsCollectorInMemoryMetricsNoOpMetrics
CostTrackerInMemoryCostTrackerNoOpCostTracker
Thread safety. All in-memory implementations use sync.Mutex for safe concurrent access. Events and spans are deep-copied on read and write to prevent aliasing bugs.

Correlation Model

All observability events carry correlation identifiers that link events to requests, sessions, and execution spans:

// Correlation identifiers on every event
CorrelationContext {
    RequestID       string  // Stable per SystemRequest
    SessionID       string  // Stable per conversation (AgentLoop)
    TraceID         string  // Distributed tracing root
    SpanID          string  // Current execution unit
    ParentSpanID    string  // Parent in span hierarchy
    CausedBy        string  // Causal link: tool_call_id, step_name
}

Required Event Fields

Every event must include:

Inference events additionally include: message_count, tool_defs_count, schema_present, grammar_present, temperature. Tool events include the tool_call_id from the inference response that triggered the tool call, establishing causality.

Full timeline reconstruction. With correlation ids on all events, you can reconstruct the complete request execution timeline. Tool start/end events link back to the inference response that emitted the tool call via tool_call_id. Full event schemas in Contracts: Observability Correlation.

10. Prompt Layer Integration

The prompt layer (L08) shapes model behavior through structured prompt construction. Orchestration patterns consume prompts through a structural typing interface rather than raw strings.

PromptProvider Pattern

// Structural typing interface consumed by orchestrate:
type PromptProvider interface {
    BuildSystemPrompt() (core.Message, error)
}

Prompt layer implementations that expose BuildSystemPrompt() (core.Message, error) satisfy this interface without importing orchestrate - same structural typing pattern as ContextProvider and CandidateReranker. The existing SystemPrompt.Build() and Template.Render() use different signatures; production code bridges between them or implements the interface directly.

Context Placement Policy

When retrieved context is injected into prompts, placement follows ContextPolicy rules:

String convenience. Orchestration patterns accept a raw SystemPrompt string as a convenience default. PromptProvider is the primary mechanism for production use where prompt assembly needs to be structured, testable, and policy-driven.

11. Design Principles

Deterministic Control Around Probabilistic Inference

LLM output is inherently non-deterministic. pagantic wraps it with deterministic control structures: tool loops with iteration limits, schema validation with repair, GBNF grammar constraints at the decoder level, and redundant inference with majority voting. The probabilistic part is contained within Engine.Infer() -- everything outside that call is predictable and testable.

Explicit State

No hidden closures, no package-level globals, no init() functions with side effects. Conversation state lives in ConversationBuffer. Observer state lives in EventLog. Configuration is passed as structs. This means you can inspect any value at any point in the call stack.

// State is always visible and inspectable:
agent := orchestrate.NewAgentLoop(orchestrate.LoopConfig{
    Engine:       engine,
    Tools:        registry,
    SystemPrompt: "You are helpful.",
    MaxTokens:    2048,
    Observer:     &observe.InMemoryEventLog{},
})

// After a call, you can inspect everything:
messages := agent.Messages()     // full conversation history
events := observer.Events()      // every action that happened

No Magic

Every inference call is logged. Every tool execution is recorded. Every structured output is validated against its schema. Every JSON repair attempt is traceable. When something goes wrong, the event log tells you exactly what happened, in what order, and how long each step took.

Composable Constraints

Output constraints can be layered:

  1. GBNF grammar -- decoder-level constraint that forces the model to produce tokens matching a grammar.
  2. JSON schema -- structural validation after generation.
  3. Enum normalization -- coerces string values to match allowed enum entries.
  4. JSON repair -- fixes common LLM JSON errors (trailing commas, unquoted keys).
  5. Schema validation -- final check that output matches required fields and types.

These are not mutually exclusive. A single ChatStructured call can apply grammar constraint at inference time, then repair, normalize, and validate the result:

// From AgentLoop.ChatStructured -- constraints compose naturally:
result, err := al.infer(ctx, inference.Request{
    Messages: messages,
    Schema:   &schema,
    Grammar:  al.cfg.Grammar, // decoder-level GBNF
})

// Post-inference constraint pipeline:
if json.Valid([]byte(result.Content)) {
    result.Content = constraint.NormalizeEnumValues(result.Content, schema)
    return al.validateSchema(result, schema)
}

repaired := constraint.RepairJSON(result.Content)
result.Content = constraint.NormalizeEnumValues(repaired, schema)
return al.validateSchema(result, schema)
Grammar is optional. Not all models support GBNF grammar. When Grammar is empty, pagantic relies on post-generation constraints (repair + validation) to enforce structure. When grammar is available, it provides a stronger guarantee at the decoder level.