Retrieval & Reranking

1. The RAG Pattern in pagantic
2. Context Layer (Layer 3)
3. Structural Typing: ContextProvider
4. Ephemeral Context Injection
5. Rerank Layer (Layer 6)
6. CandidateReranker - Cross-Layer Integration
7. Full RAG Pipeline with PlanExecutor
8. Custom Retriever Implementation
Score Semantics Contract
Consistency Model

1. The RAG Pattern in pagantic

Retrieval-Augmented Generation (RAG) gives the model relevant knowledge before generation. Instead of relying on whatever the model memorized during training, pagantic retrieves bounded, deterministic context for every query and injects it into the prompt.

Three invariants govern how pagantic handles knowledge:

Model never acts on unconstrained knowledge. Every piece of context the model sees was explicitly retrieved and scored.
Context is bounded and deterministic. Chunk counts are capped, scores are thresholded, and sources are tracked.
Retrieved per query, not accumulated. Each turn gets fresh context. Previous context does not leak into future turns.

Why not just stuff everything into the prompt? Token limits, noise, and cost. Retrieval narrows the knowledge surface to what matters for this specific query, keeping the prompt focused and the model grounded.

2. Context Layer (Layer 3)

Layer 3 gives the model relevant, bounded knowledge. Input is user query. Output is structured context block injected into the message list.

Retriever Interface

Retriever is the core abstraction. Implement it to connect any knowledge source - vector database, search engine, file system, or API.

// Retriever finds relevant content for query.
type Retriever interface {
    Retrieve(ctx context.Context, query string, limit int) ([]Chunk, error)
}

// Chunk is piece of retrieved content with relevance metadata.
type Chunk struct {
    Content  string
    Source   string
    Score    float64
    Metadata map[string]any
}

Each Chunk carries its content, a source identifier for traceability, a relevance score, and arbitrary metadata. The score field lets downstream components (like the rerank layer) compare and filter chunks.

InMemoryRetriever - Built-in Keyword Matcher

pagantic ships with InMemoryRetriever, a simple keyword-matching retriever that operates on in-memory documents. Good for bounded domains with known document sets, tests, and prototyping.

// Document is stored document for in-memory retriever.
type Document struct {
    Content  string
    Source   string
    Metadata map[string]any
}

// InMemoryRetriever does keyword matching against in-memory documents.
// Good for bounded domains with known document sets.
retriever := pctx.NewInMemoryRetriever(
    pctx.Document{Content: "Go interfaces are satisfied implicitly.", Source: "go-spec"},
    pctx.Document{Content: "Goroutines are lightweight threads.", Source: "go-concurrency"},
    pctx.Document{Content: "Channels are typed conduits for communication.", Source: "go-channels"},
)

InMemoryRetriever is intentionally simple. It matches query keywords against document content using basic string overlap. For production workloads, implement the Retriever interface with a vector database, embedding search, or hybrid retrieval strategy. See Custom Retriever Implementation below.

ContextBuilder - Assembling Chunks into Messages

ContextBuilder bridges retrieval and inference. It calls the Retriever, collects chunks, and formats them into []core.Message ready for the model.

// ContextBuilder assembles retrieved chunks into messages for model.
builder := &pctx.ContextBuilder{
    Retriever: retriever,
    MaxChunks: 3,  // max chunks to include; 0 means use all retrieved
}

// Build returns []core.Message with retrieved context as system messages
msgs, err := builder.Build(ctx, "How do interfaces work?")
// msgs[0].Content:
// "Relevant context:
//
// [1] (go-spec): Go interfaces are satisfied implicitly."

MaxChunks controls how many chunks make it into the context. Set it to bound token usage and keep the prompt focused. Zero or negative values mean "use everything the retriever returns."

3. Structural Typing: ContextProvider

The orchestrate layer (layer 2) defines ContextProvider as the interface for retrieving context messages:

// In orchestrate package

// ContextProvider retrieves context messages for a query.
type ContextProvider interface {
    Build(ctx context.Context, query string) ([]core.Message, error)
}

Notice that context.ContextBuilder has exactly the same Build method signature. In Go, interfaces are satisfied implicitly - no implements keyword, no import of the orchestrate package needed. This is structural typing.

Go structural typing in action. ContextBuilder satisfies ContextProvider without knowing it exists. The context package has zero dependency on orchestrate. This keeps layer boundaries clean - lower layers never import higher layers.

// Wire it up - ContextBuilder satisfies ContextProvider implicitly
agent := orchestrate.NewAgentLoop(orchestrate.LoopConfig{
    Engine:          engine,
    ContextProvider: contextBuilder,  // ContextBuilder satisfies ContextProvider
})

4. Ephemeral Context Injection

Key design decision: context is retrieved fresh per Chat() call and injected into the request, but NOT stored in conversation history.

Why? Prevents context from accumulating across turns. Each turn gets fresh, relevant context. Old context documents do not pollute future prompts or consume token budget.

Turn 1: "What is Go?" -> Retrieve: [go-overview doc] -> Request: [system, context-doc, user-msg] -> Memory stores: [system, user-msg, assistant-response] -> Context NOT in memory Turn 2: "Tell me about channels" -> Retrieve: [go-channels doc] (FRESH retrieval) -> Request: [system, user-msg-1, assistant-1, context-doc, user-msg-2] -> Memory stores: [..., user-msg-2, assistant-response-2]

In AgentLoop.Chat, the flow is:

User message arrives.
If ContextProvider is set, call Build(ctx, query) to retrieve context messages.
Assemble request: system prompt + conversation history + context messages + user message.
Send to inference engine.
Store user message and assistant response in memory. Context messages are not stored.

In SpecializedLoop, context is retrieved once per Call using the original prompt and injected into a fresh inner loop before the tool/structured phases.

This ephemeral injection pattern means:

Token budget stays stable across long conversations.
Each turn's context is maximally relevant to that turn's query.
No stale knowledge lingers from earlier turns.
Memory only contains what the user said and what the model replied.

5. Rerank Layer (Layer 6)

Two-stage retrieval: broad recall first, then precise selection. Layer 6 scores retrieved or generated candidates, reorders them by semantic relevance, and picks the best subset.

Core Types

// Candidate represents a scored item for ranking.
type Candidate struct {
    Content  string
    Score    float64
    Source   string
    Metadata map[string]any
}

// CandidateSet groups candidates with original query.
type CandidateSet struct {
    Query      string
    Candidates []Candidate
}

// RelevanceScorer assigns relevance scores to candidates.
type RelevanceScorer interface {
    Score(ctx context.Context, query string, candidates []Candidate) ([]Candidate, error)
}

// SelectionPolicy controls filtering after scoring.
type SelectionPolicy struct {
    TopK     int     // max candidates to return, 0 means all
    MinScore float64 // minimum score threshold, 0 means no filter
}

The separation of RelevanceScorer (scoring) and SelectionPolicy (filtering) keeps concerns clean. Swap scorers without changing selection logic. Tune thresholds without touching the scorer.

SimpleScorer - Keyword Overlap

// SimpleScorer scores candidates by keyword overlap with query.
// For testing and development only.
scorer := &rerank.SimpleScorer{}

// Counts how many query keywords appear in candidate content.
// Returns candidates with Score field populated.

SimpleScorer is for development and testing. For production, implement RelevanceScorer with a cross-encoder model, LLM-based scoring, or embedding similarity.

Reranker - Combines Scoring and Selection

Reranker is the primary entry point. It takes a scorer and a selection policy, scores all candidates, sorts by score, then applies the policy to filter.

// Reranker combines a scorer with a selection policy to produce
// a refined, filtered candidate list.
reranker := &rerank.Reranker{
    Scorer: &rerank.SimpleScorer{},
    Policy: rerank.SelectionPolicy{
        TopK:     3,    // keep top 3
        MinScore: 0.1,  // minimum relevance
    },
}

results, err := reranker.Rerank(ctx, rerank.CandidateSet{
    Query: "How do interfaces work?",
    Candidates: []rerank.Candidate{
        {Content: "Go interfaces are implicit.", Source: "go-spec", Score: 0},
        {Content: "Python uses duck typing.", Source: "python-docs", Score: 0},
        {Content: "Interface satisfaction requires no declaration.", Source: "go-faq", Score: 0},
    },
})
// results: top 3 candidates sorted by relevance score,
// filtered to those with score >= 0.1

The pipeline inside Rerank:

Pass all candidates to Scorer.Score() - populates Score fields.
Sort candidates by score descending.
Apply MinScore threshold - drop candidates below it.
Apply TopK cap - keep only the top K.
Return filtered, sorted candidates.

6. CandidateReranker - Cross-Layer Integration

The orchestrate layer defines its own reranking types to avoid importing the rerank package. Same structural typing pattern as ContextProvider.

// In orchestrate package

// RerankCandidate represents a scored item for plan-level reranking.
// Mirrors rerank.Candidate fields to make conversion straightforward
// across the layer boundary.
type RerankCandidate struct {
    Content  string
    Score    float64
    Source   string
    Metadata map[string]any
}

// RerankInput groups candidates with original query for reranking.
type RerankInput struct {
    Query      string
    Candidates []RerankCandidate
}

// CandidateReranker scores and filters candidates.
type CandidateReranker interface {
    Rerank(ctx context.Context, input RerankInput) ([]RerankCandidate, error)
}

The fields in RerankCandidate mirror rerank.Candidate exactly - same names, same types. But they are separate Go types. An adapter function converts between them at the integration boundary:

// Adapter wraps rerank.Reranker to satisfy orchestrate.CandidateReranker.
// Converts orchestrate types to rerank types, calls Rerank, converts back.
func adaptReranker(r *rerank.Reranker) orchestrate.CandidateReranker {
    return &rerankAdapter{inner: r}
}

// rerankAdapter bridges the type boundary.
type rerankAdapter struct {
    inner *rerank.Reranker
}

func (a *rerankAdapter) Rerank(ctx context.Context, input orchestrate.RerankInput) ([]orchestrate.RerankCandidate, error) {
    // Convert orchestrate.RerankCandidate -> rerank.Candidate
    candidates := make([]rerank.Candidate, len(input.Candidates))
    for i, c := range input.Candidates {
        candidates[i] = rerank.Candidate{
            Content: c.Content, Score: c.Score,
            Source: c.Source, Metadata: c.Metadata,
        }
    }

    // Call rerank layer
    results, err := a.inner.Rerank(ctx, rerank.CandidateSet{
        Query: input.Query, Candidates: candidates,
    })
    if err != nil {
        return nil, err
    }

    // Convert rerank.Candidate -> orchestrate.RerankCandidate
    out := make([]orchestrate.RerankCandidate, len(results))
    for i, r := range results {
        out[i] = orchestrate.RerankCandidate{
            Content: r.Content, Score: r.Score,
            Source: r.Source, Metadata: r.Metadata,
        }
    }
    return out, nil
}

Why separate types? The orchestrate package sits at layer 2. The rerank package sits at layer 6. If orchestrate imported rerank directly, it would create a dependency from a lower layer to a higher one. Mirrored types plus an adapter function keep the dependency direction correct: user code (above both layers) imports both and wires them together.

7. Full RAG Pipeline with PlanExecutor

End-to-end example combining retrieval, reranking, and inference using PlanExecutor. This shows how the layers compose into a complete RAG pipeline.

// Step 1: Build retriever and context provider
retriever := pctx.NewInMemoryRetriever(
    pctx.Document{Content: "Go interfaces are satisfied implicitly.", Source: "go-spec"},
    pctx.Document{Content: "Goroutines are lightweight threads.", Source: "go-concurrency"},
    pctx.Document{Content: "Channels are typed conduits.", Source: "go-channels"},
)
contextProvider := &pctx.ContextBuilder{Retriever: retriever, MaxChunks: 10}

// Step 2: Build reranker
reranker := &rerank.Reranker{
    Scorer: &rerank.SimpleScorer{},
    Policy: rerank.SelectionPolicy{TopK: 3, MinScore: 0.1},
}

// Step 3: Register step handlers
handlers := map[orchestrate.StepType]orchestrate.StepHandler{
    orchestrate.StepRetrieve: orchestrate.RetrieveHandler(contextProvider),
    orchestrate.StepRerank:   orchestrate.RerankHandler(adaptReranker(reranker)),
    orchestrate.StepInfer:    orchestrate.InferHandler(engine),
}

// Step 4: Define execution plan
plan := orchestrate.ExecutionPlan{
    Steps: []orchestrate.Step{
        {Name: "retrieve", Type: orchestrate.StepRetrieve, Input: "How do interfaces work?"},
        {Name: "rerank",   Type: orchestrate.StepRerank},
        {Name: "infer",    Type: orchestrate.StepInfer},
    },
}

// Step 5: Execute
executor := orchestrate.NewPlanExecutor(handlers)
results, err := executor.Execute(ctx, plan)
// results contains completed steps with Output populated
// Output of step N becomes available as Input of step N+1

The execution flow:

Step	Type	Input	Output
`retrieve`	`StepRetrieve`	Query string	`[]core.Message` with context
`rerank`	`StepRerank`	`RerankInput` from previous step	`[]RerankCandidate` filtered and sorted
`infer`	`StepInfer`	`inference.Request`	`*inference.Result` with model response

PlanExecutor chains step outputs. Output of step N is available as Input of step N+1. This means retrieved context flows into reranking, and reranked results flow into inference - all without manual wiring between steps.

8. Custom Retriever Implementation

InMemoryRetriever works for prototyping. For production, implement Retriever against a real knowledge store. Here is a vector database example:

type VectorRetriever struct {
    db       VectorDB
    embedder Embedder
}

func (vr *VectorRetriever) Retrieve(ctx context.Context, query string, limit int) ([]pctx.Chunk, error) {
    // Embed the query into a vector
    embedding, err := vr.embedder.Embed(ctx, query)
    if err != nil {
        return nil, err
    }

    // Search vector DB for nearest neighbors
    results, err := vr.db.Search(ctx, embedding, limit)
    if err != nil {
        return nil, err
    }

    // Convert DB results to Chunks
    chunks := make([]pctx.Chunk, len(results))
    for i, r := range results {
        chunks[i] = pctx.Chunk{
            Content:  r.Text,
            Source:   r.DocID,
            Score:    r.Similarity,
        }
    }
    return chunks, nil
}

The pattern is always the same:

Accept query and limit.
Transform the query into whatever your backend needs (embedding, keywords, SQL).
Query the backend.
Map results to []pctx.Chunk.
Return.

Retriever implementations to consider:

Vector DB (Pinecone, Weaviate, pgvector) - embedding similarity search.
Full-text search (Elasticsearch, Meilisearch) - keyword and BM25 scoring.
Hybrid - combine vector and keyword retrieval, then rerank.
API-backed - call an external knowledge service or search engine.
File system - read and score local documents or code files.

Once implemented, plug it into ContextBuilder and the rest of the pipeline works unchanged:

// Swap InMemoryRetriever for VectorRetriever - nothing else changes
contextProvider := &pctx.ContextBuilder{
    Retriever: &VectorRetriever{db: pineconeClient, embedder: openaiEmbedder},
    MaxChunks: 5,
}

Custom RelevanceScorer works the same way. Implement RelevanceScorer with a cross-encoder model or LLM-based scoring, then plug it into Reranker. The SimpleScorer is a drop-in placeholder for development.

Score Semantics Contract

Scores appear on both context.Chunk and rerank.Candidate, but they mean different things depending on who produced them.

RetrievalScore

Assigned by a Retriever during knowledge lookup. The meaning depends entirely on the retriever implementation:

Retriever Type	Score Meaning	Range	Higher Is Better
BM25 / keyword	Term frequency relevance	[0, unbounded)	Yes
Cosine similarity	Vector distance	[-1, 1]	Yes
InMemoryRetriever	Keyword overlap count	[0, N]	Yes

Contract. Every Retriever implementation must document: (1) what its score means, (2) what range it uses, and (3) whether higher is better. Scores are comparable only within a single retriever - do not compare scores across different retriever types.

RerankScore

Assigned by a RelevanceScorer after reranking. Reranking produces a normalized relevance score:

Scores should be normalized to [0..1] when possible
If normalization is not possible, the scorer must document "scores comparable only within single scorer"
Higher is always better for rerank scores

Score Field Behavior

When a Reranker processes candidates:

The Reranker overwrites Candidate.Score with the rerank score
The original retrieval score is not preserved unless the Reranker stores it in Candidate.Metadata
Custom Reranker implementations that want to preserve original scores should copy them to metadata before overwriting

Execution IR (Planned). CandidateIR is a planned typed wrapper for passing candidates across PlanExecutor step boundaries. Current PlanExecutor chains steps via raw any (Step.Input/Output fields). CandidateIR is available for optional use when callers want typed step boundaries; conversion between context.Chunk, rerank.Candidate, and CandidateIR happens in caller code.

Consistency Model

Retrieval in pagantic follows a fresh-per-turn consistency model by default.

Default Behavior

Context is retrieved fresh on every turn (AgentLoop) or every call (SpecializedLoop)
Retrieved context is working memory - ephemeral, not persisted
No caching between turns - the same query may return different results if the knowledge base changed
This ensures the model always sees current knowledge at the cost of repeated retrieval

Caching Policy (Extension Point)

An optional caching layer can be placed in front of any Retriever:

Cache key: query string (exact match or normalized)
TTL: configurable per retriever
Invalidation: on knowledge base update or TTL expiry

Not implemented. Caching is a documented extension point, not a built-in feature. Implement caching as a Retriever wrapper that checks cache before delegating to the underlying retriever.

Deduplication

When multiple retrievers or multiple queries return overlapping results, deduplication prevents redundant context:

Dedup key: Source + content hash, or Source + chunk id
Applied by ContextBuilder before assembling context messages
Keeps the highest-scoring instance when duplicates are found

Retrieval & Reranking

Table of Contents

1. The RAG Pattern in pagantic

2. Context Layer (Layer 3)

Retriever Interface

InMemoryRetriever - Built-in Keyword Matcher

ContextBuilder - Assembling Chunks into Messages

3. Structural Typing: ContextProvider

4. Ephemeral Context Injection

5. Rerank Layer (Layer 6)

Core Types

SimpleScorer - Keyword Overlap

Reranker - Combines Scoring and Selection

6. CandidateReranker - Cross-Layer Integration

7. Full RAG Pipeline with PlanExecutor

8. Custom Retriever Implementation

Score Semantics Contract

RetrievalScore

RerankScore

Score Field Behavior

Consistency Model

Default Behavior

Caching Policy (Extension Point)

Deduplication