Context retrieval, reranking, and the full RAG pipeline in pagantic.
Retrieval-Augmented Generation (RAG) gives the model relevant knowledge before generation. Instead of relying on whatever the model memorized during training, pagantic retrieves bounded, deterministic context for every query and injects it into the prompt.
Three invariants govern how pagantic handles knowledge:
Layer 3 gives the model relevant, bounded knowledge. Input is user query. Output is structured context block injected into the message list.
Retriever is the core abstraction. Implement it to connect
any knowledge source - vector database, search engine, file system,
or API.
// Retriever finds relevant content for query.
type Retriever interface {
Retrieve(ctx context.Context, query string, limit int) ([]Chunk, error)
}
// Chunk is piece of retrieved content with relevance metadata.
type Chunk struct {
Content string
Source string
Score float64
Metadata map[string]any
}
Each Chunk carries its content, a source identifier for
traceability, a relevance score, and arbitrary metadata. The score
field lets downstream components (like the rerank layer) compare and
filter chunks.
pagantic ships with InMemoryRetriever, a simple keyword-matching
retriever that operates on in-memory documents. Good for bounded domains
with known document sets, tests, and prototyping.
// Document is stored document for in-memory retriever.
type Document struct {
Content string
Source string
Metadata map[string]any
}
// InMemoryRetriever does keyword matching against in-memory documents.
// Good for bounded domains with known document sets.
retriever := pctx.NewInMemoryRetriever(
pctx.Document{Content: "Go interfaces are satisfied implicitly.", Source: "go-spec"},
pctx.Document{Content: "Goroutines are lightweight threads.", Source: "go-concurrency"},
pctx.Document{Content: "Channels are typed conduits for communication.", Source: "go-channels"},
)
Retriever
interface with a vector database, embedding search, or hybrid retrieval
strategy. See Custom Retriever Implementation
below.
ContextBuilder bridges retrieval and inference. It calls
the Retriever, collects chunks, and formats them into
[]core.Message ready for the model.
// ContextBuilder assembles retrieved chunks into messages for model.
builder := &pctx.ContextBuilder{
Retriever: retriever,
MaxChunks: 3, // max chunks to include; 0 means use all retrieved
}
// Build returns []core.Message with retrieved context as system messages
msgs, err := builder.Build(ctx, "How do interfaces work?")
// msgs[0].Content:
// "Relevant context:
//
// [1] (go-spec): Go interfaces are satisfied implicitly."
MaxChunks controls how many chunks make it into the context.
Set it to bound token usage and keep the prompt focused. Zero or negative
values mean "use everything the retriever returns."
The orchestrate layer (layer 2) defines ContextProvider
as the interface for retrieving context messages:
// In orchestrate package
// ContextProvider retrieves context messages for a query.
type ContextProvider interface {
Build(ctx context.Context, query string) ([]core.Message, error)
}
Notice that context.ContextBuilder has exactly the same
Build method signature. In Go, interfaces are satisfied
implicitly - no implements keyword, no import of the
orchestrate package needed. This is structural typing.
ContextBuilder satisfies ContextProvider
without knowing it exists. The context package has zero dependency on
orchestrate. This keeps layer boundaries clean - lower layers never
import higher layers.
// Wire it up - ContextBuilder satisfies ContextProvider implicitly
agent := orchestrate.NewAgentLoop(orchestrate.LoopConfig{
Engine: engine,
ContextProvider: contextBuilder, // ContextBuilder satisfies ContextProvider
})
Key design decision: context is retrieved fresh per Chat()
call and injected into the request, but NOT stored in
conversation history.
Why? Prevents context from accumulating across turns. Each turn gets fresh, relevant context. Old context documents do not pollute future prompts or consume token budget.
In AgentLoop.Chat, the flow is:
ContextProvider is set, call Build(ctx, query)
to retrieve context messages.Call using the original prompt and injected into a fresh
inner loop before the tool/structured phases.
This ephemeral injection pattern means:
Two-stage retrieval: broad recall first, then precise selection. Layer 6 scores retrieved or generated candidates, reorders them by semantic relevance, and picks the best subset.
// Candidate represents a scored item for ranking.
type Candidate struct {
Content string
Score float64
Source string
Metadata map[string]any
}
// CandidateSet groups candidates with original query.
type CandidateSet struct {
Query string
Candidates []Candidate
}
// RelevanceScorer assigns relevance scores to candidates.
type RelevanceScorer interface {
Score(ctx context.Context, query string, candidates []Candidate) ([]Candidate, error)
}
// SelectionPolicy controls filtering after scoring.
type SelectionPolicy struct {
TopK int // max candidates to return, 0 means all
MinScore float64 // minimum score threshold, 0 means no filter
}
The separation of RelevanceScorer (scoring) and
SelectionPolicy (filtering) keeps concerns clean.
Swap scorers without changing selection logic. Tune thresholds
without touching the scorer.
// SimpleScorer scores candidates by keyword overlap with query.
// For testing and development only.
scorer := &rerank.SimpleScorer{}
// Counts how many query keywords appear in candidate content.
// Returns candidates with Score field populated.
RelevanceScorer with a
cross-encoder model, LLM-based scoring, or embedding similarity.
Reranker is the primary entry point. It takes a scorer
and a selection policy, scores all candidates, sorts by score, then
applies the policy to filter.
// Reranker combines a scorer with a selection policy to produce
// a refined, filtered candidate list.
reranker := &rerank.Reranker{
Scorer: &rerank.SimpleScorer{},
Policy: rerank.SelectionPolicy{
TopK: 3, // keep top 3
MinScore: 0.1, // minimum relevance
},
}
results, err := reranker.Rerank(ctx, rerank.CandidateSet{
Query: "How do interfaces work?",
Candidates: []rerank.Candidate{
{Content: "Go interfaces are implicit.", Source: "go-spec", Score: 0},
{Content: "Python uses duck typing.", Source: "python-docs", Score: 0},
{Content: "Interface satisfaction requires no declaration.", Source: "go-faq", Score: 0},
},
})
// results: top 3 candidates sorted by relevance score,
// filtered to those with score >= 0.1
The pipeline inside Rerank:
Scorer.Score() - populates Score fields.MinScore threshold - drop candidates below it.TopK cap - keep only the top K.The orchestrate layer defines its own reranking types to avoid importing the rerank package. Same structural typing pattern as ContextProvider.
// In orchestrate package
// RerankCandidate represents a scored item for plan-level reranking.
// Mirrors rerank.Candidate fields to make conversion straightforward
// across the layer boundary.
type RerankCandidate struct {
Content string
Score float64
Source string
Metadata map[string]any
}
// RerankInput groups candidates with original query for reranking.
type RerankInput struct {
Query string
Candidates []RerankCandidate
}
// CandidateReranker scores and filters candidates.
type CandidateReranker interface {
Rerank(ctx context.Context, input RerankInput) ([]RerankCandidate, error)
}
The fields in RerankCandidate mirror rerank.Candidate
exactly - same names, same types. But they are separate Go types.
An adapter function converts between them at the integration boundary:
// Adapter wraps rerank.Reranker to satisfy orchestrate.CandidateReranker.
// Converts orchestrate types to rerank types, calls Rerank, converts back.
func adaptReranker(r *rerank.Reranker) orchestrate.CandidateReranker {
return &rerankAdapter{inner: r}
}
// rerankAdapter bridges the type boundary.
type rerankAdapter struct {
inner *rerank.Reranker
}
func (a *rerankAdapter) Rerank(ctx context.Context, input orchestrate.RerankInput) ([]orchestrate.RerankCandidate, error) {
// Convert orchestrate.RerankCandidate -> rerank.Candidate
candidates := make([]rerank.Candidate, len(input.Candidates))
for i, c := range input.Candidates {
candidates[i] = rerank.Candidate{
Content: c.Content, Score: c.Score,
Source: c.Source, Metadata: c.Metadata,
}
}
// Call rerank layer
results, err := a.inner.Rerank(ctx, rerank.CandidateSet{
Query: input.Query, Candidates: candidates,
})
if err != nil {
return nil, err
}
// Convert rerank.Candidate -> orchestrate.RerankCandidate
out := make([]orchestrate.RerankCandidate, len(results))
for i, r := range results {
out[i] = orchestrate.RerankCandidate{
Content: r.Content, Score: r.Score,
Source: r.Source, Metadata: r.Metadata,
}
}
return out, nil
}
End-to-end example combining retrieval, reranking, and inference
using PlanExecutor. This shows how the layers compose
into a complete RAG pipeline.
// Step 1: Build retriever and context provider
retriever := pctx.NewInMemoryRetriever(
pctx.Document{Content: "Go interfaces are satisfied implicitly.", Source: "go-spec"},
pctx.Document{Content: "Goroutines are lightweight threads.", Source: "go-concurrency"},
pctx.Document{Content: "Channels are typed conduits.", Source: "go-channels"},
)
contextProvider := &pctx.ContextBuilder{Retriever: retriever, MaxChunks: 10}
// Step 2: Build reranker
reranker := &rerank.Reranker{
Scorer: &rerank.SimpleScorer{},
Policy: rerank.SelectionPolicy{TopK: 3, MinScore: 0.1},
}
// Step 3: Register step handlers
handlers := map[orchestrate.StepType]orchestrate.StepHandler{
orchestrate.StepRetrieve: orchestrate.RetrieveHandler(contextProvider),
orchestrate.StepRerank: orchestrate.RerankHandler(adaptReranker(reranker)),
orchestrate.StepInfer: orchestrate.InferHandler(engine),
}
// Step 4: Define execution plan
plan := orchestrate.ExecutionPlan{
Steps: []orchestrate.Step{
{Name: "retrieve", Type: orchestrate.StepRetrieve, Input: "How do interfaces work?"},
{Name: "rerank", Type: orchestrate.StepRerank},
{Name: "infer", Type: orchestrate.StepInfer},
},
}
// Step 5: Execute
executor := orchestrate.NewPlanExecutor(handlers)
results, err := executor.Execute(ctx, plan)
// results contains completed steps with Output populated
// Output of step N becomes available as Input of step N+1
The execution flow:
| Step | Type | Input | Output |
|---|---|---|---|
retrieve |
StepRetrieve |
Query string | []core.Message with context |
rerank |
StepRerank |
RerankInput from previous step |
[]RerankCandidate filtered and sorted |
infer |
StepInfer |
inference.Request |
*inference.Result with model response |
InMemoryRetriever works for prototyping. For production,
implement Retriever against a real knowledge store.
Here is a vector database example:
type VectorRetriever struct {
db VectorDB
embedder Embedder
}
func (vr *VectorRetriever) Retrieve(ctx context.Context, query string, limit int) ([]pctx.Chunk, error) {
// Embed the query into a vector
embedding, err := vr.embedder.Embed(ctx, query)
if err != nil {
return nil, err
}
// Search vector DB for nearest neighbors
results, err := vr.db.Search(ctx, embedding, limit)
if err != nil {
return nil, err
}
// Convert DB results to Chunks
chunks := make([]pctx.Chunk, len(results))
for i, r := range results {
chunks[i] = pctx.Chunk{
Content: r.Text,
Source: r.DocID,
Score: r.Similarity,
}
}
return chunks, nil
}
The pattern is always the same:
query and limit.[]pctx.Chunk.
Once implemented, plug it into ContextBuilder and the
rest of the pipeline works unchanged:
// Swap InMemoryRetriever for VectorRetriever - nothing else changes
contextProvider := &pctx.ContextBuilder{
Retriever: &VectorRetriever{db: pineconeClient, embedder: openaiEmbedder},
MaxChunks: 5,
}
RelevanceScorer with a cross-encoder model or
LLM-based scoring, then plug it into Reranker. The
SimpleScorer is a drop-in placeholder for development.
Scores appear on both context.Chunk and rerank.Candidate,
but they mean different things depending on who produced them.
Assigned by a Retriever during knowledge lookup. The meaning depends entirely on the retriever implementation:
| Retriever Type | Score Meaning | Range | Higher Is Better |
|---|---|---|---|
| BM25 / keyword | Term frequency relevance | [0, unbounded) | Yes |
| Cosine similarity | Vector distance | [-1, 1] | Yes |
| InMemoryRetriever | Keyword overlap count | [0, N] | Yes |
Assigned by a RelevanceScorer after reranking. Reranking produces a normalized relevance score:
When a Reranker processes candidates:
Candidate.Score with the rerank scoreCandidate.Metadataany (Step.Input/Output fields).
CandidateIR is available for optional use when callers want typed step boundaries;
conversion between context.Chunk, rerank.Candidate, and
CandidateIR happens in caller code.
Retrieval in pagantic follows a fresh-per-turn consistency model by default.
An optional caching layer can be placed in front of any Retriever:
When multiple retrievers or multiple queries return overlapping results, deduplication prevents redundant context: