assistant18 May 2026EN

Domain grounding — why the Nortinia AI Assistant does not hallucinate

Retrieval-then-generate, a hallucination guard, a 412-question eval. The Nortinia AI Assistant only answers when the KB actually contains the answer.

Domain grounding

The Nortinia AI Assistant uses a strictly tenant-owned knowledge base (KB) for any question that is not common knowledge and not an in-app action. This article walks through the retrieval-then-generate pipeline, the hallucination guard, and the May 2026 evaluation results.

How the knowledge base is built

Every tenant can upload its own documents (PDF, DOCX, Markdown, HTML) in the assistant admin UI. The pipeline:

Chunked — documents are split into ~800-token pieces with 100-token overlap at boundaries.
Embedded — vectorised with OpenAI text-embedding-3-large.
Indexed — stored in pgvector, scoped by tenantId.
Metadata-tagged — document title, page range, upload date, uploader.

A typical tenant uploads 20-200 documents: T&C, GDPR policy, internal procedures, product catalogue, FAQ.

Retrieval-then-generate

For every knowledge question ("what is our refund policy?", "how many days to ship to Slovakia?") the flow:

Retrieval — embed the user question, fetch the top-5 most relevant chunks from the tenant KB.
Relevance check — if the top-1 chunk's cosine similarity is < 0.72, the system does NOT call the LLM for generation; it returns: "I did not find information on this in your knowledge base."
Generate — if there is a relevant chunk, the LLM is prompted strictly: "answer ONLY from the supplied context, cite by reference, if not present say it is not present."
Citation embedding — every factual statement in the reply carries an automatic [1], [2] pointer to the source chunk (document + page).

The hallucination guard

The LLM's reply is checked by a second model (post-process step): each factual statement is matched against the cited chunk. If the chunk does not support the statement, it is flagged. With 2 or more flags, the assistant either rewrites the reply more narrowly or returns "not found".

This adds latency (about +400ms) but cut hallucination to 96.4% grounded-only (see below).

The 412-question eval

In May 2026 we assembled a 412-question reference set across 5 customer KBs. Questions were balanced:

60% — clearly "in the KB" type (cited answer expected)
30% — "not in the KB" type ("not found" expected)
10% — misleading type (KB has something similar but not the exact answer)

The results:

Grounded-only answers: 96.4% (398/412)
Wrong citation: 1.7% (7/412) — answer correct, wrong chunk pointer
Hallucinated answer: 0.5% (2/412) — stated a fact not in the KB
Too conservative ("not found" when there was): 1.4% (6/412)

The 0.5% hallucination rate is below our GA threshold. For comparison: the same question set on a raw gpt-5 without KB produced a 38% hallucination rate.

What we do not ground

Two things we deliberately let the LLM use "its own" knowledge for:

Common knowledge — "how many months in a year", "what is HU VAT in general". Grounding overhead is not worth it.
In-app actions — "navigate to the invoice", "create a refund". These are not knowledge questions; they are MCP tool calls. Different path.

Edge cases ("what is the HU intra-EU VAT rate as of July 2026") count as knowledge questions, and if the tenant KB does not have them, the system says: "I did not find this, I suggest checking the NAV website".

Per-tenant isolation

One of the most important rules: tenant A's KB NEVER leaks to tenant B. The pgvector query carries a mandatory tenant_id = $current filter; two integration tests on every release (and a monthly chaos test) guarantee that the filter cannot be broken even by prompt injection.

Roadmap

Two big improvements queued: (1) cross-document reasoning — when the answer needs combining chunks from two different documents, today's top-5 retrieval does not always surface both; multi-hop retrieval is needed. (2) auto-refresh KB — daily scan to see whether a new doc landed in the tenant's Drive and auto-import (opt-in).