Clinical Knowledge Base — RAG Retrieval Demo
A live retrieval pipeline over clinically-structured biomarker briefs. Your question is embedded by Pinecone's hosted model, then matched against the index behind a hard metadata filter. Answers are returned verbatim from approved entries — no language model generates them.
How consistency is enforced
The discipline behind the demo — the part that usually breaks in a RAG knowledge base.
One schema, enforced in code
Every brief is validated against a Pydantic schema with controlled vocabularies before it is ever embedded. Invalid entries never reach the index.
Deterministic IDs, no duplicates
Each entry's vector ID is a hash of its source brief, so re-ingesting upserts in place. A content hash skips unchanged entries to avoid needless re-embedding.
Hard metadata filter
Queries filter on approved status, language and optional category before vector similarity — so unapproved or off-category content can't surface.
Anti-hallucination threshold
Matches below a cosine score are dropped, so the assistant declines rather than inventing an answer. No LLM generates the response text.
Stack: Next.js · Pinecone serverless with integrated inference (llama-text-embed-v2, cosine) — embeddings hosted, no third-party LLM key on the public endpoint.