Question 1

What makes a RAG system actually accurate in production?

Accepted Answer

Four layers that all need to be right: chunking (are the right passages being indexed?), retrieval (are they being found when asked?), ranking (are the most relevant ones at the top?), and generation (is the LLM answering from what was retrieved rather than from its weights?). Most systems that don't work well have a problem in one of the first three layers — not the LLM. We measure each layer separately so we know which one to fix.

Question 2

Which vector database do you recommend?

Accepted Answer

It depends on your infrastructure and scale. pgvector if you're already on PostgreSQL and want to avoid managing another database — it handles millions of vectors with reasonable latency. Pinecone for managed, high-scale deployments where you don't want to run infrastructure. Weaviate or Qdrant for self-hosted with more control over filtering and indexing. Chroma for local development and prototyping. We'll recommend based on your specific query patterns, update frequency, and scale requirements.

Question 3

What is hybrid search and does it actually help?

Accepted Answer

Hybrid search combines dense vector search (semantic similarity) with sparse keyword search (BM25 or similar). Pure vector search misses exact matches — if a user asks about 'ISO 27001' and the document uses the exact phrase, keyword search finds it reliably. Pure keyword search misses paraphrase — if a user asks 'what are the information security requirements' it may not match 'ISO 27001 compliance obligations'. Hybrid search, with tuned reciprocal rank fusion weighting, typically improves retrieval recall by 10–20% on real-world document corpora.

Question 4

How do you handle documents that update frequently?

Accepted Answer

We build incremental ingestion pipelines — detecting changed or new documents, re-chunking and re-embedding only the affected content, and updating the vector index without a full rebuild. We also maintain document-level metadata so you can filter by source, date, or version at query time. Full re-indexing is available as a fallback but shouldn't be the normal path.

Question 5

Can you build RAG over our internal company data (Notion, Confluence, Slack)?

Accepted Answer

Yes. We build connectors for common internal data sources — Notion, Confluence, Google Drive, SharePoint, Slack, and databases — with OAuth auth, incremental sync, and access control that respects the source system's permissions. Users should only be able to retrieve documents they could have accessed directly.

Question 6

How do you measure whether a RAG system is working?

Accepted Answer

RAGAS provides four core metrics: faithfulness (does the answer come from the retrieved context?), answer relevance (does it answer the question?), context recall (did retrieval find the relevant passages?), and context precision (were irrelevant passages excluded?). We set up these metrics as a regression suite — run against a ground truth Q&A set after every change — so you can see whether improvements in one metric cause regressions in another.

Question 7

Do your RAG developers sign NDAs before project details are shared?

Accepted Answer

Yes. NDA and IP assignment agreements are signed before any proprietary documents, data, or system details are discussed. Standard for every engagement.

Hire RAG
Developers

What our RAG developers build for you

Chunking Strategy & Document Processing

Vector Database Implementation

Hybrid Search (Semantic + Keyword)

Reranking & Context Compression

RAG Evaluation (RAGAS & Custom Evals)

RAG Pipeline Production Deployment