βΆWhat's Chroma and how is it different from a regular database?
Chroma = vector database specialized for embeddings (dense float arrays). Regular DB = structured data (tables, rows). Chroma = semantic search (find similar concepts, not exact matches). Example: search 'dog' β returns 'puppy', 'canine', 'animal' even if word 'dog' doesn't appear. Built for: RAG, semantic search, recommendation.
βΆCan I use Chroma for production with 100M embeddings?
Chroma is in-process or local-server for small-medium (<10M embeddings). For 100M+, use Pinecone (managed) or Weaviate (self-hosted at scale). Chroma = ideal for prototyping, small-medium production. Beyond that, consider bigger solutions.
βΆHow does RAG work with Chroma?
RAG = Retrieval-Augmented Generation. (1) Store document chunks in Chroma as embeddings. (2) User asks question β embed question, search Chroma for similar chunks. (3) Feed chunks + question to LLM (Claude, GPT-4). (4) LLM generates answer grounded in your docs. Chroma = the retrieval part.
βΆWhat embedding model should I use?
OpenAI text-embedding-3-large = gold standard (best quality, easy API). Hugging Face all-MiniLM-L6-v2 = free, fast, good for prototypes. Trade-off: quality vs cost vs speed. For production: evaluate on your use case (search quality), not model popularity.
βΆHow do I measure if my Chroma RAG system is working?
Metrics: (1) retrieval accuracy (does top-5 contain answer?), (2) relevance (user satisfaction), (3) latency (<1s ideal), (4) hallucination rate (wrong answers). Tool: evaluate on ~100 test questions, compute metrics. Common mistake: skip evaluation, ship broken system.
βΆCan Chroma do hybrid search (keyword + semantic)?
Chroma v0.3+ has hybrid (keyword + embedding). Example: search for 'machine learning' β fuzzy keyword match ('machne learing' β 'machine learning') + embedding similarity. Better recall (catch exact matches + semantic matches) than pure embedding. Recommended for production.
βΆWhat salary jump for Chroma + RAG expertise?
ML engineer ($100-140k) + RAG specialist = $140-180k. Data engineer adding Chroma to pipeline = $120-150k. Scarcest skill: engineers who've shipped RAG products (not just tutorials). RAG boom 2025-2026 = fast career growth if you specialize now.