Local LLM · pgvector · building a RAG chatbot
Build a chatbot that answers from your own documents with LM Studio + pgvector + Gemini. Six steps — embeddings to prompts.
- Difficulty
- advanced
- Lessons
- 7
- Total time
- 185 min
Local LLM · pgvector · building a RAG chatbot
Sometimes a single ChatGPT call is not enough. Internal docs, personal notes, data you cannot send outside. RAG (Retrieval Augmented Generation) lets an LLM answer only from materials you hand-pick.
Who it's for
- Engineers running LLMs on local GPUs or on-prem without sending data out
- Anyone who wants a chatbot that answers with citations from their own documents
- People wanting a single track covering embeddings, vector search, and prompt design
What you can do afterwards
- Run Gemma / Llama family models locally with LM Studio
- Store embeddings in PostgreSQL + pgvector with HNSW indexes
- Build a minimal FastAPI + LangChain pipeline (retrieve → prompt → generate)
- Swap Gemini and local LLMs freely
- Control system prompts, few-shot, and output schemas
Steps
- Why local LLMs · getting started with LM Studio — OpenAI-compatible endpoint · swapping models · VRAM
- Embeddings — text to vectors — the math behind semantic search · 768 dims
- pgvector + HNSW setup — install · index choice · cosine vs dot product
- RAG pipeline — chunking · retrieve · top-k · rerank · prompt injection
- Gemini · OpenAI-compatible APIs — switching local ↔ cloud · cost · latency
- Prompt design — system prompts · few-shot · output schemas · hallucination
Prerequisites — python-data-pipeline + Python 3.11 + uv + PostgreSQL 15+ + LM Studio.