I built an AI receptionist for a mechanic shop
TL;DR Highlight
A dev built an AI receptionist for their brother's auto shop — combining a RAG pipeline with Vapi's voice platform to actually answer phone calls — because missed calls were costing thousands per month.
Who Should Read
Developers looking to build AI voice agents or phone automation for small businesses, or backend devs learning how to connect RAG pipelines to production-grade apps for the first time.
Core Mechanics
- The problem was simple: the brother spends all day under cars and can't answer the phone. Customers hang up and call somewhere else. Brake jobs at $450, engine repairs at $2,000 — all just walking out the door.
- Using raw LLMs is dangerous. When a customer asks 'How much for brakes?' the actual price is $450, but the model might guess $200, destroying customer trust. RAG (Retrieval-Augmented Generation — answering based on actual knowledge documents) was introduced to prevent this.
- Knowledge base construction started by scraping the brother's website and converting it to markdown files. Service types, pricing, duration, business hours, payment methods, cancellation policy, warranty info, loaner availability, vehicle types supported — 21+ documents total.
- Each document was converted to 1024-dimensional vectors using Voyage AI's voyage-3-large model and stored in MongoDB Atlas. With Atlas Vector Search indexing, customer questions get vectorized with the same model and the top 3 semantically similar documents are retrieved. Even queries like 'How much for brake work?' find relevant docs without exact keyword matches.
- The top 3 retrieved documents are passed as context to Anthropic Claude (claude-sonnet-4-6), with a system prompt constraining it to 'only answer from the knowledge base, say you don't know if unsure, and collect callback info.' By Part 1 completion, terminal queries returned accurate answers. Example: 'How much for an oil change?' → 'Conventional oil $45, synthetic $75. Includes oil filter replacement, fluid top-off, tire pressure check, takes about 30 minutes.'
- Vapi was chosen for voice infrastructure. It handles phone number purchase, speech recognition (Deepgram), text-to-speech (ElevenLabs), and real-time function calling all in one. Developers just need to build the webhook server that Vapi calls.
- The server was built with FastAPI. When a customer asks a question, Vapi sends a tool-calls request to the /webhook endpoint, the server extracts an answer via the RAG pipeline, and Vapi reads it aloud. During development, Ngrok exposed local port 8000 to connect with Vapi.
Evidence
- A former service advisor raised serious practicality concerns. Parts prices change in real-time and inventory varies daily, making it nearly impossible to give accurate estimates upfront. In some states, inaccurate estimates can lead to legal issues. The system's real utility might be limited to one-way notifications like 'Your car is ready for pickup.'
- Whether RAG is even necessary was questioned. Price lists and business hours are small enough to fit entirely in modern LLM context windows — do you really need vector search for that? Unless you're ingesting full service manuals, RAG overhead might be unnecessary.
- Some suggested outsourced reception services might be more practical. A $500/month phone answering service could work just as well, and the ROI of building and maintaining a custom AI system should be compared.
- User reactions to AI phone answering were mixed. One person had a great experience with Mint Mobile's AI agent resolving their issue in under a minute with no wait, while another felt 'uncanny valley' vibes from a local HVAC company's AI, leading to distrust and hunting for a human. Several comments said they'd just hang up if they realized it was a robot.
- Amid negative comments, a defense emerged: the key value isn't 'is this useful for this specific case' but 'how can I use these techniques in my own projects.' Practical tips like TTS text formatting were noted. Meanwhile, someone actually called the number and found the chatbot wasn't even deployed yet, calling it 'the worst tech demo on HN.'
How to Apply
- For building AI phone agents for small businesses (restaurants, auto shops, salons, etc.), this stack (Vapi + FastAPI + MongoDB Atlas Vector Search + Voyage AI embeddings + Claude) serves as a solid reference for rapid prototyping. However, as commenters noted, services with dynamic pricing need separate real-time price lookup API integration.
- When connecting a RAG pipeline to a voice interface, Vapi's tool-calls webhook approach lets you reuse existing HTTP servers with low barrier to entry. During local development, Ngrok provides instant external exposure, so just pasting the ngrok URL into the Vapi dashboard enables real phone testing.
- If the knowledge base is small (under a few dozen documents) and doesn't change frequently, consider putting everything directly in the system prompt instead of building RAG. You'll save on vector DB setup and embedding costs, and latency might actually decrease.
- Fallback design for unknown questions is essential. This project used a saveCallback tool to collect name and contact info — much more practical than just saying 'sorry, I don't know.' Similar projects can copy this pattern directly.
Terminology
Related Papers
Show HN: Airbyte Agents – context for agents across multiple data sources
Airbyte가 Slack, Salesforce, Linear 등 여러 SaaS 시스템의 데이터를 미리 인덱싱해서 Agent가 API를 일일이 뒤지지 않아도 되는 Context Store를 출시했다. 기존 MCP 방식보다 토큰을 최대 90%까지 줄이는 효과를 확인했다.
A polynomial autoencoder beats PCA on transformer embeddings
PCA 인코더에 2차 다항식 디코더를 붙여서 닫힌 형태(closed-form)로 embedding 압축 품질을 크게 개선하는 기법으로, SGD 없이 numpy만으로 구현 가능하다.
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
RAG 스타일 텍스트 검색 대신 Schema로 정의된 구조화 레코드에 메모리를 저장하면, 정확한 사실 조회·상태 추적·집계 쿼리에서 압도적으로 높은 정확도를 얻을 수 있다.
Show HN: Atomic – Local-first, AI-augmented personal knowledge base
Atomic builds a self-hosted, open-source personal knowledge graph app that automatically embeds, tags, and links notes, web clips, and RSS feeds—supporting semantic search, LLM-powered wiki synthesis, and MCP integration.
We replaced RAG with a virtual filesystem for our AI documentation assistant
Explains how Mintlify overcame RAG chunking limitations by building a virtual filesystem (ChromaFs) on top of Chroma DB that mimics UNIX commands, reducing session boot time from 46 seconds to 100ms.
Chroma Context-1: Training a Self-Editing Search Agent