Monitor your security cameras with locally processed AI
TL;DR Highlight
How to analyze CCTV in real-time with AI on edge devices — no cloud required.
Who Should Read
Developers building self-hosted surveillance systems on home servers or Raspberry Pis, or self-hosting enthusiasts who avoid cloud-based security camera services for privacy and cost reasons.
Core Mechanics
- Camera footage processed with AI inference locally on the machine — no cloud upload needed
- Local LLM runtime (Ollama) paired with vision models (LLaVA, moondream) for video analysis
- Motion detection events trigger frame capture → AI query in natural language ('what do you see?')
- Alert conditions customizable via prompts — e.g., 'alert if person detected', 'alert if car in driveway'
Evidence
- Original paper text not available for quantitative data — content below is based on title inference
- Local inference with no network latency — ~1-3 sec/frame on CPU-only (no GPU)
- $0/month cloud costs vs. cloud API services (excluding hardware costs)
How to Apply
- Set up Frigate NVR + Ollama (moondream model): motion detection → MQTT event → frame capture → Ollama API call → send results as Home Assistant notifications.
- Branch prompts by scenario: 'If there is a person in this image, say YES. Otherwise, say NO.' — forcing short answers simplifies parsing.
- Raspberry Pi 5 or low-power x86 mini PC + USB camera for cheapest setup. Add a Coral TPU for faster inference.
Code Example
snippet
# Python example: Analyze camera frames with a local Ollama vision model
import ollama
import base64
from pathlib import Path
def analyze_frame(image_path: str, alert_condition: str) -> str:
image_data = base64.b64encode(Path(image_path).read_bytes()).decode()
response = ollama.chat(
model='moondream', # or llava:7b
messages=[
{
'role': 'user',
'content': f'{alert_condition}\nLook at the image and answer with only YES or NO.',
'images': [image_data]
}
]
)
return response['message']['content'].strip()
# Usage example
result = analyze_frame(
'/tmp/motion_capture.jpg',
'Is there a person in this image?'
)
if 'YES' in result.upper():
send_notification('Person detected!')Terminology
NVRNetwork Video Recorder. Software or device that records and manages footage from multiple IP cameras in one place. The network version of a DVR.
OllamaA tool for easily running LLMs locally. Like Docker but for AI models — one command (`ollama run llava`) and you're running AI on your machine.
edge inferenceRunning AI processing on the local device near the data source rather than sending data to the cloud. Faster response, better privacy, no internet dependency.