Show HN: GoModel – an open-source AI gateway in Go
TL;DR Highlight
GoModel unifies access to OpenAI, Anthropic, Gemini, and other AI providers through a single, OpenAI-compatible API, offering a compiled-language alternative to LiteLLM.
Who Should Read
Backend developers simultaneously using multiple LLM providers, or those interested in the performance, supply chain security, and Go ecosystem integration benefits over LiteLLM.
Core Mechanics
- GoModel is a Go-written AI gateway that integrates various providers—OpenAI, Anthropic, Gemini, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, and Ollama—into a single OpenAI-compatible API.
- It can be launched with a single Docker command, requiring only the API keys for the desired providers as environment variables; at least one provider key is needed for operation.
- Positioned as an alternative to LiteLLM, it natively supports observability (monitoring), guardrails (safety filters), and streaming (streaming responses).
- Its use of the Go compiled language is highlighted as a strength, offering greater security against runtime supply chain attacks compared to Python-based LiteLLM due to fixed dependencies at compile time.
- It supports Prometheus metric integration and includes separate configuration files (prometheus.yml) and a docker-compose.yaml for easy monitoring environment setup.
- A semantic caching layer appears to be present, with the gateway embedding requests and using vector similarity search to determine cache hits.
- A Helm chart is included, enabling deployment in Kubernetes environments.
- Currently, it has 319 stars and 20 forks on GitHub and is actively being committed, indicating an early-stage project.
Evidence
- "In response to a question about the importance of being written in Go, a comment pointed out that Go compiled binaries have a significantly smaller runtime supply chain attack surface than Python-based tools, a point also made by the developer of a similar Go gateway (sbproxy.dev). An experienced AI proxy maintainer noted that the most challenging aspect is adapting to changing input/output structures with each model/provider release, emphasizing that integration within 24 hours of a new model launch is crucial for a well-managed project. Concerns were raised about the maintenance burden of keeping up with provider updates due to the lack of a robust Go SDK compared to JavaScript and Python, a challenge the author acknowledges. A vllm user inquired about Ollama integration, and requests were made for cost tracking per model/route, particularly for mixed free/paid model usage. Questions were also raised about potential open-source rug pulls, and the need for the unified API to abstract provider-specific parameters like temperature, reasoning effort, and tool choice mode."
How to Apply
- "If you're using multiple LLM providers and want to avoid modifying client code with each model switch, deploy GoModel as an intermediary gateway and route all requests to its OpenAI-compatible endpoint at `http://localhost:8080`. Provider switching is then managed through environment variables. If you're running LiteLLM and concerned about Python runtime supply chain security or memory/performance overhead, consider switching to GoModel. Its compiled binary has no runtime dependencies and the Docker image is lightweight. For centralized management of AI traffic in Kubernetes, leverage the included Helm chart to deploy GoModel to your cluster and integrate it with Prometheus to monitor model response times and error rates. If your team manages AI provider keys individually, use GoModel as an internal gateway, directing team members to its endpoint to centralize key management."
Code Example
# Minimal execution (using only OpenAI)
docker run --rm -p 8080:8080 \
-e OPENAI_API_KEY="your-openai-key" \
enterpilot/gomodel
# Using multiple providers simultaneously
docker run --rm -p 8080:8080 \
-e OPENAI_API_KEY="your-openai-key" \
-e ANTHROPIC_API_KEY="your-anthropic-key" \
-e GEMINI_API_KEY="your-gemini-key" \
-e GROQ_API_KEY="your-groq-key" \
-e OPENROUTER_API_KEY="your-openrouter-key" \
-e XAI_API_KEY="your-xai-key" \
-e AZURE_API_KEY="your-azure-key" \
-e AZURE_BASE_URL="https://your-resource.openai.azure.com/openai/deployments/your-deployment" \
-e AZURE_API_VERSION="2024-10-21" \
enterpilot/gomodel
# Then, in the client, only change the base_url
# openai.OpenAI(base_url="http://localhost:8080", api_key="any-value")Terminology
Related Papers
Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s
Apple Silicon에서 Swift로 직접 행렬 곱셈 커널을 구현하며 CPU, SIMD, AMX, GPU(Metal)를 단계별로 최적화해 Gflop/s에서 Tflop/s 수준까지 성능을 높이는 과정을 상세히 설명한 글이다. 프레임워크 없이 LLM 학습의 핵심 연산을 밑바닥부터 구현하고 싶은 개발자에게 Apple Silicon의 성능 한계를 체감할 수 있는 드문 자료다.
Removing fsync from our local storage engine
FractalBits가 fsync 없이 SSD 전용 KV 스토리지 엔진을 구현해 동일 조건 대비 약 65% 높은 쓰기 성능을 달성한 설계 방법을 공유했다. fsync의 메타데이터 오버헤드를 피하기 위해 사전 할당, O_DIRECT, SSD 원자 쓰기 단위 정렬 저널을 조합한 구조가 핵심이다.
Google Chrome silently installs a 4 GB AI model on your device without consent
Google Chrome이 사용자 동의 없이 Gemini Nano 4GB 모델 파일을 자동 다운로드하고, 삭제해도 재다운로드되는 문제가 발견됐다. GDPR 위반 가능성과 수십억 대 기기에 적용될 때의 환경 비용 문제가 제기되고 있다.
How OpenAI delivers low-latency voice AI at scale
OpenAI redesigned its WebRTC stack to serve real-time voice AI to over 900 million users, detailing the design decisions and trade-offs of a relay + transceiver split architecture.
Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
Deterministic Leaf Enumeration (DLE) cuts self-consistency’s redundant sampling by deterministically exploring a tree of possible sequences, simultaneously improving math/code reasoning performance and speed.