Show HN: Filling PDF forms with AI using client-side tool calling

TL;DR Highlight

SimplePDF Copilot automates PDF form filling via chat, leveraging client-side tool calling to keep document data on-device.

Who Should Read

Developers building PDF form automation or document processing pipelines, or SaaS developers looking to white-label AI-powered form completion into internal systems.

Core Mechanics

SimplePDF Copilot is a demo product that automatically populates PDF form fields—like IRS W-9s—based on natural language input in a chat interface, enabling PDF editing, creation, and understanding entirely within the browser.
The standout technical feature is 'client-side tool calling,' where the LLM directly invokes functions to manipulate PDF form fields within the browser, theoretically preventing document data from being sent to a server.
Combining it with local models enables complete privacy protection. Key use cases include foreign-language form completion, contract review ('Can I trust these clauses?'), and automated pre-population of repetitive forms from existing data sources (MCP/RAG integration).
The product is designed for B2B white-label embedding, allowing customers to offer SimplePDF Copilot under their own brand within their products.
The public demo explicitly states that chat messages are sent to a remote AI provider (server). Thus, in the demo environment, PII data does not remain local.
It supports language selection for form assistance in languages other than English and includes a download function.

Evidence

"One commenter noted that their SSN was incorrectly populated into the 'Exemptions' field (field 4), sparking UX concerns about ease of use compared to manually clicking and entering data, and questions about its advantages over uploading PDFs to ChatGPT. Privacy concerns were prominent, with users requesting clearer indication of data transmission to remote servers, which the creator addressed by clarifying the potential for a client-side tool calling + local model setup to keep data on-device. A developer shared experience with an OCR+LLM pipeline for 100+ PDF forms, achieving 90% accuracy but encountering issues with missing or mislabeled fields, and inquired about error rates with programmatic form filling. Another developer implemented a local solution using Claude and Python libraries, having Claude analyze PDFs and populate fields via a script, emphasizing data remained local. Demo bugs were also reported, such as the inability to skip or clear the second field (Line 2: Business name) on the W-9 form, along with requests for Chrome AI API integration and support for XFA forms."

How to Apply

"If your organization wants to automate frequently completed contracts, tax forms, or HR forms, consider SimplePDF Copilot’s white-label embedding. Connecting existing data sources like CRMs or EHRs via MCP/RAG can create pre-population pipelines. For services handling PII or confidential documents, implement client-side tool calling + a local LLM (e.g., a Llama model run with Ollama) to design an architecture that prevents data from leaving the device. Explore integration with Chrome’s built-in AI API. When automating data extraction from 100+ PDF forms, account for the ~10% error rate of OCR+LLM pipelines by adding a validation layer for missing/mislabelled fields, or consider a local Claude API + Python (pypdf/pdfminer) approach. The advantage over uploading PDFs to ChatGPT lies in embeddability and privacy control, making it suitable for applications requiring direct PDF editing or compliance with data transfer restrictions."

Terminology

Tool CallingThe ability of an LLM to not only generate text but also directly invoke predefined functions (tools) to perform real-world actions, such as executing a function to populate a specific field.

Client-side Tool CallingA method of Tool Calling where execution occurs within the user’s browser (client-side) rather than on a server, enhancing privacy by keeping data local.

MCPShort for Model Context Protocol, a standardized protocol enabling AI models to access external data sources (CRMs, databases, etc.) and tools.

RAGRetrieval-Augmented Generation, a technique where an AI model retrieves relevant information from an external database before generating a response, used here to pull form data from existing sources.

XFAXML Forms Architecture, Adobe’s dynamic PDF form format, offering complex logic but often presenting compatibility challenges.

PIIPersonally Identifiable Information, any data that can be used to identify an individual, such as name, SSN, or address.