When you drag an invoice PDF onto jaklens.ai, a three-step pipeline runs entirely on your machine. No API key, no cloud credit, no internet connection. Here's exactly what happens — and why it works.
The three-step pipeline
Step 1 — PDF text extraction (pdfjs-dist)
pdfjs-dist reads the raw text layer from your PDF. For digital PDFs (generated by invoicing software), this produces clean, structured text. For scanned PDFs, an image rendering step is needed first.
Step 2 — LLM field extraction (Qwen2.5 + llama.cpp)
The extracted text is passed to Qwen2.5 1.5B running via node-llama-cpp. The model receives a structured prompt asking it to return JSON with specific invoice fields. It runs on your CPU — or GPU if CUDA/Vulkan is available.
Step 3 — Structured save (SQLite / better-sqlite3)
The parsed JSON is validated and written to a SQLite database via better-sqlite3. All invoice fields are indexed for fast search and filter queries. Your original PDF is stored as a blob reference.
Step 1 in depth: pdfjs-dist
pdfjs-dist is Mozilla's PDF rendering library — the same engine that powers Firefox's built-in PDF viewer. In jaklens.ai, it runs in the Node.js process (via Electron's main process) to extract text content from each page of the invoice.
For a typical digital invoice PDF (generated by Stripe, PayPal, a CRM, or invoicing software), pdfjs produces clean Unicode text that preserves line structure. The output looks something like:
INVOICE Invoice #: INV-2024-0891 Date: 15 March 2025 Due Date: 15 April 2025 Bill To: Acme Corp Ltd 123 Business Street Item Qty Unit Price Amount Design work 10 $150.00 $1,500.00 Hosting fee 1 $50.00 $50.00 Subtotal $1,550.00 Tax (15%) $232.50 TOTAL $1,782.50
For scanned PDFs (photographed or printed-and-scanned invoices), pdfjs renders the page to a bitmap, which is then processed by an OCR layer before the text reaches the LLM. This two-pass approach handles the majority of real-world invoice formats.
Step 2 in depth: Qwen2.5 1.5B via llama.cpp
Qwen2.5 is a language model family from Alibaba DAMO Academy. The 1.5B parameter variant, when quantized to 4-bit GGUF format, fits comfortably in approximately 1.2 GB of RAM and produces fast responses even on consumer CPUs.
jaklens.ai uses node-llama-cpp, a high-quality Node.js binding for llama.cpp. llama.cpp is the industry-standard C++ inference engine for running GGUF models locally — it supports AVX2/AVX512 CPU acceleration, NVIDIA CUDA, AMD ROCm, and Vulkan.
The prompt sent to the model is carefully structured to maximize extraction accuracy:
- System prompt: instructs the model to act as an invoice data extractor and return only valid JSON
- User message: the raw text from pdfjs, with a schema for the expected output fields
- Temperature: set low (0.1–0.2) to reduce hallucination and maximize consistency
- Max tokens: constrained to avoid excessive output
The model returns structured JSON similar to:
{
"vendor": "Design Studio Ltd",
"invoice_number": "INV-2024-0891",
"date": "2025-03-15",
"due_date": "2025-04-15",
"currency": "USD",
"subtotal": 1550.00,
"tax": 232.50,
"total": 1782.50,
"line_items": [
{ "description": "Design work", "qty": 10, "unit": 150.00, "amount": 1500.00 },
{ "description": "Hosting fee", "qty": 1, "unit": 50.00, "amount": 50.00 }
]
}
All of this inference happens on your hardware. Typical response times range from 3–8 seconds on a modern 8-core CPU, or under 2 seconds with GPU acceleration.
Why Qwen2.5 for invoices?
Several factors make Qwen2.5 1.5B well-suited for invoice parsing:
- Multilingual. Handles English and Arabic invoice text natively — important for Middle Eastern markets
- Small but capable. 1.5B parameters in 4-bit GGUF is ~1.2 GB — fits on budget hardware
- JSON instruction following. Qwen2.5 is specifically trained for structured output tasks
- Free. Open-weight model, no API costs, no rate limits, no usage tracking
Accuracy and limitations
No OCR system is perfect. Known limitations of the current pipeline:
- Low-quality scans: Heavily skewed, blurry, or low-DPI scans produce degraded text extraction, which reduces parsing accuracy
- Unusual layouts: Invoices with non-standard structures (tables in images, rotated text, watermarks) may miss fields
- Currency ambiguity: Multi-currency invoices may need manual correction
- Hallucination risk: Like all LLMs, Qwen2.5 can occasionally invent fields not present in the source. Always verify critical totals before confirming
jaklens.ai addresses this by showing all extracted fields in an editable review screen before saving. You confirm, edit, or reject the AI's extraction — keeping humans in control of the data.
The privacy advantage of local inference
Your invoice text never leaves your machine. It goes from your PDF to your CPU to your SQLite database — entirely within your Windows user session.
Cloud invoice OCR services (including Google Document AI, AWS Textract, and accounting software AI features) send your document to a remote API. That means your vendors, amounts, dates, and financial relationships are processed on someone else's infrastructure. With local llama.cpp inference, that pathway doesn't exist.
Written by Jaks
Jaks is the lead developer of jaklens.ai. He is passionate about local-first software architecture, artificial intelligence privacy, and giving developers and freelancers absolute ownership of their financial data.