Mistral OCR 4: cheap, self-hosted document AI
Mistral has released OCR 4, a document AI model that converts files into structured data with bounding boxes, block type classifications, and per-word confidence scores. Unlike older OCR tools that return flat text, OCR 4 maps the full layout of a document, making it suitable for AI agents that need to act on documents rather than just read them. It supports PDFs, Word, PowerPoint, and OpenDocument files across 170 languages. Pricing starts at $2 per 1,000 pages in batch mode, with a Document AI tier at $5. The model is small enough to run in a single container, enabling on-premises deployment for data-sovereignty-conscious enterprises like banks, hospitals, and governments. It is available via Mistral's studio, Amazon SageMaker, and Microsoft Foundry. Benchmarks show an 85.20 score on OlmOCRBench and a 72% human-judged win rate against rivals, though Mistral cautions the model is not suitable for medical, legal, or high-stakes financial decisions.