OCR & Document Extraction using vision models. Contribute to getomni-ai/zerox development by creating an account on GitHub.
A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense! Zerox is available as both a Node and Python package. (Node.js SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, Google Gemini, etc.) The maintainFormat option tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. This requires the requests to run synchronously, so it's a lot slower. But valuable if your documents have a lot of tabular data, or frequently have tables that cross pages. Zerox supports structured data extraction from documents using a schema. This allows you to pull specific information from documents in a structured format instead of getting the full markdown conversion. Use extractPerPage to extract data per page instead of from the whole document at once. Zerox supports a wide range of models across different providers: (Python SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, etc.) The pyzerox.zerox function is an asynchronous API that performs OCR (Optical Character Recognition) to markdown using vision models. It processes PDF files and converts them into markdown format. Make sure to set up the environment variables for the model and the model provider before using this API. Refer to the LiteLLM Documentation for setting up the environment and passing the correct model name. Note the output is manually wrapped for this documentation for better readability. This project is licensed under the MIT License. OCR Document Extraction using vision models There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.
Mentions (30d)
0
Reviews
0
Platforms
1
Sentiment
0%
0 positive
Features
Industry
information technology & services
Employees
6,000
Funding Stage
Other
Total Funding
$7.9B
20
npm packages
Pricing found: $50.10, $48.71, $48.71, $48.71, $9.74
Repository Audit Available
Deep analysis of getomni-ai/zerox — architecture, costs, security, dependencies & more
Pricing found: $50.10, $48.71, $48.71, $48.71, $9.74
Key features include: Pass in a file (PDF, DOCX, image, etc.), Convert that file into a series of images, Pass each image to GPT and ask nicely for Markdown, Aggregate the responses and return Markdown, GPT-4 Vision (gpt-4o), GPT-4 Vision Mini (gpt-4o-mini), GPT-4.1 (gpt-4.1), GPT-4.1 Mini (gpt-4.1-mini).