Apache Airflow
Platform created by the community to programmatically author, schedule and monitor workflows.
Apache Airflow® has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers. Airflow™ is ready to scale to infinity. Apache Airflow® pipelines are defined in Python, allowing for dynamic pipeline generation. This allows for writing code that instantiates pipelines dynamically. Easily define your own operators and extend libraries to fit the level of abstraction that suits your environment. Apache Airflow® pipelines are lean and explicit. Parametrization is built into its core using the powerful Jinja templating engine. No more command-line or XML black-magic! Use standard Python features to create your workflows, including date time formats for scheduling and loops to dynamically generate tasks. This allows you to maintain full flexibility when building your workflows. Monitor, schedule and manage your workflows via a robust and modern web application. No need to learn old, cron-like interfaces. You always have full insight into the status and logs of completed and ongoing tasks. Apache Airflow® provides many plug-and-play operators that are ready to execute your tasks on Google Cloud Platform, Amazon Web Services, Microsoft Azure and many other third-party services. This makes Airflow easy to apply to current infrastructure and extend to next-gen technologies. Anyone with Python knowledge can deploy a workflow. Apache Airflow® does not limit the scope of your pipelines; you can use it to build ML models, transfer data, manage your infrastructure, and more. Wherever you want to share your improvement you can do this by opening a PR. It’s simple as that, no barriers, no prolonged procedures. Airflow has many active users who willingly share their experiences. Have any questions? Check out our buzzing slack. Today we re launching the Apache Airflow Registry — a searchable catalog of every official Airflow provider and its modules, live at … The interactive report is hosted by Astronomer. The Apache Airflow community thanks Astronomer for running this survey, for sponsoring it … We are thrilled to announce the first major release of airflowctl 0.1.0, the new secure, API-driven command-line interface (CLI) for Apache … Apache Airflow Core, which includes webserver, scheduler, CLI and other components that are needed for minimal Airflow installation. Read the documentation Apache Airflow CTL (airflowctl) is a command-line interface (CLI) for Apache Airflow that interacts exclusively with the Airflow REST API. It provides a secure, auditable, and consistent way to manage Airflow deployments — without direct access to the metadata database. Read the documentation The Task SDK provides python-native interfaces for defining DAGs, executing tasks in isolated subprocesses and interacting with Airflow resources (e.g., Connections, Variables, XComs, Metrics, Logs, and OpenLineage events) at runtime. The goal of task-sdk is to decouple DAG authoring from Airflow internals (Scheduler, API Server, etc.), provid
Reducto
pages processed
Reducto's parser reads documents like a human would—capturing layout, structure, and meaning with high accuracy. Our Agentic OCR reviews and corrects outputs in real-time for near-perfect results, even on edge cases. Reducto's parser reads documents like a human would—capturing layout, structure, and meaning with high accuracy. Our Agentic OCR reviews and corrects outputs in real-time for near-perfect results, even on edge cases. Automatically separate multi-document files or long forms into individually useful units. Intelligent heuristics and layout-aware splitting keep your pipelines clean and efficient—no manual pre-processing needed. Extract structured data directly from documents with schema-level precision. Whether it's invoice fields, onboarding forms, or financial disclosures, Reducto ensures the right data lands exactly where you need it. Fill in detected blanks, tables, and checkboxes with supplied data. No bounding boxes or pre-defined templates are required; Edit dynamically identifies fillable elements regardless of document layout or format, supporting scanned PDFs, digital forms, and complex multi-page documents. Reducto helped us parse documents we previously could not because of table complexity. It's probably the only AI product that has actually worked for us. Reducto first uses layout-aware models to break down the document visually, capturing regions, tables, figures, and text. VLMs make corrections to mistakes Like a human editor, our Agentic model can detect minor mistakes and correct them, ensuring accuracy even in the most detailed cases. VLMs review Reducto's outputs Vision-language models then interpret each region in context—linking labels to values, understanding tables, and classifying segments. Everything else you need to make your data LLM-ready. Battle-tested infrastructure you can trust in production and at scale. Hands-on forward deployed support and tailored SLAs to meet your enterprise needs. Run Reducto entirely within your own infrastructure—ideal for strict security, compliance, and data residency requirements. Widely trusted by enterprises worldwide
Apache Airflow
Reducto
Apache Airflow
Reducto
Pricing found: $0.015, $0.015/credit
Reducto (1)
Only in Apache Airflow (4)
Only in Reducto (5)
Apache Airflow
Reducto