The unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter is highly praised for its robust open models and detailed statistical insights, particularly excelling in handling large volumes of programming tokens. Users appreciate its flexibility and wide integration capabilities, especially in AI agent applications. Complaints highlight issues with token costs and efficiency, with some users developing complementary tools to mitigate these concerns. Overall, pricing sentiment is generally positive due to its open-source nature, and OpenRouter maintains a strong reputation in the developer and AI community for its functionality and adaptability.
Mentions (30d)
26
Avg Rating
5.0
1 reviews
Platforms
5
Sentiment
18%
15 positive
OpenRouter is highly praised for its robust open models and detailed statistical insights, particularly excelling in handling large volumes of programming tokens. Users appreciate its flexibility and wide integration capabilities, especially in AI agent applications. Complaints highlight issues with token costs and efficiency, with some users developing complementary tools to mitigate these concerns. Overall, pricing sentiment is generally positive due to its open-source nature, and OpenRouter maintains a strong reputation in the developer and AI community for its functionality and adaptability.
Features
Use Cases
Industry
information technology & services
Employees
51
Funding Stage
Venture (Round not Specified)
Total Funding
$160.0M
How it feels to do biotech in 2026
How it feels to do biotech in 2026
View originalPricing found: $10
g2
What do you like best about OpenRouter?Unified API Access: The ability to call a multitude of LLMs from different providers (like OpenAI, Anthropic, Google, and various open-source models) through a single, consistent API endpoint is a game-changer. This drastically reduces the integration overhead and code maintenance associated with managing individual provider APIs and SDKs. Simplified Cost Management & Tracking: OpenRouter provides a clear, consolidated view of our LLM usage costs across all models. The pay-as-you-go pricing, with standardized per-token rates for many models, makes budget forecasting and expense tracking much more straightforward than juggling multiple billing dashboards. Rapid Prototyping and Model Benchmarking: The platform is excellent for quickly testing and comparing the performance of different models for specific tasks. Switching between, for instance, a Llama model and a GPT variant for a text generation task requires minimal code changes Developer-Focused Features: Tools like the model explorer, the ability to see real-time model rankings based on community usage or specific metrics, and features like request fallbacks or automatic retries demonstrate a clear understanding of developer workflows and pain points in LLM Operations (LLMOps). Review collected by and hosted on G2.com.What do you dislike about OpenRouter?While the benefits are substantial, one aspect that I've noted is the potential for slightly increased latency compared to direct API calls to the model providers. This is somewhat expected given the nature of an aggregation service acting as an intermediary. For extremely latency-sensitive applications, this might require careful benchmarking, though for most of our use cases, the difference has been marginal and outweighed by the convenience and flexibility offered. Review collected by and hosted on G2.com.
Made an awesome-list for everything LLM cost, would love contributions
So a few months back I got surprised by my Anthropic bill which somehow racked up like $400 ish on a staging key in a few weeks just running evals, no budget cap pretty dumb in hindsight I mean it’s not a big cost but I should have been careful nonetheless After that I started keeping a notes file of tools that actually helped reduce cost stuff like token counters, pricing pages that update properly, caching layers, prompt compression libs, observability tools (helicone, langfuse, langsmith, etc) it slowly grew to 80–90 entries so I cleaned it up and put it on github: [https://github.com/ankitvirdi4/awesome-llm-cost](https://github.com/ankitvirdi4/awesome-llm-cost) what’s in there right now: pricing calculators + token counters observability / tracing (helicone, langfuse, langsmith, openllmetry, phoenix) caching (gptcache, semantic caching approaches) model routers (openrouter, notdiamond, portkey) prompt compression + context window stuff eval cost tracking self hosting / GPU cost calculators everything is linted (awesome-lint), short descriptions for each entry, and I checked links recently so nothing should be dead if there’s anything you’ve used that saved you money on inference, drop it here or send a PR especially looking for more prompt compression stuff, that section feels kinda weak rn not affiliated with anything listed btw just got tired of having 80 bookmarks
View originalI stress-tested Kimi K2.6 against Claude Opus 4.7 on a quick coding-agent task
I tested Claude Opus 4.7 and Kimi K2.6 on the same coding agent task i.e. build an AI Fix Runner that takes a broken repo, runs its tests, identifies the failure, applies a patch, reruns the test, and exposes the final diff/logs through an API and UI. The goal was not to benchmark syntax completion or simple repo edits. I wanted to test model behavior on a less familiar integration path: shifting execution from local processes into remote sandboxes. I used Tensorlake specifically because the sandbox API is newer and integration-heavy. This made the test more about whether the model could reason through unfamiliar infra and produce a working implementation. Setup: * Claude Opus 4.7 through Claude Code * Kimi K2.6 through OpenCode via OpenRouter Pricing context: * Claude Opus 4.7: $5/M input, $25/M output * Kimi K2.6: $0.95/M input ($0.16 cached input), $4/M output So, what made it interesting is if Kimi's lower cost can handle a crazy workflow. To be clear, comparing Kimi K2.6 directly with Opus 4.7 is not completely fair. The model classes, pricing, and expected capability levels are very different. I mainly wanted to see how far an open model could get on the same task at a fraction of the price, and whether the performance/price tradeoff made sense for coding-agent work # Test 1: Local AI Fix Runner First, both models had to build the local version. The app needed to: * create fixture repos with intentional bugs * run install/test/build locally * capture stdout/stderr * apply patches * rerun tests after patching * expose run state through backend APIs * show logs and patched source in the UI * reject obviously unsafe commands Claude Opus 4.7 produced a working implementation. It built the fixture repos, repair flow, API endpoints, UI, logs, and patched-file inspection. The main pipeline worked: install -> test fails -> patch -> test passes -> build passes It had one real bug: workspace persistence. `KEEP_WORKSPACES=true` was supposed to preserve the final workspace, but the backend loaded .env from the wrong location. One follow-up fixed it. Kimi K2.6 got some backend pieces working and could trigger repair runs, but the implementation was incomplete. The biggest miss was patched-source inspection, which is core for this app because you need to verify exactly what the agent changed. Rough numbers: * Opus: $13.84, around 39 min wall time * Kimi: around $3.40, around 1h 39 min wall time * Result: Opus did it good, Kimi could not The difference in the price, and the time taken is just insane. # Test 2: Sandbox Integration Second, I asked both models to move execution from local processes into Tensorlake Sandboxes. This was the main stress test. The model had to: * create a sandbox * copy the repo into the sandbox * execute install/test/build remotely * capture logs from sandbox commands * apply patches inside the sandbox * rerun validation * clean up sandbox state * keep the original local runner working This is where I wanted to test performance on something newer and less likely to be in the model’s training data. Claude Opus 4.7 handled this cleanly. It added a Tensorlake runner, kept the local runner abstraction intact, wired env/config handling, and created a live test path using `TENSORLAKE_API_KEY`. More importantly, the local regression path still passed after the sandbox backend was added. Kimi K2.6 was given the working Opus local implementation as the base, so it only had to add Tensorlake execution. Even with that advantage, it failed to produce a clean sandbox flow after 150k+ tokens. It got stuck around the integration layer and never reached a reliable test/build/patch loop inside Tensorlake. Rough numbers: * Opus Tensorlake run: around $24.39, around 23 min * Kimi Tensorlake run: failed after a long run, 150k+ tokens * Result: Opus passed, Kimi failed # Takeaway Kimi K2.6 is much cheaper and can handle some bounded coding work, but it struggled once the task involved external execution infra, sandbox lifecycle, env/config handling, and regression safety. Claude Opus 4.7 was expensive, but much stronger at: * preserving architecture * adding a new execution backend * handling config bugs * maintaining testability * reasoning through unfamiliar infra For me, this was less about “which model writes code” and more about “which model can integrate a newer system without breaking the app.” On that specific test, Opus was clearly miles ahead. Full breakdown with prompts, code, screenshots, demos, and cost details: [https://www.tensorlake.ai/blog/claude-opus-4-7-vs-kimi-k2-6-real-world-coding-test](https://www.tensorlake.ai/blog/claude-opus-4-7-vs-kimi-k2-6-real-world-coding-test) Curious if anyone has gotten Kimi K2.6 working reliably on coding-agent workflows.
View originalGPT-5.5 tops the benchmarks but sits at #22 for actual usage - I built a live index that tracks both (open source)
I built AgentTape to rank models on more than just benchmarks - it blends benchmark performance with who's actually using and talking about a model, plus cost and speed. It scores every public model from public signals (GitHub, Hugging Face, OpenRouter, MCP registries, npm, PyPI, arXiv, Hacker News) refreshed hourly, plus the main benchmark leaderboards daily. Right now OpenAI sits at the top: GPT-5 is #1, with 5.2, 5.1 and 5.4 Mini rounding out the top 5, and 5.2-Codex and 5.4 just behind - 6 of the top 7. The only thing breaking the run is xAI's Grok 4.20, level on score at #2. GPT-5.5 is the clearest example - it sits at #22 overall, and the breakdown shows why: * Quality: 96.4 - 2nd highest on the whole board, only pipped by Gemini 3.1 Pro Preview (97.2). On benchmarks alone it'd be near the top. * Adoption: 15 and Efficiency: 36 - both low. New release, steep price, so hardly anyone's using it day-to-day yet. * Biggest 24h climber on the board (+6) - so that's starting to shift. A benchmark-only board would put GPT-5.5 near #1 (second only to Gemini 3.1 Pro). That gap between topping the benchmarks and actually getting used is the whole reason I built this. Early days and I'm still tuning the methodology, so I'd love your thoughts - does weighting adoption alongside benchmarks match how you'd rank the GPT line-up, or would you trust the raw benchmark order?
View originalthe-knowledge-guy: turn your bookshelf into a tutor you can ask, walk through, and skim - using Claude Code skills
I built a Claude Code skill called \`the-knowledge-guy\`. The idea: every book I've read sits on a shelf doing nothing. I wanted a thing where I could ask any question and get an answer cited across all of them, get taught a topic step by step with quizzes, or pull a cheatsheet out of any book in seconds. Eleven modes: * ask - cross-domain synthesis essay with inline citations. * walk - interactive curriculum + quizzes, resumable. * nutshell - whole-book per-chapter skim, \~100 words/chapter. * library - bookshelf overview. * comparison - one concept across multiple books, agree/extend/tension. * cheatsheet - operational one-page reference per book. * glossary - A–Z terms, per book or cross-library. * concept-map - Tier-1 framework graph for a book. * toolkit - Tier-2 deep dive on one chapter. * ingest - hand a new PDF/EPUB to /book-to-skill. * resume - pick up an interrupted walk. The router auto-discovers every installed skill - drop one in, and it picks it up on the next invocation. Every output also writes a self-contained HTML artifact using a polished design system I built alongside it. The ingest side (a separate skill, /book-to-skill) is a 5-stage map-reduce pipeline. \~10 min per 600-page book. All processing local-then-LLM - your books stay on your disk. Works natively on Claude Code, Claude Desktop, [claude.ai](http://claude.ai), the Anthropic API, OpenAI Codex CLI, and GitHub Copilot. MIT licensed. Repo: [https://github.com/vitalysim/the-knowledge-guy](https://github.com/vitalysim/the-knowledge-guy) Happy to answer questions about the architecture (the book\_number canonical-labeling thing was the bug that took the longest) or about adding new modes.
View originalBuild agentic orchestrators in minutes NOT months.
Some of you might remember BoneScript, my LLM friendly declarative backend compiler. MarrowScript is the next version and the big addition is a full LLM harness built into the language itself. The problem I kept running into: every project that calls an LLM ends up with the same pile of glue code. Retry logic, response validation, caching, cost tracking, provider switching, confidence routing. You write it once, copy it to the next project, tweak it, and it slowly rots. None of it is your actual product logic but it takes up half your backend. So I made it declarative. In MarrowScript you declare your models, prompts, and routers as first-class concepts in the spec file. The compiler generates all the infrastructure around them. What that looks like in practice: You declare a model. Provider, endpoint, context window, cost class. Works with any OpenAI-compatible endpoint. LM Studio, Ollama, vLLM, OpenRouter, whatever you're running locally. You declare a prompt. Input types, output type, which model to use, validation mode, what to do when validation fails, retry policy, cache TTL. The compiler generates a typed function you call from your routes. Under the hood it handles retries, caches responses in Postgres, validates the output against your schema, and if validation fails it can automatically fire a repair prompt to fix the response. You declare a router. It picks which model to use based on input characteristics. Short simple inputs go to your tiny local model. Complex inputs escalate to something bigger. Confidence thresholds control when to retry or escalate. ***All deterministic at compile time.*** Some examples of what it generates: * Provider adapters for openai\_compat, ollama, llamacpp, koboldcpp, and raw http * SSRF protection on all outbound LLM calls (allowlist-based, blocks private ranges by default) * Prompt cache backed by Postgres with configurable TTL * Per-trace and per-tenant token/cost budgets with hard cutoffs * Cognition traces stored in Postgres (or in-memory for dev) with OTLP export * Response validation (schema check or full AST compilation check for code generation) * Repair prompts that fire automatically when validation fails * Confidence scoring from logprobs (on providers that support it) * A CLI command to convert recorded traces into regression tests The part I'm most interested in feedback on is the router concept. Right now it's a static decision tree. You set thresholds at compile time based on an input metric. There's a `marrowc tune-router` command that reads recorded traces and tells you if your thresholds are wrong, but it doesn't auto-rewrite them yet. The whole thing is designed around local-first inference. The default setup in the examples uses LM Studio on the LAN as the primary model and OpenRouter as the escalation tier. Most requests stay local and free. Only the ones that fail confidence checks hit the paid API. It's on GitHub and npm. The compiler is TypeScript, runs on Node 18+. There's a VS Code extension you can compile and edit to your needs. What I want to know: for those of you running local models in production or semi-production, what's the infrastructure pain that eats the most time? Is it the retry/validation loop? Cost tracking? Provider switching? Something else entirely?
View originalI built a live ranking of every AI agent and foundation model (open source)
I built [AgentTape](https://agenttape.com/) because none of the existing model leaderboards quite cover all the things that I was interested in: benchmark performance is one part, but so is who's actually using a model, who's talking about it, and how it compared on cost and speed. It pulls hourly data from GitHub, Hugging Face, OpenRouter, MCP registries, npm, PyPI, arXiv, Hacker News, and more - to score and compare each public AI agent and foundation model. I'm still tweaking the scoring methodology (it's early days), so I'd love to hear your thoughts, if it's helpful, or anything you think I've got wrong!
View original$18 to $4 on the same agent run after i stopped asking opus to rename css variables
I've been running an agent loop that refactors my static site. CSS variable renames, YAML config updates, running a linter through MCP. Really glamorous stuff for a blog that gets 40 visitors a month, most of whom are me refreshing to check if Vercel actually deployed. Every single step was going to Opus 4.7 because setting up routing felt like work and I am, apparently, the kind of person who'd rather burn $18 than spend 20 minutes writing an if statement. So I finally wrote the if statement. Hard subtasks still go to Opus: component architecture, debugging code I wrote at 2am and have zero memory of writing, anything where the model needs to hold a complex plan across a long conversation. Opus is genuinely unmatched at that kind of sustained reasoning. I tried routing a tricky auth middleware bug to a cheaper model once and got back something that looked perfectly plausible but silently broke session handling in a way that cost me an hour to trace. Lesson learned permanently. The routine stuff (lint, rename, config edits, tool orchestration) goes to cheap models. I landed on DeepSeek V4 Pro for general coding chores and Tencent Hunyuan Hy3 preview for anything with heavy tool calling. As of late April it was ranked number one on OpenRouter by tool call volume, and in my MCP loops it almost never botches a function call when the schema is clean. The listed rate on Tencent Cloud is around $0.18 per million input tokens and $0.59 per million output, so roughly 28x cheaper than Opus 4.7 on input. Same 212 step refactor, now with routing: 178 steps to the cheap tier, 34 to Opus. $18 became roughly $4. I couldn't spot a difference on the routine changes. My 40 monthly visitors certainly can't. I've since started doing stuff I used to skip entirely, like having the agent write and run tests for every CSS change or regenerating all my Open Graph images, because at a fraction of a cent per tool call there's just no reason not to. They do mess up in specific and annoying ways though. The tool calling model hallucinates parameters when my schemas get sloppy (honestly fair, the schemas were bad). DeepSeek V4 Pro occasionally writes code that's syntactically perfect but does the precise opposite of what you asked, in a way that survives a quick skim. And neither can touch Opus when you need it to reason through three layers of why your auth flow is silently eating a cookie. My routing logic boils down to one question: how expensive is a wrong answer to catch? Bad lint fix costs a 2 second git revert. Bad architecture call costs the whole afternoon.
View originalcdesktop — open-source Claude Code Desktop alternative, runs locally via npx, supports any provider
I built cdesktop with Claude Code — it's an open-source alternative to Anthropic's Claude Code Desktop, running locally on your machine via `npx cdesktop`. Free, Apache 2.0. It mirrors the Code tab of Anthropic's desktop app — see the video — and supports 5 agents in one UI. Claude Code Desktop does not support third party models, cdesktop does. Features: * 5 coding agents in one UI: Claude Code, Codex, Gemini CLI, OpenCode, Hermes. Switch per session. * Full third-party support — OpenRouter, DeepSeek, Kimi, GLM, custom ANTHROPIC\_BASE\_URL — any provider, any model. 20+ presets baked in. * Agent teams — spawn teammates that share your workspace; mix agents and models per teammate; lead delegates via `npx cdesktop team spawn`. * Routines — scheduled recurring agent runs (hourly/daily/weekdays/weekly). * Side-by-side sessions — split workspace into up to 4 cells, drag any session between them. * Optional Git worktrees per session, or work in-place. Non-Git directories work too. * Diff review with inline comments routed back to the agent. * 7 UI languages: English, Simplified Chinese, Traditional Chinese, Spanish, French, Japanese, Korean. * Responsive UI — usable from a phone. Repo: [https://github.com/cdesktop-ai/cdesktop](https://github.com/cdesktop-ai/cdesktop) How Claude Code helped build it: started from a fork of vibe-kanban; Claude Code (opus) rewrote the UI around a Claude-Code-Desktop-style session model and drafted most of the new Rust + React code. It's beta — expect rough edges. Feedback welcome, especially on Claude Code workflows where it falls short of the official app.
View originalTools: Is This a Technical Victory, or a Price War Victory?
If you only follow discussions on social media, you might think AI coding is still dominated by Claude, GPT, and Gemini. But Kilo Code’s usage data on OpenRouter paints a somewhat counterintuitive picture: over the past 30 days, the top three most-used models on Kilo Code were Step 3.5 Flash, MiniMax M2.5, and Ling-2.6-1T. Together, they accounted for roughly 3.15T tokens, or about 58% of Kilo Code’s total token usage over the same period. In other words, in this real-world AI coding agent usage scenario, Chinese models are no longer just backup options. They have become a major source of token consumption. Kilo Code’s OpenRouter data does not necessarily prove that Chinese models have fully surpassed Claude or GPT. But it does show at least one thing: in high-frequency, high-token, highly automated AI coding agent workflows, Chinese models have already entered the core of real production usage. Why is this happening? Is it because Chinese models are cheaper, offer longer context windows, and are better suited for workloads that consume large amounts of tokens?
View originalLLM-Rosetta — format conversion library across LLM API standards, doubles as a proxy
This started because we had a proprietary internal LLM API that spoke none of the standard formats. Built an internal conversion layer to bridge it, maintained that for over a year. As colleagues started adopting more and more coding tools — Claude Code, opencode, Codex, VS Code plugins, Goose, and whatever came out that week — each with its own API format expectations, maintaining separate adapters for each became the actual problem. That's what pushed the internal conversion layer into a proper generalized design, and llm-rosetta is the result. It's a Python library that converts between LLM API formats — OpenAI Chat, Responses/Open Responses, Anthropic, and Google GenAI. The idea is you convert through a shared IR so you don't end up writing N² adapters. The key difference from LiteLLM: LiteLLM is a unified calling layer that takes OpenAI-style input and transforms it into provider-native requests — one direction. llm-rosetta uses a hub-and-spoke IR, so each provider only needs one converter, and you get any-to-any conversion for free. Anthropic → Google, OpenAI Chat → Anthropic, whatever direction you need. Use it as a library — `pip install` and call `convert()` directly, no server needed. Or run the gateway if you want a proxy that handles the format translation for you. Zero required runtime dependencies either way. The HTTP server, client, and persistence layer are vendored from zerodep ([https://github.com/Oaklight/zerodep](https://github.com/Oaklight/zerodep)), another project of mine — stdlib-only single-file modules, not someone else's library repackaged. The gateway ships with a Docker image if you'd rather not deal with Python env setup. You can also deploy it on HuggingFace Spaces or anything similar — admin panel, dashboard, request log, config management all included. Screenshots: [https://llm-rosetta.readthedocs.io/en/latest/gateway/admin-panel/](https://llm-rosetta.readthedocs.io/en/latest/gateway/admin-panel/) We've been running it in production for about 5 months as the conversion layer for an internal multi-model access platform — needed to support various API standards and coding tool integrations before the upstream APIs were fully standardized. The Responses converter passes all 6 official Open Responses compliance tests (schema + semantic) from the spec repo. So if you're running Ollama, vLLM, or LM Studio with Responses endpoints, it should just work as one side of the conversion. There's a shim layer for provider-specific quirks — built-in shims for OpenRouter, DeepSeek, Qwen, xAI, Volcengine, etc. Converters stay generic per API standard, shims handle the edge cases declaratively. 24 cross-provider examples in the repo covering all provider pairs, SDK + REST, streaming, tool calls, image inputs, multi-turn with provider switching mid-conversation. * GitHub: [https://github.com/Oaklight/llm-rosetta](https://github.com/Oaklight/llm-rosetta) * Docs: [https://llm-rosetta.readthedocs.io](https://llm-rosetta.readthedocs.io) * arXiv: [https://arxiv.org/abs/2604.09360](https://arxiv.org/abs/2604.09360) * Gateway screenshot: https://preview.redd.it/qzzjr2dcdw1h1.png?width=949&format=png&auto=webp&s=bce4293aae81059f794909fc37f85071cee34378
View originalI built a Claude Code plugin so Claude remembers what I shipped
https://preview.redd.it/jnwg9n3i1t1h1.png?width=1440&format=png&auto=webp&s=827236ef5ca2e1070c4abd8e06455d41672749bf Every time I started a new Claude chat, I had to re-explain what I'd been working on. The previous chat was gone with every refinement I'd made to my own context. So I built **LockedIn**. A Claude Code plugin that captures your experience and work as you do it, so Claude remembers it next session. 1 router skill + 6 sub skills, designed around harness engineering principles. You can say things in the Claude Code session like >save this commit as a project highlight >meeting just wrapped, log it >absorb this writeup It stores everything as structured markdown under `~/Documents/LockedIn/.` (editable!) The point is accumulation. Different sources, one place. Over time LockedIn notices overlaps and asks you one question at a time how to reconcile. The vault gets richer. The outputs get more specific. Claude already has 'Projects'. But a few things that are different. * Markdown on your filesystem instead of Anthropic's database. It's more like Obsidian. Edit it, version with git, carry it to any tool. * Typed ontology with 15 entity types like person, project, achievement, decision, instead of unstructured uploads. The skill grounds each claim in a specific entity. * Reconciliation. When new input overlaps existing knowledge, LockedIn asks you to merge or keep separate. Projects just accumulates context. Free and open source on GitHub. [github.com/daypunk/LockedIn](http://github.com/daypunk/LockedIn) Or install directly in Claude Code. /plugin marketplace add daypunk/LockedIn /plugin install lockedin@lockedin /lockedin:setup Enjoy! Feedback welcome 😉
View originalBootstrapped founders: how are you managing Claude Code costs?
I’m currently building an AI startup solo and Claude Code has genuinely improved my development speed compared to most other tools I’ve tried. The challenge is that subscription/API costs add up quickly while bootstrapping. I wanted to ask other founders and developers here: * Are you mainly using Claude subscriptions or OpenRouter/API? * Which models/workflows give the best cost vs productivity ratio? * Are there any startup programs, credits, or affordable setups you’d recommend? Right now I’m experimenting with mixing Claude, DeepSeek, and cheaper routing providers to keep costs manageable. Would love to hear how others are handling this.
View originalI built a desktop app that routes Claude Code to any LLM: DeepSeek, Ollama, Copilot, OpenRouter, and 7 more
Claude Code is the best AI coding tool I've used. But being locked to one provider, one pricing model, and one model catalog always bothered me. So I built CCPG, a desktop app (Mac/Windows/Linux) that proxies Claude Code to whatever provider you want. Install it, configure in the UI, launch with `ccpg --DeepSeek`. No YAML. No pip install. No config files. It also shows you every prompt Claude Code sends in the background, including the silent housekeeping calls you never see, with token count and latency per request. MIT, local-only, forever free. [https://github.com/danielalves96/claude-code-provider-gateway](https://github.com/danielalves96/claude-code-provider-gateway)
View originalMax20 user: anyone running Opus 4.7 as orchestrator + DeepSeek V4 as the worker via OpenRouter?
I'm on the Max20 plan, thinking about a setup before I sink time into it. Want to hear from anyone actually running it, not theorycraft. **The idea:** Opus 4.7 in Claude Code as the orchestrator. It plans, breaks down tasks, reviews code quality, catches mistakes. The actual implementation, the bulk token spend, gets delegated to DeepSeek V4 Pro through OpenRouter. DeepSeek lands credibly close to Opus 4.7 on agentic coding benchmarks at a fraction of the output-token cost, so the bet is: keep Opus for the judgment-heavy parts, don't burn it on routine implementation. **I'm not expecting huge savings.** Realistically maybe an extra 30% (guessing here) effective Opus headroom if delegation works cleanly, and even less margin now that the limits situation has loosened a bit. So part of the question is genuinely whether 30% is worth the integration friction at all, or whether it's a fun idea that doesn't pay for itself. **Pre-empting the obvious responses, because I've already thought about these:** * *"Just use Sonnet for the cheap parts."* The easy answer. But I'm specifically curious whether an external model's cost delta beats the friction, and whether anyone's actually measured it. * *"Max20 already gives generous Opus limits, why bother."* Fair. But I'd rather use Opus where it earns its keep and not think about rationing for the rest. It's about allocation, not desperation. * *"The quality gap means Opus spends all its effort fixing DeepSeek's output."* This is the actual question. DeepSeek reportedly drifts more than Opus on long agentic loops with many sequential tool calls. So does a tight review loop close that gap, or does it eat the 30%? That's what I want real data on. * *"This fights how Claude Code is built."* Probably. Claude Code's subagents run on Claude models, so I assume this needs a different tool (Aider, Cline, Kilo) or a custom routing layer. If the real answer is "don't do this in Claude Code at all," tell me what you'd use instead. I know the single-model answer. I'm after whether the split specifically works in practice.
View original5 secret Claude skills nobody is talking about
The File Reading Skill Claude can't always read your uploads intelligently by default. This skill acts as a smart router — PDF, DOCX, XLSX, CSV, JSON, images, archives — and tells Claude exactly how much to read and how to handle each format. Upload a 40-page contract. Get a precise, structured summary. Every time. No more Claude skimming past the important parts or misreading table data. The difference? Instead of guessing how to process your file, Claude follows a tested protocol built for that exact file type. The Frontend Design Skill Stop getting generic, boring UI from Claude. This skill loads it with design tokens, component patterns, layout rules, and production-grade aesthetics before it writes a single line of code. The output actually looks like something a senior designer shipped — not a ChatGPT tutorial from 2023. Use it for landing pages, dashboards, React components, or full web apps. The visual quality gap between Claude with and without this skill is not subtle. The Skill Creator Skill Yes. A skill that builds skills. You describe a workflow you keep repeating. Claude writes the full SKILL.md file with instructions, triggers, and edge case handling. You install it. Claude gets smarter. This is the compounding play. Every skill you build saves you prompting time forever. People running this in their workflow are essentially programming Claude to think like them — without writing a single line of actual code. The PPTX Skill Claude builds full PowerPoint decks — slides, layouts, speaker notes, branded structure — and exports actual .pptx files. Not HTML. Not markdown. Files you open directly in PowerPoint or present to a client. I used this to build a full client proposal deck in under 10 minutes. The skill handles things like slide hierarchy, content density, and formatting consistency that Claude normally fumbles without guidance. The Instagram Reader Skill Paste an Instagram link. Claude extracts the caption, carousel copy, slide text, and thread content. Repurpose competitor content, study what's working in your niche, or bulk-extract your own posts for a content audit — without screenshot gymnastics or manual transcription. For anyone running a content operation at scale, this one alone saves hours per week. submitted by /u/IAmAzharAhmed [link] [comments]
View originalYes, OpenRouter offers a free tier. Pricing found: $10
OpenRouter has an average rating of 5.0 out of 5 stars based on 1 reviews from G2, Capterra, and TrustRadius.
Key features include: Product, Company, Developer, Connect.
OpenRouter is commonly used for: AI model comparison, Cost management for AI services, Token consumption tracking, Model discovery for developers, Routing AI requests with fallbacks, Integration of AI agents.
OpenRouter integrates with: OpenAI, AWS Lambda, Google Cloud, Microsoft Azure, Slack, GitHub, Zapier, Twilio, Jira, Trello.
Based on user reviews and social mentions, the most common pain points are: token usage, token cost, cost tracking, API costs.
Guillermo Rauch
CEO at Vercel
2 mentions

The OpenRouter Show
Jan 28, 2026
Based on 83 social mentions analyzed, 18% of sentiment is positive, 82% neutral, and 0% negative.