Aider is recognized for its open-source nature, making it appealing to developers seeking solutions for coding projects. Its main strength is the ability to reduce complexity and streamline processes in code management. Users, however, frequently express dissatisfaction with memory and context retention issues across sessions, similar to those encountered with other tools like Claude Code. While users appreciate its role in solving developer headaches, there is a notable concern about the rising cost and resource usage of running such tools. Overall, Aider holds a positive reputation for innovation in AI coding solutions, though it faces challenges with memory efficiency and economic sustainability.
Mentions (30d)
12
2 this week
Reviews
0
Platforms
4
GitHub Stars
42,600
4,101 forks
Aider is recognized for its open-source nature, making it appealing to developers seeking solutions for coding projects. Its main strength is the ability to reduce complexity and streamline processes in code management. Users, however, frequently express dissatisfaction with memory and context retention issues across sessions, similar to those encountered with other tools like Claude Code. While users appreciate its role in solving developer headaches, there is a notable concern about the rising cost and resource usage of running such tools. Overall, Aider holds a positive reputation for innovation in AI coding solutions, though it faces challenges with memory efficiency and economic sustainability.
Features
Use Cases
Industry
accounting
Employees
1
641
GitHub followers
7
GitHub repos
42,600
GitHub stars
20
npm packages
14
HuggingFace models
Max20 user: anyone running Opus 4.7 as orchestrator + DeepSeek V4 as the worker via OpenRouter?
I'm on the Max20 plan, thinking about a setup before I sink time into it. Want to hear from anyone actually running it, not theorycraft. **The idea:** Opus 4.7 in Claude Code as the orchestrator. It plans, breaks down tasks, reviews code quality, catches mistakes. The actual implementation, the bulk token spend, gets delegated to DeepSeek V4 Pro through OpenRouter. DeepSeek lands credibly close to Opus 4.7 on agentic coding benchmarks at a fraction of the output-token cost, so the bet is: keep Opus for the judgment-heavy parts, don't burn it on routine implementation. **I'm not expecting huge savings.** Realistically maybe an extra 30% (guessing here) effective Opus headroom if delegation works cleanly, and even less margin now that the limits situation has loosened a bit. So part of the question is genuinely whether 30% is worth the integration friction at all, or whether it's a fun idea that doesn't pay for itself. **Pre-empting the obvious responses, because I've already thought about these:** * *"Just use Sonnet for the cheap parts."* The easy answer. But I'm specifically curious whether an external model's cost delta beats the friction, and whether anyone's actually measured it. * *"Max20 already gives generous Opus limits, why bother."* Fair. But I'd rather use Opus where it earns its keep and not think about rationing for the rest. It's about allocation, not desperation. * *"The quality gap means Opus spends all its effort fixing DeepSeek's output."* This is the actual question. DeepSeek reportedly drifts more than Opus on long agentic loops with many sequential tool calls. So does a tight review loop close that gap, or does it eat the 30%? That's what I want real data on. * *"This fights how Claude Code is built."* Probably. Claude Code's subagents run on Claude models, so I assume this needs a different tool (Aider, Cline, Kilo) or a custom routing layer. If the real answer is "don't do this in Claude Code at all," tell me what you'd use instead. I know the single-model answer. I'm after whether the split specifically works in practice.
View originalI measured my Claude Code MCP stack on two axes — byte savings AND cache-friendliness. My "best" byte-saver was defeating Anthropic's prompt cache (counter-example + open benchmark)
**TL;DR** — Single-axis benchmarks for MCPs, compressors, and retrieval layers can recommend a system that's *strictly worse* in production. The missing axis: **cache-friendliness** — whether the same input produces byte-identical bytes across runs, so Anthropic's prompt cache hits. In my coding-agent stack, my biggest byte-saver (retrieval MCP, 60–70% reduction) was defeating the 5-min TTL prompt cache on every call. Two runs of the same query produced different bytes because of `rg --files-with-matches` output order leaking through a `Map` insertion sequence into the final context. The fix was 2 lines: sort the rg hits before slicing, sort the `Map` entries by path. Byte savings unchanged, `cache_friendly_score` went from \~0% to 100%. https://preview.redd.it/x5foipotq93h1.png?width=1600&format=png&auto=webp&s=c0930422e882e23d1fc34ded25934c74db692a21 **Article + open benchmark harness:** * Article: [https://gregshevchenko.com/research/mcp-stack-token-economy/](https://gregshevchenko.com/research/mcp-stack-token-economy/) * Harness (stdlib-only Python, offline): [https://github.com/g-shevchenko/mcp-token-savers](https://github.com/g-shevchenko/mcp-token-savers) — see `methods/` for formal definitions, cluster-bootstrap CIs, Wilson CIs, preregistration, real-data Cohen's κ. **What the harness measures:** * `mean_ratio` \+ CV across N≥5 runs per fixture → byte-saving axis * `unique_md5_count == 1` check → cache-friendliness axis (0–100%) * 12-anti-pattern audit on tool definitions (DSA reference) **What named alternatives publicly disclose:** I surveyed the public docs for Cursor codebase index, Sourcegraph Cody, Aider repo-map, Microsoft LLMLingua / LLMLingua-2, Firecrawl / Jina Reader, RouteLLM / Martian (May 2026). https://preview.redd.it/ailemo1wq93h1.png?width=1600&format=png&auto=webp&s=4732f5d03f53ba95d2b5aaac0c7f21f1858a36a4 **Limitations:** * I hypothesized that the prep layer triggers more downstream cache hits on subsequent turns. It didn't reach significance: Welch p=0.32, Cohen's d ≈ 0.18, N=137. * Two-judge Cohen's κ on the corpus (cerebras-llama × groq-llama, N=25): κ = 0.5955 (moderate, below the 0.7 substantial threshold). 4 of 5 inter-judge disagreements concentrate on one task with an ambiguous acceptance criterion. Sharpening the spec would push κ to \~0.83. **Disclosure:** I'm the author. No commercial affiliation with the listed tools. The harness is MIT-licensed and takes any compressor as `(str) -> str`. Curious what `cache_friendly_score` looks like on others' Claude Code stacks.
View originalI built a local context compiler so AI coding agents stop re-reading the same repo
I’ve been working on an open-source tool called **Madar**. The problem I kept running into with AI coding agents is that they often rediscover the same codebase again and again. They grep, read files, summarize, lose context, then repeat the same exploration in the next task. On larger TypeScript/Node.js repos, this becomes slow, noisy, and expensive in tokens. Madar tries to solve this by acting as a local context compiler. It builds a structural graph of your codebase, then compiles compact context packs for a specific task before the agent starts broad repo exploration. The idea is not to replace file search. It is to give the agent a better starting point: * relevant files/symbols * route/service/call relationships * runtime execution slices * source locations * coverage/missing-context diagnostics * compact prompts for agents It works locally and does not require an API key to build the graph. Current support is strongest for TypeScript/Node.js projects, with framework-aware extraction for things like NestJS, Next.js, Express, Fastify, Hono, tRPC, Prisma, and routing-controllers. It can be used through MCP with tools like Claude Code, Cursor, Copilot, and Gemini, or through CLI-generated prompts for tools like Codex, Aider, and OpenCode. The package was previously called graphify-ts, but I renamed it to: @lubab/madar Install: npm install -g @lubab/madar Basic usage: madar generate . --spi madar summary madar pack "how does auth work?" --task explain madar claude install I’ve also been testing it with native-agent benchmarks. In some real backend prompts, it reduced provider-reported input tokens significantly. I’m being careful with that claim because results depend heavily on the repo and task, but the direction is promising. What I’m trying to validate now: 1. Is “context compilation” a useful layer for AI coding agents? 2. Do execution slices make codebase explanations more reliable? 3. Can we reduce token waste without hurting answer quality? 4. What benchmark format would developers actually trust? GitHub: [https://github.com/mohanagy/madar](https://github.com/mohanagy/madar) npm: [https://www.npmjs.com/package/@lubab/madar](https://www.npmjs.com/package/@lubab/madar) I’d genuinely appreciate technical feedback, especially from people using Claude Code, Cursor, Copilot, Codex, Aider, or other coding agents on larger repos.
View original10-gate security audit SKILL for web apps
There are a few security focus SKILLs. We are working another new one for web app. The main design goal is "a disciplined 10-gate review process to audit web applications deployed on cloud infrastructure." Before finalizing, I’m trying to check how developers and AI agents interact with the outputs : * **How would you use the output file?** Would you prefer to pipe the raw JSON directly back into an AI agent (like Claude Code, Cursor, or Aider) to auto-generate code patches? Or do you treat it purely as a human-readable review log? * **What are you looking for in a security SKILL?** Is your biggest pain point token bloat, lack of source-to-sink reachability analysis, or AI hallucination? Would love to hear how you integrate security playbooks into your agentic coding workflows right now. existing skills : [https://www.reddit.com/r/claudeskills/comments/1t224au/a\_massive\_security\_skill\_pack\_for\_claude\_29/](https://www.reddit.com/r/claudeskills/comments/1t224au/a_massive_security_skill_pack_for_claude_29/) and [https://github.com/Eliran-Turgeman/code-security-skills](https://github.com/Eliran-Turgeman/code-security-skills)
View originalSix agents running. Three are paused waiting for me. I haven't written a line of code in two hours.
I've been running parallel Claude Code agents for a few months. The promise was speed - 5× the throughput because 5× the agents. What actually happens by hour two: One agent stops on a yes/no. You alt-tab to it, approve, alt-tab back. Two more pause within the next minute. You scroll through their context, lose your place in the first one. Now there are four waiting. You're not writing code anymore - you're processing a decision queue you accidentally built for yourself. The agents aren't slow. You are. I started calling this the bottleself: the point where parallelism stops adding output and starts adding approvals you can't process fast enough. The ceiling on your system isn't tokens, model speed, or context window. It's the human in the loop. So I built a layer above the agents - a planner that: takes a high-level goal decomposes it into parallel subtasks spawns parallel Claude Code sub-agents - one per task has a QA sub-agent review the output pings you only when it actually can't decide Right now it's Claude Code only. Codex / Cursor / Aider integrations next. For a fresh repo with Claude Code, the planner handles decomposition + parallel execution end-to-end without me touching the keyboard. Source: [github.com/gekto-dev/gekto](http://github.com/gekto-dev/gekto) Try: npx gekto Honest question to anyone running 5+ agents: how much of your day is actually writing code vs clearing the queue your agents created? Where does the bottleself hit for you?
View originalMax20 user: anyone running Opus 4.7 as orchestrator + DeepSeek V4 as the worker via OpenRouter?
I'm on the Max20 plan, thinking about a setup before I sink time into it. Want to hear from anyone actually running it, not theorycraft. **The idea:** Opus 4.7 in Claude Code as the orchestrator. It plans, breaks down tasks, reviews code quality, catches mistakes. The actual implementation, the bulk token spend, gets delegated to DeepSeek V4 Pro through OpenRouter. DeepSeek lands credibly close to Opus 4.7 on agentic coding benchmarks at a fraction of the output-token cost, so the bet is: keep Opus for the judgment-heavy parts, don't burn it on routine implementation. **I'm not expecting huge savings.** Realistically maybe an extra 30% (guessing here) effective Opus headroom if delegation works cleanly, and even less margin now that the limits situation has loosened a bit. So part of the question is genuinely whether 30% is worth the integration friction at all, or whether it's a fun idea that doesn't pay for itself. **Pre-empting the obvious responses, because I've already thought about these:** * *"Just use Sonnet for the cheap parts."* The easy answer. But I'm specifically curious whether an external model's cost delta beats the friction, and whether anyone's actually measured it. * *"Max20 already gives generous Opus limits, why bother."* Fair. But I'd rather use Opus where it earns its keep and not think about rationing for the rest. It's about allocation, not desperation. * *"The quality gap means Opus spends all its effort fixing DeepSeek's output."* This is the actual question. DeepSeek reportedly drifts more than Opus on long agentic loops with many sequential tool calls. So does a tight review loop close that gap, or does it eat the 30%? That's what I want real data on. * *"This fights how Claude Code is built."* Probably. Claude Code's subagents run on Claude models, so I assume this needs a different tool (Aider, Cline, Kilo) or a custom routing layer. If the real answer is "don't do this in Claude Code at all," tell me what you'd use instead. I know the single-model answer. I'm after whether the split specifically works in practice.
View originalWe built a free tool that generates a DESIGN.md from any live URL, keeps AI coding agents on-brand
The Google Labs [DESIGN.md](http://DESIGN.md) spec launched last month, it*'s a machine-readable markdown file your AI coding agent reads to understand your design system. This tool automates creating it.* Paste any public URL: the tool extracts CSS variables, typography, Tailwind classes, and component patterns, then an AI assembles them into a spec-compliant DESIGN.md. Visual editor lets you fine-tune tokens before you download. Drop the file in your repo root and your agent has a consistent design reference across every session. Works with Cursor, Claude Code, GitHub Copilot, Aider, and Continue. Free, no signup. [https://www.masumi.network/tools/design-md](https://www.masumi.network/tools/design-md) https://reddit.com/link/1tb2tki/video/tlqzrvm1sp0h1/player
View originalI built a “Living Docs” system for long-term AI coding workflows
English is not my first language. AI actually told me to post this here, and also helped write this post 😅 After months of AI-assisted coding, I kept running into the same problems: \- repeating architecture context every session \- stale docs \- conflicting rules \- context drift \- AI modifying wrong parts of the project \- knowledge disappearing between sessions So I started building a documentation system specifically for AI workflows. The idea became something I now call “Living Docs”. Core idea: The same agent that changes the code is also responsible for maintaining the documentation and operational memory. But there is one important constraint: Documentation is NOT updated automatically after every task. The human confirms the code is correct first. Then the agent performs a deliberate “doc sweep” to sync the docs. Otherwise wrong code can mutate the docs, and then future sessions start treating incorrect behavior as truth. Some core rules from the system: One file owns each rule. No duplication. If a rule exists in two places, you now have two sources of truth, which means you have none. Code is primary truth for behavior. Docs are primary truth for intent. The docs are not static reference material. They act as institutional memory shared between humans and AI across sessions. The architecture has 3 layers: \- codebase \- LLM-maintained docs \- governance/schema layer The governance layer tells the agent: \- which docs to load \- which file owns what \- when documentation updates are allowed \- how to prevent duplication and context drift Still experimental, but it already improved long-session stability a lot for me on larger projects. Repo: https://github.com/Diew/living-docs Would genuinely love feedback from people working with Cursor, Claude Code, Aider, Roo, OpenHands, etc.
View originalFor system designers
Open-source spec studio for Claude Code. Draft a Markdown spec + an architecture diagram in the browser, then hand it off three ways: paste your API key, copy to claude.ai, or run a generated CLI snippet if you only have Claude Code. Optional: drop a GitHub PAT and it pushes CLAUDE.md straight to a branch + PR. I built the whole thing with Claude Code — the Vite migration, the BYOK integration, the pluggable storage layer, 95 tests, the wiki, even the screenshots (Playwright drives the real app). Free, MIT, no signup, no telemetry, keys stay in your browser. https://github.com/Hesper-Labs/architect
View originalI got tired of AI agents destroying my codebase and eating tokens, so I built a self-bootstrapping Markdown protocol to fix their memory.
Hi everyone, If you use Claude, Cursor, Copilot, or Gemini for large projects, you know the pain: after 20 messages, the AI's context window gets bloated. It forgets the architecture, hallucinates features, or worse, overwrites perfectly good code because it didn't read the right files. I realized the problem isn't the models; it's how we manage their memory. So I created **BEMYAGENT**: a single, lightweight Markdown file (`BEMYAGENT.md`) that acts as an "Agent OS". You just drop it into your project root, tell your AI to "Execute BEMYAGENT.md bootstrap", and it automatically generates a strictly separated file structure: * `docs/` (Immutable truth): `01-overview`, `02-architecture`, `03-code-map`. The AI is forced to use **Lazy Loading** (it's instructed *never* to read feature specs unless strictly required for the current task). * `work/` (Volatile memory): Uses a **Fractal TTE (Think-Task-Execute)** workflow based on Hierarchical Task Networks (HTN). If a task is too big, the AI must decompose it into sub-folders instead of executing blindly. **The coolest feature? Model Handoff / Pacing.** I built a configuration state right into the rules. You can tell the AI to switch to `INTERACTIVE` mode. It will use a heavy model (like o1 or Claude 3.5 Sonnet) to write the `01_think.md` strategy, then it **pauses**. You swap to a fast/cheap model (like Haiku or Flash) in your UI or CLI, and tell it to execute the code. Massive token/cost savings. It works with any AI UI or CLI tool (Aider, Cline, etc.) because it's just Markdown. I’d love for you to try it out or tear the architecture apart. Repo here: [https://github.com/vitotafuni/bemyagent](https://github.com/vitotafuni/bemyagent)
View originalI built a free local MCP server that cut my Claude Code PR review prompt from 63K to 8.7K tokens
Every time I asked Claude Code something about my codebase — "how does the v2 pipeline work?", "what calls this function?", "is this PR safe?" — the agent walked the repo from scratch. Glob, Grep, Read, Read, 8–10 sequential tool calls per question. Same structure rediscovered every time, and the input-token bill kept growing. So I built graphify-ts. It builds a local knowledge graph of your code at index time (tree-sitter AST + Louvain communities + BM25 + optional local ONNX rerank) and exposes it as an MCP stdio server. Instead of 8–10 tool calls, Claude Code makes one \`retrieve\` call and gets the relevant slice back. Fully local — your code never leaves the laptop. Numbers I actually measured (verify.sh in the repo re-derives all of them from committed evidence): Real production NestJS + Next.js codebase, 1,268 files, same Claude Opus 4.7 question both runs: \- Tool-call turns: 9 → 3 \- Input tokens: 615,190 → 233,508 (2.6× fewer) \- Latency: 96 sec → 35 sec (2.8× faster) \- Both numbers from \`claude --output-format json\` usage field, not local estimates Real 36-file production PR review: \- Prompt tokens: 63,024 → 8,690 (7.25× smaller) \- Same reviewer, same diff, same review depth — both runs flagged the same hotspots Multi-repo question across 3 repos: \- Estimated naive prompt: \~1.5M tokens (literally couldn't fit in any window) \- With graphify-ts: 2,800 tokens \- Caveat up front: the 1.5M is a structural estimate, not a sent prompt. Calling that out so it's not buried. Install: npm install -g @mohammednagy/graphify-ts cd your-project graphify-ts generate . graphify-ts claude install Also works with Cursor, Copilot, Gemini CLI, Aider, OpenCode via \`<agent> install\`. Honest trade-offs: \- Cold-start sessions cost about 13% more than no-graph baseline because the MCP server adds \~5K of tool-schema overhead at session init. Multi-question sessions amortize this. The default \`core\` profile ships 6 tools to keep that overhead small; opt into the full 21-tool surface with \`GRAPHIFY\_TOOL\_PROFILE=full\`. \- Deep extraction is best on JS/TS with framework-aware passes for Express, NestJS, Next.js, Redux Toolkit, React Router. Python/Ruby/Go/Java/Rust use plain tree-sitter AST. C/Kotlin/C#/Scala/PHP/Swift/Zig use a generic structural extractor. \- It's a structural map for an agent, not a complete program-analysis database. Heavily meta-programmed routes fall back to the base AST. GitHub: [https://github.com/mohanagy/graphify-ts](https://github.com/mohanagy/graphify-ts) (MIT, Node 20+) I'd genuinely like counterexamples — the cases where structural slicing breaks. If you've got a repo where this approach should fail, I want to know before someone bets a real review pipeline on it.
View originalI built a tool that cut my Claude Code token bill 89%. v3.4 just shipped, works in 8 IDEs.
Quick context: I have been hitting Claude Code Max 5x limits in under 2 hours on real work. The session counter goes from 21% to 100% on a single complex prompt. If you have been on the recent threads, you know exactly what I mean. So I built engramx. It is an MCP server plus a SQLite knowledge graph that intercepts file reads at the agent boundary. When Claude is about to read a file engram has indexed, the hook returns a structural summary instead of the raw content. Same edit, same diff, far fewer tokens consumed in the round trip. The benchmark is committed to the repo. On a real 87-file codebase, the aggregate reduction is 89.1%. Best-case file dropped from 18,820 tokens to 306. The bench script is `bench/real-world.ts`, you can run it on any project you own. v3.4 shipped Friday and all the install paths are live now. The same engram works across 8 IDEs natively. Claude Code (hooks plus the official plugin in review), Cursor (MDC plus MCP plus a VS Code extension on OpenVSX), Cline, [Continue.dev](http://Continue.dev), Aider, Windsurf, Zed, OpenAI Codex CLI. One install, one graph, every tool benefits. It is local-first. SQLite database lives at `.engram/graph.db` in your repo. Nothing leaves your machine. Apache 2.0. No account, no telemetry. npm install -g engramx cd ~/your-project engram setup Cursor users can install the extension directly: code --install-extension nickcirv.engram-vscode Heads up on what comes next. v4.0 "Mesh + Spine" lands May 25. Adds an opt-in federation layer so engram instances on different machines exchange mistakes and ADRs without sharing source. Phase 1 foundation already merged this week (ed25519 identity, 14-category PII gate, 1007 tests). Subscribe via the GitHub Discussions page if you want updates. There is also a `engram cost` command that tracks how many tokens it has saved you, per project per week. After 24 hours of normal use the digest shows real numbers. Repo and benchmark: [github.com/NickCirv/engram](http://github.com/NickCirv/engram) Happy to answer questions. If you have hit the new rate limits and want a second pair of hands on it, comment your stack and I will help.
View originalI kept losing track of work, insights, and improvement ideas I deferred mid-task. Built a Claude Code skill to track, surface, and manage them across scattered project files.
Every project I work on accumulates deferred items in several places: a Deferred.md at the repo root, plan files in some "deferred" folder, audit-tool ledgers, code comments likeTODO: come back to this, memory entries for AI assistants, and paused plan files in ~/.claude/plans/. Later, when I have time to address deferred items, I find some have gone stale. Some got fixed when other things got fixed. Some probably will sit forever because I didn't remember them. I worked with Claude Code to find patterns that fixed this for my app (Stuffolio, a Universal Swift codebase shipping to iOS, iPadOS, and macOS), and developed the results into a standalone Claude Code skill: unforget, a single source of truth for deferred work. The full format (four sections, ten columns, color-coded ratings) is in the README and SKILL.md. Quick, but worth a read if you want to see the structure. The skill is functional today via Claude Code's /skill invocation. Drop SKILL.md in your skill path, then run /skill unforget init (or /skill unforget add "...") in any session. Claude follows the seven-phase spec to do the work. Same pattern as other SKILL.md-based tools like /skill humanizer or /skill prompter. The seven-phase init flow has been validated against two real projects (one complex Universal app, one minimal third-party skill). v0.2 will ship as a polished Claude Code plugin (.claude-plugin/ install) so you can invoke /unforget add without the /skill prefix. Functionality unchanged; ergonomics improved. Beta testers willing to try the format on their own projects, especially: Minimal repos (small libraries, single-purpose tools). The format was designed against a complex codebase; I want to catch where it doesn't fit small projects. Non-Apple-platform projects (web, Android, backend services, libraries). The Target/release-cycle column is most natural for App Store submission cadences; want to validate it works for other deploy patterns too. Projects using non-CLAUDE.md AI instruction files (Warp's WARP.md, Cursor's .cursorrules, Aider's .aider.conf.yml). Early testing already revealed the wiring step shouldn't hardcode filenames; want more variety in what the format encounters. Continuous-deployment workflows. The spec has a "Continuous" preset (Window column instead of Target with NOW / THIS WEEK / THIS MONTH / SOMEDAY values) but it's the least field-tested of the three presets. If you try it and something in the skill breaks down, opening an issue describing the failure mode is the highest-value feedback right now. Real-project gaps shape what v0.2's runtime implementation actually does. Repo: https://github.com/Terryc21/unforget Apache 2.0 licensed. The README has the full caveats and a Companion Skills section linking to the other skills the same project family produced. Happy to answer questions in the comments. Engagement plan after posting If the post lands and gets traction, the highest-value comment threads to engage with: "Why not just use [GitHub Issues / Linear / Jira / etc.]?": those are for tracked work; this is for deferred work that doesn't deserve a ticket but shouldn't be lost. The Target column is the differentiator. "What's v0.2 going to add then?": v0.2 packages the skill as a Claude Code plugin so you can run /unforget add "..." without the /skill prefix. The functionality is the same; v0.2 is about install ergonomics. The seven-phase flow, the four sections, the 10-column table, the promotion ritual are all working today via /skill unforget. "How does this differ from [other Claude Code skills doing similar things]?": likely no direct competitor exists; the closest things are general task trackers (not deferred-specific) or per-project Deferred.md conventions (not standardized). The single-source-of-truth plus Target-column promotion ritual is genuinely novel as far as I've seen. "Is this just a todo system?": see "Why a Target column" section in the body. Most todo systems collapse Urgency and Release into Priority. This skill keeps them separate, which is the actual mental model for "we know it's bad but the calendar says next sprint." submitted by /u/BullfrogRoyal7422 [link] [comments]
View originalLessons from building a coding agent for 8k context windows: token budgeting, parallel executors, and per-file isolation
Most AI coding tools (Cursor, Aider, Claude Code) assume you have a 200k-token model. If you're running local LLMs through Ollama or LM Studio, or hitting free-tier cloud APIs like Groq or OpenRouter, you've got around 8k tokens to work with. That doesn't fit a whole project, barely fits a single large file. I spent the last few weeks building a CLI coding agent that's designed around the 8k constraint instead of fighting it. Wanted to share what I learned, because some of it surprised me. The core insight: the LLM never needs to see your whole project. Most agents try to stuff as much context as possible into a single call. With 8k tokens that's a non-starter. The approach that worked for me is splitting the work into roles: A planner call that only sees a lightweight project map (Markdown summaries of each folder, ~300-500 tokens for the whole project) plus the user's request, and outputs a task list. Executor calls that each see exactly one file plus one task. Never two files in the same call. An orchestrator that's pure code, absolutely no LLM, building a dependency graph between tasks and deciding what runs in parallel vs sequential. This split means the LLM only ever reasons about a small, bounded amount of code at any one time. The planner doesn't need to see code at all (just file summaries), and the executor only sees one file. Multi-file refactors stop being a context-window problem and become a scheduling problem. Token budgeting has to be enforced in code, not promised in a prompt. Every LLM call goes through a canFit() check that measures: system prompt + reserved output tokens + memory + actual code. If the code doesn't fit, the agent automatically falls back to a per-file line index (generated once for files over ~150 lines) and pulls only the relevant section. Concrete budget math for 8192 tokens: System prompt + instructions: ~1000 Reserved for response: ~2000 Short-term memory (4 entries): ~360 Available for actual code: ~4800 (about 140-190 lines) Parallel execution is the speed multiplier that makes 8k usable. Because each executor sees only one file, independent edits across files can run simultaneously. A 5-file refactor that would be slow if run sequentially completes in roughly the time of the longest single edit. The dependency graph (built in pure code from the planner's task list) decides which tasks have to wait for which. A few things that tripped me up along the way: Question-style requests overwriting files. The first version had no concept of read-only operations, so asking "how many lines does X have?" caused the executor to write the answer into the file. Fixed by adding an action_type: "query" field to the planner's output that routes through a separate code path that never touches disk. Stale project maps causing silent misroutes. If the user named a file in their request that wasn't in the context map (because they just renamed it, or hadn't refreshed), the planner would silently route the action to the closest match. Now the orchestrator validates that mentioned file paths actually exist on disk and throws a clear error if they don't. Markdown fences in executor output. Even when explicitly told not to, smaller models love wrapping code in triple backticks. Strip them in post-processing rather than fighting the prompt. Memory token cost. Initially didn't budget for it; persistent memory is great but it's another ~80-90 tokens per entry that has to come out of the code budget. Now folder context is dropped first when the budget is tight, then memory, before the actual code gets cut. What I'm still figuring out: Whether the planner/executor split scales cleanly to codebases over 50 files. The dependency graph stays manageable, but the project map starts costing real tokens once you have enough folders. Currently dropping folder context first when budget is tight, but that means deeper edits get less context. Curious if anyone else has run into this and how they handle it. Open-sourced the implementation if anyone wants to dig in: https://github.com/razvanneculai/litecode submitted by /u/BestSeaworthiness283 [link] [comments]
View originalI added voting to my AI tools library, now the ratings are community-driven, not just mine
a few weeks ago I posted about building a library that tracks 120+ AI coding tools by how long their free tier actually lasts. the response was good but the most common feedback was "your scores are subjective." fair point. so I rebuilt the rating system. you can now sign in with Google and vote on any tool directly. the scores update in real time based on actual user votes, not just my personal assessment. if you think I rated something wrong, you can now do something about it instead of just commenting. also shipped dark mode because apparently I was the only person who thought the default looked fine. **what Tolop actually is if you're new:** every AI tool claims to be free. most aren't, or at least not for long. Tolop tracks the real limits: how many completions, how many requests, how long until you hit the wall under light use vs heavy use vs agentic sessions. it also flags the tools where "free" means you're still paying Anthropic or OpenAI through your own API key. 120+ tools across coding assistants, browser builders, CLI agents, frameworks, self-hosted tools, local models, and a new niche tools category for single-purpose utilities that don't fit anywhere else. **a few things the data shows that I found genuinely interesting:** * Gemini Code Assist offers 180,000 free completions per month. GitHub Copilot Free offers 2,000. same category, 90x difference * several of the most popular tools (Cline, Aider, Continue) are free to install but require paid API keys, so "free" is misleading * self-hosted tools have by far the most generous free tiers because the cost is on your hardware, not a server would genuinely appreciate votes on tools you've actually used, the more real usage data behind the scores, the more useful the ratings get for everyone. [tolop.space](http://tolop.space) :- no account needed to browse, Google login to vote.
View originalGot into Anthropic's Opus 4.7 hackathon — pushing Verified Skill (security + evals + package manager for AI agent skills, 49 platforms) this week
Approved at 1:39 AM this morning. 500 builders, $100K pool, virtual, judges from the Claude Code team. Apr 21-28. The product (already shipping, this week I push harder) Verified Skill is what every AI agent ecosystem is missing: security + quality + distribution for AI skills. Security — skills execute code, touch your tools, read your files. 52 known attack patterns. We scan and grade every skill 3 tiers (Scanned / Verified / Certified) before install. Quality — Skill Studio (npx vskill studio) is a 100% local eval framework. Plain-English test cases. A/B vs baseline. Multi-model (Claude, GPT, Gemini, Llama, Ollama). Nothing similar exists for AI skills today. Distribution — vskill CLI. Universal package manager. Works across 49 agent platforms (Claude Code, Cursor, Copilot, Windsurf, Codex, Gemini CLI, Cline, Aider, and more). The bet Every agent platform runs SKILL.md now. The question isn't "which format wins" — it has. The question is who builds the infrastructure around it. This week with Opus 4.7 Agent-aware generation: one skill source → tailored outputs per agent Smarter routing based on target-agent capabilities Tighter eval loops Daily ships Stack: Node.js ESM CLI, Cloudflare Workers + D1 + Prisma, Next.js 15 dashboard. Orchestrated through SpecWeave — my spec-driven dev framework (open source): https://spec-weave.com Links - Verified Skill: https://verified-skill.com - SpecWeave: https://spec-weave.com Swap notes Anyone else in the cohort? Anyone shipping developer tooling who wants to compare notes this week? submitted by /u/OwenAnton84 [link] [comments]
View originalRepository Audit Available
Deep analysis of Aider-AI/aider — architecture, costs, security, dependencies & more
Aider uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Cloud and local LLMs, Maps your codebase, 100+ code languages, Git integration, In your IDE, Images & web pages, Voice-to-code, Linting & testing.
Aider is commonly used for: Pair programming with LLMs for new project initiation, Enhancing existing codebases with AI suggestions, Automating code linting and testing processes, Generating documentation for codebases, Translating voice commands into code snippets, Integrating AI-generated images and web pages into projects.
Aider integrates with: GitHub, GitLab, Bitbucket, Jira, Slack, Trello, Visual Studio Code, JetBrains IDEs, Eclipse, Notion.
Aider has a public GitHub repository with 42,600 stars.
Marc Raibert
Founder at Boston Dynamics
1 mention
Based on user reviews and social mentions, the most common pain points are: token cost, token usage, API costs, large language model.
Based on 38 social mentions analyzed, 21% of sentiment is positive, 76% neutral, and 3% negative.