Why context quality determines code quality
Every AI coding tool uses the same models. Augment's Context Engine is the difference. We maintain a live understanding of your entire stack — code, dependencies, architecture, and history. Aggregate performance across functional correctness, style, and context awareness. Code executes as intended, passes tests, and handles edge cases without logical errors. Solution fully implements the requested feature scope, leaving no placeholders or TODOs. Intelligently leverages existing project utilities, types, and components to minimize technical debt. Code matches the unique patterns, naming conventions, and architecture of the codebase. Blind study comparing 500 agent-generated pull requests to merged code written by humans on the Elasticsearch repository — 3.6M Java LOC from 2,187 contributors A developer workspace where agents are coordinated, specs stay alive, and every workspace is isolated. AI-powered coding in your terminal. For engineers who prefer the command line. Same Context Engine, same powerful agents, no GUI required. Add rate limiting to the API endpoints I'll add rate limiting to your API. Let me check the existing middleware setup. Most AI-generated code needs cleanup. Augment agents are different: our deep contextual understanding of your codebase means the code they write is superior, not slop. Augment is built for pro software teams: Using String(children) to derive heading IDs can return '[object Object]' when the heading contains inline elements (e.g., code/emphasis), leading to anchor mismatches with the TOC; consider extracting plain text from children (also applies to h2/h3). Was this useful? React with thumbs up or thumbs down The only AI code reviewer that thinks like a senior engineer. We benchmarked 7 leading tools on real production codebases—Augment delivered the highest precision and recall by a significant margin. Context-powered reviews catch critical bugs without the noise. Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Features
Industry
information technology & services
Employees
75
Funding Stage
Venture (Round not Specified)
Total Funding
$252.0M
Pricing found: $20 /month, $60 /month, $200 /month, $20, $60
A Claude memory retrieval system that actually works (easily) and doesn't burn all my tokens
TL;DR: By talking to claud and explaining my problem, I built a very powerfu local " memory management" system for Claude Desktop that indexes project documents and lets Claude automatically retrieve relevant passages that are buried inside of those documents during Co-Work sessions. for me it solves the "document memory" problem where tools like NotebookLM, Notion, Obsidian, and Google Drive can't be queried programmatically. Claude did all of it. I didn't have to really do anything. The description below includes plenty of things that I don't completely understand myself. the key thing is just to explain to Claude what the problem is ( which I described below) , and what your intention is and claude will help you figure it out. it was very easy to set this up and I think it's better than what i've seen any youtuber recommend The details: I have a really nice solution to the Claude external memory/external brain problem that lots of people are trying to address. Although my system is designed for one guy using his laptop, not a large company with terabytes of data, the general approach I use could be up-scaled just with substitution of different tools. I wanted to create a Claude external memory system that is connected to Claude Co-Work in the desktop app. What I really wanted was for Claude to proactively draw from my entire base of knowledge for each project, not just from the documents I dropped into my project folder in Claude Desktop. Basically, I want Claude to have awareness of everything I have stored on my computer, in the most efficient way possible (Claude can use lots of tokens if you don't manage the "memory" efficiently. ) I've played with Notion and Google Drive as an external brain. I've tried NotebookLM. And I was just beginning to research Obsidian when I read this article, which I liked very much and highly recommend: https://limitededitionjonathan.substack.com/p/stop-calling-it-memory-the-problem That got my attention, so I asked Claude to read the document and give me his feedback based on his understanding of the projects I was trying to work on. Claude recommended using SQLite to connect to structured facts, an optional graph to show some relationships, and .md files for instructions to Claude. But...I pointed out that almost all of the context information I would want to be retrievable from memory is text in documents, not structured data. Claude's response was very helpful. He understood that although SQLite is good at single-point facts, document memory is a different challenge. For documents, the challenge isn't storing them—it's retrieving the right passage when it's relevant without reading everything (which consumes tokens). SQLite can store text, but storing a document in a database row doesn't solve the retrieval problem. You still need to know which row to pull. I asked if NotebookLM from Google might be a better tool for indexing those documents and making them searchable. Claude explained that I was describing is a Retrieval-Augmented Generation (RAG) problem. The standard approach: Documents get chunked into passages (e.g., 500 words each) Each chunk gets converted to an embedding—a vector that captures its meaning When Claude needs context, it converts the query to the same vector format and finds the semantically closest chunks Those chunks get injected into the conversation as context This is what NotebookLM is doing under the hood. It's essentially a hosted, polished RAG system. NotebookLM is genuinely good at what it does—but it has a fundamental problem for my case: It's a UI, not infrastructure. You use it; Claude can't. There's no API, no MCP tool, no way to have Claude programmatically query it during a Co-Work session. It's a parallel system, not an integrated one. So NotebookLM answers "how do I search my documents as a human?"—not "how does Claude retrieve the right document context automatically?" After a little back and forth, here's what we decided to do. For me, a solo operator with only a laptop's worth of documents that need to be searched, Claude proposed a RAG pipeline that looks like this: My documents (DOCX, PDF, XLSX, CSV) ↓ Text extraction (python-docx, pymupdf, openpyxl) ↓ Chunking (split into ~500 word passages, keep metadata: file, folder, date) ↓ Embedding (convert each chunk to a vector representing its meaning) ↓ A local vector database + vector extension (store chunks + vectors locally, single file) ↓ MCP server (exposes a search_knowledge tool to Claude) ↓ Claude Desktop (queries the index when working on my business topics) With that setup, when you're talking to Claude and mention an idea like "did I pay the overdue invoice" or "which projects did Joe Schmoe help with," Claude searches the index, gets the 3-5 most relevant passages back, and uses them in its answer without you doing anything. We decided to develop a search system like that, specific to each of my discrete projects. Th
View originalI built an AI reasoning framework entirely with Claude Code — 13 thinking tools where execution order emerges from neural dynamics
I built Sparks using Claude Code (Opus) as my primary development environment over the past 2 weeks. Every module — from the neural circuit to the 13 thinking tools to the self-optimization loop — was designed and implemented through conversation with Claude Code. What I built Sparks is a cognitive framework with 13 thinking tools (based on "Sparks of Genius" by Root-Bernstein). Instead of hardcoding a pipeline like most agent frameworks, tool execution order emerges from a neural circuit (~30 LIF neurons + STDP learning). You give it a goal and data. It figures out which tools to fire, in what order, by itself. How Claude Code helped build it Architecture design: I described the concept (thinking tools + neural dynamics) and Claude Code helped design the 3-layer architecture — neural circuit, thinking tools, and AI augmentation layer. The emergent tool ordering idea came from a back-and-forth about "what if there's no conductor?" All 13 tools: Claude Code wrote every thinking tool implementation — observe, imagine, abstract, pattern recognition, analogize, body-think, empathize, shift-dimension, model, play, transform, synthesize. Each one went through multiple iterations of "this doesn't feel right" → refinement. Neural circuit: The LIF neuron model, STDP learning, and neuromodulation system (dopamine/norepinephrine/acetylcholine) were implemented through Claude Code. The trickiest part was getting homeostatic plasticity right — Claude Code helped debug activation dynamics that were exploding. Self-improvement loop: Claude Code built a meta-analysis system where Sparks can analyze its own source code, generate patches, benchmark before/after, and keep or rollback changes. The framework literally improves itself. 11,500 lines of Python, all through Claude Code conversations. What it does Input: Goal + Data (any format) Output: Core Principles + Evidence + Confidence + Analogies I tested it on 640K chars of real-world data. It independently discovered 12 principles — the top 3 matched laws that took human experts months to extract manually. 91% average confidence. Free to try ```bash pip install cognitive-sparks Works with Claude Code CLI (free with subscription) sparks run --goal "Find the core principles" --data ./your-data/ --depth quick ``` The default backend is Claude Code CLI — if you have a Claude subscription, you can run Sparks at no additional cost. The quick mode uses only 4 tools and costs ~$0.15 if using API. Also works with OpenAI, Gemini, Ollama (free local), and any OpenAI-compatible API. Pre-computed example output included in the repo so you can see results without running anything: examples/claude_code_analysis.md Links PyPI: pip install cognitive-sparks Happy to answer questions about the architecture or how Claude Code shaped the development process. submitted by /u/RadiantTurnover24 [link] [comments]
View originalThe Jose robot at the airport is just a trained parrot
Saw the news about Jose, the AI humanoid greeting passengers in California, speaking 50+ languages. Everyone's impressed by the language count. But here's what nobody's talking about - he's doing exactly what a well-trained chatbot does, except with a body and a face. I've spent months building actual workflows with Claude Code. The difference between a working tool and a novelty is whether it solves a real problem or just looks impressive. Jose answers questions and gives info about local attractions. That's a prompt with retrieval-augmented generation and a text-to-speech pipeline attached to a robot. The problem today isn't building, it's distribution and adoption. A humanoid robot that greets people is distribution theater. It gets press. It gets attention. But does it actually improve passenger experience compared to a kiosk or a mobile app? Or is it just novel enough that people want to film it? I'm not saying robots are useless. I'm saying we're confusing "technically impressive" with "practically valuable." The real test: will airports measure this in passenger satisfaction improvement, or just in social media mentions? If it's the latter, it's a marketing tool wearing an AI label. submitted by /u/Temporary_Layer7988 [link] [comments]
View originalJSONL: The Operational Ledger Claude Didn't Tell You About
Somewhere along the way I discovered how Claude uses .jsonl files. Turns out they serve as the forensic context, in a manner of speaking. The .jsonl files are the ground truth - every domain concept made concrete. Cache hits, tool calls, agent spawns, permission modes, compact boundaries, task lifecycles, all timestamped and queryable. Not theory. Your actual sessions, proving the exam tests real production patterns you've already been running. I'm studying to become a Claude Certified Architect. I feel like an idiot most of the times. Like how does this all work. Then I realized that Claude creates .jsonl files which is like a ledger of everything Claude does in the background. If you're using claude right now, here are the prompts: Prompt 1: ❯ in 50 words tell me significance of .jsonl files in helping students understand how claude architect exam domain and task concepts work in practice in their everyday claude code use behind the scenes Prompt 2: ❯ create a forensic token analysis and claude operational analysis using tables derived from information in the most recent .jsonl files so I can understand what’s happening underneath the hood of the most recent claude sessions; hooks, tool calls, token usage, agent actions, subagent SDK actions, context passing, etc. Prompt 3: ❯ in 50 words tell me significance of .jsonl files in helping students understand how claude architect exam domain and task concepts work in practice in their everyday claude code use behind the scenes Now you know what Claude is up to, not what it tells you it's up to. BONUS: You know how all the VCs are asking to see your Claude Transcripts? You can create a custom post-mortem based on the .jsonl files to augment your Claude Transcript with actual operational and token cost accounting data in the .json files. submitted by /u/Ok_Dance2260 [link] [comments]
View originalI built a CLI that installs MCP, skills, prompts, commands and sub-agents into any AI tool (Cursor, Claude Code, Windsurf, etc.)
Install Sub-agents, Skills, MCP Servers, Slash Commands and Prompts Across AI Tools with agent-add agent-add lets you install virtually every type of AI capability across tools — so you can focus on what to install and where, without worrying about each tool's config file format. https://preview.redd.it/kemovi39qitg1.jpg?width=1964&format=pjpg&auto=webp&s=b994b81f343ee01afdf23392e13e0d472c71a47d It's especially useful when: You're an AI capability developer shipping MCP servers, slash commands, sub-agents, or skills Your team uses multiple AI coding tools side by side You can also use agent-add simply to configure your own AI coding tool — no need to dig into its config file format. Getting Started agent-add runs directly via npx — no install required: npx -y agent-add --skill 'https://github.com/anthropics/skills.git#skills/pdf' agent-add requires Node.js. Make sure it's installed on your machine. Here's a more complete example: npx -y agent-add \ --mcp '{"playwright":{"command":"npx","args":["-y","@playwright/mcp"]}}' \ --mcp 'https://github.com/modelcontextprotocol/servers.git#.mcp.json' \ --skill 'https://github.com/anthropics/skills.git#skills/pdf' \ --prompt $'# Code Review Rules\n\nAlways review for security issues first.' \ --command 'https://github.com/wshobson/commands.git#tools/security-scan.md' \ --sub-agent 'https://github.com/VoltAgent/awesome-claude-code-subagents.git#categories/01-core-development/backend-developer.md' For full usage details, check the project README, or just run: npx -y agent-add --help Project & Supported Tools The source code is hosted on GitHub: https://github.com/pea3nut/agent-add Here's the current support matrix: AI Tool MCP Prompt Skill Command Sub-agent Cursor ✅ ✅ ✅ ✅ ✅ Claude Code ✅ ✅ ✅ ✅ ✅ Trae ✅ ✅ ✅ ❌ ❌ Qwen Code ✅ ✅ ✅ ✅ ✅ GitHub Copilot ✅ ✅ ✅ ✅ ✅ Codex CLI ✅ ✅ ✅ ❌ ✅ Windsurf ✅ ✅ ✅ ✅ ❌ Gemini CLI ✅ ✅ ✅ ✅ ✅ Kimi Code ✅ ✅ ✅ ❌ ❌ Augment ✅ ✅ ✅ ✅ ✅ Roo Code ✅ ✅ ✅ ✅ ❌ Kiro CLI ✅ ✅ ✅ ❌ ✅ Tabnine CLI ✅ ✅ ❌ ✅ ❌ Kilo Code ✅ ✅ ✅ ✅ ✅ opencode ✅ ✅ ✅ ✅ ✅ OpenClaw ❌ ✅ ✅ ❌ ❌ Mistral Vibe ✅ ✅ ✅ ❌ ❌ Claude Desktop ✅ ❌ ❌ ❌ ❌ submitted by /u/pea3nut [link] [comments]
View originalUsed Anthropic's Economic Index data to build a career outlook tool for 756+ jobs
Spent the weekend building this with Claude Code (Opus for architecture, Haiku for the runtime generation). It combines Anthropic's Economic Index task penetration data with O*NET job breakdowns and BLS employment projections to give each role a career outlook score. The interesting part technically: Haiku generates tailored narratives and task breakdowns per role on the fly, cached to Supabase after first generation. It rewrites generic O*NET task descriptions into role-specific language and fills in gaps where task categories are empty. For the niche roles that aren't in the dataset, Haiku suggests the closest standard occupations from its training knowledge, fuzzy-matches them against the database, and blends the data to estimate a score. The data itself is pretty reassuring. Mostly augmentation, over replacement (which will help me sleep at night). Curious what this sub thinks of their own scores! Also very down for feedback as this is a lil' baby v1! submitted by /u/BowlerEast9552 [link] [comments]
View originalWe found that connection structure matters more than explicit memory for pattern retention - implications for memory architectures?
We've been running numerical experiments on how patterns persist on different geometric substrates (networks of connected nodes with simple local update rules). The setup is a toy model - not a neural network - but the finding might be relevant to how we think about memory and retrieval in graph-structured systems. The setup: A localised activation pattern (think: a 'blob of signal') evolves on a graph. At each step, each node carries forward some of its current state, reconstructs from its neighbours, and loses some to decay. We added an explicit "memory field" - a slowly decaying record of past activation that feeds back into the update. Then we swept two parameters: how long memory persists, and how strongly it feeds back. The key finding: On a Penrose tiling (an aperiodic graph with long-range order and no repeating structure), the native tile-edge connections already function as retained influence. Adding explicit memory barely helps - the graph structure is already doing memory's job. On periodic lattices and random graphs, explicit memory helps a lot, partially compensating for their less structured connectivity. The falsification test: We took the Penrose graph and randomly rewired all its edges while keeping each node's degree exactly the same (same positions, same degree distribution, scrambled connections). Result: At zero memory: rewired and native perform identically. Positions alone set the baseline. At maximum memory: native Penrose gains 0.23 in retention. Rewired gains 0.01. A 20:1 ratio. At high memory, the rewired graph actually performs WORSE than the periodic and random controls - memory through incoherent connections creates noise rather than reinforcement. The punchline: Positions set the floor. Connections set the ceiling. Memory is the mechanism that lets the system reach from floor to ceiling - but only if the connections encode structure. Destroy the structure (while keeping everything else identical) and memory becomes useless or actively harmful. Why this might matter for ML: If you're building memory or retrieval systems on top of graph structures (knowledge graphs, retrieval-augmented generation, graph neural networks), this suggests that the topology of your connections might matter more than the strength or persistence of your memory mechanism. Well-structured connections might make explicit memory partially redundant. Poorly structured connections might make additional memory actively counterproductive. This is a toy model and we're not claiming direct applicability to neural architectures. But the principle - that connection structure and memory are not independent design choices - is worth consideration. Code by Claude Opus 4.6 Available on request Proper Falsification submitted by /u/Neat_Pound_9029 [link] [comments]
View originalAI hype burst - yet powerful
I started building app (who nobody cares) a long time ago, and I was so impressed that I was just building, building building, without realizing the amount of bugs or lazy fallbacks, AI was producing. My experience was, I spend 3-5 building a full stack app, when completed, then next stage was 2-3 weeks debugging, only to get the full stack app running, then debugging continued. I created, agents, commands, skills to counter part the AI tendency to implement lazy fallbacks, fake information, hallucinations, etc.. but AI persistence on all of the mention issues is so strong, that I learned to leave with it and constantly try to spot these out as early as possible. I created a skill to run regular on my any of my codebase published on https://www.reddit.com/r/ClaudeAI/comments/1s1a9tp/i_built_a_codebase_review_skill_that_autodetects/ . This skill was built with a concept learn from ML models, for every bug identified, 3 agents spawn run separate validations and results are presented for a vote, then the decision is based on winning votes, minimizing hallucinations. I was happy to find that the skill was working and fixing lots of issues, however I then found out an article in claude about AI hallucination power, mentioning the capacity of AI to also identify non-existing bugs and introduce new bugs by fixing non existing bugs, oh dear! Can't find the link to the article, but If I find it again I'll share it. Next, I found another article about an experiment run by a claude developer, about harnessing design for long term running applications, which can be found on https://www.anthropic.com/engineering/harness-design-long-running-apps , this provided really good insights and concepts, including using Generative Adversarial Networks (GANs), and introducing the concept of context anxiety, which results on an expensive run, however a codebase less prompt to bugs (although not free). To get an understanding of cost, you can see below the table of running the prompt solo vs using the harness system described on the article. https://preview.redd.it/14ko9se5yrrg1.png?width=1038&format=png&auto=webp&s=5ba1ea533bd71bd67a126cd4b516d63e76380d7b I am now trying to generate a similar agentic system than the one described on the article, but adding some improvements, by addressing context management and leveraging the Generative Adversarial Networks (GANs) during design and implementation, and augmenting functionality, so it can generate the system from a more detailed high level functional specs, instead of short prompts so it can generate a more useful system after spending so many tokens. The system is not ready yet but I might share on GitHub if I get anywhere half decent. In conclusion, when I started working with AI I was so excited that I didn't realized of the level of hallucination AI has, then I started spending days and weeks fixing bugs on code, then I realized that bugs would never stop while realizing that all apps I was developing were only useful to gain experience, but other people with lots more AI understanding and experience and organizations investing on AI implementation can and will surpass any app I'll ever create, which is a bit demoralizing, but I still stick with it as I still can use it to build some personal projects and would keep me professionally relevant (I hope). Finally, I ended up on a state of feeling about AI where I realized that AI full power is yet to come and what we can see today is a really good picture of the capabilities AI will be able to provide, as AI companies are working hard to harness the silent failures and lazy fall back currently introduced during design and implementation. Has anybody experienced similar phases with AI learning curve? PS: This post has not been generated by AI, as it seems it is heavily punished by people, and it seems that auto moderators block post automatically when AI is detected, hopefully this one is not blocked. I apologize if grammar or spelling is not correct, or structure is not clear, but I hope this post does not get blocked and punished by other people for being AI generated because it is not. Credit to Prithvi Rajasekaran for writing the interesting article about Harness design for long-running application development. -> https://www.anthropic.com/engineering/harness-design-long-running-apps Happy Saturday everyone. submitted by /u/amragl [link] [comments]
View original[R] Controlled experiment: giving an LLM agent access to CS papers during automated hyperparameter search improves results by 3.2%
Ran a controlled experiment measuring whether LLM coding agents benefit from access to research literature during automated experimentation. Setup: Two identical runs using Karpathy's autoresearch framework. Claude Code agent optimizing a ~7M param GPT-2 on TinyStories. M4 Pro, 100 experiments each, same seed config. Only variable — one agent had access to an MCP server that does full-text search over 2M+ CS papers and returns synthesized methods with citations. Results: Without papers With papers Experiments run 100 100 Papers considered 0 520 Papers cited 0 100 Techniques tried standard 25 paper-sourced Best improvement 3.67% 4.05% 2hr val_bpb 0.4624 0.4475 Gap was 3.2% and still widening at the 2-hour mark. Techniques the paper-augmented agent found: AdaGC — adaptive gradient clipping (Feb 2025) sqrt batch scaling rule (June 2022) REX learning rate schedule WSD cooldown scheduling What didn't work: DyT (Dynamic Tanh) — incompatible with architecture SeeDNorm — same issue Several paper techniques were tried and reverted after failing to improve metrics Key observation: Both agents attempted halving the batch size. Without literature access, the agent didn't adjust the learning rate — the run diverged. With access, it retrieved the sqrt scaling rule, applied it correctly on first attempt, then successfully halved again to 16K. Interpretation: The agent without papers was limited to techniques already encoded in its weights — essentially the "standard ML playbook." The paper-augmented agent accessed techniques published after its training cutoff (AdaGC, Feb 2025) and surfaced techniques it may have seen during training but didn't retrieve unprompted (sqrt scaling rule, 2022). This was deliberately tested on TinyStories — arguably the most well-explored small-scale setting in ML — to make the comparison harder. The effect would likely be larger on less-explored problems. Limitations: Single run per condition. The model is tiny (7M params). Some of the improvement may come from the agent spending more time reasoning about each technique rather than the paper content itself. More controlled ablations needed. I built the paper search MCP server (Paper Lantern) for this experiment. Free to try: https://code.paperlantern.ai Full writeup with methodology, all 15 paper citations, and appendices: https://www.paperlantern.ai/blog/auto-research-case-study Would be curious to see this replicated at larger scale or on different domains. submitted by /u/kalpitdixit [link] [comments]
View originalRAG is a trap for Claude Code. I built a DAG-based context compiler that cut my Opus token usage by 12x.
Hey everyone, If you’ve been using the new Claude Code CLI or building agents with Sonnet 3.5 / Opus on mid-to-large codebases, you’ve probably noticed a frustrating pattern. You tell Claude: "Implement a bookmark reordering feature in app/UseCases/ReorderBookmarks.ts." What happens next? Claude starts using its grep and find tools, exploring the codebase, trying to guess your architectural patterns. Or worse, if you use a standard RAG (Retrieval-Augmented Generation) MCP tool, it searches your docs for keywords like "bookmark" and completely misses the abstract architectural rules like "UseCases must not contain business logic" or "Use First-Class Collections". Because of this Semantic Gap, Claude hallucinates the architecture, writes a massive transaction script, and burns massive amounts of tokens just exploring your repo. I got tired of paying for Claude to "guess" my team's rules, so I built Aegis. Aegis is an MCP server, but it's not a search engine. It’s a deterministic Context Compiler. Instead of relying on fuzzy vector math (RAG), Aegis uses a Directed Acyclic Graph (DAG) backed by SQLite to map file paths directly to your architecture Markdown files. How it works with Claude: Claude plans to edit app/UseCases/Reorder.ts and calls the aegis_compile_context tool. Aegis deterministically maps this path to usecase_guidelines.md. Aegis traverses the DAG: "Oh, usecase_guidelines.md depends on entity_guidelines.md." It compiles these specific documents and feeds them back to Claude instantly. No guessing, no grepping. The Results (Benchmarked with Claude Opus on a Laravel project with 140+ UseCases): • Without Aegis: Claude grepped 30+ files, called tools 55 times, and burned 65.4k tokens just exploring the codebase to figure out how a UseCase should look. Response time: 2m 32s. • With Aegis: Claude was instantly fed the compiled architectural rules via MCP. Tool calls: 6. Output tokens: 1.8k. Response time: 43s. That's a 12x reduction in token waste and a 3.5x speedup. More importantly, the generated code actually respected our architectural decisions (ADRs) because Claude was forced to read them first. It runs 100% locally. If you want to stop hand-holding Claude through your architecture and save on API costs, give it a try. GitHub: https://github.com/fuwasegu/aegis I'd love to hear your thoughts or feedback! Has anyone else felt the pain of RAG when trying to enforce strict architecture with Claude? submitted by /u/fuwasegu [link] [comments]
View originalSharing some of my personal trade secrets for augmenting claude code
I'm tired of introducing myself and trying to demonstrate my projects Here's my secrets to coding success (opinionate as you will, this is what I do) 1: OPINIONATED STACK I tell my agents use an opinionated stack, pointing them to: https://anentrypoint.github.io/fast-stack/ I have my reasons for doing that, it saves me time and money 2: JIT EXECUTION I use just-in-time code execution, with special a workaround to reduce code encapsulation strain on the agant, to accomplish this I parse bash statements pre-tool-use and convert them to cli statements, that executes it in my own agent-optimized lifecycle managed called gm-exec my code execution workaround 3: 1-SHOT OVERVIEWS I provide additional (compact) context when the user prompts, using another tool I maintain called mcp-thorns, exposing many caveats in the code tree early on, this is built using tree-sitter, and it is a one-shot string that describes the codebase in a compacted way mcp-thorns delivering codebase analysis on first prompt 4: SEMANTIC SEARCH I providee a simple, local, semantic codebase search with another tool called codebasesearch, it provides a nice compact vector based search 5: CLOSED LOOP TESTING Using JIT-execution (mentioned above) I then get it to test its ideas without editing the codebase, and then when its finished validate them client and server side with further code execution 6: CONTEXT REDUCTION I don't install tools that I don't need, the agent gets a tested skill tree that explicitly tells it to follow the neccesary steps that I personally find to work across all the projects I maintain, when it comes to prompt/skill matintenance I don't ever let the agent know what my gripes are, I only let it know once I've come up with a novel solution thats worth trying. My workflow for constructing skills is allowing agents to initally transpose coding philosophies from preferred codebases with the properties that we need to employ, using frontier models to iterate on hyperparameter benchmarking projects like WFGY for ideas and then iterating on what it came up with in real programming scenarios, sharpening the tools we originally created on provider front pages. 7: REPO-REDUCTION (the only alternative to context reduction) To get better overall agent output, I deduplicate all concerns, that includes cross-domain duplications such as unit tests (replaced with agentic closed loop testing mentioned above), comments, documentation or specs. I maintain my agentic research project (and coding daily driver) using coding tools and mostly claude code and claude. https://github.com/AnEntrypoint/plugforge is the factory repo and it builds out to 10 other repos right now, It's claude code output is the one that receives the most testing, which filters down to other platforms primarily opencode and kilo cli submitted by /u/moonshinemclanmower [link] [comments]
View originalI maintain an open-source library of 181 agent skills. I would like to get your critism and opinion what is missing
Hey everyone 👋 The beauty of open source is that the best ideas come from users, not maintainers. I have been heads-down building for months — now I want to come up for air and hear what the community actually needs. I'm Reza (A regular CTO) — I maintain claude-skills, an open-source collection of 181 agent skills, 250 Python tools, and 15 agent personas that work across 11 different AI coding tools (Claude Code, Cursor, Windsurf, Codex, Gemini CLI, Aider, Kilo Code, OpenCode, Augment, Antigravity, and OpenClaw). I think about extend the skills also for replit and vercel. The link to the repo: https://github.com/alirezarezvani/claude-skills In the last two weeks, the repo went from ~1,600 stars to 4,300+. Traffic exploded — 20,000 views/day, 1,200 unique cloners daily. I am really surprised from the attention the repo gets. :) And very happy and proud btw. But I am not here to flex numbers. I am here because I think I am approaching skills wrong as a community, and I want to hear what you think. The Problem I Keep Seeing Most skill repos (including mine, initially) treat skills as isolated things. Need copywriting? Here is a skill. Need code review? Here is another. Pick and choose. But that is not how real work happens. Real work is: "I'm a solo founder building a SaaS company. I need someone who thinks like a CTO, writes copy like a marketer, and ships like a senior engineer — and they need to work together." No single skill handles that. You need an agent with a persona that knows which skills to reach for, when to hand off, and how to maintain context across a workflow. What I am Building Next Persona-based agents — not just "use this skill," but "here's your Startup CTO agent who has architecture, cost estimation, and security skills pre-loaded, and thinks like a pragmatic technical co-founder." - A different approach than agency-agents Composable workflows — multi-agent sequences like "MVP in 4 Weeks" where a CTO agent plans, a dev agent builds, and a growth agent launches. Eval pipeline — we're integrating promptfoo so every skill gets regression-tested. When you install a skill, you know it actually works — not just that someone wrote a nice markdown file. True multi-tool support — one ./scripts/install.sh --tool cursor and all 181 skills convert to your tool's format. Already works for 7 tools. What I Want From You I am asking — not farming engagement: Do you use agent skills at all? If yes, what tool? Claude Code? Cursor? Something else? What is missing? What skill have you wished existed but could not find? What domain is underserved? Personas vs skills — does the agent approach resonate? Would you rather pick individual skills, or load a pre-configured "Growth Marketer" agent that knows what to do? Do you care about quality guarantees? If a skill came with eval results showing it actually improves output quality, would that change your decision to use it? What tool integrations matter most? We support 11 tools but I want to know which ones people actually use day-to-day. Drop a comment, roast the approach, suggest something wild. I am listening. Thx - Reza submitted by /u/nginity [link] [comments]
View originalPricing found: $20 /month, $60 /month, $200 /month, $20, $60
Key features include: Implement, Review, Plan, then execute, Remember what matters, Prompts, enhanced, Commit history, Codebase patterns, External sources.
Based on user reviews and social mentions, the most common pain points are: token usage, token cost, API costs.
Based on 17 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.