Your engineers have agents. Your organization doesn't. Cosmos is the platform that closes the gap.
Users generally appreciate Augment Code for its effective integration with AI programming tools, like Claude, enhancing coding capabilities and facilitating complex projects. However, there are concerns about occasional bugs and the software not fully meeting user expectations in execution consistency. Pricing sentiment seems neutral, as there are few mentions or complaints about costs in reviews. Overall, Augment Code has a solid reputation for being a powerful tool, especially in data analysis and AI reasoning frameworks, but it needs more refinement to address performance issues reported by some users.
Mentions (30d)
12
Reviews
0
Platforms
2
Sentiment
20%
7 positive
Users generally appreciate Augment Code for its effective integration with AI programming tools, like Claude, enhancing coding capabilities and facilitating complex projects. However, there are concerns about occasional bugs and the software not fully meeting user expectations in execution consistency. Pricing sentiment seems neutral, as there are few mentions or complaints about costs in reviews. Overall, Augment Code has a solid reputation for being a powerful tool, especially in data analysis and AI reasoning frameworks, but it needs more refinement to address performance issues reported by some users.
Features
Use Cases
Industry
information technology & services
Employees
75
Funding Stage
Venture (Round not Specified)
Total Funding
$252.0M
Pricing found: $20 /month, $60 /month, $200 /month, $20, $60
If you've ever wondered how rigorous data analysis+social science research can look with AI, I've finally launched a nice website for my open-source Claude Code researcher's toolkit: the Data Analyst Augmentation Framework! Equal parts interactive explainer on agentic orchestration + free tool
submitted by /u/brhkim [link] [comments]
View originalIf you've ever wondered how rigorous data analysis+social science research can look with AI, I've finally launched a nice website for my open-source Claude Code researcher's toolkit: the Data Analyst Augmentation Framework! Equal parts interactive explainer on agentic orchestration + free tool
submitted by /u/brhkim [link] [comments]
View originalClaude Got Gaslit by a Discord Bot
Lol submitted by /u/Comprehensive-Bet-83 [link] [comments]
View originalClaude Full Stack 2.0 – 80+ Production-Grade Claude Skills
Hey r/ClaudeAI Over the past few weeks I’ve turned my experiments with Claude into something much more ambitious: Claude Full Stack 2.0 — a structured, production-oriented collection of AI engineering skills and end-to-end workflows. Instead of treating AI as a fancy chatbot, this repository turns Claude into a real AI-augmented software engineering operating system that can help you go from idea all the way to production. What’s inside: 80+ skills organized into: Technology-agnostic architecture decision domains (skills/architecture/) Ecosystem-specific implementations (skills/implementations/) — Spring Boot, FastAPI, Node.js, React, Flutter, Postgres, Kubernetes, AWS, Terraform, GitHub Actions, etc. Strong focus on DevOps, SRE, observability, security, and production readiness Clean standards, architecture patterns, quality gates, and consistent documentation Now available as an installable Claude Code plugin Useful For: Founders building MVPs Developers & indie hackers The entire repo is open source under MIT license. Contributions and feedback are very welcome! Repository: claude-full-stack-2.0 submitted by /u/Past-Pirate3335 [link] [comments]
View originalWhere I'm at with AI Assisted Building + Current and Future Workflow Overview
I've been in an AI dive bomb for probably a couple of years now. The early days... when models couldn't be trusted for more than 5% of the code you wrote. Over the last 2 years that's evolved so quickly that I now write nearly 0% of my code by hand, on personal projects and at work. I've used all kinds of tools in that time too. OpenCode, Zed, Claude Code, Codex, Cursor, Windsurf, OpenCLAW, Lovable... and probably a bunch more I can't recall in the haze that's been AI ADHD for me. Over that time, I started with just copy-pasting code between ChatGPT's interface and my IDE almost like a slightly faster Stack Overflow search. Then that somewhat evolved with Cursor quite a bit. I sort of went from prompt engineering to something closer to a human relay pattern. Then, with Plan Mode becoming a thing, I think I naturally gravitated more towards planning everything because planning felt so cheap. Originally, I used to think that architectural discussion and planning was something that was reserved for larger features, but with expediting my ability to do research, orient myself within a codebase, and know what tools I have to reach for doing technical specifications for everything felt reasonable. From the human relay pattern, I started evolving into more autonomy, especially when Claude Code came out earlier last year. Between the combination of Cursor and Claude Code, starting to get orchestration, starting to use skills more heavily, starting to create actual agent personas that could replace some of my common prompt chains it was around then that I kinda started going all in on true context engineering, utilizing sub-agents optimizing cache reads, and it's probably when many of my first (I call it) sophisticated commands were born. All of this converged pretty rapidly in November of 2025 with the release of what was probably the biggest step increase for AI as far as code quality went with Opus 4.5 and Codex 5.3. The Codex app and Codex CLI were quickly growing. Claude Code was improving at a breakneck pace, introducing all kinds of new ways to introduce deterministic gates within the autonomy of the harness. Fast forward to today, I have a pretty sophisticated workflow with a combination of agents that do everything within the SDLC, commands for almost every type of entry point for work, and skills for just about everything I could possibly do in my day-to-day the workflow with some of the latest tools is able to run quite autonomously overnight do large feature implementations, minimally supervised while producing production-worthy code quality It somewhat reached a point I realized, probably a month and a half ago or so where I needed to figure out a way to remove myself even more from the loop without jeopardizing the determinism that I bring to what is effectively a probabilistic LLM. The models are exceptional, and they seem to have a massive step increase each release, but continuous execution, strict instruction rigor, and preventing hallucinations is still very much difficult to achieve. That's predominantly what I've been doing. I've effectively offloaded a lot of thinking to the agents and LLMs that I use, but none of the understanding. I've asked myself, "How do I maintain that understanding, though maintain the determinism from my steering, without actually physically being there to steer?" This was essential, and I realized or had a bit of an aha moment, just like how I manage teams of engineers that are working on numerous projects, most of which I can never really go too deeply on even though they do most of the thinking, most of the building, and even most of the implementation planning, I was still there, very close to the architecture. I could speak to enough breadth and enough depth to keep us out of trouble and keep things moving I kind of started thinking more about what the shape of me was within the agentic harness and how I could replicate that. More on what I landed on a little bit later. My Setup and How I Work Today To start, I'll probably just talk a little bit about my current working setup. I am predominantly in the terminal now a days using Claude Code. Claude Code orchestrates both the Claude models, of course, and I use it to orchestrate Codex through a series of run books, skills, and commands that I have set up on several hooks so that Codex, when it gets dispatched, also has access to the same skills and agent personas Claude does. I use Ghostty as my terminal of choice and use the IDE integration in claude code pretty heavily to review Markdown or HTML files in my IDE. I also use it to review code snippets and diff reviews, although lately I find myself only really looking at the code nowadays once it's hit a merge request. Some of my adjacent tools are Wispr Flow for faster steering, since I can speak a lot faster than I can type and then I use quite a few MCPs and tools to improve my token usage, but the big ones are I have a custom doc maintenance suite of
View originalArkon: turning Claude from a personal chatbot into a managed organizational resource
Sharing a project I've been building. Not asking for anything in particular - just thought the problem and approach might be interesting to some folks here. The problem Most companies adopting LLMs hit the same wall: every employee uses ChatGPT or Claude individually, copy-pastes confidential docs into random chats, and the org has zero visibility or control. The "AI rollout" is really just a license purchase plus a prayer. On the other end, the heavy enterprise solutions (custom RAG platforms, Glean-style tools) are expensive, complex, and overkill for most mid-sized teams. There's a missing middle: small-to-medium organizations that want their employees to use Claude productively, but with proper access control, shared knowledge, and no manual context-pasting every single time. The approach Arkon sits between the org and Claude. Admins manage knowledge centrally. Employees connect to Arkon via MCP (Model Context Protocol) and automatically get the right context for who they are, without configuring anything. Two realms: Global Knowledge - org-wide docs and wiki, scoped by department. A finance person sees finance docs, an engineer sees engineering docs. Admins decide who sees what. Workspaces - smaller scopes for projects, teams, or cross-functional initiatives. Membership-gated. Your global role doesn't bleed into workspaces - you only see workspaces you're a member of. The MCP integration means employees keep using Claude the way they already do (Claude Desktop, Claude Code, whatever client they prefer). They don't learn a new tool. They just suddenly have org context available when they need it. How wiki generation actually works This is the part I think is interesting and slightly different from typical RAG setups. Arkon isn't a retrieval-augmented chatbot. It's an LLM-generated wiki layer. When you upload a document - say a 300-page handbook - Arkon uses an LLM to analyze the structure and produce a hierarchical wiki. If the source has clear headings, the wiki follows them. If not, the LLM clusters content by topic semantically. The output is a browsable, organized internal reference, not a linear summary. I'm honest with users about the tradeoff: LLM-generated content has no guarantee of accuracy, especially for deep domain material. So there's a human-in-the-loop layer in the roadmap - employees can flag, annotate, and edit wiki content. The LLM does the organizational heavy lifting; humans own final correctness. Permissioning lessons learned The biggest design pivot so far: I initially had roles carry both what you can do and what you can do it on in one bag. This led to a classic bug - give a user "read documents" and suddenly they could read every document in the org, ignoring department scope. Fixed it by splitting cleanly: Permissions are scoped strings: doc:read:own_dept vs doc:read:all Workspaces are pure membership checks - global roles cannot grant workspace access, ever Two realms, fully independent If anyone is building org-level permission systems, that separation is worth getting right early. Retrofitting it is painful. Repo: github.com/nduckmink/arkon Happy to answer questions about architecture, MCP integration, or the permission model. Feedback and criticism welcome - especially from anyone who has built or used internal knowledge systems and seen what works and what doesn't. submitted by /u/Glass-Statistician97 [link] [comments]
View originalWhat's new in CC 2.1.124 (+166 tokens) and CC 2.1.126 (-87 tokens)
NEW: System Reminder: File modification detected (budget exceeded) — Tells the agent when a user or linter changed a file but the diff was omitted because other modified files already exceeded the snippet budget, and directs it to read the file if current content is needed. System Prompt: Harness instructions — Replaces the core-identity function call with explicit introductory-line and security-note insertion points before the shared harness instructions. System Prompt: REPL tool usage and scripting conventions — Clarifies that thenable shorthand results are auto-awaited only at return time, so inline uses such as concatenation, templates, or arguments to another call must be awaited first. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.124 REMOVED: System Reminder: Malware analysis after Read tool call — Removed the reminder that asked agents to consider whether each file read is malware and to analyze malware without improving or augmenting it. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.126 submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalReleasing the Data Analyst Augmentation Framework (DAAF) version 2.1.0 today -- still fully free and open source! In my very biased opinion: DAAF is now finally the best, safest, AND easiest way to get started using Claude Code for responsible and rigorous data analysis
https://preview.redd.it/o74lppqd86zg1.png?width=1456&format=png&auto=webp&s=3a904bae42b8130e2c6382be55debe8f6ef4d6ca When I launched the Data Analyst Augmentation Framework v2.0.0 six weeks ago, I wrote that the major update was about going “from usable to useful” -- rebuilding the orchestrator system for maximum flexibility and efficiency, adding a variety of more responsive engagement modes, and deepening the roster of methodological knowledge that DAAF could pull upon as needed for causal inference, geospatial analysis, science communication and data visualization, supervised and unsupervised machine learning, and much, much more. But while DAAF continued to get more capable and more useful for those actually using it… Well, it was still extremely annoying to use, generally obtuse, and hard to get started with, which means a lot of people who were interested were simply bouncing off of it. That all changes with the v2.1.0 update, which I’m cheekily calling the Frictionless Update for three key reasons: 1. Installation happens in one line now From a fresh computer to talking with a DAAF-empowered Claude Code in no more than ten minutes on a decent internet connection. This is really it: https://preview.redd.it/tiglwl3f86zg1.png?width=1038&format=png&auto=webp&s=3ec92cf797af5e0b91a2d46ef8cfb2976cbff802 Which means it’s easier than ever to get started with Claude Code and DAAF in a highly curated, secure environment. To that point, you still need Docker Desktop installed (I’ll talk about that more in a sec), but no more faffing about with a bunch of ZIP file downloads and commands in the terminal. The simplicity of this is even crazier, given that… 2. DAAF now comes bundled with everything you need to make it your main AI-empowered research environment No more messing around with external programs, installations, extensions, etc., it just works from the get-go with everything you need to thrive in your new AI-empowered research workflows with Claude from the moment you run the install line. https://preview.redd.it/q3pdj36g86zg1.png?width=1456&format=png&auto=webp&s=56ed822da68e773a9b7253ce6aa5a95abc057788 Thanks to code-server, DAAF automatically installs a fully-featured version of VSCode in the container, accessible in your favorite browser: file editing, version control management, file uploads and downloads, markdown document previews, smart code editing and formatting, the works. Reviewing and editing whatever you work on with DAAF has never been easier. DAAF also now comes with an in-depth and interactive session log browser that tracks everything Claude Code does every step of the way. See its thinking, what files it loads and references, which subagents it runs, and look through any code its written, read, or edited across any project/session/etc. Full auditability and transparency is absolutely mission-critical when using AI for any research work so you can truly verify everything its doing on your behalf and form a much more refined and critical intuition for how it works (and how/when/why it fails!). Some of the most important failure modes I’ve discovered with AI assistants (DAAF included) is it simply doesn’t load the proper reference materials or follow workflow instructions; this is the single most important diagnostic tool to identify and fight said issues, which I frankly think everyone should be doing in any context with LLM assistants. This took a lot of elbow-grease, but I think it’s the single most important thing I could do to help people actually understand what the heck Claude Code gets up to and review its work more thoroughly. https://preview.redd.it/jkocy45h86zg1.png?width=1456&format=png&auto=webp&s=6848b5a01ef958fa051a3246a1e6b13beef91e80 These two big new bundled features are in addition to installing Claude Code, the entire DAAF orchestration system, bespoke references to facilitate Claude’s rigorous application of pretty much every major statistical methodology you’ll need, deep-dive data documentation for 40+ datasets from the Urban Institute Education Data Portal, curated Claude permissioning systems and security defenses, automatic context and memory management protocols designed for reproducible research workflows, and a high-performance and fully reproducible Python data science/analysis environment that just works -- no need to worry about dependencies, system version conflicts, or package management hell. https://preview.redd.it/wzaotr5i86zg1.png?width=1456&format=png&auto=webp&s=91390402dfe3666a90472f6e878364ddcd1fb740 With the magic of Docker, everything above happens instantly and with zero effort in one line of code from your terminal. And perhaps most importantly (and why I will keep dying on the hill of trying to get people to use Docker): setting up DAAF and Claude Code in this Docker environment offers critical guardrails (like firewalling off its file access to only those things you explicitly allow) and security (like creating a convenient sy
View originalhalf-deployed AI projects haunt my github
Got 47 repos that start with 'just playing with Claude' or 'testing Llama 4 on'. Every single one dead after three commits. Like you get this spark, right? Midnight scrolling leads to some random implementation of retrieval-augmented generation for your personal notes. Brain goes full steam. You're already planning the deployment pipeline while pip installing transformers. Then day two hits. The model's hallucinating your grocery lists into poetry (weirdly beautiful but useless). Your GPU's crying. And suddenly you remember you have actual work that pays actual money. But here's the thing that gets me. These aren't just abandoned experiments, they're digital ghosts of pure optimism. Each one represents that exact moment when everything seemed possible, when you thought you'd crack the code this time, when the future felt close enough to touch. Now I scroll past them looking for that one functional script I actually need. Graveyard of good intentions, all named some variation of 'ai-helper-v2-final-actually-final'. Anyone else got a git log that reads like a museum of broken dreams? submitted by /u/NefariousnessLow9273 [link] [comments]
View originalRecommended Plugins/Tooling/Tips for managing Ansible ( Code Base Hygiene/Documentation Management/Workflow) via Claude?
I'm a Linux Sysadmin rather than a Dev, and I have recently discovered how much Claude has levelled up recently, and can see many different ways it can not just augment code writing and debugging but also with workflow optimisation and admin toil. I work mainly in Ansible for automation, and have one primary git repo for my codebase at work, we're a relatively small team/environment. I work in quite a toil heavy, reactive environment and have had a creeping documentation backlog for the last few months, but basically how I'm planning to use Claude is to: Analyse my code base, track down inconsistencies, errors, flag potential security risks Also hook into my AWX server's API and other APIs to information gather on the setup there. (both the above will then form the basis of a scripted weekly Team code hygiene report). Read my existing documentation to get an idea on document template structure, formatting and my writing style. Whilst it is doing all the above maintaining ongoing tracking and recording of pertinent reference information on coding style and standards, in-use conventions and code structures cross referenced with information in the Docs to build a cohesive technical understanding of my code base. Leverage this to draft process documents, fed back into Claude to further clarify and improve it's understand (for values of LLM) of As I am working with it on new projects and actively discussing design choices, this context can be further used in fresh documentation, with any changes in process or standard config then backported to other common areas of code and documentation to ensure everything I have a coherent whole at both technical and documentation level. 7, Further branch out my documentation into Standards and Processes, training materials, reference guides for Dev Teams and other stakeholders, quick reference materials, you name it. It's light years ahead of Copilot/ChatGPT in terms of both depth of both technical comprehension for troubleshooting and debugging in and out of code (again for values of LLM), but I'm actually even more excited about it's potential as workflow optimisation tool. This is not only going to help dig me out of my current toil backlog but fill in the hole and concrete over it afterwards. I've been optimising my setup to be token efficient already and have have already created a number of dynamically loading custom skills such as a coding-mode that loads all my technical conventions, coding best practices and structure templates, a doc-mode that loads comprehension within the scope of documentation writing, and other skills for updating files containing Claude's tracking of any changes, and another for triggering consistency checks across multiple documents. I am however relatively unfamiliar with the wealth of 3rd party plugins and other tooling to augment Claude, so my question is - can anybody make any recommendations for any extra tooling or features out there that I might use to further leverage or optimise what I'm trying to achieve here, or otherwise offer any useful tips or suggestions I may not be aware of, before I go reinventing any wheels too much? Thanks in advance! submitted by /u/motorleagueuk-prod [link] [comments]
View originalHow to give Claude Code 'Cursor AI' goggles
Recently used Cursor AI (free tier for 3 free queries a month) to resolve an issue in 10 mins that Claude Code Opus could not resolve in 2 hours. Simple reason was that Cursor quickly got a grasp on meaningful end to end parity relationships between my entire codebase and quickly hunted down the culprit. I was impressed and then I had questions. Cursor charges almost the SAME sub cost $ as Claude code yet it is NOT an LLM. Its a bunch of powerful proprietary toolsets designed to make your LLM "see" your code correctly. Cursor is a "holistic" augmented IDE that uses real-time indexing and background linting to assist your active coding flow, blah blah blah. Claude Code on the other hand is a top-down autonomous agent that plans and executes sequentially. They both do the same 'sort' of thing but try to get to similiar results very differently. Disclaimer - by the way CC is way more useful and powerful overall lets not kid outselves. Being the 'resourceful' person I like to pretend I always am I tried to approximate this type of capability in Claude Code. Heres what I got below. PS I used AI to format this table and content below so dont drag me over the coals MCP Server Functional Benefit Cursor AI Equivalent mcp-code-search Semantic Index: Maps the "meaning" of your code so you can search for concepts (e.g., "how we handle phase") rather than just exact text. u/Codebase / Semantic Search lsp (via clangd) Symbolic Map: Understands the "laws" of C++. It traces ripples, finds every reference of a function, and jumps to definitions with 100% precision. "Go to Definition" / Symbol Indexing mcp-memory Persistent Brain: Remembers architectural decisions and project rules across different days and sessions so I don't have to "re-learn" your project. (Cursor lacks persistent memory) filesystem Direct Access: Gives me high-speed read/write access to your local project folders without me having to "ask" for file contents repeatedly. Integrated Explorer sequential-thinking Logic Scratchpad: Allows me to break down complex bugs (like your IPC state-machine issues) into steps before I touch a single line of code. "Advanced Reasoning" mode I used Opus to run some comparison tests and apparently i am like at 70- 80% functional parity with Cursor AI although thats hard to actually quantify. I also ask it stuff at the conclusion of my conversation like 'how much longer would this have taken you without the so and so MCPs Cursor AI powers you've now got? and mostly very positive 'reviews' from claude code and comparitive proof (which are really just estimations I know!) Few more notes ------------------- -use Claude Code itself to install\ configure these MCPS yourself Youll save yourself a lot of stuffing around TRUST ME! -Use a Post-Edit Re-index Hook to keep your data fresh (avoids having to remember to reindex your codebase manually every new session) -update your claude.md file to prioritise your nav tools so that it can take advantage of your newly added search tools (example only text below) Navigation: LSP first, then MCP (`juce-docs`, `memory`, `code-search`), then Grep/Glob as fallback. What I have personally noticed in 4 weeks of use? -------------------------------------------- Lets me preface by saying I know my codebase and I've got a good grasp on what is considered implementation 'success' for MY project and what baseline methods I used to help CC get me there as accurately and fast as possible for the last 6 months. What have I noticed now? Snappier more contextual processing\ graph based searching of my codebase (no blind grepping it actually 'walks the graph' not just a keyword search, jumps to relevant files rather than scanning my whole repo every time) , better ripple edits (less guessing + quickly detects cross file impact) , better total hit rates, more tailored targetted responses, + just piece of mind that I've got that 'extended' type of capability when and if helpful. Im sure at least some of this is placebo but if I trust Opus to help me write entire applications then I should technically also be taking it at face value when its outright telling me that these tools have proven measurably useful in getting faster more accurate results at the end of the session. Anyway thought to post here in case someone else was interested in giving it a go and seeing what mileage they may get out of it. Peace..... submitted by /u/ThesisWarrior [link] [comments]
View originalWhat's new in CC 2.1.124 (+166 tokens) and 2.1.126 (-87 tokens) system prompt
NEW: System Reminder: File modification detected (budget exceeded) — Tells the agent when a user or linter changed a file but the diff was omitted because other modified files already exceeded the snippet budget, and directs it to read the file if current content is needed. System Prompt: Harness instructions — Replaces the core-identity function call with explicit introductory-line and security-note insertion points before the shared harness instructions. System Prompt: REPL tool usage and scripting conventions — Clarifies that thenable shorthand results are auto-awaited only at return time, so inline uses such as concatenation, templates, or arguments to another call must be awaited first. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.124 REMOVED: System Reminder: Malware analysis after Read tool call — Removed the reminder that asked agents to consider whether each file read is malware and to analyze malware without improving or augmenting it. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.126 submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalCodebase-scale retrieval using AST-derived graphs + BM25 — reducing LLM context from 100K to 5K tokens [D]
Wanted to share an approach I've been using for retrieval-augmented generation over large codebases and get feedback from people thinking about similar problems. The problem Naive codebase RAG typically works by chunking files into text segments and embedding them for similarity search. This breaks down on code because semantic similarity at the chunk level doesn't capture structural relationships — a function in file A calling a type defined in file C won't surface that dependency through embedding proximity alone. The approach: AST-derived typed graphs Instead of chunking, I parse every file using Tree-sitter into its AST, then extract a typed node/edge graph: Nodes: functions, classes, interfaces, types, modules Edges: imports, exports, call relationships, inheritance, composition This gets stored in SQLite as a persistent graph. Parse cost is one-time per project. Retrieval: BM25 over graph nodes At query time, instead of embedding similarity, I run BM25 scoring over node metadata (names, signatures, docstrings, file paths). Top-scoring nodes get passed to the LLM. The graph structure means a retrieved function automatically pulls in its direct dependencies via edge traversal. Empirically this lands at ~5K tokens per query on medium-large codebases that would otherwise require ~100K tokens with naive full-context approaches. Hierarchical fallback for complex queries For multi-file reasoning tasks: A Mermaid diagram of the full graph serves as a persistent architectural map always in context BM25 node retrieval handles targeted lookup At 70% context capacity, a fast model compresses least-relevant nodes before passing to the primary model Why BM25 over embeddings here Code identifiers (function names, type names, module paths) are highly distinctive lexically. BM25 outperforms embedding similarity on exact and near-exact identifier matching, which is the dominant retrieval pattern in code queries. Embeddings would likely help more for natural language docstring queries — haven't benchmarked that comparison rigorously yet. Open questions I'm still thinking about: Better edge-weighting strategies for the graph — currently all edges are unweighted Whether re-ranking with a cross-encoder would meaningfully improve precision over BM25 alone Handling dynamic languages where call graphs can't be fully resolved statically Has anyone tackled codebase-scale RAG differently? Particularly curious if anyone's compared AST-graph approaches against embedding-based chunk retrieval on real codebases with quantitative benchmarks. submitted by /u/Altruistic_Night_327 [link] [comments]
View originalTaught my 60-year-old dad (zero coding exp) Claude and Git in Feb. Today he built a RAG solution. I finally get "vibe coding."
My father teaches geology and has literally zero coding expertise. Back in February, I introduced him to Claude and taught him the absolute basics of how Git works. Fast forward to today: he actually implemented a functional RAG (Retrieval-Augmented Generation) solution for analyzing and querying his mineral documents. Seeing this happen made me finally understand why "vibe coding" has become such a thing. Don't get me wrong, I know a proper end-to-end solution engineer or architect is still leagues ahead of someone just prompting an AI. But it is surprisingly impressive how Claude Code can take a 60-year-old with absolutely zero experience and elevate him to the level of an average developer. submitted by /u/Longjumping-Host-617 [link] [comments]
View originalOpen-source 9-task benchmark for coding-agent retrieval augmentation. Per-task deltas +0.010 to +0.320, all evals reproducible [P]
Sharing an open-source benchmark suite (paper-lantern-challenges) that measures coding-agent performance with vs without retrieval-augmented technique selection across 9 everyday software tasks. Disclosure: I'm the author of the retrieval system under test (paperlantern.ai/code); the artifact being shared here is the benchmark suite itself, not the product. Every prompt, agent code path, and prediction file is in the repo and reproducible. Setup. Same coding agent (Claude Opus 4.6 as the planner, Gemini Flash 3 as the task model), same input data, same evaluation scripts across all 9 tasks: test generation (mutation score), text-to-SQL (execution accuracy), PDF extraction, contract extraction, PR review, text classification, few-shot prompt selection, LLM routing, summarization evaluation. Independent variable: whether the agent could call a retrieval tool over CS literature before writing its solution. One pass per task, no retries, no manual filtering of outputs. Task selection. Tasks were chosen to span the everyday-engineering surface a coding agent actually faces, not specialized ML scenarios. Selection criteria: (1) unambiguous quantitative metric, (2) baseline performance well below ceiling, (3) standard datasets where they exist, (4) eval reproducible on a free Gemini API key in roughly 10 minutes per task. Eval methodology. Each task uses its task-standard quantitative metric (mutation score for test_generation, execution accuracy for text_to_sql, F1 on labeled spans for the extraction tasks, weighted F1 for classification, etc.). Full per-task scripts and dataset choices are in the repo - one directory per task, evaluate.py as the entry point, README.md per task documenting methodology and dataset. Retrieval setup. The "with retrieval" agent has access to three tool calls: explore_approaches(problem) returns ranked candidate techniques from the literature, deep_dive(technique) returns implementation steps and known failure modes for a chosen technique, compare_approaches(candidates) is for side-by-side when multiple options look viable. The agent decides when and how often to call them. Latency is roughly 20s per call; results cache across sessions. The baseline agent has none of these tools, otherwise identical scaffolding. Comparability. Both agents share the same task-specific user prompt; the only system-prompt difference is the retrieval agent's tool-call grammar. Predictions and per-task prompts are diffable in the repo (baseline/ and with_pl/ subdirectories per task). Results. Task Baseline With retrieval Delta extraction_contracts 0.444 0.764 +0.320 extraction_schemas 0.318 0.572 +0.254 test_generation 0.625 0.870 +0.245 classification 0.505 0.666 +0.161 few_shot 0.193 0.324 +0.131 code_review 0.351 0.395 +0.044 text_to_sql 0.650 0.690 +0.040 routing 0.744 0.761 +0.017 summeval 0.623 0.633 +0.010 The test-generation delta came from the agent discovering mutation-aware prompting - the techniques are MuTAP and MUTGEN - which enumerate every AST-level mutation of the target and require one test per mutation. Baseline wrote generic tests from pretrain priors. The contract extraction delta came from BEAVER (section-level relevance scoring) and PAVE (post-extraction validation), both 2026 techniques that post-date the agent's training. 10 of the 15 most-cited sources across the experiments were published in 2025 or later, which is the conservative argument for why retrieval matters: the agent could not have reached these techniques from parametric memory. Failure modes. Self-refinement hurt text-to-SQL (the agent second-guessed correct queries after reading work on SQL ambiguity). Two suggested techniques (DyT, SeeDNorm) were architecture-incompatible in the autoresearch experiment and got discarded. Retrieval surfaces better options, not guaranteed wins. Reproducibility. Every prompt, every line of agent code, every prediction file, every eval script is in the repo. Each task directory has a README documenting methodology and an approach.md showing exactly what the retrieval surfaced and which technique the agent chose. Repo: https://github.com/paperlantern-ai/paper-lantern-challenges Writeup with detailed per-task discussion: https://www.paperlantern.ai/blog/coding-agent-benchmarks Happy to share additional design choices in comments. submitted by /u/kalpitdixit [link] [comments]
View originalPricing found: $20 /month, $60 /month, $200 /month, $20, $60
Key features include: Fragmented setups, Trapped expertise, No quality signal, Review bottleneck, Work Dispatcher, PR Author, Pair Review, Deep Code Review.
Augment Code is commonly used for: Automating code reviews to ensure functional correctness and style adherence., Enhancing collaborative coding sessions by providing contextual suggestions., Maintaining a comprehensive understanding of legacy code for easier refactoring., Integrating with CI/CD pipelines to streamline deployment processes., Facilitating knowledge transfer among team members through documented tribal knowledge., Identifying and suggesting improvements based on codebase patterns..
Augment Code integrates with: GitHub, GitLab, Bitbucket, JIRA, Slack, Trello, Visual Studio Code, JetBrains IDEs, CircleCI, Travis CI.
Based on user reviews and social mentions, the most common pain points are: token usage, budget exceeded, token cost, API costs.

Introducing SwipeReview™ by Augment Code
Apr 1, 2026
Based on 35 social mentions analyzed, 20% of sentiment is positive, 80% neutral, and 0% negative.