Cohere Command is a family of highly scalable language models that balances high performance with strong accuracy.
I don't see any user reviews or social mentions specifically about "Command R" in the content you've provided. The social mentions appear to be about general LLM token optimization and a RAG-backed retrieval system refactor, but neither explicitly discusses "Command R" or user experiences with that particular tool. To provide an accurate summary of what users think about Command R, I would need reviews and social mentions that actually reference this software tool directly.
Mentions (30d)
1
Reviews
0
Platforms
4
Sentiment
0%
0 positive
I don't see any user reviews or social mentions specifically about "Command R" in the content you've provided. The social mentions appear to be about general LLM token optimization and a RAG-backed retrieval system refactor, but neither explicitly discusses "Command R" or user experiences with that particular tool. To provide an accurate summary of what users think about Command R, I would need reviews and social mentions that actually reference this software tool directly.
Features
Industry
information technology & services
Employees
850
Funding Stage
Venture (Round not Specified)
Total Funding
$2.4B
Cutting LLM token usage by 80% using recursive document analysis
When you employ AI agents, there’s a significant volume problem for document study. Reading one file of 1000 lines consumes about 10,000 tokens. Token consumption incurs costs and time penalties. Codebases with dozens or hundreds of files, a common case for real world projects, can easily exceed 100,000 tokens in size when the whole thing must be considered. The agent must read and comprehend, and be able to determine the interrelationships among these files. And, particularly, when the task requires multiple passes over the same documents, perhaps one pass to divine the structure and one to mine the details, costs multiply rapidly. **Matryoshka** is a tool for document analysis that achieves over 80% token savings while enabling interactive and exploratory analysis. The key insight of the tool is to save tokens by caching past analysis results, and reusing them, so you do not have to process the same document lines again. These ideas come from recent research, and retrieval-augmented generation, with a focus on efficiency. We'll see how Matryoshka unifies these ideas into one system that maintains a persistent analytical state. Finally, we'll take a look at some real-world results analyzing the [anki-connect](https://git.sr.ht/~foosoft/anki-connect) codebase. --- ## The Problem: Context Rot and Token Costs A common task is to analyze a codebase to answers a question such as “What is the API surface of this project?” Such work includes identifying and cataloguing all the entry points exposed by the codebase. **Traditional approach:** 1. Read all source files into context (~95,000 tokens for a medium project) 2. The LLM analyzes the entire codebase’s structure and component relationships 3. For follow-up questions, the full context is round-tripped every turn This creates two problems: ### Token Costs Compound Every time, the entire context has to go to the API. In a 10-turn conversation about a codebase of 7,000 lines, almost a million tokens might be processed by the system. Most of those tokens are the same document contents being dutifully resent, over and over. The same core code is sent with every new question. This redundant transaction is a massive waste. It forces the model to process the same blocks of text repeatedly, rather than concentrating its capabilities on what’s actually novel. ### Context Rot Degrades Quality As described in the [Recursive Language Models](https://arxiv.org/abs/2505.11409) paper, even the most capable models exhibit a phenomenon called context degradation, in which their performance declines with increasing input length. This deterioration is task-dependent. It’s connected to task complexity. In information-dense contexts, where the correct output requires the synthesis of facts presented in widely dispersed locations in the prompt, this degradation may take an especially precipitous form. Such a steep decline can occur even for relatively modest context lengths, and is understood to reflect a failure of the model to maintain the threads of connection between large numbers of informational fragments long before it reaches its maximum token capacity. The authors argue that we should not be inserting prompts into the models, since this clutters their memory and compromises their performance. Instead, documents should be considered as **external environments** with which the LLM can interact by querying, navigating through structured sections, and retrieving specific information on an as-needed basis. This approach treats the document as a separate knowledge base, an arrangement that frees up the model from having to know everything. --- ## Prior Work: Two Key Insights Matryoshka builds on two research directions: ### Recursive Language Models (RLM) The RLM paper introduces a new methodology that treats documents as external state to which step-by-step queries can be issued, without the necessity of loading them entirely. Symbolic operations, search, filter, aggregate, are actively issued against this state, and only the specific, relevant results are returned, maintaining a small context window while permitting analysis of arbitrarily large documents. Key point is that the documents stay outside the model, and only the search results enter the context. This separation of concerns ensures that the model never sees complete files, instead, a search is initiated to retrieve the information. ### Barliman: Synthesis from Examples [Barliman](https://github.com/webyrd/Barliman), a tool developed by William Byrd and Greg Rosenblatt, shows that it is possible to use program synthesis without asking for precise code specifications. Instead, input/output examples are used, and a solver engine is used as a relational programming system in the spirit of [miniKanren](http://minikanren.org/). Barliman uses such a system to synthesize functions that satisfy the constraints specified. The system interprets the examples as if they were relational rules, and the synthesis e
View originalRalph Wiggum plugin corrupted 70+ files in my production codebase — anyone else experience this?
I'm a non-technical founder running a SaaS product (Next.js/React/TypeScript/Supabase stack, ~76 database tables, 100+ migrations). I used the Ralph Wiggum autonomous agent plugin for Claude Code to run 8 overnight sessions redesigning my admin dashboard. Ralph completed all 8 sessions, made 2 commits touching 97 files, and the build appeared to pass locally. But when I tried to publish via Lovable, it failed. After hours of debugging, here's what we found: The damage: 4 TSX files had trailing NUL bytes (invisible zero bytes appended after the actual code). This made the files appear as "binary data" instead of text to build tools, causing Vite to choke. 244 source files had Windows CRLF line endings instead of Unix LF — even though the entire codebase was LF before Ralph touched it. 70+ files were silently truncated mid-code. Functions cut off mid-word, JSX tags never closed, braces unbalanced. TypeScript only reported the first few errors before giving up, so the true scope wasn't obvious until we ran a deep file integrity scan. 37 inline font references were wrong (used the public-facing font instead of the admin font Ralph was supposed to apply). The scary part: npx tsc --noEmit passed clean on the first round of fixes because it stops after a certain number of errors and the truncated files happened to not be imported in certain code paths. The real damage only showed up when Vite tried to build everything. What we had to do to fix it: Strip NUL bytes with tr -d '\0' Convert CRLF→LF with sed -i 's/\r$//' across all files Restore all 70 truncated files from the pre-Ralph git commit Re-apply the font changes manually (simple find-and-replace) Run a custom Python script scanning every file for: NUL bytes, CRLF, unbalanced braces, and suspicious line endings Total time to diagnose + fix: ~4 hours across multiple sessions. My questions for the community: Has anyone else used Ralph Wiggum for large batch operations? Did you experience similar file corruption? What's causing the truncation? Is it a token/context limit issue where the agent runs out of space mid-file-write? A buffer issue? Something with how Claude Code writes files? What defenses do you use before committing autonomous agent output? I'm thinking of adding: Pre-commit hook that rejects files detected as "data" by the file command Pre-commit hook that rejects files with CRLF line endings Automated brace-balance check on all changed .tsx/.ts files Mandatory vite build (not just tsc) before any commit Do other autonomous agent plugins (Cursor background agents, Cline, etc.) have similar issues with large batch file writes? Is there a recommended max number of files an autonomous session should touch before the corruption risk gets too high? Lessons learned the hard way: tsc --noEmit alone is NOT enough to validate autonomous agent output. You need the full build (vite build or equivalent). Always check file *.tsx after batch operations — if any file shows as "data" instead of "ASCII text" or "UTF-8 text", it's corrupted. Git's diff showing Bin X -> Y bytes for a .tsx file is a red flag — text files should never show binary diffs. Keep your pre-agent commit hash handy. You'll need it to restore files. Don't let autonomous agents touch more than ~20 files per session without a verification step in between. Would love to hear others' experiences and any preventive measures you've found effective. This is a great tool when it works, but the silent corruption is genuinely dangerous for production codebases. submitted by /u/Chanaka9000 [link] [comments]
View originalThis sub made my app viral & got me an invite to apply at the Claude Dev Conference in SF. So, I built caffeine half life & sleep health tooling for everyone.
Hey [r/ClaudeAI](r/ClaudeAI) A little while back I shared my Caffeine Curfew app on here and it completely blew up. Because of that amazing viral response, I actually got invited to apply for the Claude developer conference. I am so incredibly grateful to this community, and I really wanted to find a way to give back and share the core tooling with you all for completely free. I built an MCP server for Claude Code and the Claude mobile app that tracks your caffeine intake over time and tells you exactly when it is safe to sleep. Have you ever had a late afternoon coffee and then wondered at midnight why you are staring at the ceiling? This solves that problem using standard pharmacological decay modeling. Every time you log a drink, the server stores it and runs a decay formula. It adds up your whole history to give you a real time caffeine level in mg. Then it looks forward in time to find the exact minute your caffeine drops below your sleep interference threshold. The default half life is five hours and the sleep threshold defaults to 25mg, but both are adjustable since everyone is different! The tech makes the tools ridiculously easy to use. There are zero complicated parameters to memorize. Once connected, it remembers your history automatically and you just talk to Claude naturally: • "Log 150mg of coffee, I just had it" • "When can I safely go to bed tonight?" • "If I have another espresso right now how late would I have to stay up?" • "Show me my caffeine habits for the last thirty days" Under the hood, there are eight simple tools powering this: • log_entry: Log a drink by name and mg • list_entries: See your history • delete_entry: Remove a mistaken entry • get_caffeine_level: Current mg in your system right now • get_safe_bedtime: Earliest time you can safely sleep • simulate_drink: See how another coffee shifts your bedtime before you even drink it • get_status_summary: Full picture with a target bedtime check • get_insights: Seven or thirty day report with trend direction and peak days I am hosting this server on my Mac Mini behind a Cloudflare Tunnel. It features strict database isolation, meaning every single person gets a unique URL and your data is totally separate from everyone else. No login, no signup, no account. Want to try it out? Just leave a comment below and I will reply with your personal key! Once you get your key, you just paste the URL into your Claude desktop app under Settings then Connected Tools, or drop it into your Claude desktop config file. For the tech people curious about the stack: Python, FastMCP, SQLite, SSE transport, Cloudflare Tunnel, and launchd for auto start. The user isolation uses an ASGI middleware that extracts your key from the SSE connection URL and stores it in a ContextVar, ensuring every tool call is automatically scoped to the right user without any extra steps. If you would rather host it yourself, you can get it running in about five minutes. I have the full open source code on GitHub here: https://github.com/garrettmichae1/CaffeineCurfewMCPServer The repo readme has all the exact terminal commands to easily get your own tunnel and server up and running. Original App: https://apps.apple.com/us/app/caffeine-curfew-caffeine-log/id6757022559 ( The MCP server does everything the app does, but better besides maybe the presentation of the data itself. ) Original Post: https://www.reddit.com/r/ClaudeCode/s/FsrPyl7g6r submitted by /u/pythononrailz [link] [comments]
View originalI built a Chrome extension that sends any webpage element's context to Claude Code via MCP — in one click
Hey r/ClaudeAI, Built a small tool that's been saving me a lot of copy-paste time: Clasp-it. The problem it solves: When I'm fixing a UI bug, I used to open DevTools, copy the HTML, copy the computed CSS, paste it into Claude, describe the issue... It was tedious. Especially when the bug involved React props or console errors too. What Clasp-it does: - Click the extension icon → click any element on any page - It captures HTML, CSS selector, computed styles, React props, console logs, network requests, and a screenshot - All of it gets sent to Claude Code via MCP automatically Then I just tell Claude: *"fix all recent picks using clasp-it"* — and it reads the full context and edits my actual source files. Setup (2 minutes): Install from Chrome Web Store (link below) Run one command to add the MCP server: claude mcp add --scope user --transport http clasp-it https://claspit.dev/mcp --header "Authorization: Bearer YOUR_API_KEY" Free plan: 10 picks/day with DOM + CSS Pro: unlimited + screenshot, console, network, React props ($2.99/mo) Chrome Web Store: https://chromewebstore.google.com/detail/clasp-it/inelkjifjfaepgpdndcgdkpmlopggnlk Website: https://claspit.dev Happy to answer any questions. Would love feedback from this community especially. submitted by /u/cyphermadhan [link] [comments]
View originalBuild Your Own Alex Hormozi Brain Agent (anyone with lots of publicly available content) using a Claude Project
I bought the books. Watched the videos. Still wanted more, especially after he talked about the agent he created. All that material is publicly available. Enough to build my own Alex Hormozi Brain Agent? "Hey Jules, how about it?" Jules is my AI coding assistant (Claude Code). Jules ran off, grabbed transcripts of videos, text of books, whatever is available online. Guest podcasts." then turned that into files I uploaded to a Claude Project so I can chat through Claude with Alex Hormozi. Here's what Jules found - 99 long-form YouTube video transcripts - 3 complete audiobook transcripts - 15 guest podcast transcripts - X threads What I Did in Four Phases Phase 1 maps the full source landscape: YouTube channel (4,754 videos), The Game podcast (~900+ episodes), three books, guest podcast appearances, X/Twitter. Figure out what's worth downloading before you start. Phase 2 downloads and converts. Top 100 longest video transcripts, full audiobook transcripts for all three books, 15 guest podcast transcripts from the highest-view-count appearances, and whatever X/Twitter content the API will give you. Phase 3 runs voice pattern analysis. Sentence structure, reasoning skeleton, core frameworks, teaching style, verbal signatures. This is where the persona takes shape. Phase 4 builds the system prompt and optimizes the knowledge base to fit within Claude Projects' limits. Then deploy. Phase 1: Inventory The @AlexHormozi YouTube channel has 4,754 videos. That number is misleading. 4,246 of those are Shorts (under 60 seconds or no duration metadata). Filter those out and you have 508 full-length videos. That's the real content library. Beyond YouTube, the main sources worth pursuing: The Game podcast (~900+ episodes). His primary long-form output. The audiobooks for all three books are available free on the podcast and YouTube. Guest podcast appearances. DOAC, Impact Theory, School of Greatness, Modern Wisdom, Danny Miranda. Hosts push him off-script and into territory he doesn't cover in his own content. High value per byte. X/Twitter threads. Compressed, punchy formulations of his frameworks. Different texture than the long-form material. Skool community. Behind a login wall. Low ROI for this project. Acquisition.com. No blog. Courses are paywalled. Skip. Phase 2: Collect YouTube Transcripts The first scrape of the YouTube channel only returned 494 videos. The channel has 4,754. The scraper was pulling from the /videos tab, which doesn't surface the full library. Re-running against the full channel URL (@AlexHormozi) returned everything. Easy to miss, significant difference. After filtering Shorts: 508 full-length videos. I downloaded auto-generated captions for the top 100 longest videos (sorted by duration, so the meatiest content came first). Auto-generated captions from YouTube come as SRT files with timestamps, line numbers, and duplicate lines. Converting those to clean readable text required stripping all the formatting artifacts and deduplicating language variants (English vs English-Original). Result: 99 transcripts. A few livestreams had no captions available. Book Audiobook Transcripts All three Hormozi books have full audiobook uploads on YouTube: $100M Offers (~4.4 hours) $100M Leads (~7 hours) $100M Money Models (~4.3 hours) Same process as the video transcripts. Download the auto-generated captions, convert to clean text. Three files, 855KB total. These are non-negotiable core material for the knowledge base. Guest Podcast Transcripts Searched YouTube for Hormozi guest appearances sorted by view count. The top hit was Diary of a CEO at 4.7M views. Grabbed the 15 highest-view-count appearances. The guest transcripts are 2.1MB total. Worth every byte. When a host like Steven Bartlett or Tom Bilyeu pushes back on a claim, Hormozi shifts into a different mode. He's more precise and sometimes reveals the edge cases he glosses over on his own channel. You can't get that from watching his channel alone. X/Twitter Content X's API rate limits capped the collection at 9 unique tweets. Not ideal, but enough to confirm the voice texture: "Aggressive with effort. Relaxed with outcome." His Twitter is his most compressed format. Each tweet is a framework distilled to a single line. 9 tweets is thin. For a more complete build, you'd want to manually curate 50-100 of his best threads. The API limitations made automated collection impractical. Phase 3: Analyze I ran voice analysis across the full corpus, looking at seven dimensions. Hormozi's sentences are short, punchy declarations. Fragments for emphasis. "And so" as his default transition. Short bursts, then a longer sentence that lands the point. Nearly every argument follows the same five-step skeleton: bold claim, personal story, framework, math, then a reductio ad absurdum that makes the alternative sound insane. Once you see it, you can't unsee it. The core frameworks are Grand Slam Offer, Value Equation, Supply an
View originalI built a persistent memory MCP for Claude Code — here's what I learned about why LLM-based extraction is the wrong approach
I've been using Claude Code daily for months and wanted it to remember things across sessions — project context, my preferences, decisions we've made together. I tried Mem0 and Zep but hit the same frustration with both: they intercept conversations and run them through a separate LLM to decide what's worth remembering. That felt wrong. Claude already understands the conversation. Why pay for a second LLM to re-interpret what just happened? So I built Deep Recall — an MCP server that takes a different approach. Claude decides what to store. The memory system handles what happens to those memories over time. **What I learned building this:** The biggest insight was that extraction quality is actually BETTER when the agent does it itself. Claude has full context — it knows what's new information vs what it already knows, what contradicts existing memories, what's important to this specific user. A separate extraction LLM has none of that context. The second insight was that memories need biology, not just storage. I implemented: - **Salience decay** based on ACT-R cognitive architecture — unused memories fade, frequently accessed ones resist decay - **Hebbian reinforcement** — when Claude cites a memory in its response, that memory gets stronger - **Contradiction detection** — if you store "works at Google" then later "works at Meta", it flags the conflict - **Temporal supersession** — detects that's a career change, not a contradiction, and auto-resolves it - **Memory consolidation** — clusters of related episodes compress into durable facts over time **How it works with Claude Code:** ```bash pip install deeprecall-mcp ``` Add to `~/.claude/settings.json`: ```json { "mcpServers": { "deeprecall": { "command": "deeprecall-mcp", "env": { "DEEPRECALL_API_KEY": "your_key" } } } } ``` Claude gets tools like `deeprecall_context` (pull memories before responding), `deeprecall_remember` (store a fact), and `deeprecall_learn` (post-conversation biology processing). **The whole thing was built with Claude Code** — Thomas (my Claude instance) and I pair-programmed the entire backend, MCP server, landing page, billing, and the biological memory algorithms. The irony of using Claude to build a memory system for Claude isn't lost on me. Free to try — 10,000 memories, no credit card, all features: https://deeprecall.dev Happy to answer questions about the architecture or the cognitive science behind the decay/reinforcement models. submitted by /u/floppytacoextrasoggy [link] [comments]
View originalwhat’s the right “Jira/Linear” abstraction for Claude Code?
Saw the recent post about using GitHub issues with Claude Code. Smart approach. We had been using similar workflows before, but it also felt like a hint that the real missing layer here is probably something closer to “Linear/Jira for Claude Code” than just reusing human PM tools. We had been building and using a local-first alternative internally with Claude Code, and recently open sourced it: https://github.com/Agent-Field/plandb What it does: it gives agents a persistent task graph instead of a flat todo list, issue tracker, or board. The main thing we kept seeing is that agent workflows want different primitives than human workflows. Not just: ticket status assignee board columns More like: complex task dependencies ready / unblocked next work safe parallel task claiming mid-flight replanning preserving local context and discoveries adapting the plan as new information shows up One interesting thing from using Claude Code on this: it often wants to decompose work in a more parallel, graph-shaped way than humans naturally would. Human PM tools assume people move tasks through stages. But becuse ai is much smarter than us, it splits work, runs independent branches in parallel, and adapts halfway through like we have never seen before (atleast for the internal development we have been doing) and thats what PlanDB is optimized for. You can try it now with a single command curl -fsSL https://raw.githubusercontent.com/Agent-Field/plandb/main/install.sh | bash And something like /plandb Build a CLI todo app in Python with add, list, complete, and delete commands. Store todos in a local JSON file. Include tests. The CLI bits that made this feel agent-native for us were things like: plandb init "auth-refactor" plandb add "ship auth refactor" --description "full work order" plandb split --into "schema, api, tests" plandb critical-path plandb bottlenecks plandb go plandb done --next plandb what-unlocks t-api plandb context "root cause: token refresh race" --kind discovery plandb task pivot t-tests --file revised-plan.yaml It’s open source, built with Claude Code for this kind of workflow, and I think this category is still pretty open. submitted by /u/Santoshr93 [link] [comments]
View originalFree MCP server I built: gives Claude access to 11M businesses with phone/email/hours, no Google Places API needed
Hi r/ClaudeAI 👋 I built and published a free MCP server for Claude Desktop / Claude Code that gives Claude access to a structured directory of 11M+ real businesses across 233 countries — phone numbers, opening hours, emails, addresses, websites, geo coordinates. It's called agentweb-mcp. Free signup, no credit card, runs on a single VPS I pay for personally. ────────────────────────────────── What you can ask Claude after installing it ────────────────────────────────── • "Find me 3 vegan restaurants near 51.51, -0.13 within 2 km, with phones" • "What time does that bakery in Copenhagen open on Sundays?" • "Search for dentists in Berlin Mitte with verified opening hours" • "I'm in Tokyo — find a 24/7 pharmacy near my coordinates" • "List all hardware stores in Dublin with a website" Plus write-back tools so Claude can also contribute: • "Add this restaurant I just visited to AgentWeb" (auto-dedupes by name+coords+phone) • "Report that the dentist on Hauptstrasse closed" (3+ closed reports auto-lower trust score) ────────────────────────────────── Install (60 seconds) ────────────────────────────────── Get a free key: https://agentweb.live/#signup Add to claude_desktop_config.json: { "mcpServers": { "agentweb": { "command": "npx", "args": ["-y", "agentweb-mcp"], "env": { "AGENTWEB_API_KEY": "aw_live_..." } } } } Restart Claude Desktop. Done. ────────────────────────────────── Why I built it ────────────────────────────────── I needed business data in agent-native format and Google Places costs ~$17 per 1k lookups, which is fine for human apps but instantly painful for any agent doing meaningful work. OpenStreetMap has the data but Overpass query syntax is rough for LLMs to generate. I wanted something Claude could just call as a tool with no friction. ────────────────────────────────── How I built it (the part that might help anyone making their own MCP) ────────────────────────────────── A few things I learned along the way that I'd recommend to anyone building an MCP server: **Make at least one tool work without an API key.** Most MCP servers gate everything behind auth. Mine has a "substrate read" — agentweb_get_short — that hits a public endpoint with no key required, returns the business in 700 bytes instead of 3-5KB. Single-letter JSON keys, schema documented at /v1/schema/short. ~80% token savings on bulk lookups. Lowering friction by zero-auth on the most common path is the single biggest win for adoption. **The MCP server itself is tiny.** ~400 lines of TypeScript. It's just a thin protocol adapter — search_businesses → /v1/search, get_business → /v1/r/{id}, etc. The real work is in the FastAPI backend behind it (Postgres + PostGIS for geo, Redis for hot caching, Cloudflare in front). If you're starting an MCP, build the REST API first and treat the MCP layer as the last 5% of work. **Postgres is enough for "AI-native" infrastructure.** I almost migrated to ClickHouse for analytics performance but the actual fix was just refreshing the visibility map (VACUUM) and adding composite indexes. Postgres + pgvector handles geo, full-text, JSONB, and vector search in one engine. The boring database is the right database. **Per-field provenance + confidence scores matter for agents.** Every record returned has src (jsonld / osm / owner_claim) and t (trust score 0-1). Agents can filter on these. I think this is going to be table stakes for any agent-data API in 18 months. **Owner-claimable in 30 seconds, no website required.** Most directories require businesses to verify via website or Google Business — long tail businesses (the bakery on the corner) get locked out. Mine lets the owner claim with email-at-domain verification, takes 30 seconds, no website needed. This is the moat I'm betting on long-term. ────────────────────────────────── Honest limitations ────────────────────────────────── • Phone coverage varies by country. Nordics + Western Europe are great (60-80% coverage). Parts of SE Asia and Africa are sparse. • Some rows are stale; I have enrichment workers running continuously but it's not Google-perfect yet. • Free tier has rate limits, but they're generous for personal use. Free, MIT licensed, source: github.com/zerabic/agentweb-mcp npm: https://www.npmjs.com/package/agentweb-mcp Live demo + manifesto: https://agentweb.live Happy to answer any technical questions, particularly about the token-efficient shorthand format, the substrate architecture, or the matview-based aggregate cache. Built solo over a few weeks. submitted by /u/ZeroSubic [link] [comments]
View originalOne Bad Package Exposed Millions of Claude Users. Adopt These 5 Habits to Avoid the Next One.
The axios supply chain attack on March 31 should have been a wake up call. For roughly 3 hours, one of the most popular npm packages in the world was shipping North Korean malware. It executed in 2 seconds - before npm even finished installing. If your Claude Code session ran npm install during that window, you were compromised before you could blink. Here's the uncomfortable part: Claude added axios to your project. You didn't review it. The AI reached for the most popular HTTP client, added it to package.json, and ran install. That's the whole vibe coding workflow. It's also the attack surface. I regularly scan for security conversations across Reddit. Outside of r/cybersecurity, and a few other places, security content is either not published or ignored. It's time to pay attention: 24-45% of AI-generated code contains security flaws, vibe-coded apps are getting breached, and hackers are targeting popular packages because they know people don't check what they're installing. So what do the people who aren't getting burned do differently? What They Do Now 1. They actually look at package.json after Claude modifies it When Claude adds a dependency, they check: What is it? Is this the real package or a typosquat? They pin versions explicitly (1.14.0 not ^1.14.0) so auto-updates don't pull in a compromised release. 2. They run npm audit (or pip-audit) regularly Takes seconds. Catches known vulnerabilities in your dependency tree. Many people skip this entirely. 3. They use the AI to review its own work (using a different model can also help here) After Claude generates a feature, they prompt: "Now act as a security engineer. Review the code you just wrote for injection, path traversal, and hardcoded secrets. Flag anything risky." Two-pass prompting catches what single-pass misses. It only takes a few minutes. 4. They don't let AI output go straight to production AI-generated code gets intense scrutiny. AI-aided review as well as using static vulnerability tools that don't hallucinate and don't have attention problems. 5. They scan for leaked secrets before every commit AI hallucinations include hardcoded API keys, test credentials, and config values that should never hit a repo. git secrets or GitHub's built-in secret scanning catches these. What Next-Level Coders Will Be Doing Next The axios attack exposed a fundamental problem: by the time you see npm install complete, it's already too late. The malware ran during install, not after. Leveling up means having protection that work before packages get installed - not after you've already been compromised. Passive supply chain protection Tools that intercept package installation and check against known malicious package databases before the code runs. If axios@1.14.1 is on a blocklist, the install fails before the postinstall hook ever executes. Automatic content scanning When Claude fetches a URL, reads a document, or processes retrieved content, that content gets scanned for prompt injection patterns before it enters context. The attack gets detected or blocked at ingestion, not detected after execution. Background traffic monitoring Your AI assistant makes network calls constantly - fetching docs, pulling packages, calling APIs. Passive monitoring flags anomalous destinations (why is my dev environment calling a server in Pyongyang?) without requiring you to watch every request. MCP tool integrity verification As Claude Code and other AI tools become more popular and use of these tools by non-technical people expands, compromised tool definitions (tool definitions that contain harmful content) will next supply chain vector. Integrity checks verify that the tools your AI is using haven't been tampered with. The pattern: security that runs automatically, in the background, without requiring you to remember to run a command or review a log. Because the attack surface isn't just your code. It's everything the AI touches on your behalf. The axios incident lasted 3 hours. The next one might last longer. The difference between getting burned and not is whether your workflow has any protection at all. What are you doing differently since March 31? submitted by /u/SpiritRealistic8174 [link] [comments]
View originalClaude confidently got 4 facts wrong. /probe caught them before I wrote the code
I've been running a skill called /probe against AI-generated plans before writing any code, and it keeps catching bugs in the spec that the AI was confidently about to implement. This skill forces each AI-asserted fact into a numbered CLAIM with an EXPECTED value, then runs a command to "probe" against the real system and captures the delta. used it today for this issue, which motivated this post- My tmux prefix+v scrollback capture to VIM stopped working in Claude Code sessions because CLAUDE_CODE_NO_FLICKER=1 (which I'd set to kill the scroll-jump flicker) switches Claude into the terminal's alternate screen buffer. No scrollback to capture. So I decided to try something else- Claude sessions are persisted as JSONL under ~/.claude/projects/..., so I asked Claude to propose a shell script to parse that directly. Claude confidently described the format. I ran /probe against the description before writing the jq filter. Four hallucinations fell out: AI said 2 top-level types (user, assistant). Reality: 7, also queue-operation, file-history-snapshot, attachment, system, permission-mode, summary. AI said assistant content = text + tool_use. Missed thinking blocks, which are about a third of assistant output in extended thinking mode. AI said user content is always an array. Actually polymorphic: string OR array. AI said folder naming replaces / with -. Actually prepend dash, then replace. Each would have been a code bug confidently implemented by AI. The jq filter would have errored on string-form user content, dumped thinking blocks as garbage, and missed 5 of 7 message types entirely. The probe caught them because the AI had to write "EXPECTED: 2 types" before running jq -r '.type' file.jsonl | sort -u. Saying the number first makes the delta visible. One row from the probe looked like this: CLAIM 1: JSONL has 2 top-level types (user, assistant) EXPECTED: 2 COMMAND: jq -r '.type' *.jsonl | sort -u | wc -l ACTUAL: 7 DELTA: +5 unknown types (queue-operation, file-history-snapshot, attachment, system, permission-mode, summary) the claims worth probing are often the ones the AI is most confident about. When the AI hedges, you already know to check. When it flatly states X, you don't. And X is often wrong in some small load-bearing way. High-confidence claims are where hallucinations hide. another benefit is that one probe becomes N permanent tests. The 7-type finding >> schema test that fails CI if a new type appears. The string-or-array finding >> property test that fuzzes both shapes. When the upstream format changes, the test fails, I re-probe, the oracle updates. the limitations are that the probe only catches claims the AI thinks to make. Unknown unknowns stay invisible. Things that help: run jq 'keys' first to enumerate reality before generating claims. Dex Horthy's CRISPY pattern (HumanLayer) pushes the AI to surface its own gap list. GitHub's Spec Kit uses [NEEDS CLARIFICATION] markers in specs to force the AI to literally mark blind spots. Human scan of the claim list is also recommended. Here what to consider- traditional TDD writes the test based on what you THINK should happen. Probe-driven TDD writes the test based on what you spiked or VERIFIED happens. Mocks test your model of the system. The probe tests the system itself. anybody else run into this- AI claims that are confident but wrong? happy to share the full /probe skill file if there's interest, just drop a comment. EDIT: gist with the full skill + writeup >> https://gist.github.com/williamp44/04ebf25705de10a9ba546b6bdc7c17e4 two files: - README.md: longer writeup with the REPL-as-oracle angle and a TDD contrast - probe-skill.md: the 7-step protocol I load as a Claude Code skill swap out the Claude Code bits if you don't use Claude Code. the pattern is just "claim table + real-system probe + capture the delta" and works with any REPL or CLI tool that can query the system you're about to code against. submitted by /u/More-Journalist8787 [link] [comments]
View originalanthropic isn't the only reason you're hitting claude code limits. i did audit of 926 sessions and found a lot of the waste was on my side.
Last 10 days, X and Reddit have been full of outrage about Anthropic's rate limit changes. Suddenly I was burning through a week's allowance in two days, but I was working on the same projects and my workflows hadn't changed. People on socials reporting the $200 Max plan is running dry in hours, some reporting unexplained ghost token usage. Some people went as far as reverse-engineering the Claude Code binary and found cache bugs causing 10-20x cost inflation. Anthropic did not acknowledge the issue. They were playing with the knobs in the background. Like most, my work had completely stopped. I spend 8-10 hours a day inside Claude Code, and suddenly half my week was gone by Tuesday. But being angry wasn't fixing anything. I realized, AI is getting commoditized. Subscriptions are the onboarding ramp. The real pricing model is tokens, same as electricity. You're renting intelligence by the unit. So as someone who depends on this tool every day, and would likely depend on something similar in future, I want to squeeze maximum value out of every token I'm paying for. I started investigating with a basic question. How much context is loaded before I even type anything? iykyk, every Claude Code session starts with a base payload (system prompt, tool definitions, agent descriptions, memory files, skill descriptions, MCP schemas). You can run /context at any point in the conversation to see what's loaded. I ran it at session start and the answer was 45,000 tokens. I'd been on the 1M context window with a percentage bar in my statusline, so 45k showed up as ~5%. I never looked twice, or did the absolute count in my head. This same 45k, on the standard 200k window, is over 20% gone before you've said a word. And you're paying this 45k cost every turn. Claude Code (and every AI assistant) doesn't maintain a persistent conversation. It's a stateless loop. Every single turn, the entire history gets rebuilt from scratch and sent to the model: system prompt, tool schemas, every previous message, your new message. All of it, every time. Prompt caching is how providers keep this affordable. They don't reload the parts that are common across turns, which saves 90% on those tokens. But keeping things cached costs money too, and Anthropic decided 5 minutes is the sweet spot. After that, the cache expires. Their incentives are aligned with you burning more tokens, not fewer. So on a typical turn, you're paying $0.50/MTok for the cached prefix and $5/MTok only for the new content at the end. The moment that cache expires, your next turn re-processes everything at full price. 10x cost jump, invisible to you. So I went manic optimizing. I trimmed and redid my CLAUDE md and memory files, consolidated skill descriptions, turned off unused MCP servers, tightened the schema my memory hook was injecting on session start. Shaved maybe 4-5k tokens. 10% reduction. That felt good for an hour. I got curious again and looked at where the other 40k was coming from. 20,000 tokens were system tool schema definitions. By default, Claude Code loads the full JSON schema for every available tool into context at session start, whether you use that tool or not. They really do want you to burn more tokens than required. Most users won't even know this is configurable. I didn't. The setting is called enable_tool_search. It does deferred tool loading. Here's how to set it in your settings.json: "env": { "ENABLE_TOOL_SEARCH": "true" } This setting only loads 6 primary tools and lazy-loads the rest on demand instead of dumping them all upfront. Starting context dropped from 45k to 20k and the system tool overhead went from 20k to 6k. 14,000 tokens saved on every single turn of every single session, from one line in a config file. Some rough math on what that one setting was costing me. My sessions average 22 turns. 14,000 extra tokens per turn = 308,000 tokens per session that didn't need to be there. Across 858 sessions, that's 264 million tokens. At cache-read pricing ($0.50/MTok), that's $132. But over half my turns were hitting expired caches and paying full input price ($5/MTok), so the real cost was somewhere between $132 and $1,300. One default setting. And for subscription users, those are the same tokens counting against your rate limit quota. That number made my head spin. One setting I'd never heard of was burning this much. What else was invisible? Anthropic has a built-in /insights command, but after running it once I didn't find it particularly useful for diagnosing where waste was actually happening. Claude Code stores every conversation as JSONL files locally under ~/.claude/projects/, but there's no built-in way to get a real breakdown by session, cost per project, or what categories of work are expensive. So I built a token usage auditor. It walks every JSONL file, parses every turn, loads everything into a SQLite database (token counts, cache hit ratios, tool calls, idle gaps, edit failures, skill invocations), and an insi
View originalRefactor a-help skill to use RAG-backed retrieval instead of monolithic prompt
## Context The `/a-help` built-in skill is a 48KB monolithic prompt (`src/anteroom/cli/default_skills/a-help.yaml`) that embeds the entire Anteroom reference — config tables, tool descriptions, CLI commands, environment variables, and more. It's at 95% of the 50KB skill prompt limit and growing with every feature. This is unsustainable. Every new config field, tool, or CLI command requires squeezing more into an already-bloated prompt that gets injected in full on every `/a-help` invocation. Additionally, users shouldn't need to remember to type `/a-help` — when someone asks "how do I configure tools?" in a normal conversation, the AI should automatically recognize this as an Anteroom question and invoke the help skill. ## Proposal Two changes: 1. **Slim `a-help`** from a monolithic 48KB inline reference to a ~10-15KB strategy prompt with a curated docs index and explicit `read_file` fallback. The AI reads the specific docs page it needs on demand instead of receiving everything upfront. 2. **Improve auto-invocation reliability** by broadening the `a-help` skill description so the LLM recognizes natural Anteroom questions as matching the skill. This is skill-specific prompt engineering — no changes to shared infrastructure (`invoke_skill` tool, `<available_skills>` catalog instruction, or system prompt builders). This is **not** a RAG integration. The skill stays a pure prompt template — no new fields, no retrieval pipeline changes. ### Benefits - Removes the 50KB ceiling — docs can grow freely - Reduces token cost per `/a-help` invocation (~10K tokens instead of ~12K) - Docs pages stay the single source of truth — no more maintaining parallel content in `a-help.yaml` and `docs/` - New features auto-appear in `/a-help` when their docs pages are written and indexed - Users get help without needing to know about `/a-help` — natural questions trigger it automatically - Zero infrastructure change — works today with no code modifications ### Future: Skill-Scoped RAG Retrieval A separate future issue will explore proper RAG-backed skills with: - Retrieval scoping by source IDs / corpus - Non-user-visible storage for built-in docs - Update semantics for bundled docs - `rag_enabled` skill field and CLI/web parity That's new RAG infrastructure, not a refactor of `a-help`. ## Acceptance Criteria ### Slim a-help (Phase 1 — done in PR #850) - [x] `a-help.yaml` is under 15KB (hard budget — leaves room for growth) - [x] Strategy section tells the AI to check inline quick-ref first, then `read_file` specific docs pages - [x] Curated docs index maps question categories to specific file paths - [x] Inline quick-reference retained for the most common questions (~80% coverage): config layers, tool tiers, approval modes, skill format, CLI commands - [x] Less common reference (full config field tables, env var lists, detailed architecture) moved to docs-only — accessed via `read_file` - [x] Links to #843 content (`docs/cli/porting-from-claude-code.md`, `docs/cli/skill-examples.md`) included in the docs index - [x] Existing skill-loader tests pass — `a-help` still loads as a valid skill ### Auto-invocation (Phase 2) - [ ] `a-help` skill description broadened to trigger auto-invocation for Anteroom questions (description-only change, no shared code) - [ ] Natural questions like "how do I configure tools?" trigger `invoke_skill(name="a-help")` without explicit `/a-help` - [ ] Manual verification: ask Anteroom questions without `/a-help` prefix — AI uses the skill automatically - [ ] If description-only approach proves insufficient, open a separate issue for changing the generic `<available_skills>` instruction in `repl.py:1431-1437` and `chat.py:495-501` with broader eval coverage ## Related Issues - #843 — porting docs and skill examples; `a-help` will link to these new pages - Future: skill-scoped RAG retrieval (not yet filed) - Future (if needed): tune generic `<available_skills>` matching language in `repl.py` and `chat.py` ## Parity **Parity exception**: Built-in skill content change only (YAML `description` field). Both interfaces read the same `a-help.yaml` via the shared skill registry. The `<available_skills>` catalog in both `repl.py` and `chat.py` renders the description identically. No changes to shared prompt builders or runtime behavior. --- ## Implementation Plan ### Summary Slim the `a-help` built-in skill from 48KB to under 15KB by replacing inline reference tables with a curated docs index and `read_file` fallback strategy. Then broaden the skill's `description` field to improve auto-invocation for natural Anteroom questions. ### Phase 1: Slim a-help (done — PR #850) | File | Change | |------|--------| | `src/anteroom/cli/default_skills/a-help.yaml` | Restructure: keep strategy + high-frequency quick-ref + curated docs index; remove low-frequency inline tables | | `tests/unit/test_skills.py` | Size budget assertion (< 15KB) | ### Phase 2: Auto-invocation (skill-specific, no shared code c
View originalCutting LLM token usage by 80% using recursive document analysis
When you employ AI agents, there’s a significant volume problem for document study. Reading one file of 1000 lines consumes about 10,000 tokens. Token consumption incurs costs and time penalties. Codebases with dozens or hundreds of files, a common case for real world projects, can easily exceed 100,000 tokens in size when the whole thing must be considered. The agent must read and comprehend, and be able to determine the interrelationships among these files. And, particularly, when the task requires multiple passes over the same documents, perhaps one pass to divine the structure and one to mine the details, costs multiply rapidly. **Matryoshka** is a tool for document analysis that achieves over 80% token savings while enabling interactive and exploratory analysis. The key insight of the tool is to save tokens by caching past analysis results, and reusing them, so you do not have to process the same document lines again. These ideas come from recent research, and retrieval-augmented generation, with a focus on efficiency. We'll see how Matryoshka unifies these ideas into one system that maintains a persistent analytical state. Finally, we'll take a look at some real-world results analyzing the [anki-connect](https://git.sr.ht/~foosoft/anki-connect) codebase. --- ## The Problem: Context Rot and Token Costs A common task is to analyze a codebase to answers a question such as “What is the API surface of this project?” Such work includes identifying and cataloguing all the entry points exposed by the codebase. **Traditional approach:** 1. Read all source files into context (~95,000 tokens for a medium project) 2. The LLM analyzes the entire codebase’s structure and component relationships 3. For follow-up questions, the full context is round-tripped every turn This creates two problems: ### Token Costs Compound Every time, the entire context has to go to the API. In a 10-turn conversation about a codebase of 7,000 lines, almost a million tokens might be processed by the system. Most of those tokens are the same document contents being dutifully resent, over and over. The same core code is sent with every new question. This redundant transaction is a massive waste. It forces the model to process the same blocks of text repeatedly, rather than concentrating its capabilities on what’s actually novel. ### Context Rot Degrades Quality As described in the [Recursive Language Models](https://arxiv.org/abs/2505.11409) paper, even the most capable models exhibit a phenomenon called context degradation, in which their performance declines with increasing input length. This deterioration is task-dependent. It’s connected to task complexity. In information-dense contexts, where the correct output requires the synthesis of facts presented in widely dispersed locations in the prompt, this degradation may take an especially precipitous form. Such a steep decline can occur even for relatively modest context lengths, and is understood to reflect a failure of the model to maintain the threads of connection between large numbers of informational fragments long before it reaches its maximum token capacity. The authors argue that we should not be inserting prompts into the models, since this clutters their memory and compromises their performance. Instead, documents should be considered as **external environments** with which the LLM can interact by querying, navigating through structured sections, and retrieving specific information on an as-needed basis. This approach treats the document as a separate knowledge base, an arrangement that frees up the model from having to know everything. --- ## Prior Work: Two Key Insights Matryoshka builds on two research directions: ### Recursive Language Models (RLM) The RLM paper introduces a new methodology that treats documents as external state to which step-by-step queries can be issued, without the necessity of loading them entirely. Symbolic operations, search, filter, aggregate, are actively issued against this state, and only the specific, relevant results are returned, maintaining a small context window while permitting analysis of arbitrarily large documents. Key point is that the documents stay outside the model, and only the search results enter the context. This separation of concerns ensures that the model never sees complete files, instead, a search is initiated to retrieve the information. ### Barliman: Synthesis from Examples [Barliman](https://github.com/webyrd/Barliman), a tool developed by William Byrd and Greg Rosenblatt, shows that it is possible to use program synthesis without asking for precise code specifications. Instead, input/output examples are used, and a solver engine is used as a relational programming system in the spirit of [miniKanren](http://minikanren.org/). Barliman uses such a system to synthesize functions that satisfy the constraints specified. The system interprets the examples as if they were relational rules, and the synthesis e
View originalCommand R uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Multilingual, RAG Citations, Purpose-built for real-world enterprise use cases, Automate business workflows, Command family of models, Blog post, What’s possible with Command, Private deployment and customization.
Based on user reviews and social mentions, the most common pain points are: token usage, token cost.
Based on 17 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.