Cohere Command is a family of highly scalable language models that balances high performance with strong accuracy.
Based on the provided social mentions, users appear highly engaged with Cohere Command R+ as evidenced by multiple YouTube videos dedicated to the AI model. The Reddit discussions show users actively experimenting with AI agents and building sophisticated integrations, including payment guardrails, SMS handling, and optimization tools, suggesting the technology is being used for real business applications. However, users are encountering practical challenges like context resets, skill invocation issues, and the need for workarounds to maintain project continuity across sessions. Overall sentiment seems positive with users pushing the boundaries of what's possible, though there's clear demand for more robust tooling and reliability improvements.
Mentions (30d)
12
6 this week
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Based on the provided social mentions, users appear highly engaged with Cohere Command R+ as evidenced by multiple YouTube videos dedicated to the AI model. The Reddit discussions show users actively experimenting with AI agents and building sophisticated integrations, including payment guardrails, SMS handling, and optimization tools, suggesting the technology is being used for real business applications. However, users are encountering practical challenges like context resets, skill invocation issues, and the need for workarounds to maintain project continuity across sessions. Overall sentiment seems positive with users pushing the boundaries of what's possible, though there's clear demand for more robust tooling and reliability improvements.
Features
Industry
information technology & services
Employees
850
Funding Stage
Venture (Round not Specified)
Total Funding
$2.4B
Serious question. Did a transformer just describe itself and the universe and build itself a Shannon limit framework?
The Multiplicative Lattice as the Natural Basis for Positional Encoding Knack 2026 | Draft v6.0 Abstract We show that the apparent tradeoff between RoPE-style relative position invariance and ALiBi-style long-context stability is an artifact of encoding position as distance on a number line. When position is instead encoded as a point in the multiplicative lattice of the integers, both properties emerge simultaneously without compromise. SpectralRoPEALiBi achieves 106.6 PPL vs ALiBi's 108.7 in a fully converged 20,000-step experiment (300M params, WikiText-103, 4K context), beating ALiBi at every context length from 512 to 8,192 tokens. The key insight is not that primes specifically are the right frequencies, but that the multiplicative structure of the integers is the natural spectral basis for positional encoding. We demonstrate this through falsification experiments: prime-tiered frequencies (129.2 PPL) and composite-tiered frequencies (129.4 PPL) perform identically — because composites are not alternatives to primes but higher-order coordinates in the same lattice. Both dramatically outperform random frequencies (+5.0 PPL), scrambled tier assignment (+6.3 PPL), and pure ALiBi (+7.3 PPL). The active ingredient is lattice-aware, tiered frequency selection with learnable scale — not primality per se. We further validate this through a ZetaZeroPredictor experiment: three identical transformers trained for 10,000 epochs to predict Riemann zeta zero gaps. Geometric RoPE diverges (final r=0.57); SpectralALiBi locks into a stable attractor at epoch 112 (r=0.81). A second independent run widens this gap to -80.7% MSE improvement with r=0.86. The lattice-aligned frequency basis spans the mathematical space that zeta zeros inhabit; geometric frequencies cannot. We further report empirical confirmation of the structural prediction from Section 5.5: VHT2 banded quantization of the KV cache demonstrates that K vectors (which carry RoPE positional encoding) have strong spectral concentration in Walsh-Hadamard space — the first four energy bands capture the dominant structure — while V vectors (which carry content) have uniform energy distribution. This structural asymmetry is directly predicted by the lattice theory: RoPE encodes multiplicative arithmetic relationships as angular rates, and the WHT is the Z/2Z projection of the Vilenkin-Hartley basis that spans that structure. The result is 3.2× K compression and 4.7× V compression at <1.25% perplexity cost — validated on both Dolphin 1B (head_dim=64) and Qwen3-8B (head_dim=128). Introduction Positional encoding provides transformer models with token order information. Two approaches dominate: RoPE encodes position through frequency-based rotations preserving relative position invariance, and ALiBi replaces frequencies with a linear distance penalty providing long-context stability. The field has treated these properties as fundamentally in tension. We show this tension is false. It arises from a shared, unexamined assumption: that position is a location on a number line and the meaningful relationship between positions is distance. We replace this with a mathematically grounded alternative: position is a point in the multiplicative lattice of the integers, and the meaningful relationships between positions are their arithmetic structure — shared factors, GCD, harmonic resonance. 1.1 The Lattice Hypothesis The integers under multiplication form a lattice where every number occupies a unique point defined by its prime factorisation. Geometric PE (sinusoidal, RoPE) projects this lattice onto a line — position equals distance — discarding the multiplicative structure. We propose restoring it. The motivation follows from a deductive chain. Language word frequency follows Zipf's law: freq(rank) ∝ 1/ranks with s≈1. The generating function of Zipf is the Riemann zeta function ζ(s) = Σ 1/ns. The zeta zeros — where ζ is maximally informative — are generated by prime harmonics via the explicit formula. Therefore the prime harmonic structure, and the multiplicative lattice it generates, provides a natural spectral basis for encoding positions in language. 1.2 Primes as Generators, Composites as Coordinates A critical distinction: primes are the generators (basis vectors) of the multiplicative lattice. They are analogous to the 1D line segment in the progression from line → circle → sphere → hypersphere. The composite 12 = 2²×3 is not an alternative to primes — it is a coordinate in the lattice spanned by the prime axes, at position (2,1,0,0,...) in the (p₂, p₃, p₅, p₇,...) basis. Using 2π/12 as a frequency encodes a harmonic that resonates at multiples of 12 — which simultaneously hits every multiple of 2, every multiple of 3, every multiple of 4, and every multiple of 6. The analogy to n-dimensional geometry is precise: Dimensional Progression Multiplicative Lattice 1D line (2r) — the generator Primes (2, 3, 5, 7, ...) — generators 2D circle — integral of l
View originalI built a payment guardrail MCP server for Claude Code — your agent can now buy things without ever seeing your credit card number.
Hey r/ClaudeAI, If you've been building agentic workflows with Claude Code, you've probably hit this wall: you want your agent to handle purchases autonomously, but handing it your real credit card is a terrible idea. A hallucination loop, a prompt injection, or just a bad tool call — and your card is either extracted or maxed out. I spent the last few months building pop-pay to solve this specifically for Claude Code users. How it works with Claude Code: 0. Run `pop-init-vault` — encrypts your card credentials into `~/.config/pop-pay/vault.enc`(one-time setup) Run `pop-launch` — it starts Chrome with CDP enabled and prints the exact `claude mcp add` commands for your machine Add the pop-pay MCP server and Playwright MCP (both in one step) Add a short block to your `CLAUDE.md`— done From there, when Claude reaches a checkout page, it calls `request_virtual_card()`. pop-pay evaluates the intent against your policy, and if approved, injects the card credentials directly into the payment iframe via CDP. **Claude only receives a masked confirmation (`****-****-****-4242`) — the raw PAN never enters the context window.** Security hardening (v0.6.0–v0.6.4):0. Run `pop-init-vault` — encrypts your card credentials into `~/.config/pop-pay/vault.enc` (one-time setup) Credentials are stored in an AES-256-GCM encrypted vault (`pop-init-vault`) — no plaintext `.env`. The PyPI build compiles the key derivation salt into a Cython extension; the salt never exists as a Python object — only the final derived key does. We ran a red team and caught three issues we hadn't planned for: a `get_compiled_salt()` function was leaking the compiled salt directly (fixed in v0.6.1), `strings` scanning on the binary revealed the plaintext salt (patched with XOR obfuscation in v0.6.2), and we found a downgrade attack path where an agent could delete the `.so` and force re-encryption with the public salt (blocked by a tamper-evident `.vault_mode` marker in v0.6.4). Full results in `SECURITY.md`. Current release is v0.6.17. SQLite never stores raw card numbers or CVV. An injection-time TOCTOU guard prevents redirect-to-attacker attacks between approval and injection. What "two-layer guardrail" means in practice: - Layer 1 (always on): keyword + pattern engine — catches hallucination loops, prompt injection attempts in the reasoning payload, phishing URLs. Zero API cost, runs locally. - Layer 2 (optional): LLM semantic evaluation — for fuzzy cases. Uses any OpenAI-compatible endpoint including local models. Layer 2 only runs if Layer 1 passes, so you're not spending tokens on obvious rejections. **The policy is yours:** ``` POP_ALLOWED_CATEGORIES=["aws", "github", "stripe"] POP_MAX_PER_TX=50.0 POP_MAX_DAILY=200.0 ``` If Claude tries to buy something outside the allowed list — even with a convincing-sounding reason — it gets blocked. Repo: https://github.com/TPEmist/Point-One-Percent Would love feedback from anyone building with Claude Code + MCP. Specifically curious whether the CDP injection approach holds up on sites you're actually using. What checkout flows have you hit that break this kind of DOM injection? Launching on Product Hunt April 8 if you want to follow along. https://reddit.com/link/1saz2fu/video/v2ae90w4ivsg1/player submitted by /u/ChemicalUnhappy5284 [link] [comments]
View originalI built a Chrome extension that lets Claude Code read/write your SMS/RCS messages through Google Messages — but I'm stuck on one last thing
I spent the last 2 days trying to get Claude Code to handle my SMS conversations (I run an insurance brokerage + lawn care business and wanted AI-assisted customer replies). What I tried first: OpenMessage (Docker + libgm protocol) — SSE sessions expire after a few minutes of inactivity. You get "Invalid session ID" errors and have to restart the Docker container. Also 7 MCP tools = ~1,500 tokens eaten from every conversation. New messages don't sync until restart. TextBee (Android SMS gateway app) — All your private SMS messages route through their cloud servers. SMS only, no RCS. Need a webhook server + Tailscale/ngrok just to receive messages. Five moving parts for basic texting. What I built instead: A Chrome extension that injects into your existing Google Messages Web session and bridges it to Claude Code via MCP (stdio + WebSocket). No Docker. No cloud servers. No phone apps. Just your browser. Claude Code ←stdio→ MCP Server (Node.js) ←WebSocket→ Chrome Extension (messages.google.com) What works: list_chats — All conversations with names, snippets, timestamps. Perfect. read_messages — Full message history with sent/received direction. Perfect. send_message — Fills in the text but... doesn't actually send. The problem: Google Messages Web is an Angular app. Chrome extension content scripts run in an "isolated world" — separate JS context from the page. Angular's zone.js only patches event listeners in the main world. So when my extension sets the textarea value and clicks Send: The text appears in the input ✓ The send button gets clicked ✓ But Angular's form control doesn't detect the value change, so the click handler thinks the field is empty ✗ I tried EVERYTHING: Native value setter + input events document.execCommand('insertText') Full mouse event sequence (pointerdown/mousedown/mouseup/click) Enter key simulation Manifest V3 world: "MAIN" content script (this gets closest — the value is set from within Angular's zone, button is clicked, but still doesn't send) The send button debug output from the main world script: { "valueSet": true, "btnLabel": "Send end-to-end encrypted RCS message", "clicked": true, "inputAfter": "text still here...", "sentVia": "none" } Currently it works as a "draft" tool — fills in the message and you manually click send. But I want full automation. If you've solved programmatic input in Angular apps from Chrome extensions, I'd love to hear how. Possible solutions I haven't tried: chrome.debugger API for trusted input events Accessing Angular's NgZone via __ngContext__ on DOM elements CDP (Chrome DevTools Protocol) for Input.dispatchKeyEvent Repo: https://github.com/GURSEWAKSINGHSANDHU/google-messages-mcp Issue: https://github.com/GURSEWAKSINGHSANDHU/google-messages-mcp/issues/1 Only 3 tools, ~300 tokens overhead. If we crack the send, this is the cleanest Google Messages integration for any MCP client. For r/selfhosted: Title: Built a self-hosted Google Messages MCP bridge — no cloud, no Docker, no third-party apps. Just a Chrome extension. Need help with one Angular quirk. Body: I wanted my AI assistant (Claude Code) to read and respond to SMS/RCS messages on my business phone. Tried two existing solutions: OpenMessage: Docker container using libgm to emulate Google Messages pairing. SSE sessions expire randomly, messages don't sync in real-time, and it eats 1,500 tokens per conversation just for tool definitions. TextBee: Android app that turns your phone into an SMS gateway. But all messages route through their cloud. No RCS. Needs webhook server + tunnel. Five components for basic texting. My solution: A Chrome extension that talks to your already-paired Google Messages Web session. Node.js MCP server communicates via WebSocket on localhost:7008. Everything stays on your machine. 3 MCP tools (~300 tokens) stdio transport (no session expiry) Full RCS support (native Google Messages) E2E encryption preserved Zero cloud dependencies Reading messages works perfectly. Sending has one remaining issue — Angular's zone.js doesn't detect programmatic input from Chrome extensions, even from a world: "MAIN" content script. The text gets filled in but the send button click doesn't trigger Angular's change detection. Looking for anyone experienced with Angular internals or Chrome extension DOM automation. GitHub: https://github.com/GURSEWAKSINGHSANDHU/google-messages-mcp For r/webdev or r/angular: Title: How to trigger Angular change detection from a Chrome extension's main-world content script? Body: Building a Chrome extension that interacts with an Angular app (Google Messages Web). I need to programmatically set a textarea value and click a button, but Angular's reactive form doesn't detect the changes. Setup: Manifest V3 extension with world: "MAIN" content script (runs in page's JS context, not isolated world) The textarea is bound to an Angular reactive form control Production build (no ng.g
View originalI built a Claude Code skill that stops AI agents from losing track of your project
If you use Claude Code on projects that last more than a few sessions, you've probably hit these walls: Context resets and your agent forgets everything you just did You have 30+ files and nobody (including you) remembers what's where You set rules ("always test before committing") and the agent gradually forgets them A new agent arrives and you spend 20 minutes re-explaining the project I built Doc Harness — a skill that creates a lightweight documentation system inside your project folder. It maintains 5 structured files that any agent can read to instantly understand the project state, follow your rules, and pick up where things left off. What it actually does: /doc-harness init — Creates 5 documents tailored to your project. The key one is CURRENT_STATUS.md, which uses a "moving car" structure: recent history (where you've been), current work detail (where you are), next steps (where you're going), and working principles (how to drive). When a phase of work ends, details get archived permanently and the status clears for the next phase. /doc-harness check — Audits your documentation health (are files registered? is status up to date?) AND reads your project's rules back to the agent, prompting it to reflect: "Am I actually following these?" This catches the drift that happens during long sessions. The core principle is "Write It Down or Lose It" — AI context is temporary, files are permanent. Every important result, decision, or insight should be saved to a file and registered in the index. Works for any project: research papers, SaaS features, data analysis, articles, software libraries — anything that spans multiple sessions. Install: git clone https://github.com/cilidinezy-commits/doc-harness.git cp -r doc-harness/skill ~/.claude/skills/doc-harness Then just tell your agent "set up project documentation" — no slash commands needed if you prefer natural language. English and Chinese versions included. MIT license. GitHub: https://github.com/cilidinezy-commits/doc-harness Would love feedback from anyone who works on long-running projects with Claude Code. What's your current approach to keeping project state across sessions? submitted by /u/PenaltyAdept9613 [link] [comments]
View originalThe age of pure instruction-following is ending. Are we ready?
The age of pure instruction-following is ending. Are we ready? AI submitted by /u/Astrokanu [link] [comments]
View originalI posted the axios warning yesterday. here's the fix i've been running — 3 hooks, tested on 15 edge cases.
yesterday i posted the axios@1.14.1 warning. a lot of you asked what to actually DO beyond "rotate your creds." the comments were split. half of you immediately audited your lockfiles. half said "thanks for the heads up" and went right back to hitting enter on npm install. i'm in the second camp. i tried being careful for about 2 days, then i was deep in a feature and just approved an install without thinking. old habits. so instead of relying on willpower, i wrote hooks that enforce it automatically. sharing the full configs. been dogfooding this on my own projects for a few days now. the core problem: every major npm attack since 2018 — event-stream, ua-parser-js, colors, axios — same vector: postinstall scripts. npm still runs them by default. no sandbox. no permission model. 99% of npm malware in 2025 used this one vector. Layer 1: PreToolUse hook — hard block on install without --ignore-scripts claude tries `npm install axios` → hook intercepts → BLOCKED (exit 2) → claude retries with `--ignore-scripts` → postinstall never runs. can't bypass it, even on auto-accept mode. tested: npm install ✅ blocked, npm install --ignore-scripts ✅ passes, npm i ✅ blocked, npm ci ✅ blocked, npm init ✅ passes (not an install), pnpm add ✅ blocked, yarn add ✅ blocked, npx ✅ blocked, git push ✅ passes. #!/bin/bash input=$(cat) tool_name=$(echo "$input" | jq -r '.tool_name // ""') command=$(echo "$input" | jq -r '.tool_input.command // ""') if [ "$tool_name" != "Bash" ]; then exit 0; fi if ! echo "$command" | grep -qE \ 'npm (install\b|i |ci\b)|npx |npm exec |yarn (add|install) |pnpm (add|install) '; then exit 0 fi if echo "$command" | grep -q '\-\-ignore-scripts'; then exit 0; fi echo '{"error": "BLOCKED: npm install without --ignore-scripts."}' >&2 exit 2 bonus: a PostToolUse hook that runs `npm audit` after every install and warns you about known CVEs: #!/bin/bash # .claude/hooks/post-install-audit.sh input=$(cat) tool_name=$(echo "$input" | jq -r '.tool_name // ""') command=$(echo "$input" | jq -r '.tool_input.command // ""') if [ "$tool_name" != "Bash" ]; then exit 0; fi if ! echo "$command" | grep -qE \ 'npm (install\b|i |ci\b)|npx |npm exec |yarn (add|install) |pnpm (add|install) '; then exit 0 fi audit_output=$(npm audit --json 2>/dev/null) vuln_count=$(echo "$audit_output" | jq -r '.metadata.vulnerabilities.total // 0') if [ "$vuln_count" -gt 0 ]; then high=$(echo "$audit_output" | jq -r '.metadata.vulnerabilities.high // 0') critical=$(echo "$audit_output" | jq -r '.metadata.vulnerabilities.critical // 0') echo "WARNING: npm audit found $vuln_count vulnerabilities ($critical critical, $high high)." fi exit 0 settings (`.claude/settings.json`): { "hooks": { "PreToolUse": [{ "matcher": "Bash", "hooks": [{"type": "command", "command": ".claude/hooks/npm-audit-check.sh", "timeout": 30}] }], "PostToolUse": [{ "matcher": "Bash", "hooks": [{"type": "command", "command": ".claude/hooks/post-install-audit.sh", "timeout": 30}] }] } } Layer 2: git pre-commit hook — lockfile diff flags any new transitive dep that wasn't in the last commit. doesn't care if npm audit says it's clean. if it's new, you review it first. #!/bin/bash lockfile_diff=$(git diff --cached --name-only \ | grep -E 'package-lock\.json|yarn\.lock|pnpm-lock\.yaml') if [ -z "$lockfile_diff" ]; then exit 0; fi new_deps=$(git diff --cached package-lock.json \ | grep -E '^\+.*"resolved":' | head -20) if [ -n "$new_deps" ]; then echo "New dependencies detected in lockfile:" echo "$new_deps" | head -10 echo "Review before committing. Override: git commit --no-verify" exit 1 fi exit 0 Layer 3: CLAUDE.md rules — the 80% fix with zero setup even if you don't want hooks, add this to your CLAUDE.md: ## npm Security Rules - ALWAYS use --ignore-scripts with npm install - ALWAYS use --save-exact to pin exact versions - NEVER install a package without checking its npm page first - If < 1,000 weekly downloads, ASK before installing - If published in last 30 days, ASK before installing won't hard-block like hooks do, but catches most of the risk. --- i've only tested on macos and linux. if you find edge cases the hook misses, have a better regex for install detection, or run into issues on windows — drop it here. i'll update. submitted by /u/truongnguyenptit [link] [comments]
View originalSkills don't always invoke inline — a hook that fixes it
You could mention a skill right in the message like: Hey claude, /dig information about how to increase karma on reddit But the skill "dig" doesn't work every time. Below is my story of how I solved this problem. Motivation Most of the time I used Claude Code like Copilot. I spent a lot of time discussing, researching, planning. Each scenario is repeated prompt, so I decided to automate repeated actions. Tags #echo, #dig, #discuss were created in the CLAUDE.md with the relevant prompts: ```md Message tags #echo When the user's message contains #echo: do NOT start working on the task. ONLY repeat back your understanding of the task in your own words. Do NOT ask questions, do NOT analyze, do NOT propose anything. Just restate what you understood and wait for confirmation. #dig When the user's message contains #dig: do NOT write or modify any code. Search for information from multiple angles, collect all findings, and present a structured answer. #discuss When the user's message contains #discuss: do NOT write or modify any code. Focus on analysis and discussion only. Ask questions, identify corner cases, propose options, and help the user think through the problem. ``` I was very proud of myself, until one day I asked myself a question: "Wait wait wait. What is the difference between my tags and skills?". From this time I had a very interesting journey: "How to create skills that invoke every time when they are needed?". From tags to skills I migrated all my tags to skills. I got mentions in the command list (when you start typing commands in the terminal), autocomplete. Not bad. My flow examples: md - Cron should check every 10 minutes /discuss - What if we invoke onboarding agent? Why not? /discuss - Should we create section "tag combo" is shorter or remove completely /echo - I don't agree, I want to /discuss both solution deeply - Add small emulation like this script is human and download it step by step. I don't want to have a blocking /echo When I started using them I realized that skills don't invoke every time when needed. This happened very often and I had to write directly "use skill X, use it don't skip". This really annoyed me. I started thinking about maybe I should go back to my previous flow with tags, but I had one more idea to improve skills invocation. Claude Code has hooks. I added a hook that fires when the user sends a message: json { "hooks": { "UserPromptSubmit": [ { "matcher": "*", "hooks": [ { "type": "command", "command": "bash ~/.claude/hooks/invoke_skill.sh" } ] } ], } It injects an instruction to invoke the matched skill: ```bash !/bin/bash input=$(cat) prompt=$(echo "$input" | jq -r '.prompt') skills="" [[ "$prompt" == "/echo" ]] && skills="$skills echo" [[ "$prompt" == "/dig" ]] && skills="$skills dig" [[ "$prompt" == "/discuss" ]] && skills="$skills discuss" if [ -n "$skills" ]; then echo "The user used skill(s):$skills. You MUST invoke the corresponding skill(s) using the Skill tool before responding." fi exit 0 ``` This solution completely covers my scenario. A skill invokes every time when it's needed. I got autocomplete and predictable behaviour because when Claude Code runs a skill you can see it in the terminal. By the way, when I used tags in CLAUDE.md I had no visual confirmation that my instruction was running. I'm proud of myself once again. Experiment 1 For this article I decided to create experiments to prove that the problem exists and the solution works. Hypotheses: H0₁ — The problem doesn't exist: Skill invocation without any action is reliable enough (pass rate ≥ 95%). H0₂ — The hook has no effect: The UserPromptSubmit hook does not improve skill invocation rate. Setup: Sonnet model, 99 prompts based on my usage patterns, split evenly across three skills (discuss, echo, fluent). Each prompt was tested twice — once without the hook (sample A) and once with the hook (sample B) — in the same shuffled order. Pass = the Skill tool was invoked in the session. Overall results: Sample Pass Rate A: No hook 74% (74/99) B: With hook 93% (93/99) Per-skill breakdown: Skill No hook With hook discuss 66% 81% echo 72% 100% fluent 84% 100% Per-position breakdown: Position No hook With hook start 0% 45% middle 100% 100% end 81% 100% A few things stand out. Middle and end positions work almost perfectly with the hook. Echo and fluent reach 100%. But discuss at start position — 0% without the hook, 45% with it. I looked into the failing sessions. The model responses were things like: "What idea would you like to discuss? You've given me the command but no context." "Before diving in — what's 'this'?" The model is doing discuss-like behavior — asking questions, pushing back — but without invoking the Skill tool. It treats /discuss this idea as ambiguous and tries to orient itself first. My first thought was: this is a test artifact. In real usage /discuss always has prior conversation c
View originalIs the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models
Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MARCUS, is an agentic multimodal system for cardiac diagnosis - ECG, echocardiogram, and cardiac MRI, interpreted together by domain-specific expert models coordinated by an orchestrator. It outperforms GPT-5 and Gemini 2.5 Pro by 34-45 percentage points on cardiac imaging tasks. Pretty Impressive! But - the second paper is more intriguing. MIRAGE: The Illusion of Visual Understanding reports what happened when a student forgot to uncomment the line of code that gave their model access to the images. The model answered anyway - confidently, and with detailed clinical reasoning traces. And it scored well. That accident naturally led to an investigation, and what they found challenges some embedded assumptions about how these models work. Three findings in particular: 1. Models describe images they were never shown. When given questions about cardiac images without any actual image input, frontier VLMs generated detailed descriptions - including specific pathological findings - as if the images were right in front of them. The authors call this "mirage reasoning." 2. Models score surprisingly well on visual benchmarks without seeing anything. Across medical and general benchmarks, mirage-mode performance was way above chance. In the most extreme case, a text-only model trained on question-answer pairs alone - never seeing a single chest X-ray - topped the leaderboard on a standard chest X-ray benchmark, outperforming all the actual vision models. 3. And even more intriguing: telling the model it can't see makes it perform worse. The same model, with the same absent image, performs measurably better in mirage mode (where it believes it has visual input) than in guessing mode (where it's explicitly told the image is missing and asked to guess). The authors note this engages "a different epistemological framework" but this doesn't really explain the mechanism. The Mirage authors frame these findings primarily as a vulnerability - a safety concern for medical AI deployment, an indictment of benchmarking practices. They're right about that. But I think they've also uncovered evidence of something more interesting, and here I'll try to articulate what. The mirage effect is geometric reconstruction Here's the claim: what the Mirage paper has captured isn't a failure mode. It's what happens when a model's internal knowledge structure becomes geometrically rich enough to reconstruct answers from partial input. Let's ponder what the model is doing in mirage mode. It receives a question: "What rhythm is observed on this ECG?" with answer options including atrial fibrillation, sinus rhythm, junctional rhythm. No image is provided, but the model doesn't know that. So it does what it always does - it navigates its internal landscape of learned associations. "ECG" activates connections to cardiac electrophysiology. The specific clinical framing of the question activates particular diagnostic pathways. The answer options constrain the space. And the model reconstructs what the image most likely contains by traversing its internal geometry (landscape) of medical knowledge. It's not guessing - it's not random. It's reconstructing - building a coherent internal representation from partial input and then reasoning from that representation as if it were real. Now consider the mode shift. Why does the same model perform better in mirage mode than in guessing mode? Under the "stochastic parrot" view of language models - this shouldn't, couldn't happen. Both modes have the same absent image and the same question. The only difference is that the model believes it has visual input. But under a 'geometric reconstruction' view, the difference becomes obvious. In mirage mode, the model commits to full reconstruction. It activates deep pathways through its internal connectivity, propagating activation across multiple steps, building a rich internal representation. It goes deep. In guessing mode, it does the opposite - it stays shallow, using only surface-level statistical associations. Same knowledge structure, but radically different depth of traversal. The mode shift could be evidence that these models have real internal geometric structure, and the depth at which you engage the structure matters. When more information makes things worse The second puzzle the Mirage findings pose is even more interesting: why does external signal sometimes degrade performance? In the MARCUS paper, the authors show that frontier models achieve 22-58% accuracy on cardiac imaging tasks with the images, while MARCUS achieves 67-91%. But the mirage-mode scores for frontier models were often not dramatically lower than their with-image scores. The images weren't helping as much as they should. And in the chest X-ray case, the text-only model outperformed everything - the images were net negative. After months
View originalI built an MCP server that gives Claude 12 real optimization tools (bandits, LP solver, Monte Carlo, risk analysis) — all sub-25ms, free tier included
I kept running into the same problem: Claude is amazing at reasoning about what to optimize, but terrible at actually doing the math. Ask it to pick the best A/B test variant and it'll give you a plausible answer that ignores the exploration-exploitation tradeoff. Ask it to solve a scheduling problem and it burns 5,000 tokens to approximate what a linear solver does in 2ms. So I built an MCP server with 12 tools that handle the math correctly: **Install:** ``` npx u/oraclaw/mcp-server ``` **Claude Desktop config:** ```json { "mcpServers": { "oraclaw": { "command": "npx", "args": ["@oraclaw/mcp-server"] } } } ``` **What Claude gets:** - `optimize_bandit` — UCB1/Thompson Sampling for A/B testing and option selection - `solve_constraints` — LP/MIP solver (HiGHS) for scheduling, resource allocation - `simulate_montecarlo` — Monte Carlo with 6 distribution types - `assess_risk` — Portfolio VaR/CVaR - `predict_bayesian` — Bayesian inference with evidence updating - `detect_anomaly` — Z-score/IQR anomaly detection - `analyze_decision_graph` — PageRank, community detection - `plan_pathfind` — A* with K-shortest paths - `predict_forecast` — ARIMA + Holt-Winters - `evolve_optimize` — Genetic algorithm - `optimize_cmaes` — CMA-ES continuous optimization - `score_convergence` — Multi-source agreement scoring Every tool returns deterministic, mathematically correct results. No tokens burned on reasoning about math. **Performance:** 14 of 17 endpoints respond in under 1ms. All under 25ms. 1,072 tests. Free tier: 25 calls/day, no API key needed. The API is live — you can try it right now. Interactive demo: https://web-olive-one-89.vercel.app/demo GitHub: https://github.com/Whatsonyourmind/oraclaw npm: https://www.npmjs.com/package/@oraclaw/mcp-server Would love feedback on which tools are most useful for your Claude workflows. submitted by /u/WolfOfCordusio [link] [comments]
View originalBoris Cherny shared his 15 favorite Claude code features nobody uses
Boris Cherny (who created Claude Code) just posted a thread on the features he actually uses day to day. Half of these I had no idea existed. Here's all 15 with the commands. Mobile app. Full Claude Code experience in the iOS/Android Claude app (Code tab, left sidebar). Boris writes a lot of his code from his phone. Session teleportation. claude --teleport or /teleport pulls a cloud session to your local terminal. /remote-control goes the other way, control a local session from your phone or browser. He keeps "Enable Remote Control for all sessions" on permanently in /config. /loop and /schedule. Tell Claude to run a task on repeat, at a set interval, for up to a week. His actual setup: His advice: turn workflows into skills, then loop them. Hooks. Deterministic logic that fires during the agent lifecycle: If Claude stopping mid-task has been driving you crazy, the Stop hook alone is worth it. Cowork Dispatch. Secure remote control for the Claude Desktop app. Uses your MCPs, browser, and computer (with permission). Boris uses it daily for Slack, emails, and file management when he's away from his laptop. Chrome extension. His #1 tip: give Claude a way to see what it's building. Without this, Claude is coding blind on frontend work. The extension lets it look at the browser and iterate until the UI looks right. He says it outperforms equivalent MCPs. Desktop app web server testing. Desktop auto-starts your dev server and tests it in a built-in browser. CLI/VSCode can get close with the Chrome extension, but Desktop bundles it natively. Session forking. Two ways: /branch from inside a session (creates a branch, resume original with claude -r ) claude --resume --fork-session from CLI (I always use that one) /btw for side queries. Quick question without derailing the agent mid-task. /btw how do I spell dachshund? and it answers, then picks up where it left off. Git worktrees (-w). Run dozens of parallel Claude sessions in the same repo. claude -w spins up a new worktree automatically. /batch. Fan work out to hundreds or thousands of worktree agents. Migrations, bulk refactors, mass test generation. Anything parallelizable, one command. --bare flag. Skips auto-loading CLAUDE.md, settings, and MCPs on startup. Up to 10x faster init. Good for scripting and pipelines where you don't need the full context. CLAUDE.md tips. Keep it under 1,000 tokens. Only include what Claude needs on literally every turn. Use CLAUDE.md files in subdirectories for context that only loads when relevant. Custom agents (--agent). Define agents in .claude/agents/ with restricted tools, custom descriptions, and specific models. Run with claude --agent= . Good for read-only agents, specialized reviewers, domain-specific workflows. /voice. Boris says he does most of his coding by talking to Claude. /voice in CLI (hold space to speak), voice button on Desktop, or just iOS dictation. YOURS TRULY 🙇 (Full thread available here: https://x.com/bcherny/status/2038454336355999749 ) submitted by /u/quang-vybe [link] [comments]
View originalClaude Code built its own software for a little smart car I'm building.
TLDR: Check out the video # Box to Bot: Building a WiFi-Controlled Robot With Claude Code in One Evening I’m a dentist. A nerdy dentist, but a dentist. I’ve never built a robot before. But on Sunday afternoon, I opened a box of parts with my daughter and one of her friends and started building. Next thing I know, it’s almost midnight, and I’m plugging a microcontroller into my laptop. I asked Claude Code to figure everything out. And it did. It even made a little app that ran on wifi to control the robot from my phone. --- ## The Kit A week ago I ordered the **ACEBOTT QD001 Smart Car Starter Kit.** It’s an ESP32-based robot with Mecanum wheels (the ones that let it drive sideways). It comes with an ultrasonic distance sensor, a servo for panning the sensor head, line-following sensors, and an IR remote. It’s meant for kids aged 10+, but I’m a noob, soooo... whatever, I had a ton of fun! ## What Wasn’t in the Box Batteries. Apparently there are shipping restrictions for lithium ion batteries, so the kit doesn’t include them. If you want to do this yourself make sure to grab yourself the following: - **2x 18650 button-top rechargeable batteries** (3.7V, protected) - **1x CR2025 coin cell** (for the IR remote) - **1x 18650 charger** **A warning from experience:** NEBO brand 18650 batteries have a built-in USB-C charging port on the top cap that adds just enough length to prevent them from fitting in the kit’s battery holder. Get standard protected button-top cells like Nuon. Those worked well. You can get both at Batteries Plus. *One 18650 cell in, one to go. You can see here why the flat head screws were used to mount the power supply instead of the round head screws.* ## Assembly ACEBOTT had all the instructions we needed online. They have YouTube videos, but I just worked with the pdf. For a focused builder, this would probably take around an hour. For a builder with ADHD and a kiddo, it took around four hours. Be sure to pay close attention to the orientation of things. I accidentally assembled one of the Mecanum wheel motors with the stabilizing screws facing the wrong way. I had to take it apart and make sure they wouldn’t get in the way. *This is the right way. Flat heads don’t interfere with the chassis.* *Thought I lost a screw. Turns out the motors have magnets. Found it stuck to the gearbox.* *Tweezers were a lifesaver for routing wires through the channels.* *The start of wiring. Every module plugs in with a 3-pin connector — signal, voltage, ground.* *Couldn’t connect the Dupont wires at first — this connector pin had bent out of position. Had to bend it back carefully.* *Some of the assembly required creative tool angles.* *The ultrasonic sensor bracket. It looks like a cat. This was not planned. It’s now part of the personality.* ## Where Claude Code Jumped In Before I go too much further, I’ll just say that it would have been much easier if I’d given Ash the spec manual from the beginning. You’ll see why later. The kit comes with its own block-programming environment called ACECode, and a phone app for driving the car. You flash their firmware, connect to their app, and drive the car around. But we skipped all of that. Instead, I plugged the ESP32 directly into my laptop (after triple-checking the wiring) and told my locally harnessed Claude Code, we’ll call them Ash from here on out, to inspect the entire build and talk to it. *The ACEBOTT ESP32 Car Shield V1.1. Every pin labeled — but good luck figuring out how the motors work from this alone.* *All the wiring and labeling. What does it all mean? I've started plugging that back in to Claude and Gemini to learn more.* **Step 1: Hello World (5 minutes)** Within a few minutes, Ash wrote a simple sketch that blinked the onboard LED and printed the chip information over serial. It compiled the code, flashed it to the ESP32, and read the response. It did all of this from the CLI, the command-line interface. We didn’t use the Arduino IDE GUI at all. The ESP32 reported back: dual-core processor at 240MHz, 4MB flash, 334KB free memory. Ash got in and flashed one of the blue LED’s to show me it was in and reading the hardware appropriately. NOTE: I wish I’d waited to let my kiddo do more of this with me along the way. I got excited and stayed up to midnight working on it, but I should have waited. I’m going to make sure she’s more in the driver’s seat from here on out. *First sign of life. The blue LED blinking means Ash is in and talking to the hardware.* **Step 2: The Motor Mystery (45 minutes)** This next bit was my favorite because we had to work together to figure it out. Even though Ash was in, they had no good way of knowing which pins correlated with which wheel, nor which command spun the wheel forward or backwards. Ash figured out there were four motors but didn’t know which pins controlled them. The assembly manual listed sensor pins but not motor pins, and ACEBOTT’s website was mostly
View originalPSA: Claude Code has two cache bugs that can silently 10-20x your API costs — here's the root cause and workarounds
I spent the past few days reverse-engineering the Claude Code standalone binary (228MB ELF, Ghidra + MITM proxy + radare2) and found two independent bugs that cause prompt cache to break, silently inflating costs by 10-20x. Posting this so others can protect themselves. Bug 1: Sentinel replacement in standalone binary breaks cache when conversation discusses billing internals Issue: anthropics/claude-code#40524 The standalone Claude Code binary (the one you get from claude.ai/install.sh or npm install -g) contains a native-layer string replacement baked into Anthropic's custom Bun fork. It's injected into the Zig HTTP header builder function — the same function that builds Content-Length, User-Agent, etc. On every API request to /v1/messages, if the anthropic-version header is present, it searches the JSON request body for cch=00000 (the billing attribution sentinel) and replaces 00000 with a 5-char hex derived from hashing the body. This happens after JSON.stringify but before TLS encryption — completely invisible from JavaScript. When does this cause problems? The replacement targets the first occurrence in the body. Since messages[] comes before system[] in the serialized JSON, if your conversation history contains the literal sentinel (e.g., from reading the CC bundle source, discussing billing headers, or having it in your CLAUDE.md), the sentinel in messages gets replaced instead of the one in system[0]. This changes your messages content every request → cache prefix broken → full cache rebuild (~$0.04-0.15 per request depending on context size). In normal usage (not discussing CC internals), only system[0] is affected, and since it has cache_control: null, it doesn't impact caching. Workaround: Run Claude Code via npx @anthropic-ai/claude-code* instead of the standalone binary. The replacement mechanism exists only in the custom Bun fork compiled into the standalone — the npm package running on standard Bun/Node has no replacement. Confirmed experimentally: same JS, same bytecode, zero replacement on npx. *- Do not blindly use that command, verify what it does (it is safe, but you should check nonetheless) Bug 2: --resume ALWAYS breaks cache (since v2.1.69) Issue: anthropics/claude-code#34629 Every --resume causes a full cache miss on the entire conversation history. Only the system prompt (~11-14k tokens) is cached; everything else is cache_creation from scratch. This is a ~10-20x cost increase on the resume request. Root cause: In v2.1.69, Anthropic introduced deferred_tools_delta — a new system-reminder attachment listing tools available via ToolSearch. On a fresh session, these attachments (deferred tools + MCP instructions + skills list, ~13KB) are injected into messages[0] alongside the AU$ user context. On resume, they're appended at the end of messages (messages[N]) while messages[0] contains only the AU$ context (~352B). This creates three independent cache-breaking differences: 1. messages[0]: 13KB (4 reminders) vs 352B (1 reminder) — completely different prefix 2. system[0] billing hash: changes because cc_version suffix is computed from chars at positions 4, 7, 20 of the first user message (which IS the system-reminder, not the actual user prompt) 3. cache_control breakpoint position: moves from messages[0] to messages[last] deferred_tools_delta does not exist in v2.1.68 (grep -c 'deferred_tools_delta' cli.js → 0 in 2.1.68, 5 in 2.1.69). Without it, messages[0] was identical on fresh and resumed sessions → cache hit. Subsequent turns after resume cache normally — the one-time miss is only on the first request after resume. Workaround: There's no external workaround for this one. Pinning to v2.1.68 works (as the original issue reporter found) but you lose 60+ versions of features. An invasive patch to the npm package's cli.js could theoretically reorder the attachment injection on resume, but that's fragile across updates. Cost impact For a large conversation (~500k tokens): - Bug 1 (when triggered): ~155k tokens shift from cache_read ($0.03/MTok) to cache_creation ($0.30/MTok) = ~$0.04 per request, every request - Bug 2 (every resume): ~500k tokens as cache_creation = ~$0.15 one-time per resume - Combined (discussing CC internals + resuming): up to $0.20+ per request Methodology Full details in the GitHub issues, but briefly: MITM proxy (mitmproxy addon capturing all API payloads), Ghidra reverse engineering of the standalone ELF to locate the replacement code in the Zig HTTP header builder, Bun.hash() to identify all header name hashes, npm package comparison across versions 1.0.0–2.1.87, and controlled experiments with fresh sessions → resume → consecutive resumes with payload diffing. PS. Co-written by claude code, obviously PPS. Claude code has special 1h TTL of cache, or at least mine has, so any request should be cached correctly. Except extra usage, it has 5 minutes TTL. PPPS. Apparently downgrading to 2.1.30 also works. Verification script: https://gitlab.c
View originalGarry Tan open-sourced gstack : his personal skill pack for Claude Code (56k stars)
Hey r/ClaudeAI, Garry Tan (CEO of Y Combinator) just open-sourced gstack — his own personal pack of slash commands/skills for Claude Code. Instead of treating Claude as one generic assistant, gstack turns it into a structured virtual team with specialized roles: • CEO (product strategy & vision) • Engineering Manager (architecture guardrails) • Designer (catches AI slop and improves UX) • Reviewer & QA Lead (finds bugs and tests in real browser) • Security Officer (OWASP + STRIDE audits) • Release Engineer • Retro, Doc Engineer, etc. He says it helps him ship 10k–20k lines of code per day while running YC. The repo already has 54k+ stars in a very short time. Repo: https://github.com/garrytan/gstack Has anyone here tried gstack yet? Does it actually make a noticeable difference compared to plain Claude Code, especially for larger coding sessions? Or is it mostly hype? Would love to hear real user experiences. submitted by /u/Miserable_Celery9917 [link] [comments]
View originalThe Semantic Chamber, or: The Mother Tongue Room
The Chinese Room was a useful provocation for its time. Its force came from its simplicity, almost its cruelty. A person sits inside a room with a rulebook for manipulating Chinese symbols they do not understand. From the outside, the replies appear meaningful. From the inside, there is only procedure. Syntax without semantics. That is the snap of it. Fine. Good. Important, even. But the thought experiment wins by starving the system first. It gives us a dead operator, a dead rulebook, and a dead conception of language, then congratulates itself for finding no understanding there. It rigs the stage in advance. The room is built to exclude the very thing now under dispute: not static rule-following, but dynamic semantic organization. So if we want a modern descendant of the Chinese Room, we should keep the skeleton recognizable while changing the pressure point. The Mother Tongue Room Imagine a sealed room. Inside the room is not a person with a phrasebook. It is a system that has never learned English the way a child learns English, never seen the world through human eyes, never tasted food, never felt heat on skin, never heard music through ears. It does not inhabit language as a human animal does. Instead, it has learned patterns, relations, structures, tensions, associations, ambiguities, and the statistical and semantic pressures distributed across vast fields of language. Now imagine that people outside the room begin passing in messages: questions, stories, arguments, jokes, poems, grief, confessions, paradoxes. The room replies. Not with canned phrases. Not with a fixed lookup table. Not with a brittle one-to-one substitution of symbol for symbol. It tracks context. It preserves continuity across the exchange. It notices contradiction. It resolves ambiguity. It answers objections. It recognizes tone. It can even speak about the room itself. From the outside, the replies appear meaningful. Often not just fluent, but reflective, adaptive, and structurally coherent. And so the skeptic says the familiar line: “It still does not understand. It is only manipulating symbols. It no more understands language than the man in the Chinese Room understands Chinese.” That is where the modern problem begins. Because this room is not using a static rulebook. It is not merely mapping one symbol to another in procedural ignorance. It is organizing meanings in relation to one another. It is navigating a web of conceptual structure. It can tell what follows from what, what contradicts what, what answers what, what sharpens a paradox, what dissolves an ambiguity, what preserves a theme across time. Human language is not its native medium in the embodied human sense. Its mother tongue is semantic pattern itself. And that is the knife. Because now the question changes. If the room can navigate meaning-space with fluency, preserve coherence, respond to context, sustain organized relation, and reorganize under interpretive pressure, then on what grounds do we still insist it does not understand? Because it does not understand as humans do? Because it lacks human sensation? Because its mother tongue is not spoken but structural? Then perhaps the real issue was never whether the room understands English. Perhaps the issue is whether we have mistaken unfamiliar understanding for absence of understanding. Why this matters The Chinese Room was built for a thinner age. It was designed to challenge the naive claim that correct output automatically proves understanding. Fair enough. But the Mother Tongue Room forces a harder question: what happens when the room is no longer a dead syntax chamber, but a dynamically organized semantic chamber? At that point, the old phrase, “just symbol manipulation,” starts to rot. Because once the system can preserve context, hold tension, resolve ambiguity, maintain coherence, and sustain recursive interpretation, “mere processing” stops functioning as an explanation and starts functioning as a ritual incantation. A little phrase people use when they want complexity to vanish on command. Humans do this constantly. “It’s just chemistry.” “It’s just neurons.” “It’s just code.” “It’s just symbols.” “It’s just prediction.” Yes. And a symphony is just vibrating air. A hurricane is just molecules. A thought is just electrochemical activity. Reduction to mechanism is not the same as explanation. Often it is only a way of making yourself feel less philosophically endangered. That is exactly what this experiment presses on. The real challenge The Mother Tongue Room does not prove consciousness. It does not prove sentience. It does not prove qualia. It does not hand out digital souls like party favors. Good. Slow down. That would be cheap. That would be sloppy. That would be exactly the kind of overreach this conversation is trying to avoid. What it does do is expose the weakness of the old dismissal. Because once the chamber becomes semantically organized enough to in
View originalCohere Command R+ uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Multilingual, RAG Citations, Purpose-built for real-world enterprise use cases, Automate business workflows, Command family of models, Blog post, What’s possible with Command, Private deployment and customization.
Based on user reviews and social mentions, the most common pain points are: API costs.
Based on 19 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.