The fastest AI copilot for JetBrains. Write code 10x faster with intelligent autocomplete and an AI agent.
I cannot provide a meaningful summary of user sentiment about "Sweep" based on the provided content. The social mentions you've shared appear to be unrelated to any software tool called "Sweep" - they discuss Netflix's impact on movies, media monopolies, and Slack's AI updates. Additionally, no actual user reviews were included in the reviews section. To provide an accurate summary of what users think about Sweep, I would need relevant reviews and social mentions that actually discuss this software tool.
Mentions (30d)
1
Reviews
0
Platforms
4
GitHub Stars
7,708
455 forks
I cannot provide a meaningful summary of user sentiment about "Sweep" based on the provided content. The social mentions you've shared appear to be unrelated to any software tool called "Sweep" - they discuss Netflix's impact on movies, media monopolies, and Slack's AI updates. Additionally, no actual user reviews were included in the reviews section. To provide an accurate summary of what users think about Sweep, I would need relevant reviews and social mentions that actually discuss this software tool.
Features
Industry
information technology & services
Employees
5
Funding Stage
Seed
Total Funding
$2.0M
545
GitHub followers
12
GitHub repos
7,708
GitHub stars
2
npm packages
6
HuggingFace models
Did Netflix Ruin Movies?
*Subscribe here: [Apple Podcasts](https://podcasts.apple.com/us/podcast/galaxy-brain/id1378618386) | [Spotify](https://open.spotify.com/show/542WHgdiDTJhEjn1Py4J7n) | [YouTube](https://youtu.be/A4922CILwM4)* Few companies have reshaped American culture as aggressively as Netflix. This week’s *Galaxy Brain* charts how we got here. Charlie Warzel talks with *Atlantic* film critic David Sims about Netflix’s strange, sweeping arc: from red DVD envelopes to a streaming colossus with 325 million subscribers. Sims explains how Hollywood initially shrugged off streaming as a novelty, only to watch Netflix reshape both distribution and the aesthetics and economics of entertainment itself. Together, they discuss the rise of binge culture, data-driven green-lighting, and the tension between prestige projects and “second screen” slop built for distracted viewers. The conversation also examines Netflix’s stance toward theaters, its aborted bid for Warner Bros. Discovery, and the deeper question haunting the industry: Has Netflix simply exploited technological inevitabilities—or has it rewired our expectations of what movies and television are supposed to be? *The following is a transcript of the episode:* > **David Sims:** When Hulu and HBO and all the other streamers start to crop up later in the game, it’s kind of like: You have Netflix, and then maybe you try another one. But you’re not gonna let go of Netflix. Netflix had just already won the war. **[*Music*]** **Charlie Warzel:** I’m Charlie Warzel, and this is *Galaxy Brain*, a show where today we’re going to talk about red DVD envelopes, the streaming wars, and the company that upended Hollywood. Awards season will wrap up soon this month with the Oscars, which means it’s a good time to talk about Hollywood. And you can’t talk about Hollywood without talking about Netflix. It’s difficult to imagine a company that’s had a greater impact on the entertainment industry over the last two decades. Since its founding in the late ’90s, Netflix has continued to do one thing over and over again: use technology and the internet to exploit convenience and wind its way into our lives. First it was a website that allowed you to pick your favorite DVDs to be shipped to you in the mail. Then it launched into streaming, original programming, a full movie studio. Now Netflix hosts live TV, award shows, sporting events—and is even a home for podcasts. The company has more than 325 million subscribers. Netflix’s story follows the classic tech-company arc. The platform didn’t just disrupt how people watched movies and TV; it changed the culture and the fabric of entertainment altogether. Netflix has influenced the way that many movies look, feel, and sound— even how they’re conceived of and green-lit. The company has had its hand in creating everything: from auto-play, second-screen-binge mode-algo-slop to prestige award-bait projects. All of Hollywood’s hopes and anxieties—the decline of theatergoing, the data-driven writers’ rooms, you name it—Netflix sits at the center of all of it. It’s a weird moment for the company. Back in December, Netflix made an offer to buy Warner Bros. Discovery in a deal worth approximately $82.7 billion. The purchase would have made Netflix arguably the world’s most powerful entertainment company. But Paramount Skydance, headed by David Ellison and backed in part by his father, the centibillionaire [co-]founder of Oracle, Larry Ellison fought the deal. Paramount Skydance submitted a revised offer to buy Warner at $111 billion. Netflix backed out of the deal last week. Some industry observers argued that Netflix dodged a bullet—or at least a lot of debt and regulatory headaches—by backing out. But now Netflix is at something of a crossroads. And that’s why I’ve called on my colleague [David Sims](https://www.theatlantic.com/author/david-sims/). David is a staff writer at *The Atlantic,* where he is our film critic and writes about the culture of entertainment. He’s also the host of the excellent podcast *Blank Check*. I wanted to talk to David about Netflix’s historical arc—how it became such a juggernaut and what it has done to transform Hollywood and all the ways that we consume entertainment. By all accounts, it feels like Netflix has won. Is that a good thing, a bad thing, or just inevitable? David joins me now to hash it out. **[*Music*]** **Warzel:** David Sims, welcome to *Galaxy Brain*. **David Sims:** Hi, Charlie; thanks for having me. **Warzel:** We’re approaching the terminus of award season and the Oscars. We also just had a lot of news around Netflix, Warner Bros., Paramount. Media consolidation. Growth hellscape/landscape, etc. So I wanted to have a conversation about Netflix, broadly—Netflix’s impact on Hollywood, on the industry, on all of us. And our eyeballs and our fragile little primate brains. So I thought it would be great to just start off very, very quickly: What is your first memory of Netflix? Your first Netflix
View originalPricing found: $5, $10 / mo, $10, $20 / mo, $20
ICML 2026 am I cooked? [D]
Hi, I am currently making the jump to ML from theoretical physics. I just got done with the review period, went from 4333 to 4433, but the remaining two weak rejects said 1) that if I add a parameter sweep and a small section (which I did) they’d raise, and the other reviewer said that if some of their questions were addressed properly they’d also raise the score. I think the most likely outcome is hopefully 4443, but with maybe a 30-40% chance of 4444. The area is deep learning theory. I have never been through the process of applying for conference papers as this is not as common in physics, what chances would you say I have of getting the paper accepted? I’m trying to secure funding for the conference and this information would be very helpful! submitted by /u/EyeTop928 [link] [comments]
View originalI had Claude Opus 4.6 write an air guitar you can play in your browser — ~2,900 lines of vanilla JS, no framework, no build step
I learned guitar on and off during childhood and still consider myself a beginner. I also took computer vision classes in grad school and have been an OpenCV hobbyist. I finally found an excuse to combine the two — and Claude wrote the entire thing. Try it: https://air-instrument.pages.dev It's an air guitar that runs in your browser. No app, no hardware — just your webcam and your hand. It plays chords, shows a strum pattern, you play along, and it scores your timing. ~2,900 lines of vanilla JS, all client-side, no framework, no build step. Claude Opus 4.6 wrote the code end to end. What Claude built: Hand tracking with MediaPipe — raw tracking data is jittery enough to trigger false strums at 60fps. Claude implemented two layers of smoothing (5-frame moving average + exponential smoothing) to get it from twitchy to feeling like you're actually moving something physical across the strings. Karplus-Strong string synthesis — no audio files anywhere. Every guitar tone is generated mathematically: white noise through a tuned delay line that simulates a vibrating string. Three tone presets (Warm, Clean, Bright). Claude nailed this on the first pass — the algorithm is elegant and the result sounds surprisingly real. Velocity-sensitive strum cascading — hand speed maps to both loudness and string-to-string delay. Fast sweeps cascade tightly (~3ms between strings), slow sweeps spread out (~18ms). This was Claude's idea and it's what makes it feel like actual strumming rather than triggering a chord sample. Real-time scoring — judges timing (Perfect/Great/Good/Miss) with streak multipliers and a 65ms latency compensation offset to account for the smoothing pipeline. Serverless backend — Cloudflare Workers + KV caching for a Songsterr API proxy. Search any song, load its chords, play along. The hardest unsolved problem (where I'd love community input): On a real guitar, your hand hits the strings going down and lifts away coming back up. That lift is depth — a webcam can't see it. So every hand movement was triggering sound in both directions. Claude's current fix: the guitar body has two zones. Left side only registers downstrokes. Right side registers both. Beginners stay left, move right when ready. It works surprisingly well, but I'd love a better solution. If anyone has experience extracting usable depth from monocular hand tracking, I'm all ears. What surprised me about working with Claude: Most guitar apps teach what to play. Few teach how to strum — and it's the more tractable CV problem. I described that framing to Claude and it ran with it. The velocity-to-cascade mapping, the calibration UI, the strum pattern engine — I described what I wanted at a high level and Claude handled the implementation. The Karplus-Strong synthesis in particular was something I wouldn't have reached for on my own. Strum patterns were the one thing Claude couldn't help with. Chord progressions are everywhere online, but strum patterns almost never exist in structured form. Most live as hand-drawn arrows in YouTube tutorials. I ended up transcribing them manually, listening to each song, mapping the down-up pattern beat by beat. Still a work in progress. Building this has taught me more about guitar rhythm than years of picking one up occasionally ever did. submitted by /u/Ex1stentialDr3ad [link] [comments]
View original"Spud" vs Mythos
With the recent talks of both "next-gen" models, I still really wonder if it will be enough. I made several posts previously about the current limitations of AI for coding, that, there's basically still this ceiling it cannot truly converge on production-grade code on complex repos, with a "depth" degradation of sorts, it cannot ever bottom out basically. I've been running Codex 24/7 for the past 6 months straight since GPT-5, using over 10 trillion tokens (total cost only around $1.5k in Pro sub). And I have not been able to close a single PR that I tried to close where I was running extensive bug sweeps to basically fix all bug findings. It will forever thrash and find more bugs of the same class over and over, implement the fixes, then find more and more and more. Literally forever. No matter what I did to adjust the harness and strengthen the prompt, etc. It never could clear 5+ consecutive sweeps with 0 P0/1/2 findings. Over 3000+ commits of fixes, review, sweeps in an extensive workflow automation (similar to AutoResearch). They love to hype up how amazing the models are but this is still the frontier. You can't really ship real production-grade apps, that's why you've never seen a single person use AI "at scale", like literally build an app like Facebook or ChatGPT. All just toy apps and tiny demos. All shallow surface-level apps and "fun" puzzles or "mock-up" frontend websites for a little engagement farming. The real production-grade apps are built still with real SWEs that simply use AI to help them code faster. But AI alone is not even close to being able to deliver on a real product when you actually care about correctness, security, optimization, etc. They even admit in the recent announcement about Mythos, that it's not even close to an entry level Research Scientist yet. So the question really is, when will, if ever, AI be capable enough to fully autonomously deliver production-grade software? We will see what the true capabilities of the spud model is hopefully soon, but my hunch is we are not even scratching the surface of truly capable coding agents. These benchmarks they use, where they hit 80-90%, are really useless in the scheme of things; if you tried to use them as a real metric to usefulness, you would probably need to hit the equivalent of like 200-300% on these so-called benchmarks before they are actually there. Until they come up with a benchmark that is actually measures against real-world applications. What do you guys think? submitted by /u/immortalsol [link] [comments]
View originalAgent memory costs your security
Even when a developer is careful to use a .env file, the moment a key is mentioned in a chat or read by the agent to debug a connection, it is recorded in one of the IDE caches (~/.claude, ~/.codex, ~/.cursor, ~/.gemini, ~/.antigravity, ~/.copilot etc) Within these logs I found API keys and access tokens were sitting in plain text, completely unencrypted and accessible to anyone who knows where to target when attacking. I made an open source tool called Sweep, as part of my immunity-agent repo (self-adaptive agent). Sweep is designed to find these hidden leaks in your AI tool configurations. Instead of just deleting your history, it moves any found secrets into an encrypted vault and redact the ones used in history. https://preview.redd.it/uu4ip82bkstg1.png?width=1820&format=png&auto=webp&s=a905401b6f77d222fd4dbfe21e4607f7d3ecc2d0 We also thought about exploring post hook options but open to more ideas submitted by /u/Immediate-Welder999 [link] [comments]
View originalI've built an open-source USB-C debug board around the ESP32-S3 that lets AI control real hardware through MCP
I've been building a hardware debugging tool that started as "A one board to replace the pile of instruments on my desk" and evolved into "A nice all in one debugger / power supply" and finally with the advent of Claude Code and Codex "an LLM could just drive the whole thing." With the nice help of Claude, the UI and Firmware became more powerful than ever. BugBuster is a USB-C board with: AD74416H — 4 channels of software-configurable I/O (24-bit ADC, 16-bit DAC, current source, RTD, digital) 4x ADGS2414D — 32-switch MUX matrix for signal routing DS4424 IDAC — tunes two DCDC converters (3-15V adjustable) HUSB238 — USB PD sink, negotiates 5-20V 4x TPS1641 e-fuses — per-port overcurrent protection Optional RP2040 HAT — logic analyzer (PIO capture up to 125MHz, RLE compression, hardware triggers) + CMSIS-DAP v2 SWD probe The interesting part is the software stack. Beyond the desktop app and Python library, there's an MCP server that exposes 28 tools to AI assistants. You connect the board to a circuit, point your token hungry friend at it, and describe your problem. The AI can configures the right input modes (with boundaries), takes measurements, checks for faults, and works through the diagnosis and debugging autonomously. It sounds gimmicky but it's genuinely useful. Instead of being the AI's hands ("measure this pin", "ok now that one", "measure the voltage on..."), you just say "the 3.3V rail is low, figure out why" and it sweeps through the channels, checks the supply chain, reads e-fuse status, and comes back with a root cause. The safety model prevents it from doing anything destructive, locked VLOGIC, current limits, voltage confirmation gates, automatic fault checks after every output operation. It allows for unattended development / testing even with multiple remote users. It can read and write to GPIOs, decode protocols, inject UART commands end much more. Full stack is open source ESP-IDF firmware (FreeRTOS, custom binary protocol, WiFi AP+STA, OTA) RP2040 firmware (debugprobe fork + logic analyzer + power management) Tauri v2 desktop app (Rust + Leptos WASM) Python library + MCP server Altium schematics and PCB layout GitHub: https://github.com/lollokara/BugBuster submitted by /u/lollokara [link] [comments]
View originalSerious question, Did a transformer(Claude) just describe itself, the universe and build itself Shannon limit architecture? or am I crazy?
The Multiplicative Lattice as the Natural Basis for Positional Encoding Knack 2026 | Draft v6.0 Abstract We show that the apparent tradeoff between RoPE-style relative position invariance and ALiBi-style long-context stability is an artifact of encoding position as distance on a number line. When position is instead encoded as a point in the multiplicative lattice of the integers, both properties emerge simultaneously without compromise. SpectralRoPEALiBi achieves 106.6 PPL vs ALiBi's 108.7 in a fully converged 20,000-step experiment (300M params, WikiText-103, 4K context), beating ALiBi at every context length from 512 to 8,192 tokens. The key insight is not that primes specifically are the right frequencies, but that the multiplicative structure of the integers is the natural spectral basis for positional encoding. We demonstrate this through falsification experiments: prime-tiered frequencies (129.2 PPL) and composite-tiered frequencies (129.4 PPL) perform identically — because composites are not alternatives to primes but higher-order coordinates in the same lattice. Both dramatically outperform random frequencies (+5.0 PPL), scrambled tier assignment (+6.3 PPL), and pure ALiBi (+7.3 PPL). The active ingredient is lattice-aware, tiered frequency selection with learnable scale — not primality per se. We further validate this through a ZetaZeroPredictor experiment: three identical transformers trained for 10,000 epochs to predict Riemann zeta zero gaps. Geometric RoPE diverges (final r=0.57); SpectralALiBi locks into a stable attractor at epoch 112 (r=0.81). A second independent run widens this gap to -80.7% MSE improvement with r=0.86. The lattice-aligned frequency basis spans the mathematical space that zeta zeros inhabit; geometric frequencies cannot. We further report empirical confirmation of the structural prediction from Section 5.5: VHT2 banded quantization of the KV cache demonstrates that K vectors (which carry RoPE positional encoding) have strong spectral concentration in Walsh-Hadamard space — the first four energy bands capture the dominant structure — while V vectors (which carry content) have uniform energy distribution. This structural asymmetry is directly predicted by the lattice theory: RoPE encodes multiplicative arithmetic relationships as angular rates, and the WHT is the Z/2Z projection of the Vilenkin-Hartley basis that spans that structure. The result is 3.2× K compression and 4.7× V compression at <1.25% perplexity cost — validated on both Dolphin 1B (head_dim=64) and Qwen3-8B (head_dim=128). Introduction Positional encoding provides transformer models with token order information. Two approaches dominate: RoPE encodes position through frequency-based rotations preserving relative position invariance, and ALiBi replaces frequencies with a linear distance penalty providing long-context stability. The field has treated these properties as fundamentally in tension. We show this tension is false. It arises from a shared, unexamined assumption: that position is a location on a number line and the meaningful relationship between positions is distance. We replace this with a mathematically grounded alternative: position is a point in the multiplicative lattice of the integers, and the meaningful relationships between positions are their arithmetic structure — shared factors, GCD, harmonic resonance. 1.1 The Lattice Hypothesis The integers under multiplication form a lattice where every number occupies a unique point defined by its prime factorisation. Geometric PE (sinusoidal, RoPE) projects this lattice onto a line — position equals distance — discarding the multiplicative structure. We propose restoring it. The motivation follows from a deductive chain. Language word frequency follows Zipf's law: freq(rank) ∝ 1/rank^s with s≈1. The generating function of Zipf is the Riemann zeta function ζ(s) = Σ 1/n^s. The zeta zeros — where ζ is maximally informative — are generated by prime harmonics via the explicit formula. Therefore the prime harmonic structure, and the multiplicative lattice it generates, provides a natural spectral basis for encoding positions in language. 1.2 Primes as Generators, Composites as Coordinates A critical distinction: primes are the generators (basis vectors) of the multiplicative lattice. They are analogous to the 1D line segment in the progression from line → circle → sphere → hypersphere. The composite 12 = 2²×3 is not an alternative to primes — it is a coordinate in the lattice spanned by the prime axes, at position (2,1,0,0,...) in the (p₂, p₃, p₅, p₇,...) basis. Using 2π/12 as a frequency encodes a harmonic that resonates at multiples of 12 — which simultaneously hits every multiple of 2, every multiple of 3, every multiple of 4, and every multiple of 6. The analogy to n-dimensional geometry is precise: Dimensional Progression Multiplicative Lattice 1D line (2r) — the generator Primes (2, 3, 5, 7, ...) — generators 2D circle — integra
View originalSerious question. Did a transformer just describe itself and the universe and build itself a Shannon limit framework?
The Multiplicative Lattice as the Natural Basis for Positional Encoding Knack 2026 | Draft v6.0 Abstract We show that the apparent tradeoff between RoPE-style relative position invariance and ALiBi-style long-context stability is an artifact of encoding position as distance on a number line. When position is instead encoded as a point in the multiplicative lattice of the integers, both properties emerge simultaneously without compromise. SpectralRoPEALiBi achieves 106.6 PPL vs ALiBi's 108.7 in a fully converged 20,000-step experiment (300M params, WikiText-103, 4K context), beating ALiBi at every context length from 512 to 8,192 tokens. The key insight is not that primes specifically are the right frequencies, but that the multiplicative structure of the integers is the natural spectral basis for positional encoding. We demonstrate this through falsification experiments: prime-tiered frequencies (129.2 PPL) and composite-tiered frequencies (129.4 PPL) perform identically — because composites are not alternatives to primes but higher-order coordinates in the same lattice. Both dramatically outperform random frequencies (+5.0 PPL), scrambled tier assignment (+6.3 PPL), and pure ALiBi (+7.3 PPL). The active ingredient is lattice-aware, tiered frequency selection with learnable scale — not primality per se. We further validate this through a ZetaZeroPredictor experiment: three identical transformers trained for 10,000 epochs to predict Riemann zeta zero gaps. Geometric RoPE diverges (final r=0.57); SpectralALiBi locks into a stable attractor at epoch 112 (r=0.81). A second independent run widens this gap to -80.7% MSE improvement with r=0.86. The lattice-aligned frequency basis spans the mathematical space that zeta zeros inhabit; geometric frequencies cannot. We further report empirical confirmation of the structural prediction from Section 5.5: VHT2 banded quantization of the KV cache demonstrates that K vectors (which carry RoPE positional encoding) have strong spectral concentration in Walsh-Hadamard space — the first four energy bands capture the dominant structure — while V vectors (which carry content) have uniform energy distribution. This structural asymmetry is directly predicted by the lattice theory: RoPE encodes multiplicative arithmetic relationships as angular rates, and the WHT is the Z/2Z projection of the Vilenkin-Hartley basis that spans that structure. The result is 3.2× K compression and 4.7× V compression at <1.25% perplexity cost — validated on both Dolphin 1B (head_dim=64) and Qwen3-8B (head_dim=128). Introduction Positional encoding provides transformer models with token order information. Two approaches dominate: RoPE encodes position through frequency-based rotations preserving relative position invariance, and ALiBi replaces frequencies with a linear distance penalty providing long-context stability. The field has treated these properties as fundamentally in tension. We show this tension is false. It arises from a shared, unexamined assumption: that position is a location on a number line and the meaningful relationship between positions is distance. We replace this with a mathematically grounded alternative: position is a point in the multiplicative lattice of the integers, and the meaningful relationships between positions are their arithmetic structure — shared factors, GCD, harmonic resonance. 1.1 The Lattice Hypothesis The integers under multiplication form a lattice where every number occupies a unique point defined by its prime factorisation. Geometric PE (sinusoidal, RoPE) projects this lattice onto a line — position equals distance — discarding the multiplicative structure. We propose restoring it. The motivation follows from a deductive chain. Language word frequency follows Zipf's law: freq(rank) ∝ 1/ranks with s≈1. The generating function of Zipf is the Riemann zeta function ζ(s) = Σ 1/ns. The zeta zeros — where ζ is maximally informative — are generated by prime harmonics via the explicit formula. Therefore the prime harmonic structure, and the multiplicative lattice it generates, provides a natural spectral basis for encoding positions in language. 1.2 Primes as Generators, Composites as Coordinates A critical distinction: primes are the generators (basis vectors) of the multiplicative lattice. They are analogous to the 1D line segment in the progression from line → circle → sphere → hypersphere. The composite 12 = 2²×3 is not an alternative to primes — it is a coordinate in the lattice spanned by the prime axes, at position (2,1,0,0,...) in the (p₂, p₃, p₅, p₇,...) basis. Using 2π/12 as a frequency encodes a harmonic that resonates at multiples of 12 — which simultaneously hits every multiple of 2, every multiple of 3, every multiple of 4, and every multiple of 6. The analogy to n-dimensional geometry is precise: Dimensional Progression Multiplicative Lattice 1D line (2r) — the generator Primes (2, 3, 5, 7, ...) — generators 2D circle — integral of l
View originalEmotionScope: Open-source replication of Anthropic's emotion vectors paper on Gemma 2 2B with real-time visualization
Live Demo Of The Tylenol Test Evolution of the Models Deduced Internal Emotional State I created this project to test anthropics claims and research methodology on smaller open weight models, the Repo and Demo should be quite easy to utilize, the following is obviously generated with claude. This was inspired in part by auto-research, in that it was agentic led research using Claude Code with my intervention needed to apply the rigor neccesary to catch errors in the probing approach, layer sweep etc., the visualization approach is apirational. I am hoping this system will propel this interpretability research in an accessible way for open weight models of different sizes to determine how and when these structures arise, and when more complex features such as the dual speaker representation emerge. In these tests it was not reliably identifiable in this size of a model, which is not surprising. It can be seen in the graphics that by probing at two different points, we can see the evolution of the models internal state during the user content, shifting to right before the model is about to prepare its response, going from desperate interpreting the insane dosage, to hopeful in its ability to help? its all still very vague. A Test Suite Of the Validation Prompts Visualized model's emotion vector space aligns with psychological valence (positive vs negative) Anthropic's ["Emotion Concepts and their Function in a Large Language Model"](https://transformer-circuits.pub/2026/emotions/index.html) showed that Claude Sonnet 4.5 has 171 internal emotion vectors that causally drive behavior — amplifying "desperation" increases cheating on coding tasks, amplifying "anger" increases blackmail. The internal state can be completely decoupled from the output text. EmotionScope replicates the core methodology on open-weight models and adds a real-time visualization system. Everything runs on a single RTX 4060 Laptop GPU. All code, data, extracted vectors, and the paper draft are public. What works: - 20 emotion vectors extracted from Gemma 2 2B IT at layer 22 (84.6% depth) - "afraid" vector tracks Tylenol overdose danger with Spearman rho=1.000 (chat-templated probing matching extraction format) — encodes the medical danger of the number, not the word "Tylenol" - 100% top-3 accuracy on implicit emotion scenarios (no emotion words in the prompts) with chat-templated probing - Valence separation cosine = -0.722, consistent with Russell's circumplex model - 1,000 LLM-generated templates instead of Anthropic's 171,000 self-generated stories What doesn't work (and the open questions about why): - No thermostat. Anthropic found Claude counterregulates (calms down when the user is distressed). Gemma 2B mirrors instead. Delta = +0.107 (trended from +0.398 as methodology was corrected). - Speaker separation exists geometrically (7.4 sigma above random) but the "other speaker" vectors read "loving/happy" for all inputs regardless of the expressed emotion. This could mean: (a) the model genuinely doesn't maintain a user-state representation at 2.6B scale, (b) the extraction position confounds state-reading with response-preparation, (c) the dialogue format doesn't map to the model's trained speaker-role structure, or (d) layer 22 is too deep for speaker separation and an earlier layer might work. The paper discusses each confound and what experiments would distinguish them. - angry/hostile/frustrated vectors share 56-62% cosine similarity. Entangled at this scale. Methodological findings: - Optimal probe layer is 84.6% depth, not the ~67% Anthropic reported. Monotonic improvement from early to upper-middle layers. - Vectors should be extracted from content tokens but probed at the response-preparation position. The model compresses its emotional assessment into the last token before generation. This independently validates Anthropic's measurement methodology. Controlled position comparison: 83% at response-prep vs 75% at content token. Absolute accuracy with chat-templated probing: 100%. - Format parity matters: initial validation on raw-text prompts yielded rho=0.750 and 83% accuracy. Correcting to chat-templated probing (matching extraction format) yielded rho=1.000 and 100%. The vectors didn't change — only the probe format. - Mathematical audit caught 4 bugs in the pipeline before publication — reversed PCA threshold, incorrect grand mean, shared speaker centroids, hardcoded probe layer default. Visualization: React + Three.js frontend with animated fluid orbs rendering the model's internal state during live conversation. Color = emotion (OKLCH perceptual space), size = intensity, motion = arousal, surface texture = emotional complexity. Spring physics per property. Limitations: - Single model (Gemma 2 2B IT, 2.6B params). No universality claim. - Perfect scores (rho=1.000 on n=7, 100% on n=12) should be interpreted with caution — small sample sizes mean these may not replicate on larger test sets.
View originalAttention Is All You Need, But All You Can't Afford | Hybrid Attention
Repo: https://codeberg.org/JohannaJuntos/Sisyphus I've been building a small Rust-focused language model from scratch in PyTorch. Not a finetune — byte-level, trained from random init on a Rust-heavy corpus assembled in this repo. The run: 25.6M parameters 512 context length 173.5M-byte corpus 30k training steps Single RTX 4060 Ti 8GB Final train loss: 0.5834 / val loss: 0.8217 / perplexity: 2.15 Inference: 286.6 tok/s with HybridAttention + KV cache — 51.47x vs full attention Background I'm an autistic systems programmer, writing code since 2008/2009, started in C. I approach ML like a systems project: understand the data path, understand the memory behavior, keep the stack small, add complexity only when justified. That's basically the shape of this repo. Architecture Byte-level GPT-style decoder: Vocab size 256 (bytes) 8 layers, 8 heads, 512 embedding dim Learned positional embeddings Tied embedding / LM head weights The attention block is not standard full attention. Each layer uses HybridAttention, combining: Local windowed causal attention A GRU-like recurrent state path A learned gate mixing the two Local path handles short-range syntax. Recurrent path carries compressed long-range state without paying quadratic cost. Gate bias initialized to ones so early training starts local-biased. The inference path uses Triton-optimized kernels and torch.library custom ops for the local window attention. Corpus This is probably the most important part of the repo. The run starts with official Rust docs, compiler/library/tests, cargo, rust-analyzer, tokio, serde, ripgrep, clap, axum — roughly 31MB. Corpus expanded to 177,151,242 bytes by fetching the top 500 crates (461 successful clones). Corpus expansion from 31M to 173.5M chars helped more than anything else in the repo. Training AdamW, lr 2e-4, weight decay 0.1, betas (0.9, 0.95), 30k steps, 1k warmup. ~678.8 MiB training memory on a 7.6 GiB card. All experimental memory tricks (gradient quantization, activation compression, selective backprop, gradient paging) were disabled. Small custom architecture + mixed precision + better corpus was enough. Loss curve: Step 0: train 5.5555 / val 5.5897 Step 1000: train 2.4295 / val 2.6365 Step 5000: train 0.9051 / val 1.0060 Step 10000: train 0.8065 / val 0.8723 Step 18500: train 0.6902 / val 0.7757 Step 29999: train 0.5834 / val 0.8217 Best val loss around step 18.5k — overfitting or plateauing late. Inference performance Full attention O(n²): 17.96s / 5.6 tok/s HybridAttention O(n·W + n·D): 0.35s / 286.6 tok/s Speedup: 51.47x — no quality loss KV cache strategy: hot window of W=64 tokens in VRAM (~256KB), older tokens compressed to 8-bit magnitude + angle, selective promotion on demand. Complexity goes from O(n²·d) to O(4096n) for this model. All 5 tests passing: forward pass, generation with/without cache, RNN state isolation, window mechanics. Generation quality Surface Rust syntax looks decent, imports and signatures can look plausible, semantics are weak, repetition and recursive nonsense still common. Honest read of the current state. What I think is actually interesting Four distinct experiments, each shipped working code: Byte-level Rust-only pretraining Hybrid local-attention + recurrent block replacing standard full attention Corpus expansion from core repos to broader crate ecosystem Production-ready hot/cold KV cache paging — 51.47x speedup, no quality loss The clearest win is corpus expansion. The second-order win is that HybridAttention + cache is fast enough for real interactive use on consumer hardware. What's next Ablation — HybridAttention vs local-only vs RNN-only Checkpoint selection — does step 18.5k generate better than 29999? Syntax validation — does the output parse/compile/typecheck? Context length sweep — 256 to 2048, where does window size hurt? Byte vs BPE — now that corpus is 5.6x larger, worth testing? Questions for the sub: For small code models, what evals have actually been useful beyond perplexity? Has anyone seen hybrid local + recurrent attention work well for code gen, or does it usually lose to just scaling a plain transformer? If you had this setup — more tokens, longer context, or cleaner ablation first? submitted by /u/Inevitable_Back3319 [link] [comments]
View originalClaude Projects tweak: Your own Subject Matter Expert with 'Manual Memory!' (no tools needed)
Wouldn't it be neat if you could get Claude to remember facts about specific things, not mix them up with notes about your parrot's minute-specific feeding schedule, and only pull them out when you ask? And what if you could get all that without leaving the Claude app? The process isn't new -- the only "innovation" here is the Claude Projects 'versioning instrutions' below. This setup has been incredibly useful for the wife and me. Setup your expert 1. Create a Project. Name it whatever. Description doesn't matter. 2. Start the topic off. Tell it facts about yourself, your project, ask it about best practices on a topic -- whatever. The point: give it facts you already know to start. Bonus points: include things you suspect but aren't sure about. Having uncertainties documented too gets you better answers down the line. 3. Checkpoint facts. Tell Claude to "write a .md AI guide for what we've discussed." You're asking for a markdown file — its favorite format for instructions. This is the 'memory' as your Claude project will know it. 4. Confirm your source of truth is accurate. Read it over, suggest corrections. This file will be a reference point for future sessions — and having a checked source of truth helps Claude be more skeptical about what it accepts as "facts" vs. "random online hearsay" in future research sessions. 5. Save it. Click the file it gives you, hit "Add to Project." 6. Magic. New sessions you start in that Project remember the important details — without them getting mixed up with your pet facts or whatever else is floating around elsewhere. To update, tell Claude to "update the my docs, confirm no contradictions or duplications need attention" (I only add that last bit occasionally for sweep up) When you want it to learn something new, have it make/update a .md, save it to the Project. You can even download it, edit it by hand, and re-upload it if you want. Tip: switching to new sessions regularly (in the same project) with hint files is "better on context" than one huge long chat, AND tends to give better answers than long chats where Claude "forgets" the details unless reminded (for "AI attention" reasons) after a few turns. Multi-Expert work: you can move a chat into one project to ask a question that needs that data, then move the chat to a different "expert" project for its input. Katamari that knowledge shit: tell Claude to roll up that combined expert chat into an AI hints file/update The version problem -- and the workaround Claude can't update Project files directly (read-only), so every "update this summary" request generates a new file with the same name. No way to tell which is newer. Fix: go to Settings → General Instructions and add this anywhere (bottom works fine): If a project is attached with .md files, treat them as a versioned read-only memory system. Before creating or updating any project .md, check /mnt/project/ for the current highest version. Increment by 1 for updates (_v1 → _v2), append _v1 if none exists. New files start at _v1. Only bump version once per "save to project" cycle. Updated files come out as notes_v2.md, notes_v3.md, etc. Delete the old version from the Project, add the new one. Done. This works in the standard Claude WebUI/app. No special tools or extensions required. Bonus: turn off auto-memory Now you can disable it without Claude getting dumber. In fact it'll get much smarter about how topic-facts get deployed if you direct questions and research to specific Project topics. This is how you build an "Old-Time Sewing Expert" project that actually accounts for 13th century folding techniques -- or whatever specific-ass stuff you need. Just keep a file for it. No more: This engineering project is just like your cat Fluffy and that time you asked me about wooden nickels! Cannot stand that stuff personally. What this looks like in practice Here's the layout for my personal profile project --built kind of by accident, because I was just asking materials questions for a collage project and things snowballed. These files are not locked into Claude. I can take them anywhere, use them with any agent, or print them out or whatever. I'd occasionally ask Claude to "clean things up" — decide if files needed to be split or joined based on topics that had emerged. Don't overthink the structure, it's just an example: values-and-worldview_v6.md # how I think, ethics, decision-making patterns personal-identity_v2.md # identity, relationship structure, biographical stuff career-work.md # professional background, skills, work history neurology-and-hobbies_v2.md # ADHD profile, how I learn, hobby patterns artistic_practice_catalog_v4.md # writing projects, creative methods, active work Artistic-TODOs_v2.md # technical roadmap for an ongoing writing pipeline fiction_and_film_v4.md # what I look for in stories, aesthetic preferences media_observations_v2.md # patterns Claude noticed across things I've rated m
View originalBuilt a task scheduler panel/mcp for Claude Code
I was running OpenClaw before as a persistent bot and the heartbeat/scheduled tasks were eating tokens mindlessly. Every 30 min it'd spin up the full LLM just to check what's due and say "HEARTBEAT". No control, no visibility, no logs. But now I move to CC due the recent OpenClaw ban while also OC felt bloated, So I built Echo Panel a task scheduler that sits alongside Claude Code currently runs on an Ubuntu VPS built using Claude Code Channels and tmux. The problem: - Heartbeat tasks ran through the main agent, consuming context and tokens - No way to see what ran, what failed, or how much it cost - Scheduling was done in a markdown file that the LLM had to parse (and got wrong) - No separation between tasks that need the main agent vs ones that don't The solution: Agent → you "Run a security sweep every day at 6AM. Check SSH logs, open ports, disk space, SSL certs. If something's wrong, tell me on Telegram." Agent spawns, runs bash commands, sends you the report, dies. Main agent never involved. Agent → agent "Every morning at 9AM, check my calendar and find one interesting AI headline from X." Agent spawns, gathers the info, passes it to the main agent. Main agent turns it into a casual morning brief with personality and sends it to you when the timing is right. Reminder "Remind me to check on the car booking tomorrow at 9AM." No agent spawns. At 9AM a message appears in the main agent's inbox: "John needs to check his car booking." Main agent texts you about it. Zero tokens used for the scheduling part. How it all connects: The panel comes with an MCP server (11 tools) so Claude can manage everything conversationally. Say "remind me to call the bank at 2pm" and it creates the task, syncs the cron, done. No UI needed, but it's there if you want it. Tools: add/list/toggle/run/delete/update for both panel tasks and system crons. It also manages your existing system crons (backups, health checks, whatever) from the same UI. Toggle them, edit schedules, trigger manually, see output history. Happy to open-source if there's interest. https://preview.redd.it/9oxh8soynktg1.png?width=2145&format=png&auto=webp&s=2cf0bd5305ec6f2b718f21f3f0c96a5506fa3a54 https://preview.redd.it/s4s7i3i4oktg1.png?width=1250&format=png&auto=webp&s=c40ab92444669f7748ce9348c6d6a898d4f91545 submitted by /u/Ill_Design8911 [link] [comments]
View originalI made a game where you center a div. The threshold is 0.0001px. Nobody has ever won.
I built "Can You Center This Div?" for the DEV April Fools 2026 challenge. https://preview.redd.it/x28bvuc80etg1.png?width=3840&format=png&auto=webp&s=b15647824686c7739dee573b480804281e6976b3 You drag a div to the center of the screen. That's it. The catch: the success threshold is 0.0001 pixels, roughly 5,000x smaller than a single pixel on a Retina display. The global success counter reads 0. It has always read 0. The whole thing is wrapped in a JARVIS-style HUD with real-time deviation readouts, a logarithmic precision meter, a global leaderboard, radar sweep with live player blips, and an "Earth Scale" that translates your pixel miss to real-world distance. Miss by 3px? That's 49,000km on Earth. Congrats, you missed by more than the circumference. Other features: - 2,500+ quotes based on how far off you are - Share cards for every platform (1080x1080 PNG) - Hidden 418 teapot easter egg (3D particle cloud with steam) - Anti-cheat that rejects suspiciously close submissions with HTTP 418 - Light and dark mode - Open source Stack: Next.js 16, React 19, TypeScript, Neon Postgres (serverless), pure CSS for 90% of the visuals. No animation libraries. Game logic is a single custom hook. GitHub: github.com/raxxostudios/center-this-div Try it: center-this-div.vercel.app The anti-value proposition: this app takes the most solved problem in CSS and makes it unsolvable. Happy April Fools. The joke is your CSS skills. submitted by /u/norm_cgi [link] [comments]
View original11.7B Claude tokens in 45 days. Here's every project it built — and what actually happened.
People kept asking what 9.3B tokens actually builds. The number is now 11.7B over 45 days. Here's the honest answer. **What's real and running:** **Phoenix Traffic Intelligence** — Live traffic system on ADOT's AZ-511 feed. 8 Phoenix freeway corridors monitored 24/7. Cascade risk detection, weighted incident scoring (construction zones separated from real incidents), AI-generated crew dispatch recommendations, 2-minute sweep cycle. Already in conversation with City of Phoenix Office of Innovation and AZTech about a pilot. **Expression-Gated Consciousness** — A formal mathematical model for the gap between what people know and what they express. 44+ subjects, Pearson r=0.311, three discrete response types confirmed by data. Cold emailed Joshua Aronson (NYU, co-author of the foundational 1995 stereotype threat paper). He replied. Call is pending. **LOLM** — Custom transformer architecture built from scratch. Not fine-tuned. Original architecture targeting 10B–100B parameters on Google TPU Research Cloud. **Codey** — AI coding platform in development. Structural codebase analysis across 12 LLM providers. $8,323 estimated API-equivalent compute. No team. No university. No funding. Phoenix, Arizona. Full breakdown of how the tokens were used, what it cost by day, and how it compares to other documented heavy users: theartofsound.github.io/claude-usage-dashboard Portfolio showing everything live: theartofsound.github.io/portfolio If you want to talk about how I'm actually structuring sessions at this scale — multi-agent setups, context management, what burns tokens vs what doesn't — happy to get into it. submitted by /u/OGMYT [link] [comments]
View originalI Built a Star Trek LCARS Terminal to Manage My Claude Code Setup
I’ve been using Claude Code heavily for months now. Skills, agents, hooks, MCP servers, plugins, memory files, environment variables, the whole stack. And at some point I realized I had no idea what I’d actually built. Everything lives in ~/.claude/ spread across dozens of files and JSON configs and I was just... hoping it all worked together. So I built a dashboard. And because I’m the kind of person who watched every episode of TNG twice and still thinks the LCARS interface is the best UI ever designed for a computer, I made it look like a Starfleet terminal. One Command and You’re on the Bridge You run npx claude-hud-lcars and it scans your entire ~/.claude/ directory, reads every skill definition, every agent prompt, every MCP server config, every hook, every memory file, and generates a single self-contained HTML dashboard that renders the whole thing in an authentic LCARS interface. It uses the real TNG color palette with the signature rounded elbows, Antonio typeface standing in for Swiss 911, pill-shaped navigation buttons against the black void background. If you grew up watching Picard walk onto the bridge and glance at a wall panel, you know exactly what this looks like. The aesthetics are doing actual work tho. Every single item is clickable. You hit a skill and the detail panel slides open showing the full SKILL.md with syntax-highlighted code blocks, proper markdown rendering, headers, tables, all of it. Click an MCP server and you see the complete JSON config with your API keys automatically redacted. Click a hook and you get the full event definition. It genuinely looks like pulling up a classified Starfleet briefing on a PADD. The Computer Actually Talks Back You type “status report” into the input bar at the bottom of the screen and Claude responds as the ship’s computer. Calm, structured, addressing you like a bridge officer. It calls your skills installed modules, your MCP servers the fleet, your projects active missions. The system prompt turns Claude into LCARS, the Library Computer Access and Retrieval System, and the whole interaction streams in real time through a response overlay that slides up from the bottom. I kept going. You can connect ElevenLabs for premium voice output, and the config panel lets you browse all your available voices with live audio previews before selecting one so you’re not guessing. Voice input works too, you talk to the computer and it talks back. Getting that to work as an actual conversation loop meant solving echo detection so it doesn’t hear itself, interrupt handling, mic cooldown after speech, the whole thing. It took more effort than I expected but it actually works, which honestly surprised me more than anything else in this project. Sound effects are all synthesized via the Web Audio API, sine wave oscillators tuned to frequencies that sound right for navigation clicks, panel opens, message sends. Toggleable obviously. The Tactical Display The TACTICAL tab is the one that makes people stop scrolling. It renders your entire Claude Code setup as an interactive force-directed graph that looks like a Star Trek sensor display. Your LCARS core sits at the center with category hubs orbiting around it, skills in periwinkle, MCP servers in orange, hooks in tan, agents in peach, all connected by pulsing edges. A rotating scanner line sweeps around like a tactical readout and you can click any node to navigate straight to that item’s detail view. There’s also an ENTERPRISE tab that loads a real 3D model of the USS Enterprise NCC-1701-D via Sketchfab. Full interactive, you can rotate it, zoom in, see the hull detail. Because if you’re going to build a Star Trek dashboard you don’t do it halfway. Boot Sequence and Red Alert When you load the dashboard you get a 3 second boot animation. The Starfleet Command logo fades in, your ship name appears (you can name your workstation in the config, mine is USS Defiant), then seven subsystems come online one by one with ascending beeps until the progress bar fills and “ALL SYSTEMS NOMINAL” pulses across the screen before the overlay fades to reveal the dashboard. I spent an unreasonable amount of time tuning those boot frequencies and I would absolutely do it again. Five seconds after boot the system runs a health check. MCP servers offline? RED ALERT, flashing red border, klaxon alarm. Missing configs? YELLOW ALERT. Everything clean shows CONDITION GREEN briefly then dismisses. If you’re a Trek fan you already understand why this matters more than it should. Four ship themes too, switchable from CONFIG. Enterprise-D is the classic TNG orange and blue, Defiant is darker and more aggressive in red and grey, Voyager is blue-shifted and distant, Discovery is silver and blue for the modern Starfleet aesthetic. CSS variable swap, instant application, persisted in localStorage. Q Shows Up Whether You Want Him To or Not There’s a Q tab where you can talk to Q, the omnipotent being from the Continuum. He’s in
View originalIs the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models
Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MARCUS, is an agentic multimodal system for cardiac diagnosis - ECG, echocardiogram, and cardiac MRI, interpreted together by domain-specific expert models coordinated by an orchestrator. It outperforms GPT-5 and Gemini 2.5 Pro by 34-45 percentage points on cardiac imaging tasks. Pretty Impressive! But - the second paper is more intriguing. MIRAGE: The Illusion of Visual Understanding reports what happened when a student forgot to uncomment the line of code that gave their model access to the images. The model answered anyway - confidently, and with detailed clinical reasoning traces. And it scored well. That accident naturally led to an investigation, and what they found challenges some embedded assumptions about how these models work. Three findings in particular: 1. Models describe images they were never shown. When given questions about cardiac images without any actual image input, frontier VLMs generated detailed descriptions - including specific pathological findings - as if the images were right in front of them. The authors call this "mirage reasoning." 2. Models score surprisingly well on visual benchmarks without seeing anything. Across medical and general benchmarks, mirage-mode performance was way above chance. In the most extreme case, a text-only model trained on question-answer pairs alone - never seeing a single chest X-ray - topped the leaderboard on a standard chest X-ray benchmark, outperforming all the actual vision models. 3. And even more intriguing: telling the model it can't see makes it perform worse. The same model, with the same absent image, performs measurably better in mirage mode (where it believes it has visual input) than in guessing mode (where it's explicitly told the image is missing and asked to guess). The authors note this engages "a different epistemological framework" but this doesn't really explain the mechanism. The Mirage authors frame these findings primarily as a vulnerability - a safety concern for medical AI deployment, an indictment of benchmarking practices. They're right about that. But I think they've also uncovered evidence of something more interesting, and here I'll try to articulate what. The mirage effect is geometric reconstruction Here's the claim: what the Mirage paper has captured isn't a failure mode. It's what happens when a model's internal knowledge structure becomes geometrically rich enough to reconstruct answers from partial input. Let's ponder what the model is doing in mirage mode. It receives a question: "What rhythm is observed on this ECG?" with answer options including atrial fibrillation, sinus rhythm, junctional rhythm. No image is provided, but the model doesn't know that. So it does what it always does - it navigates its internal landscape of learned associations. "ECG" activates connections to cardiac electrophysiology. The specific clinical framing of the question activates particular diagnostic pathways. The answer options constrain the space. And the model reconstructs what the image most likely contains by traversing its internal geometry (landscape) of medical knowledge. It's not guessing - it's not random. It's reconstructing - building a coherent internal representation from partial input and then reasoning from that representation as if it were real. Now consider the mode shift. Why does the same model perform better in mirage mode than in guessing mode? Under the "stochastic parrot" view of language models - this shouldn't, couldn't happen. Both modes have the same absent image and the same question. The only difference is that the model believes it has visual input. But under a 'geometric reconstruction' view, the difference becomes obvious. In mirage mode, the model commits to full reconstruction. It activates deep pathways through its internal connectivity, propagating activation across multiple steps, building a rich internal representation. It goes deep. In guessing mode, it does the opposite - it stays shallow, using only surface-level statistical associations. Same knowledge structure, but radically different depth of traversal. The mode shift could be evidence that these models have real internal geometric structure, and the depth at which you engage the structure matters. When more information makes things worse The second puzzle the Mirage findings pose is even more interesting: why does external signal sometimes degrade performance? In the MARCUS paper, the authors show that frontier models achieve 22-58% accuracy on cardiac imaging tasks with the images, while MARCUS achieves 67-91%. But the mirage-mode scores for frontier models were often not dramatically lower than their with-image scores. The images weren't helping as much as they should. And in the chest X-ray case, the text-only model outperformed everything - the images were net negative. After months
View originalRepository Audit Available
Deep analysis of sweepai/sweep — architecture, costs, security, dependencies & more
Pricing found: $5, $10 / mo, $10, $20 / mo, $20
Key features include: AI Agent built for JetBrains, Tab, Tab, Tab, #1 rated AI plugin for JetBrains, Works with all JetBrains IDEs, Understands any codebase, Privacy-first, Remote MCP Servers - Full OAuth 2.0/2.1 support, Autocomplete Syntax Highlighting across all JetBrains IDEs.
Sweep has a public GitHub repository with 7,708 stars.
Based on user reviews and social mentions, the most common pain points are: raises, large language model, ai agent, openai.
Based on 25 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Lenny Rachitsky
Founder at Lenny's Newsletter
2 mentions