Build AI responsibly to benefit humanity
Based on the limited social mentions provided, users view Google DeepMind as a legitimate AI research organization, though there's some debate about what constitutes a genuine "research lab" in the AI space. The mentions show positive technical engagement, with users discussing and building upon DeepMind's research like the Aletheia project and integrating their work into practical applications. DeepMind appears to maintain credibility in the research community, with users referencing their papers and methodologies as valuable resources for AI development projects. However, the sample size is quite small and lacks comprehensive user feedback on specific products or pricing.
Mentions (30d)
13
1 this week
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Based on the limited social mentions provided, users view Google DeepMind as a legitimate AI research organization, though there's some debate about what constitutes a genuine "research lab" in the AI space. The mentions show positive technical engagement, with users discussing and building upon DeepMind's research like the Aletheia project and integrating their work into practical applications. DeepMind appears to maintain credibility in the research community, with users referencing their papers and methodologies as valuable resources for AI development projects. However, the sample size is quite small and lacks comprehensive user feedback on specific products or pricing.
Features
Use Cases
I built an open-source defense layer for Claude Code's browser tools after reading the DeepMind "Agent Traps" paper
Google DeepMind published a paper last month showing how hidden HTML content can hijack AI agents browsing the web. The stats are wild hidden injections alter agent behavior 15-29% of the time, and data exfil attacks succeed 80%+ across five different agents. The core problem: when your agent reads a web page, it parses the raw HTML including content hidden from humans via CSS (display:none, opacity:0, offscreen positioning, etc.). Attackers can embed instructions in these hidden elements. I built a two-layer Python library that sanitizes web content before it reaches the agent: DOM layer JavaScript that strips hidden elements, comments, and offscreen content before text extraction Pattern layer regex scanner for 15+ known injection patterns (instruction overrides, role hijacking, data exfil attempts, etc.) Tested it against a page with 19 embedded injection vectors, all caught at Layer 1 before the regex even fired. It drops into any MCP browser server in ~10 lines of code. No dependencies for the core lib. Repo + demo: github.com/sysk32/trapwatch Inspired by: "AI Agent Traps" by Franklin et al., Google DeepMind (March 2026) — SSRN 6372438 submitted by /u/g0trekt [link] [comments]
View original[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell
Google DeepMind dropped Gemma 4 today: Gemma 4 31B: dense, 256K context, redesigned architecture targeting efficiency and long-context quality Gemma 4 26B A4B: MoE, 26B total / 4B active per forward pass, 256K context Both are natively multimodal (text, image, video, dynamic resolution). We got both running on MAX on launch day across NVIDIA B200 and AMD MI355X from the same stack. On B200 we're seeing 15% higher output throughput vs. vLLM (happy to share more on methodology if useful). Free playground if you want to test without spinning anything up: https://www.modular.com/#playground submitted by /u/carolinedfrasca [link] [comments]
View originalI started a local business completely with Claude.
I started a business and its making money. A handful of clients, a pipeline of prospects, two side projects(one that I have big dreams for). No employees. No funding. Just Claude. I don't want to shill out the business on here because I want to see its natural growth. It is not the one on my profile, that's just my side project. Over the past few months I've built a system on top of Claude that goes way beyond "write me an email" or "debug this function." It's closer to a second brain that operates across every part of my life, fitness, personal finance with persistent memory, custom skills, and compositions that chain skills together automatically. Here's what it actually looks like: 13 custom skills that trigger by name. I call the skill and Claude enters build mode — reads my project architecture file, checks the changelog, saves a version snapshot, then starts coding. I call a reflect skill and it runs a full 7-domain product audit across my entire operation. I call a problem solving skill it recalibrates and it shifts into tactical mode for a negotiation or client conversation, building scripts and leverage maps. Each skill has defined inputs, outputs, and knows which projects it applies to. Persistent memory that compounds across sessions. Not just "remember my name." The system maintains resonance files deep records of how I think, what frameworks I use, how I make decisions, what feedback I've given. When I start a new session, Claude doesn't start from zero. It knows my business architecture, my communication style, my client pipeline status, and the patterns I've validated or rejected over dozens of sessions. This is the part most people miss. Claude gets dramatically better when you invest in teaching it who you are, not just what you want. A daily intelligence briefing that runs at 5AM. I call it the Mind's Eye. Every morning before I wake up, a scheduled task queries my entire operation pipeline movements, invoice statuses, task drift, finance pressure, stalled prospects and compares it to yesterday's state, finds cross-domain connections, and sends me a Gmail digest. I wake up knowing exactly what changed overnight and what needs my attention. It's more than just a dashboard, I see it as perception. Skill compositions — chaining skills together. Individual skills are powerful alone, but the real unlock is combining them. I have chains (sequential pipelines — research a prospect, draft outreach, schedule follow-up) and fusions (multiple skills running in parallel whose outputs converge into something none of them could produce alone). A market-entry fusion might run competitive analysis, prospect research, pricing strategy, and messaging drafts simultaneously, then synthesize them into a complete go-to-market package. An "Infinity Barrier" between the system and reality. This is the part I'm most proud of. The system runs everything at full speed, compositions fire, research compiles, outreach drafts, proposals generate. But nothing touches a client, a prospect, or the real world until I explicitly approve it. Everything deposits into a staging layer. I review it on my own schedule then I approve, edit, or kill. The system is never bottlenecked by me being busy, but I never lose control over what becomes real. To put it how it feels, it feels like it lets the mind run faster than the body without the body tripping over itself. What it runs on: Supabase (database, auth, edge functions) Cloudflare Pages (hosting, DNS, email routing) Claude Pro subscription Gmail MCP, Google Calendar MCP, Cloudflare MCP, Chrome automation No frameworks. Vanilla HTML/CSS/JS. The whole dashboard is browser-native ES modules. What it actually does for my business: Automated prospect research generates new leads daily Outreach drafts are personalized per prospect with research-backed angles Client invoicing flows through Stripe with automated commission calculations Every client site deploys through scripted pipelines The morning briefing catches things I'd miss like a stalled prospect, a deadline I forgot, a cross-domain pattern like "this client's invoice is overdue AND their project tasks are incomplete" The whole philosophy is: automate the mechanical, amplify the strategic, leave the human judgment in my hands. It's extending how many contexts I can hold at once while ensuring nothing happens without my explicit will. I feel like the distinction here between orchestration (which can still be run along with this) is that orchestration runs repeatedly with an expected outcome. This system adapts between sessions while putting the human at the end of the funnel. Based on the decisions I've made previously it will also learn what decisions need to be routed and which ones do not. I'm not a traditional developer. I didn't go to school for CS. I built this because I needed it. Running a business with no budget forces you to figure out what the tools can actually become when you push the
View originalI built a Claude Code skill that combines DeepMind's Aletheia and Anthropic's harness design research into a single pipeline
Two papers dropped within weeks of each other not too long ago, DeepMind's Aletheia and an Anthropic blog post on multi-agent coding architecture. So Aletheia is Google DeepMind's math research agent that matters because it crosses the line from convergent thinking, where AI reproduces known solutions and arrives at established answers, to divergent thinking, where AI generates original, novel mathematical results that didn't exist before, which is the fundamental capability gap that has separated AI from genuine scientific contribution. What's interesting is neither team borrowed from the other. Aletheia had no planner. Anthropic's harness had no chain-of-thought decoupling in the evaluator. There was an obvious synthesis sitting there. So I built it as a Claude Code skill — a Planner → Generator → Evaluator → Reviser pipeline that combines both approaches and adds one thing I haven't seen elsewhere: blind pre-analysis. The evaluator reasons about the correct approach before it ever sees the candidate code. It forms its own expectations first, then grades the solution against them. It's an extension of Aletheia's decoupling idea, but instead of just hiding the chain-of-thought, the evaluator goes in cold. After that it runs the code, grades against concrete criteria (correctness, completeness, security, resilience, quality), and returns a structured verdict (CORRECT / FIXABLE / WRONG) that drives targeted revision. Install: bash mkdir -p ~/.claude/skills/aletheia # clone repo, copy SKILL.md + evaluator.md + planner.md ``` **Usage:** ``` /aletheia Build a rate limiter middleware for Fastify using Redis /aletheia review src/routes/auth.ts /aletheia quick Fix the N+1 query in the dashboard → https://github.com/zhadyz/aletheia-harness submitted by /u/PlanWeak [link] [comments]
View originalWhere’s the Chat in ChatGPT?
To preface, I dislike 4o. 5.1 and 5.4 I really like. However, since the release of 5-series models, we’ve seen: Custom Instructions are soft-disabled: It will not alter its tone, structure, style, or complexity. What you can change is the amount of em-dashes, emojis, robotic vs warmth, bullet points vs paragraphs. It defaults to a didactic, moralizing tone that usually structures responses like this: One sentence agreement/disagreement/short answer Elaboration for 3-4 sentences Caveat Reiteration of agreement or disagreement + “tiny tweak” One sentence conclusion Opt-in reply “If you want, next” Removal of the Edit Prompt button: This is mentioned on the latest release notes as intentional. Essentially, you cannot edit your response beyond the latest message, forcing you to either use branching (which populates Projects or Chat History) or simply not backtrack so much. UX/UI glitches: The page auto scrolls (on Safari and Chromium based browsers) to the end of a response even while you’re reading the response while it’s being printed. This is admittedly minor in relative terms but still annoying. Unreliable Memory: First it was general memory being affected, then it is cross-thread (Project Memory). Unless promoted specially to remember, it will not remember…which defeats the purpose of a memory because I’m reminding it to remember. Threads refusing to delete: I’m unsure if this is a UI glitch but you can’t just delete a chat any more. It will disappear then show up again moments later. This creates a lot of clutter. Adult Mode and overzealous safety: Yeah, I haven’t forgotten. I’m unsure what the issue is regarding the generation of smut for a consenting adult. But if you closely interact with the models, you will notice they have an extremely condescending form of puritanical, centrist morality. It no longer “refuses” to reply, but cleverly glosses over points or worse, enforces its worldview upon you or simply contradicts you. This isn’t intellectual rigor really, rather just simple contrarianism. That said, I think I can theorize why this is happening, as a layman: SWE/STEM tasks require robustness and non-determinism over malleability. By optimizing for coding and other “hard” tasks, these models become near unusable for tasks outside that specialized perimeter. Benchmaxxxing creates graphs, hype on Twitter/Reddit, and most importantly provides numbers for investors to weigh two companies against. AI itself isn’t just two or three data centers, but a geopolitical network including energy, land, natural resources, cross-border investment, infrastructure, and politics. OpenAI and Anthropic are burning cash. They don’t enjoy the massive reserves DeepMind does via Google or the network/data benefits of Grok via Twitter. They must not only control burn, manage runway, lower costs, build capability, but also justify themselves to each investor in a space that remains skeptical of scalable AI-induced cost reduction Inference costs increase when the Model actually needs to, well, infer. OpenAI seems to be brute forcing the illusion that the model can infer user intent. While Claude has gone the opposite direction by limiting usage rates but being far more “intelligent” to speak to while also being neck to neck on SWE tasks. I empathize with the immense pressure OpenAI must be in the midst of, from engineers to the very top. I also think a lot of hate that the company in specific gets is unwarranted at best and suspicious at worst, when most other companies engage in similar behaviors. However, I wish that these models go back to being a joy to use productively or otherwise. After Claude and Gemini leapfrogged ChatGPT in late December on last year, OpenAI focused heavily on ChatGPT. An emergency they have only now declared over. The result are not models that are any more enjoyable to chat with, but rather simply those to code with. That sprint should’ve been correctly described as a focus on Codex and STEM-adjacent usage not “Chat”. Myself I’m not looking for the revival of 4o. Please. That model was as annoying to talk to than 5.2, just in the opposite direction. My favorite models remain 5.4, 5.1, 4.5, and 4.1. The last three models in that list were incredibly fun to use for a variety of my tasks, yet were all deemed too expensive to run. I’m wondering then what models fit my usage case the best? I don’t code, I consult. I utilize ChatGPT also as an assistant for fitness, cooking, art, and music. I think those days are increasingly gone. Claude is great but far too limited in its limits. Gemini just gets worse every time I use it. Grok is absolutely unhinged. GPT models were the best middle ground between all of them. submitted by /u/Goofball-John-McGee [link] [comments]
View original[D] Has "AI research lab" become completely meaningless as a term?
Genuinely asking because I've been thinking about this a lot lately. Like, OpenAI calls itself a research lab. So does Google DeepMind. So do a bunch of much smaller orgs doing actual frontier research with no products at all. And so do many institutes operating out of universities. Are these all the same thing? Because, to use an analogy, it feels like calling both a university biology department and Pfizer "research organizations." This is technically true but kind of useless as a category. My working definition has started to be something like: a real AI research lab is primarily organized around pushing the boundaries of what's possible, not around shipping products for mass markets. The moment your research agenda is downstream of your product roadmap, you're a tech company with an R&D team, which is fine! But it's different. Curious where people draw the line. Is there a lab you'd defend as still genuinely research-first despite being well-known? submitted by /u/Shoddy_Society_4481 [link] [comments]
View originalI built a Claude skill that redesigns any job for the AI Agent era — 6-layer methodology, tested on a real HR case with 61 sub-tasks. Looking for testers!
TL;DR: I built a Claude skill called "Future Work Paradigm Designer" (未来工作范式设计师) that helps you take any job, break it down to granular sub-tasks, figure out which parts AI should handle vs. humans, design a multi-agent collaboration system, and generate an implementation roadmap. Free .skill file attached — looking for people to test it with their own jobs and give feedback. Who am I I work in HR / organizational development at a large company in Asia. I'm not a developer — I'm someone who's been exploring how AI can actually change the way teams work, not just make individual tasks faster. This skill is a product of months of iteration on a core question: what does a job actually look like when you have an AI team working alongside you? The core insight Most people think about AI as a tool — "use ChatGPT to write my email faster." But in the AI Agent era, the real shift is bigger: you go from doing everything yourself to commanding an AI team. The analogy I keep coming back to: a general's value isn't in knowing how to use a sword — it's in knowing how to deploy troops (将军的价值不在于会用剑,而在于排兵布阵). This means the core skills shift from "being good at execution" to three new competencies: Task decomposition — can you break complex work into pieces AI can handle? Resource orchestration — can you coordinate multiple AI agents effectively? Critical judgment — can you make the right call at the moments AI can't? What the skill does: 6-layer methodology The skill walks you through a structured process to redesign any job for human-AI collaboration: Layer What it does 1. Task outline Map out end-to-end tasks 2. SOP deep decomposition Break each task into sub-task level, uncovering the real process 3. Human-AI division Label each sub-task: AI autonomous / AI draft + human review / Human-led + AI assist / Pure human 4. Orchestration design Design the AI team structure — agent roles, coordination, approval gates 5. Future paradigm visualization Generate a system-level view of how everything runs together 6. Implementation roadmap 4-phase path from "start tomorrow" to "full AI team" At the end, it can also generate a presentation PPT + interactive HTML deep-reference — so you can actually take the output to your boss. https://preview.redd.it/5pt6k5cs8zpg1.jpg?width=720&format=pjpg&auto=webp&s=9d0170d683dd77a7250391f5a30247c57bd19aa0 Test case: Annual talent review (OD Director role) I tested it on a real HR scenario — an OD Director running the annual talent review process. Results: 8 major tasks → 61 sub-tasks decomposed 73% of sub-tasks theoretically AI-drivable (30% fully autonomous + 43% AI-drafts-human-reviews) 27% pure human — calibration facilitation, executive communication, political navigation Designed a 6-agent system: Planner, Data Officer, Analyst, Dispatcher, Communications Officer + an Orchestrator (think of it as the "chief of staff" who coordinates all agents) Generated a 4-phase implementation roadmap with 15 specific AI skills to build The key finding: the OD Director's role transforms from "person who does everything" to "commander who only does judgment, decisions, and human communication." Not faster at busywork — freed from busywork entirely. https://preview.redd.it/7ofxtyxw8zpg1.jpg?width=720&format=pjpg&auto=webp&s=99676e17220de18893dc9e4d19bb7d6426a47c8a https://preview.redd.it/6fja4vk49zpg1.png?width=1754&format=png&auto=webp&s=8c25347d42e363cf8e62a49885e7ef1596399dd7 What I'm looking for I'd love for people to test this with their own jobs — any role, any industry. The skill works in both English and Chinese. Specific feedback I'm interested in: Decomposition accuracy — does the SOP breakdown feel true to how you actually work? Does it catch the hidden prep/follow-up work that people usually forget to mention? Human-AI division — do you agree with where it draws the line between AI and human? Any sub-tasks where you think the assignment is wrong? Orchestration design — does the multi-agent structure make sense? Is the "approval gate" concept (审批门) useful? Output usefulness — could you actually take the final PPT/HTML to your boss? Is the "paradigm → methodology → case study" narrative structure convincing? Methodology transferability — does the 6-layer approach work for non-HR jobs? I've only tested it deeply in HR contexts. How to install Download the .skill file (link below) In Claude.ai, go to your profile → Skills/Features Upload/install the .skill file Start a conversation and say something like: "Help me design the future work paradigm for my role" or "我想看看AI时代我的工作应该是什么样" Download link: https://drive.google.com/file/d/1dSlUaIBHgn8GKS99es77VjtqhbgmZSzf/view?usp=sharing A few honest caveats This skill makes theoretical projections, not proven results. The "73% AI-drivable" is based on analysis, not actual implementation. I've deliberately kept the language cautious — it says "theoretically" and "expected to,"
View originalGitHub - Ephyr: Ephemeral infrastructure access for AI agents.
www.ephyr.ai/ https://github.com/EphyrAI/Ephyr Hey everyone, I wanted to introduce Ephyr...because giving an autonomous agent a permanent API key or an open SSH session is pretty suboptimal. Goal: To start, I would like to say, I'm not pitching this as a production ready, polished tool. It is a prototype. I think its ready for the selfhosted community, r/homelab and similar. But I'm really hoping to get input on the architecture and the technical approach to make sure I have no glaring holes. With that said...the tool: Current State: If an orchestrator agent spawns a sub-agent to handle a subtask, it usually just passes down its own credentials. The Model Context Protocol (MCP) is a great transport layer, but it completely lacks a permission propagation and identity layer. How I got here: I had actually been working on a simple access broker for SSH keys so I could use Claude Code to manage infa in my homelab (initially internal as 'Clauth'). A few weeks ago, Google DeepMind published Intelligent AI Delegation (arXiv:2602.11865), and I saw some interesting similarities. Solution: Their paper highlighting this gap and proposing the use of Macaroons as "Delegation Capability Tokens". Ephyr is an open-source, production-ready implementation of that architecture. It sits between agent runtimes and infrastructure, replacing standing credentials with task-scoped, cryptographically attenuated Macaroons. A few architectural decisions I thought folks might appreciate: Pure-Stdlib Macaroons: To minimize supply chain risk on the hot path, I dropped external macaroon libraries and wrote the HMAC-SHA256 caveat chaining from scratch using only Go's crypto/hmac and crypto/sha256. The core HMAC chain is ~300 lines of stdlib, with the full macaroon package coming in around 3,600 lines. The entire broker has exactly 3 direct dependencies. I'm actually incredibly proud of this, I wanted to be lean, efficient code to be one of the core pillars. You can literally run Ephyr on a rPi. The Effective Envelope Reducer: Macaroons natively prove caveat accumulation, but not semantic attenuation. Ephyr solves this with a deterministic reducer that enforces strict narrowing across delegation hops using set intersections, minimums, and boolean ANDs. The HMAC chain proves no caveats were stripped; the reducer proves the authority actually shrank. Also pairing this with task level filtering makes a powerful combo. Epoch Watermarking: Traditional JTI blocklists for revocation require O(N) memory growth and make cascading revocation a nightmare. Ephyr uses an Epoch Watermark map keyed by task ULID. Validation walks the token's lineage array in O(depth), meaning revoking a parent instantly kills all descendants with a single map entry. Again, incredibly fast and efficient. Proof-of-Possession (PoP): Because Macaroons are bearer tokens, I implemented a two-phase delegation bind to kill replay attacks. The parent creates an unbound token; the child independently generates an ephemeral Ed25519 keypair and binds its public key to the task. All subsequent requests require a PoP signature over a nonce and the request body hash. The broker currently supports ephemeral SSH certificate issuance, HTTP credential injection, and federated MCP server routing. Performance-wise, auth takes <1ms, Macaroon verification takes ~32µs, and the full PoP pipeline runs in ~132µs. I've included highly detailed security and identity whitepapers (in docs/whitepapers/) and a full threat model (docs/THREAT_MODEL.md) in the repository. Caveats: I think it goes without saying in this sub, but I did use AI and agentic development tools in the process (namely CC), but I professionally I have spent most of my career in cybersec/machine learning/data science space, so I try and get in the minutia and code as much as possible. The architecture is my own, but built on fundamental building blocks and research that came before me. submitted by /u/-Crash_Override- [link] [comments]
View originalI’m not a developer, a doctor, or a writer. AI — and Claude specifically — gave me a seat at the table anyway.
I keep seeing the same take recycled: AI is making people dumber, lazier, more dependent. And every time, I think — that criticism is coming from people who already had access to the things AI gives me for the first time. Let me be specific. I’ve had health issues since my early twenties that I just lived with. Morning joint stiffness, chest heaviness waking up, chronically bad sleep I didn’t know was bad because I had nothing to compare it to. I’m not someone who can casually schedule specialist appointments every time something feels off. That’s not how life works for a lot of people. AI helped me put words to what I was experiencing in medical terms I didn’t have. It helped me understand what my Apple Watch data was actually telling me — that my deep sleep was consistently low, that my respiratory rate was spiking at night, that these were patterns worth investigating, not just “bad sleep.” When I ended up in the ER with a serious blood pressure spike, I already had context for the conversation with the doctor. That’s not replacing a doctor. That’s showing up as a better patient. I’m not in therapy. Maybe I should be, but I’m not, and that’s the reality for a lot of people. AI gives me a space to process things without performing for another human. No judgment, no social cost, no score being kept. I can think out loud, contradict myself, come back three days later and pick up where I left off. That’s not replacing professional help — it’s a pressure valve for someone who otherwise has nothing. But the biggest thing is communication. I have complex thoughts. The gap between what I think and what I can actually get out of my mouth or onto a page has always been brutal. Conversations move on before I’ve found my words. Once something is said wrong, it can’t be unsaid. I’ve stayed silent in discussions I had real contributions to because I couldn’t find the entry point. AI helps me get what’s already in my head into a form other people can engage with. That’s not dependency. That’s accessibility. And when you don’t know what you don’t know, you can’t Google it. You can’t search for something you don’t have the vocabulary for. AI bridges that gap. I’ve gone deep into philosophy of mind, hardware engineering, institutional theory, long-form writing — not because AI spoon-fed me answers, but because it helped me ask better questions. I want to be honest about something though. AI lets me do things I couldn’t do before — write code, build systems, draft things that would normally take years of specialized training. And I want to acknowledge the people who actually learned those crafts. The developers, the engineers, the writers, the people who earned real expertise through years of work. I am not operating at their level and I know that. That’s a fair criticism. But working with AI isn’t just typing “build this for me” and copying whatever comes out. It’s choreographed. It’s back and forth — “that’s kind of what I wanted but let’s bring it closer to this,” “that’s not my voice, I sound more like…,” “no, that function needs to do this instead.” The vision is mine. The direction is mine. The decisions about what stays, what goes, what gets refined — those are mine. AI is the instrument, but I’m still the one playing it. The result is something genuinely mine even if the process looks different than how it’s traditionally been done. Now let me give credit where it’s due. Most AI models can do these things to some degree. But there’s a difference between a model that can do them and one that does them well enough that you actually want to keep coming back. For me that’s Claude and it’s not close. I’ve used other models. They work. But Claude is the most well-rounded experience I’ve found. It knows when to be personable and when to be formal. It doesn’t talk down to me but it doesn’t assume I already know everything either. When I go deep it goes with me. When I’m wrong it tells me without making me feel stupid for being wrong. That tone matters more than people realize — because if the tool doesn’t feel good to use, you stop using it. And for someone who depends on it for health awareness, mental health, communication, and learning, the difference between a model I tolerate and a model I trust is the difference between having access and not. Anthropic got that right and they deserve to hear it. The “AI makes people dumber” critique isn’t wrong about everyone. But it’s being applied with a broad brush that erases people like me — people who aren’t starting from a position of advantage, who don’t have professionals on speed dial, who have always been one step behind because the tools everyone else had were never built for how we process. AI isn’t making me dumber. It’s the first tool that’s ever made the playing field something close to level. submitted by /u/AbbreviationsNice810 [link] [comments]
View original[P] Karpathy's autoresearch with evolutionary database.
Integrated an evolutionary database to Karpathy's autoresearch project that replaces the simple tsv file based logging in the original project. Evolutionary algorithms have shown to be a powerful tool for autonomously discovering optimal solutions to problems with large search spaces. Famously, Google DeepMind's AlphaEvolve system uses evolutionary algorithms to discover state of the art matrix multiplication algorithms. The implementation of the evolutionary database itself is based heavily on the implementation in OpenEvolve. Would love thoughts and suggestions from the community. Check it out: https://github.com/hgarud/autoresearch submitted by /u/hgarud [link] [comments]
View original[P] Karpathy's autoresearch with evolutionary database.
Integrated an evolutionary database to Karpathy's autoresearch project that replaces the simple tsv file based logging in the original project. Evolutionary algorithms have shown to be a powerful tool for autonomously discovering optimal solutions to problems with large search spaces. Famously, Google DeepMind's AlphaEvolve system uses evolutionary algorithms to discover state of the art matrix multiplication algorithms. The implementation of the evolutionary database itself is based heavily on the implementation in OpenEvolve. Would love thoughts and suggestions from the community. Check it out: https://github.com/hgarud/autoresearch submitted by /u/hgarud [link] [comments]
View originalCreative Writing - DeepSeek-V3.2 vs. Claude Sonnet 4.6 vs. Claude Sonnet 4.5
Prompt Can you write a short story. In this story Ayn Rand mysteriously finds herself reanimated in the year 2026 in America. She's immediately angry to discover that death wasn't 'the end' and becomes increasingly angry as she explores the modern world and sees how things have changed. For the writing style, favor longer paragraphs and slightly lyrical prose. Strong language is permitted where appropriate. For the tone blend deep philosophical gravitas with dry absurdist humor. DeepSeek-V3.2 The first thing Ayn Rand became aware of was the absence. Not of light or sound, but of the finality she had counted on. Death, it seemed, was not the end, but a rather rude interruption, a cosmic oversight now being corrected without her consent. She was standing on a sidewalk, the peculiar, frictionless ground of it unsettlingly smooth. The air was warm, unnaturally so for what her internal clock insisted was a late autumn morning, and thick with a scent she didn't recognize—a cloying sweetness of synthetic vanilla and burnt sugar. She looked down at her hands. Solid. Her mind, a fortress of logic and will, was intact, its gates slammed shut against the chaos that was already beginning to press against them. The first feeling, before the anger, before the philosophical horror, was a profound and personal sense of betrayal. She had wagered everything on the nothingness of death, on the dignity of a universe that ultimately didn't care. To be dragged back was not just an inconvenience; it was a logical fallacy made flesh. Her surroundings were a low, sprawling temple to some new god. A vast parking lot, empty save for a few gleaming, silent vehicles, surrounded a building of glass and corrugated metal. Above its wide entrance, a giant, stylized coffee cup, rendered in pixels, emitted a pulsating, sickly green light. The Door. It didn't open when she approached. It slid aside with a faint, hydraulic sigh. Inside, the assault on her senses intensified. The air was a cacophony of competing noises: a hissing steam wand, a relentless thumping beat of what she assumed was music but sounded like a malfunctioning engine, and a symphony of beeps, chimes, and robotic voices announcing the readiness of food items. And the people. They sat in a state of abject, bovine surrender, their eyes fixed on rectangles of glowing glass propped before them. They did not talk to one another. They did not read, not in any real sense, their eyes merely scrolling. They stared, fingers twitching, occasionally lifting a cup to their lips with the mechanical obedience of well-programmed automatons. One of them, a young man with a face full of metal rings and hair the color of a cartoon strawberry, glanced up, his gaze passing over her without a flicker of recognition or interest. He looked back at his glowing rectangle. She had been dismissed by a creature who had clearly dismissed himself first. She approached a counter where a list of items and their prices glowed on another screen. A girl with a vacant expression and a voice devoid of affect asked, "Welcome to Grind. What can we get started for you today?" Rand stared at her. The girl's eyes were slightly unfocused, looking just past Rand's shoulder. "Coffee," Rand said, the word tasting like a relic. "Black." The girl tapped the screen. "That'll be seven fifty-two." Seven dollars and fifty-two cents. For a cup of coffee. The currency of her America, the tangible representation of one's earned value, had apparently become a plaything. She had no money. The absurdity of it—reanimated from the dead and undone by the price of a beverage—was a flicker of dark humor she immediately crushed. She turned and walked out, the girl's attention already returned to whatever phantom was beckoning from her own rectangle. The city beyond was a fever dream of American decline, a grotesque caricature of the collectivist nightmares she had spent her life dissecting. The streets were choked with vehicles that moved not with the confident roar of combustion, but with a sinister, electric hum. They drove themselves. People sat inside them, also staring at rectangles. The pinnacle of human achievement, the act of piloting a machine, of mastering a path through space, had been outsourced to a machine so they could consume more drivel. On the corners, people in ill-fitting clothes made incomprehensible gestures at their wrists, speaking into the air. "I'm literally dying," one of them said, her face slack with boredom as she articulated her own non-existent mortality. A group of tourists, their bodies soft and uniformly dressed, blocked the sidewalk, each one holding a rectangle at arm's length to capture an image of a mundane building across the street. They weren't seeing the building; they were seeing it on their screens. They were mediating reality through a device, ensuring they never actually had to experience it. She found a public bench and sat, the sheer volume of the irrational threatening to overwhe
View original[NEWS] TECHNICAL UPDATE: THE COALITION AGAINST THE PENTAGON BLACKLIST
TL;DR: The confrontation between Anthropic and the Trump administration has escalated into a rare industry-wide alliance. Following two federal lawsuits from Anthropic, a coalition of OpenAI and Google researchers has filed in support of their rival, while major cloud providers (AWS, Google, Microsoft) have signaled a landmark defiance of the Pentagon’s commercial blacklist. TECHNICAL UPDATE: THE COALITION AGAINST THE PENTAGON BLACKLIST (MARCH 10, 2026) As of 10:45 EST, the fallout from the supply chain risk designation has moved beyond a procurement dispute and into a full-scale industry revolt. The narrative is no longer just about one lab’s safety rules; it is about whether the federal government can legally use national security tools to punish American companies for their ethical red lines. THE “RIVALS UNITE” AMICUS BRIEF In an unprecedented move, 30+ researchers from OpenAI and Google DeepMind—traditionally Anthropic’s fiercest competitors—filed an amicus brief on Monday evening. * The Google Signal: Google Chief Scientist Jeff Dean signed the brief in a personal capacity, a move widely seen as a rejection of the administration’s "security risk" framing. * The “Chilling Effect”: The brief argues that weaponizing the FASCSA (supply chain risk) label to punish safety guardrails will effectively silence the technical community, deterring experts from speaking openly about AI risks to avoid federal retaliation. * Alternative Remedies: The researchers pointed out that if the Pentagon was unhappy with Anthropic’s terms, they could have simply canceled the contract rather than issuing an industry-wide blacklist typically reserved for foreign adversaries. THE CLOUD PROVIDER REVOLT In a direct challenge to the administration’s threat to ban “any commercial activity” with Anthropic, the world’s three largest cloud providers have issued quiet but firm assurances to their customers: * Microsoft, AWS, and Google Cloud have all confirmed that Claude will remain available on their platforms (Vertex AI, Bedrock, and Azure) for all non-defense commercial and academic workloads. * Legal teams at these giants have concluded that the Pentagon’s authority is limited to federal procurement and cannot legally sever private commercial relationships between American firms. This effectively walls off the “Department of War” from the rest of the global economy. THE “IRAN” PARADOX New reports indicate a massive contradiction in the government’s case: Anthropic’s technology was reportedly used for intelligence analysis and targeting in operations related to Iran right up until the ban was issued. * The Contradiction: The administration is labeling Anthropic a “security risk” while simultaneously relying on its precision and reliability for active military theaters. * The Targeting Gap: Military officials are reportedly scrambling to replace Claude’s specific “targeting suggestions” capabilities, as the 6-month phase-out creates an immediate void in intelligence processing. LITIGATION DEEP DIVE: THE TWO-FRONT WAR Anthropic's legal counter-offensive is targeting two different legal "levers": 1. Northern District of California (Civil Complaint): Focuses on First and Fifth Amendment violations. It alleges the administration is engaging in “unlawful viewpoint-based retaliation” by trying to destroy the company’s economic value because it refused to allow Claude to be used for mass domestic surveillance. 2. D.C. Circuit Court of Appeals (FASCSA Review): Challenges the supply chain risk label itself. Anthropic argues the Pentagon bypassed mandatory procedures and applied a tool meant for foreign adversaries (like Huawei) to a domestic firm with no ties to hostile nations. Sources: * AP News – Anthropic sues Trump administration seeking to undo 'supply chain risk' designation * WIRED – OpenAI and Google Workers File Amicus Brief in Support of Anthropic * Lawfare – Anthropic Challenges the Pentagon's Supply Chain Risk Determination * The-Decoder – Despite Pentagon ban, Google, AWS, and Microsoft stick with Anthropic's AI models submitted by /u/Acceptable_Drink_434 [link] [comments]
View originalI analyzed 15 competitors in the AI coding assistants space — here's what I found
I built a Claude Code skill that dispatches 6 parallel research agents to analyze any market in ~20 minutes. Ran it on the AI coding assistants space (the tools we all use every day) and the results were... eye-opening. 15 competitors analyzed with real web data — reviews, pricing, forums, funding, hiring signals. Here's what the agents found. 1. Every single competitor has pricing complaints — it's the #1 pain point in the entire market Cursor's June 2025 credit switch was the most discussed negative event. Users report $40-50/mo effective cost vs. advertised $20. One team's $7,000 annual subscription depleted in a single day. JetBrains users report credits consumed when AI isn't even active. Augment users calculated a 10x+ hidden price increase after their October 2025 credit switch. Not one competitor has figured out pricing that developers actually trust. 2. Cursor hit $2B ARR with near-zero marketing spend — the fastest-growing SaaS company ever From $100M to $2B in 12 months. No ads, no content marketing, just product-led growth and word-of-mouth. They didn't hire a single enterprise sales rep until after $200M ARR. 60% of revenue now comes from enterprise contracts that started as individual developers bringing Cursor into their teams. 3. Developers use 2-3 tools simultaneously — nobody owns the full workflow The dominant pattern in forums: Cursor or Copilot for daily autocomplete, Claude Code for hard reasoning problems, Copilot for GitHub integration, Cline as a budget fallback. No tool is "the one." The market is segmenting into 5 tiers: autocomplete, editor-native agents, execution-depth agents, enterprise codebase tools, and open-source BYOK tools. 4. 46% of developers don't trust AI coding output — despite 73% using it daily Stack Overflow 2025: experienced developers have the lowest "highly trust" rate (2.6%) and highest "highly distrust" rate (20%). A controlled study showed developers were 19% slower with AI tools despite believing they were 20% faster. AI-generated code has 41% higher churn than human-written code. The productivity illusion is real. 5. Windsurf no longer exists as an independent company The founders went to Google DeepMind in a $2.4B reverse-acquihire. The remaining team/product was acquired by Cognition (Devin) for ~$250M. An attempted $3B OpenAI acquisition collapsed due to Microsoft IP concerns. The brand lives on but under entirely different leadership. Summary Table Competitor Pricing Best For Biggest Weakness Cursor $20-200/mo (credits) AI-native IDE with fastest features Pricing backlash, security gaps GitHub Copilot $10-39/mo (per-seat) Teams in GitHub ecosystem Poor codebase context awareness Windsurf $15-60/mo (credits) Budget AI IDE Instability, acquisition uncertainty Cline Free (BYOK) Cost control, open-source trust No autocomplete, UX for non-power-users Augment $20-200/mo (credits) Large enterprise codebases Pricing controversy, low brand awareness Tabnine $39-59/user/mo Air-gapped / regulated industries Feature gap widening vs. agentic competitors Sourcegraph Cody $59/user/mo (enterprise) Massive legacy codebases Fragmentation, individual plans discontinued Amazon Q Free-$19/mo AWS-native development Useless outside AWS ecosystem JetBrains AI $10-30/mo (credits) Existing JetBrains IDE users Credit consumption crisis I published the full analysis (competitors report, pricing landscape, feature matrix, and 9 battle cards) here: github.com/ferdinandobons/startup-skill-examples/analyses/ai-coding-assistants/ Generated with an open-source skill I built for Claude Code. If you want to analyze your own market: github.com/ferdinandobons/startup-skill submitted by /u/ferdbons [link] [comments]
View originalGoogle DeepMind uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Gemma 4, Nano Banana 2 🍌, Lyria 3, Genie 3, Gemini 3, WeatherNext 2, Gemini Robotics, Models.
Google DeepMind is commonly used for: Gemini 3.1 Flash-Lite: Built for intelligence at scale.
Based on 19 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Philipp Schmid
Tech Lead at Hugging Face
5 mentions