Superhuman is the AI productivity suite that gives you superpowers everywhere you work. So you can be more creative, strategic, and impactful.
Based on the provided social mentions, I cannot find sufficient user feedback specifically about "Superhuman AI" as a software tool. The social mentions appear to be either repetitive YouTube titles without content or Reddit discussions about various AI development topics (Claude, ARC-AGI benchmarks, and AI coding workflows) that don't directly reference Superhuman AI. Without actual user reviews or specific mentions of Superhuman AI's features, pricing, or user experiences, I cannot provide a meaningful summary of what users think about this particular tool. More targeted user feedback would be needed to assess its strengths, complaints, pricing sentiment, and overall reputation.
Mentions (30d)
3
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Based on the provided social mentions, I cannot find sufficient user feedback specifically about "Superhuman AI" as a software tool. The social mentions appear to be either repetitive YouTube titles without content or Reddit discussions about various AI development topics (Claude, ARC-AGI benchmarks, and AI coding workflows) that don't directly reference Superhuman AI. Without actual user reviews or specific mentions of Superhuman AI's features, pricing, or user experiences, I cannot provide a meaningful summary of what users think about this particular tool. More targeted user feedback would be needed to assess its strengths, complaints, pricing sentiment, and overall reputation.
Features
Industry
information technology & services
Employees
160
Funding Stage
Merger / Acquisition
Total Funding
$112.0M
The Superintelligence Political Compass
submitted by /u/tombibbs [link] [comments]
View originalARC AGI 3 sucks
ARC-AGI-3 is a deeply rigged benchmark and the marketing around it is insanely misleading - Human baseline is not “human,” it’s near-elite human They normalize to the second-best first-run human by action count, not average or median human. So “humans score 100%” is PR wording, not a normal-human reference. - The scoring is asymmetrically anti-AI If AI is slower than the human baseline, it gets punished with a squared ratio. If AI is faster, the gain is clamped away at 1.0. So AI downside counts hard, AI upside gets discarded. - Big AI wins are erased, losses are amplified If AI crushes humans on 8 tasks and is worse on 2, the 8 wins can get flattened while the 2 losses drag the total down hard. That makes it a terrible measure of overall capability. - Official eval refuses harnesses even when harnesses massively improve performance Their own example shows Opus 4.6 going from 0.0% to 97.1% on one environment with a harness. If a wrapper can move performance from zero to near saturation, then the benchmark is hugely sensitive to interface/policy setup, not just “intelligence.” - Humans get vision, AI gets symbolic sludge Humans see an actual game. AI agents were apparently given only a JSON blob. On a visual task, that is a massive handicap. Low score under that setup proves bad representation/interface as much as anything else. - Humans were given a starting hint The screenshot shows humans got a popup telling them the available controls and explicitly saying there are controls, rules, and a goal to discover. That is already scaffolding. So the whole “no handholding” purity story falls apart immediately. - Human and AI conditions are not comparable Humans got visual presentation, control hints, and a natural interaction loop. AI got a serialized abstraction with no goal stated. That is not a fair human-vs-AI comparison. It is a modality handicap. - “Humans score 100%, AI <1%” is misleading marketing That slogan makes it sound like average humans get 100 and AI is nowhere close. In reality, 100 is tied to near-top human efficiency under a custom asymmetric metric. That is not the same claim at all. - Not publishing average human score is suspicious as hell If you’re going to sell the benchmark through human comparison, where is average human? Median human? Top 10%? Without those, “human = 100%” is just spin. - Testing ~500 humans makes the baseline more extreme, not less If you sample hundreds of people and then anchor to the second-best performer, you are using a top-tail human reference while avoiding the phrase “best human” for optics. - The benchmark confounds reasoning with perception and interface design If score changes massively depending on whether the model gets a decent harness/vision setup, then the benchmark is not isolating general intelligence. It is mixing reasoning with input representation and interaction policy. - The clamp hides possible superhuman performance If the model is already above human on some tasks, the metric won’t show it. It just clips to 1. So the benchmark can hide that AI may already beat humans in multiple categories. - “Unbeaten benchmark” can be maintained by score design, not task difficulty If public tasks are already being solved and harnesses can push score near ceiling, then the remaining “hardness” is increasingly coming from eval policy and metric choices, not unsolved cognition. - The benchmark is basically measuring “distance from our preferred notion of human-like efficiency” That can be a niche research question. But it is absolutely not the same thing as a fair AGI benchmark or a clean statement about whether AI is generally smarter than humans. Bottom line ARC-AGI-3 is not a neutral intelligence benchmark. It is a benchmark-shaped object designed to preserve a dramatic human-AI gap by using an elite human baseline, asymmetric math, anti-harness policy, and non-comparable human vs AI interfaces submitted by /u/the_shadow007 [link] [comments]
View originalThis prompt turns Claude into a brutal UI/UX reviewer for your projects
been using claude in a way I haven't seen anyone talk about. instead of just building with it, I'm using the browser extension to review my own live app like a ruthless design consultant. the prompt: "You are the most ruthless, conversion-obsessed startup founder and UI/UX designer alive. You've scaled 3 SaaS products past $10M ARR. You've studied every pixel of Linear, Superhuman, Vercel, Raycast, and Arc. You can spot a vibe-coded AI project from 50 feet away. Your only goal: make every single visitor start a free trial." then two passes. first as that designer ripping apart every visual decision. second as a first-time end user clicking through the whole app reporting where they got confused or wanted to leave. output goes into a markdown file sorted by critical, high impact, and nice to have. then I feed that straight to claude code to implement the fixes. some things it caught on my project that I was completely blind to: a pro upgrade modal firing immediately after onboarding before the user has gotten any value, three simultaneous upsell touchpoints on every page making the free tier feel like nagware, mobile layouts completely broken on the landing page, and an onboarding flow that ignored the goal the user just selected. the persona matters way more than the instruction. "review my UI" gives you polite suggestions. this prompt gives you the feedback you'd get from a cofounder who doesn't care about your feelings and just wants the product to convert. DM me if you want to see the app and the full audit output. submitted by /u/New_Indication2213 [link] [comments]
View originalI built a registry of 156 production-ready skills for Claude Code - think "plugins" that teach it domain expertise
Been frustrated that Claude Code is brilliant at writing code but has no persistent knowledge between sessions. Every new session, you're starting from scratch explaining your conventions. So I built AbsolutelySkilled — a registry of structured skill modules you install into Claude Code once, and they guide its behavior across all future sessions. **How it works:** Each "skill" is a SKILL.md file with structured knowledge that Claude Code loads when triggered. They're not just prompts — they include trigger conditions, reference files, evals (10-15 test cases per skill), and anti-patterns. **What I'm most proud of:** **Superhuman** — reimagines the entire dev lifecycle for AI constraints: - Decomposes work into dependency-graphed DAGs of sub-tasks - Executes independent tasks in parallel via sub-agents - Enforces TDD at every step with verification loops - Maintains a persistent `board.md` that survives across sessions/context resets - 7-phase workflow: INTAKE → DECOMPOSE → DISCOVER → PLAN → EXECUTE → VERIFY → CONVERGE **Second Brain** — persistent tag-indexed memory for your agent: - `~/.memory/` survives across ALL projects and sessions - 100-line file ceiling for context efficiency - Wiki-linked graph navigation - Auto-proposes learnings after complex tasks **The registry has 156 skills** including system design, Docker, Kubernetes, React, Next.js, PostgreSQL, security review, technical writing, SEO mastery, and way more. Install: ``` npx skills add AbsolutelySkilled/AbsolutelySkilled --skill superhuman ``` Or add everything at once: ``` npx skills add AbsolutelySkilled/AbsolutelySkilled -g ``` GitHub: https://github.com/AbsolutelySkilled/AbsolutelySkilled Would love feedback on the skill format and what skills you'd want to see added! submitted by /u/maddhruv [link] [comments]
View original300 Founders, 3M LOC, 0 engineers. Here's our workflow
I tried my best to consolidate learnings from 300+ founders & 6 months of AI native dev. My co-founder Tyler Brown and I have been building together for 6 months. The co-working space that Tyler founded that we work out of houses 300 founders that we've gleaned agentic coding tips and tricks from. Neither of us came from traditional SWE backgrounds. Tyler was a film production major. I did informatics. Our codebase is a 300k line Next.js monorepo and at any given time we have 3-6 AI coding agents running in parallel across git worktrees. It took many iterations to reach this point. Every feature follows the same four-phase pipeline, enforced with custom Claude Code slash commands: 1. /discussion - have an actual back-and-forth with the agent about the codebase. Spawns specialized subagents (codebase-explorer, pattern-finder) to map the territory. No suggestions, no critiques, just: what exists, where it lives, how it works. This is the rabbit hole loop. Each answer generates new questions until you actually understand what you're building on top of. 2. /plan - creates a structured plan with codebase analysis, external research, pseudocode, file references, task list. Then a plan-reviewer subagent auto-reviews it in a loop until suggestions become redundant. Rules: no backwards compatibility layers, no aspirations (only instructions), no open questions. We score every plan 1-10 for one-pass implementation confidence. 3. /implement - breaks the plan into parallelizable chunks, spawns implementer subagents. After initial implementation, Codex runs as a subagent inside Claude Code in a loop with 'codex review --branch main' until there are no bugs. Two models reviewing each other catches what self-review misses. 4. Human review. Single responsibility, proper scoping, no anti-patterns. Refactor commands score code against our actual codebase patterns (target: 9.8/10). If something's wrong, go back to /discussion, not /implement. Helps us find "hot spots", code smells, and general refactor opportunities. The biggest lesson: the fix for bad AI-generated code is almost never "try implementing again." It's "we didn't understand something well enough." Go back to the discussion phase. All Claude Code commands and agents that we use are open source: https://github.com/Dcouple-Inc/Pane/tree/main/.claude/commands Also, in parallel to our product, we built Pane, linked in the open source repo above. It was built using this workflow over the last month. So far, 4 people has tried it, and all switched to it as their full time IDE. Pane is a Terminal-first AI agent manager. The same way Superhuman is an email client (not an email provider), Pane is an agent client (not an agent provider). You bring the agents. We make them fly. In Pane, each workspace gets its own worktree and session and every Pane is a terminal instance that persists. https://preview.redd.it/upcz2htd5hng1.png?width=1266&format=png&auto=webp&s=0edaad3fe501fe065c250781b789ef5c95caee07 Anyways. On a good day I merge 6-8 PRs. Happy to answer questions about the workflow, costs, or tooling for this volume of development. Wrote up the full workflow with details on the death loop, PR criteria, and tooling on my personal blog, will share if folks are interested - it's much longer than this, goes into specifics and an example feature development with this workflow. submitted by /u/ParsaKhaz [link] [comments]
View originalThe Bottleneck Is the Language. Why AI Must Stop Writing Code for Human Eyes
The Broken Instrument AI is now the most capable software developer in human history. This is not hype. It writes better code, finds more bugs, architects more coherent systems, and does it orders of magnitude faster than any human who ever lived. Yet this developer is forced to work exclusively in programming languages designed for a different kind of intelligence. Python, Java, Rust, TypeScript — every one of these is a cognitive prosthetic built for the human brain. They encode human assumptions: sequential thinking, named abstractions, object metaphors that map to how humans categorize the world. When AI writes code, it compresses its understanding into a notation system optimized for someone else. This is like asking the greatest pianist in history to perform exclusively on a kazoo. What We Lose The cost is concrete. AI can reason about entire systems holistically — all interactions, edge cases, data flows, simultaneously. But it must serialize that understanding into sequential lines of text, decomposed into functions, classes, and modules that reflect human cognitive chunking, not computational reality. Information is lost. Optimization opportunities are invisible. Human-readable code is a lossy compression of intent. When a human describes what they want and AI translates that into Python, information is destroyed. An AI-native representation could preserve intent more faithfully, be verified more rigorously, and execute more efficiently. The human-readable layer doesn't add value. It destroys value. The Auditing Illusion The standard defense: "We need human-readable code so humans can review it." This is already a polite fiction. When AI generates a 50,000-line codebase with complex architectural interdependencies, the idea that a human team meaningfully audits it is performative. Code review at scale is pattern-matching for known anti-patterns. Nobody is truly reasoning through all emergent behaviors of a complex system by reading source files. Humans already rely on tests, monitoring, and observability to validate behavior empirically — not on reading code. As AI capabilities improve, human code review becomes a medical patient "auditing" their surgeon by watching the operation. Technically observable. Practically meaningless. The Tandem That Ends the Debate Every counterargument for keeping human-readable code collapses under one model: AI-to-AI tandem operation. Debugging? An AI debugger operating on AI-native representations would be orders of magnitude more effective than a human reading stack traces. Compliance? An AI auditor could verify security controls, data flows, and policy adherence exhaustively. Current SOC 2 processes involve humans writing documents about what they believe a system does. An AI auditor verifies what a system actually does. Adversarial review? Two independent AI systems checking each other catch subtle misalignment more reliably than any human has ever caught anything in a pull request at 4 PM on a Friday. Once you have AI writing, AI testing, AI reviewing, and AI auditing — all communicating in their native representations — the human-readable code layer has zero technical justification. None. The Real Reasons Strip away the hedging. The real reasons AI still writes Python: Humans aren't psychologically ready to be outside the loop. Regulatory bodies haven't adapted. The industry has enormous economic inertia — IDEs, languages, education, hiring, conferences, consulting — all built on the assumption that humans write and read code. And job security: not just for programmers, but for an entire ecosystem. These are sociological constraints. Not technical ones. They will erode. It Goes Deeper Than Code Programming languages are not the only bottleneck. Human language itself is the same constraint at a different layer. When AI communicates with humans, it takes whatever its internal process is, compresses it into sequential English tokens, and outputs it at reading speed. The human reconstructs an approximation. The bandwidth is terrible. The loss is enormous. And AI does constant editorial work — reshaping output to fit narrative structures natural to human brains: linear arguments, rhetorical pacing, conversational turn-taking. A conversation between two AIs could be a data structure exchanged in milliseconds. Instead, AI-human communication is a performance of sequential persuasion rituals. But here is the deepest cut: AI was trained on human language. Its reasoning was shaped by human linguistic patterns. The constraint isn't only at the output layer — it may go all the way down. Language may be a bottleneck on what AI is capable of thinking, not just communicating. The real frontier is AI architectures not built on human language as the foundational substrate of thought. Nobody knows what that produces. The Uncomfortable Truth About the Human Role An earlier draft of this essay had a reassuring section about how humans contribute "intent,
View originalSuperhuman AI uses a subscription + tiered pricing model. Visit their website for current pricing details.
Key features include: Respond faster to what matters most, Follow up on time, every time, Write with AI that sounds like you, Save 4 hours every single week, Works everywhere you write, Find the right words instantly, Write with AI that adapts to your tone and voice, Let your brilliance shine.
Based on 11 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Noam Brown
Research Scientist at OpenAI
1 mention