Magic is an AI company that is working toward building safe AGI to accelerate humanity’s progress on the world’s most important problems.
Based on the social mentions provided, users generally view "Magic" (referring to Magic AI and AI tools like Claude) quite positively, though with realistic expectations. Users praise AI as genuinely useful for everyday tasks like file management, drafting, and basic automation, with several developers successfully building complex applications (games, mobile apps, production tools) using AI assistance despite having limited experience in those domains. However, users also emphasize that AI isn't actually "magic" - it has clear limitations when pushed beyond basic use cases and requires realistic expectations about its capabilities. The overall sentiment suggests AI tools are seen as valuable productivity enhancers and coding assistants, but users maintain a balanced perspective on what these tools can and cannot achieve.
Mentions (30d)
12
3 this week
Reviews
0
Platforms
5
Sentiment
0%
0 positive
Based on the social mentions provided, users generally view "Magic" (referring to Magic AI and AI tools like Claude) quite positively, though with realistic expectations. Users praise AI as genuinely useful for everyday tasks like file management, drafting, and basic automation, with several developers successfully building complex applications (games, mobile apps, production tools) using AI assistance despite having limited experience in those domains. However, users also emphasize that AI isn't actually "magic" - it has clear limitations when pushed beyond basic use cases and requires realistic expectations about its capabilities. The overall sentiment suggests AI tools are seen as valuable productivity enhancers and coding assistants, but users maintain a balanced perspective on what these tools can and cannot achieve.
Features
Industry
information technology & services
Employees
110
Funding Stage
Venture (Round not Specified)
Total Funding
$610.9M
Show HN: Oxyde – Pydantic-native async ORM with a Rust core
Hi HN! I built Oxyde because I was tired of duplicating my models.<p>If you use FastAPI, you know the drill. You define Pydantic models for your API, then define separate ORM models for your database, then write converters between them. SQLModel tries to fix this but it's still SQLAlchemy underneath. Tortoise gives you a nice Django-style API but its own model system. Django ORM is great but welded to the framework.<p>I wanted something simple: your Pydantic model IS your database model. One class, full validation on input and output, native type hints, zero duplication. The query API is Django-style (.objects.filter(), .exclude(), Q/F expressions) because I think it's one of the best designs out there.<p><i>Explicit over implicit.</i> I tried to remove all the magic. Queries don't touch the database until you call a terminal method like .all(), .get(), or .first(). If you don't explicitly call .join() or .prefetch(), related data won't be loaded. No lazy loading, no surprise N+1 queries behind your back. You see exactly what hits the database by reading the code.<p><i>Type safety</i> was a big motivation. Python's weak spot is runtime surprises, so Oxyde tackles this on three levels: (1) when you run makemigrations, it also generates .pyi stub files with fully typed queries, so your IDE knows that filter(age__gte=...) takes an int, that create() accepts exactly the fields your model has, and that .all() returns list[User] not list[Any]; (2) Pydantic validates data going into the database; (3) Pydantic validates data coming back out via model_validate(). You get autocompletion, red squiggles on typos, and runtime guarantees, all from the same model definition.<p><i>Why Rust?</i> Not for speed as a goal. I don't do "language X is better" debates. Each one is good at what it was made for. Python is hard to beat for expressing business logic. But infrastructure stuff like SQL generation, connection pooling, and row serialization is where a systems language makes sense. So I split it: Python handles your models and business logic, Rust handles the database plumbing. Queries are built as an IR in Python, serialized via MessagePack, sent to Rust which generates dialect-specific SQL, executes it, and streams results back. Speed is a side effect of this split, not the goal. But since you're not paying a performance tax for the convenience, here are the benchmarks if curious: <a href="https://oxyde.fatalyst.dev/latest/advanced/benchmarks/" rel="nofollow">https://oxyde.fatalyst.dev/latest/advanced/benchmarks/</a><p>What's there today: Django-style migrations (makemigrations / migrate), transactions with savepoints, joins and prefetch, PostgreSQL + SQLite + MySQL, FastAPI integration, and an auto-generated admin panel that works with FastAPI, Litestar, Sanic, Quart, and Falcon (<a href="https://github.com/mr-fatalyst/oxyde-admin" rel="nofollow">https://github.com/mr-fatalyst/oxyde-admin</a>).<p>It's v0.5, beta, active development, API might still change. This is my attempt to build the ORM I personally wanted to use. Would love feedback, criticism, ideas.<p>Docs: <a href="https://oxyde.fatalyst.dev/" rel="nofollow">https://oxyde.fatalyst.dev/</a><p>Step-by-step FastAPI tutorial (blog API from scratch): <a href="https://github.com/mr-fatalyst/fastapi-oxyde-example" rel="nofollow">https://github.com/mr-fatalyst/fastapi-oxyde-example</a>
View originalI built a full iOS app in 2 weeks with Claude Code. Here’s what it was great at, and where it broke.
I wanted to share an honest breakdown of what using Claude Code as my main dev tool actually felt like. This wasn’t a landing page or a toy project. I used it to build and ship a full React Native app to the App Store. The app has 225 lessons, 13 exercise types, a real-time duel system, Supabase backend/auth, subscriptions, and a bunch of gamification. What Claude Code was great at It was insanely fast at scaffolding. I could describe a feature and it would generate the project structure, screens, navigation, and boilerplate way faster than I would have done manually. It was also really strong for repetitive mechanical work. Once I had the pattern right, it helped me build out learning paths, exercise formats, and backend wiring much faster than normal. Supabase was also smoother than I expected. Auth, schemas, and edge functions were all very doable with the right prompts. Where it broke Big files were the biggest problem. Once I started feeding it large content files, it would lose the plot, repeat itself, or start hallucinating. Breaking content generation into much smaller lesson batches fixed most of that. It also had a tendency to overcorrect. Sometimes I wanted one small fix and it would try to rewrite an entire page. I got much better results once I started keeping prompts short, specific, and focused on one change at a time. What workflow worked best The best workflow for me was: short prompt → test visually → commit if good → move to the next chunk Once I stopped treating it like magic and started treating it more like very fast pair programming, everything got easier. The more specific and pointed you can be with your prompts, the better. I also ended up using different models for different jobs. Opus was better for writing actual lesson content. Sonnet was better for mechanical edits and formatting. What I’d tell anyone starting Don’t try to make one giant prompt do everything. Break the app into small chunks. Keep prompts narrow. Verify visually. Commit constantly. If you do that, Claude Code becomes a lot more useful and a lot less chaotic. The app is called Kiro. It’s basically Duolingo for AI skills, and I built the whole thing solo in about 2 weeks. Happy to answer questions if anyone here is building with Claude Code too. submitted by /u/Kiro_ai [link] [comments]
View originalBuild Your Own Alex Hormozi Brain Agent (anyone with lots of publicly available content) using a Claude Project
I bought the books. Watched the videos. Still wanted more, especially after he talked about the agent he created. All that material is publicly available. Enough to build my own Alex Hormozi Brain Agent? "Hey Jules, how about it?" Jules is my AI coding assistant (Claude Code). Jules ran off, grabbed transcripts of videos, text of books, whatever is available online. Guest podcasts." then turned that into files I uploaded to a Claude Project so I can chat through Claude with Alex Hormozi. Here's what Jules found - 99 long-form YouTube video transcripts - 3 complete audiobook transcripts - 15 guest podcast transcripts - X threads What I Did in Four Phases Phase 1 maps the full source landscape: YouTube channel (4,754 videos), The Game podcast (~900+ episodes), three books, guest podcast appearances, X/Twitter. Figure out what's worth downloading before you start. Phase 2 downloads and converts. Top 100 longest video transcripts, full audiobook transcripts for all three books, 15 guest podcast transcripts from the highest-view-count appearances, and whatever X/Twitter content the API will give you. Phase 3 runs voice pattern analysis. Sentence structure, reasoning skeleton, core frameworks, teaching style, verbal signatures. This is where the persona takes shape. Phase 4 builds the system prompt and optimizes the knowledge base to fit within Claude Projects' limits. Then deploy. Phase 1: Inventory The @AlexHormozi YouTube channel has 4,754 videos. That number is misleading. 4,246 of those are Shorts (under 60 seconds or no duration metadata). Filter those out and you have 508 full-length videos. That's the real content library. Beyond YouTube, the main sources worth pursuing: The Game podcast (~900+ episodes). His primary long-form output. The audiobooks for all three books are available free on the podcast and YouTube. Guest podcast appearances. DOAC, Impact Theory, School of Greatness, Modern Wisdom, Danny Miranda. Hosts push him off-script and into territory he doesn't cover in his own content. High value per byte. X/Twitter threads. Compressed, punchy formulations of his frameworks. Different texture than the long-form material. Skool community. Behind a login wall. Low ROI for this project. Acquisition.com. No blog. Courses are paywalled. Skip. Phase 2: Collect YouTube Transcripts The first scrape of the YouTube channel only returned 494 videos. The channel has 4,754. The scraper was pulling from the /videos tab, which doesn't surface the full library. Re-running against the full channel URL (@AlexHormozi) returned everything. Easy to miss, significant difference. After filtering Shorts: 508 full-length videos. I downloaded auto-generated captions for the top 100 longest videos (sorted by duration, so the meatiest content came first). Auto-generated captions from YouTube come as SRT files with timestamps, line numbers, and duplicate lines. Converting those to clean readable text required stripping all the formatting artifacts and deduplicating language variants (English vs English-Original). Result: 99 transcripts. A few livestreams had no captions available. Book Audiobook Transcripts All three Hormozi books have full audiobook uploads on YouTube: $100M Offers (~4.4 hours) $100M Leads (~7 hours) $100M Money Models (~4.3 hours) Same process as the video transcripts. Download the auto-generated captions, convert to clean text. Three files, 855KB total. These are non-negotiable core material for the knowledge base. Guest Podcast Transcripts Searched YouTube for Hormozi guest appearances sorted by view count. The top hit was Diary of a CEO at 4.7M views. Grabbed the 15 highest-view-count appearances. The guest transcripts are 2.1MB total. Worth every byte. When a host like Steven Bartlett or Tom Bilyeu pushes back on a claim, Hormozi shifts into a different mode. He's more precise and sometimes reveals the edge cases he glosses over on his own channel. You can't get that from watching his channel alone. X/Twitter Content X's API rate limits capped the collection at 9 unique tweets. Not ideal, but enough to confirm the voice texture: "Aggressive with effort. Relaxed with outcome." His Twitter is his most compressed format. Each tweet is a framework distilled to a single line. 9 tweets is thin. For a more complete build, you'd want to manually curate 50-100 of his best threads. The API limitations made automated collection impractical. Phase 3: Analyze I ran voice analysis across the full corpus, looking at seven dimensions. Hormozi's sentences are short, punchy declarations. Fragments for emphasis. "And so" as his default transition. Short bursts, then a longer sentence that lands the point. Nearly every argument follows the same five-step skeleton: bold claim, personal story, framework, math, then a reductio ad absurdum that makes the alternative sound insane. Once you see it, you can't unsee it. The core frameworks are Grand Slam Offer, Value Equation, Supply an
View originalI'm a retail worker in Taiwan who built a 65-subsystem AI operating system with Claude Code in 3 months — here's the honest story (including the part where I've made $0)
I work at a chain retail store in Taiwan. No CS degree. No engineering background. I've had 6-7 jobs, all entry-level service work. In December 2025 I wanted out. My idea: build an AI system that generates income, then routes it into an automated investment engine — a self-reinforcing growth loop where AI runs both sides. 3 months and ~177,000 lines of code later, here's what exists. All built with Claude Code as my primary tool. --- **What I built (4 repos, all open for browsing)** **CAIOS** — "Central AI Operating System." 65 subsystems, 657 Python files, 154,740 lines of code, 46 database tables, 2,792 tests. Runs 30 scheduled jobs on a single GCP VM — morning briefs at 08:00, anomaly patrols every 30 min, daily reports at 20:00, memory sync at 23:00. All delivered through a Telegram bot. **creatoraitools.tools** — a Next.js 15 / React 19 web platform. 233 files, 21,395 lines of TypeScript, 20 pages, 30 API routes. Free to use, no login required for the tools. You can browse it right now. **joseph** — a Taiwan stock trading engine. Scans, scores, simulates, reports. Running in dry-run mode every weekday at 08:00. Live trading is permanently locked in source code (not config — more on this below). **buildhub-patrol** — a watchdog. Playwright e2e tests nightly at 03:00, health patrols every 6 hours. --- **How Claude Code was involved** Everything. I cannot write code — not one line from memory. My workflow: I describe what I want in natural language Claude Code writes the implementation I test and verify the result Iterate When I started in December 2025, I was copy-pasting chatbot output into Python files without understanding any of it. Then I found Windsurf, which helped but felt limited. Claude Code was the turning point — it plans, writes, debugs, tests, and explains in a way I can actually follow and direct. It's the difference between "AI writes code for me" and "AI is my engineering partner." I use Claude Code via the CLI with a Max subscription. Vertex AI / Gemini is my fallback. The entire CAIOS memory system is built on top of Claude Code's auto-memory feature — every session reads and writes to a persistent MEMORY.md so Claude already knows the full project context when I start a new conversation. --- **The hard lessons (real entries from my project memory)** **1. A watchdog that flaps is worse than no watchdog.** My web console's watchdog started flapping — restarting itself in a loop. I deliberately killed both the console and its watchdog, then wrote the re-enable steps into the memory file. System has been stable since. Lesson: ship the off switch before the feature. **2. When an autonomous loop produces garbage, stop it first.** My ADO (Autonomous Development OS) backlog ingestion twice exploded — the loop kept ingesting markdown fragments as new work items. Fix: stop → fix root cause → restart. Not "patch while running." I have 987 cancelled work packages in the database as a reminder. **3. Irreversible actions get compile-time blocks, not config flags.** Joseph's live trading is hard-coded `False` in the adapter — not a config toggle. Why? Because `bool(settings.allow_push)` under MagicMock silently evaluates truthy and bypasses the safety check. The fix — `if settings.allow_push is True` — is two extra characters that prevent an entire class of test-only false negatives. For anything you can't undo (real money, force pushes, database wipes), the guard belongs in source code. --- **What actually worked, ranked by impact** **Build the operations layer first.** I wired everything to Telegram on day one. Once I didn't need SSH to check on things, my throughput jumped 10x. The interface to all 65 subsystems is one chat thread. **Memory system on day one.** A persistent, structured memory file means Claude doesn't start from zero every session. The compounding is enormous. If you take one thing from this post: set up memory before you build features. **Schedule everything.** 30 jobs run on a clock. Morning briefs, anomaly patrols, daily reports, memory sync — all happen while I sleep. Cron is the most underrated framework in the world. **Off switches before features. Approval gates before automation.** Every CAIOS action has a risk classification. Risky actions stop at an approval gate and wait for me to tap a Telegram button. Safe actions run and notify me after. **2,792 tests are how I sleep at night.** Many are AI-generated, but I read every one. When you run autonomous loops, tests are the only thing between "the system fixed itself" and "the system silently destroyed itself." --- **The honest part** I have not made a single dollar from any of this. The trading engine works but I don't have capital to run it live. The web platform has almost zero organic traffic (2,910 impressions, 10 clicks in 28 days). Most "I built X with AI" posts skip this part. I'm not skipping it. What it has proven is that the gap between "I have an idea" and "I have
View originalI built a pipeline on Claude Code where safety is enforced at the tool-call level and the system gets smarter after every task
Hey , I've been using Claude Code for months and kept thinking about how to make it more reliable for production use. The result is PocketTeam. https://preview.redd.it/lwyvokul7qtg1.png?width=1413&format=png&auto=webp&s=49ffa3dd2adce82207bb9c20b181be9ecccfd990 I want to highlight what makes it actually different — not just "more agents." Hook-based safety (not prompt-based) The most technically interesting piece. 9 safety layers intercept tool calls before they execute. Writes to `.env`, `rm -rf /`, `DROP DATABASE` — blocked at the runtime layer. They can't be bypassed by prompt injection or context compaction. The rules are in the hook code, not the LLM's context. The system learns after every task An Observer agent runs at the end of every completed task. It analyzes what happened, what worked, what didn't, and writes structured learnings to agent-specific files. Those files are injected into future agent contexts. Run it enough times and the agents start avoiding the mistakes they made two weeks ago. Self-healing via GitHub Actions When a build fails, a GitHub Actions workflow triggers a Claude Code session automatically. It creates a fix plan and pings you via Telegram for approval. You never have to be at your desk to handle a broken build. 59 built-in skills Not just agent prompts. A library of structured procedures: OWASP audit workflows, TDD loops (`ralph: mode`), codebase mapping, database diagnostics, cost tracking, fan-out parallel execution with worktree isolation, and more. Magic keywords autopilot: add dark mode → full pipeline, zero human gates ralph: fix failing tests → TDD loop until green (max 5 iterations) quick: rename this var → straight to implementation deep-dive: our DB schema → 3 parallel research agents Built-in headless browser (ptbrowse) Your AI team doesn't just run unit tests — it opens your app in a real Chromium browser and verifies the UI works. Instead of screenshots (thousands of tokens), ptbrowse uses Accessibility Tree snapshots: ~100–300 tokens per call. The QA agent navigates pages, clicks elements, fills forms, waits for conditions, runs assertions — all in a real browser. Zero setup. Daemon auto-starts and auto-stops. Screenshots are also available, saved to `.pocketteam/screenshots/`. Live dashboard `pt dashboard` — real-time agent activity in Docker. See what's running, what passed, what failed. Telegram remote control Native Claude Code channel plugin. Approve plans from your phone. Not a notification system — actual two-way control. Free with a Claude Code subscription. MIT License. pipx install pocketteam https://github.com/Farid046/PocketTeam (appreciate testing and maybe a Star <3) Would love feedback from this community — you know Claude Code deeper than anyone. submitted by /u/Legal_Location1699 [link] [comments]
View original94.42% on BANKING77 Official Test Split — New Strong 2nd Place with Lightweight Embedding + Rerank (no 7B LLM)
94.42% Accuracy on Banking77 Official Test Split BANKING77-77 is deceptively hard: 77 fine-grained banking intents, noisy real-world queries, and significant class overlap. I’m excited to share that I just hit 94.42% accuracy on the official PolyAI test split using a pure lightweight embedding + example reranking system built inside Seed AutoArch framework. Key numbers: Official test accuracy: 94.42% Macro-F1: 0.9441 Inference: ~225 ms / ~68 MiB Improvement: +0.59pp over the widely-cited 93.83% baseline This puts the result in clear 2nd place on the public leaderboard, only 0.52pp behind the current absolute SOTA (94.94%). No large language models, no 7B+ parameter monsters just efficient embedding + rerank magic. Results, and demo coming very soon on HF Space Happy to answer questions about the high-level approach #BANKING77 #IntentClassification #EfficientAI #SLM submitted by /u/califalcon [link] [comments]
View originalClaude Projects tweak: Your own Subject Matter Expert with 'Manual Memory!' (no tools needed)
Wouldn't it be neat if you could get Claude to remember facts about specific things, not mix them up with notes about your parrot's minute-specific feeding schedule, and only pull them out when you ask? And what if you could get all that without leaving the Claude app? The process isn't new -- the only "innovation" here is the Claude Projects 'versioning instrutions' below. This setup has been incredibly useful for the wife and me. Setup your expert 1. Create a Project. Name it whatever. Description doesn't matter. 2. Start the topic off. Tell it facts about yourself, your project, ask it about best practices on a topic -- whatever. The point: give it facts you already know to start. Bonus points: include things you suspect but aren't sure about. Having uncertainties documented too gets you better answers down the line. 3. Checkpoint facts. Tell Claude to "write a .md AI guide for what we've discussed." You're asking for a markdown file — its favorite format for instructions. This is the 'memory' as your Claude project will know it. 4. Confirm your source of truth is accurate. Read it over, suggest corrections. This file will be a reference point for future sessions — and having a checked source of truth helps Claude be more skeptical about what it accepts as "facts" vs. "random online hearsay" in future research sessions. 5. Save it. Click the file it gives you, hit "Add to Project." 6. Magic. New sessions you start in that Project remember the important details — without them getting mixed up with your pet facts or whatever else is floating around elsewhere. To update, tell Claude to "update the my docs, confirm no contradictions or duplications need attention" (I only add that last bit occasionally for sweep up) When you want it to learn something new, have it make/update a .md, save it to the Project. You can even download it, edit it by hand, and re-upload it if you want. Tip: switching to new sessions regularly (in the same project) with hint files is "better on context" than one huge long chat, AND tends to give better answers than long chats where Claude "forgets" the details unless reminded (for "AI attention" reasons) after a few turns. Multi-Expert work: you can move a chat into one project to ask a question that needs that data, then move the chat to a different "expert" project for its input. Katamari that knowledge shit: tell Claude to roll up that combined expert chat into an AI hints file/update The version problem -- and the workaround Claude can't update Project files directly (read-only), so every "update this summary" request generates a new file with the same name. No way to tell which is newer. Fix: go to Settings → General Instructions and add this anywhere (bottom works fine): If a project is attached with .md files, treat them as a versioned read-only memory system. Before creating or updating any project .md, check /mnt/project/ for the current highest version. Increment by 1 for updates (_v1 → _v2), append _v1 if none exists. New files start at _v1. Only bump version once per "save to project" cycle. Updated files come out as notes_v2.md, notes_v3.md, etc. Delete the old version from the Project, add the new one. Done. This works in the standard Claude WebUI/app. No special tools or extensions required. Bonus: turn off auto-memory Now you can disable it without Claude getting dumber. In fact it'll get much smarter about how topic-facts get deployed if you direct questions and research to specific Project topics. This is how you build an "Old-Time Sewing Expert" project that actually accounts for 13th century folding techniques -- or whatever specific-ass stuff you need. Just keep a file for it. No more: This engineering project is just like your cat Fluffy and that time you asked me about wooden nickels! Cannot stand that stuff personally. What this looks like in practice Here's the layout for my personal profile project --built kind of by accident, because I was just asking materials questions for a collage project and things snowballed. These files are not locked into Claude. I can take them anywhere, use them with any agent, or print them out or whatever. I'd occasionally ask Claude to "clean things up" — decide if files needed to be split or joined based on topics that had emerged. Don't overthink the structure, it's just an example: values-and-worldview_v6.md # how I think, ethics, decision-making patterns personal-identity_v2.md # identity, relationship structure, biographical stuff career-work.md # professional background, skills, work history neurology-and-hobbies_v2.md # ADHD profile, how I learn, hobby patterns artistic_practice_catalog_v4.md # writing projects, creative methods, active work Artistic-TODOs_v2.md # technical roadmap for an ongoing writing pipeline fiction_and_film_v4.md # what I look for in stories, aesthetic preferences media_observations_v2.md # patterns Claude noticed across things I've rated m
View originalBuilt a tool for reviewing and handing off markdown docs to Claude
As a product manager, I never write specs or stories anymore. Claude generates, I review and provide feedback, Claude updates, then I hand over to devs (human or agents) to implement. But the feedback loop is clunky. It's difficult to read raw markdown files, annotate, and iterate, especially in the CLI. md-redline lets you open a markdown file in a GUI, leave inline comments, and hand back off to Claude for updates. The comments are stored as HTML markers directly in the .md file. They're invisible in GitHub and VS Code preview but Claude can read them with a plain file read. The workflow: Claude generates a markdown doc from your prompt (e.g. "write a feature spec for magic link authentication") Open the file with mdr /path/to/spec.md Review and leave inline comments (e.g. "out of scope", "what does this mean?") Click hand-off button which copies instructions and paste into Claude Claude addresses your comments and updates the doc Review the diffs in md-redline Runs locally. No account, no cloud, no database. Works with Claude Code or any agent that reads files. Feedback welcome! https://github.com/dejuknow/md-redline (open source, MIT license) submitted by /u/cleverquokka [link] [comments]
View originalBuilding on Claude taught us that growth needs an execution engine, not just a smarter chat UI.
Vibecoding changed the front half of company building. A founder can now sit with Claude, Cursor, Replit, or Bolt, describe a product in plain English, iterate in natural language, and get to a working app in days instead of months. That shift is real, and it is why so many more products exist now than even a year ago. But the moment the product works, the shape of the problem changes. Now the founder needs market research, positioning, lead generation, outreach, content, follow-up, and some way to keep all of it connected across time. That work does not happen inside one codebase. It happens across research workflows, browser actions, enrichment, CRM updates, email, publishing, and ongoing decision-making. That is where we felt the gap. Vibecoding has a clean execution loop. Growth does not. That is why we built Ultron the way we did. We did not want another wrapper where a user types into a chat box, a model sees a giant prompt plus an oversized tool list, and then improvises one long response. That pattern can look impressive in demos, but it starts breaking as soon as the task becomes multi-step, cross-functional, or dependent on what happened earlier in the week. We wanted something closer to a runtime for company execution. Under the hood, Ultron is structured as a five-layer system. The first layer is the interaction layer. That is the chat interface, real-time streaming, tool activity, and inline rendering of outputs. The second layer is orchestration. That is where session state, transcript persistence, permissions, cost tracking, and file history are handled. The third layer is the core execution loop. This is the part that matters most. The system compresses context when needed, calls the model, collects tool calls, executes what can run in parallel, feeds results back into the loop, and keeps going until the task is actually finished. The fourth layer is the tool layer. This is where the system gets its execution surface. Built-in tools, MCP servers, external integrations, browser actions, CRM operations, enrichment, email, document generation. The fifth layer is model access and routing. That architecture matters because growth work is not one thing. A founder does not actually want an answer to a prompt like help me grow this product. What they really want is something much more operational. Research the category. Map the competitors. Find the right companies. Pull the right people. Enrich and verify contacts. Score them against the ICP. Draft outreach. Create follow-ups. Generate content from the same positioning. Keep track of the state so the work continues instead of resetting. That is not a chatbot interaction. That is execution. So instead of one general assistant pretending to be good at everything, Ultron runs five specialists. Cortex handles research and intelligence. Specter handles lead generation. Striker handles sales execution. Pulse handles content and brand. Sentinel handles infrastructure, reliability, and self-improvement. The important part is not just that they exist. It is how they work together. If Specter finds a strong-fit lead, it should not stop at surfacing a nice row in a table. It should enrich the lead, verify the contact, save the record, and create the next unit of work for Striker. Then Striker should pick that work up with the research context already attached, draft outreach that reflects the positioning, start the follow-up logic, and update the state when a reply comes in. That handoff model was a big part of the product design. We kept finding that most AI tools are still built around the assumption that one request should produce one answer. But growth work does not behave like that. It behaves more like a queue of connected operations where different kinds of intelligence need different tool access and different execution patterns. Parallel execution became a huge part of this too. A lot of business tasks are only partially sequential. Some things do depend on previous steps, but a lot of work does not. If you are researching a category, scraping pages, pulling firmographic data, enriching leads, and checking external sources, there is no reason to force all of that into one slow serial chain. So we built Ultron so independent work can run concurrently. The product is designed to execute a large number of tasks in parallel, and within each task the relevant tool calls can run at the same time instead of waiting on each other unnecessarily. That alone changes the feel of the system. Instead of watching one model think linearly through everything, the user is effectively working with an execution environment where research, lead ops, sales actions, and content prep can all move at the pace they naturally should. The other thing we cared about was skills. Not vague agent personas. Not magic prompts hidden behind branding. Actual reusable execution patterns. That mattered to us because a serious system should no
View originalThe Ghost House Effect: Why Claude Code feels like magic for 2 weeks and then ruins your life.
I spent my morning acting like a digital coroner. I ran a deep audit on dev rants across Reddit and G2, and honestly, my brain is fried. We all talk about hallucinations, but that’s just the surface. The real horror is what I’m seeing in the data right now — I call it the Ghost House. The pattern is terrifyingly consistent. You get 10x speed for the first 2-3 weeks. It feels like you’re a god. Then you hit the tipping point. The interest on your LLM technical debt starts compounding faster than you can refactor. You aren’t coding anymore, you’re just spending 8 hours a day begging the agent not to break what it built yesterday. I found 5 specific failure modes that are killing MVPs right now: Shadow Dependencies. Claude imports a library that isn't in your package.json. It works in your local cache, but explodes the second you hit CI/CD. Founders call this AI ghost deps. Context Window Paralysis. Once the repo gets big, the agent starts summarizing. It fixes a UI bug but accidentally nukes a database migration script because it lost the big picture. The Fear of Editing. I found dozens of stories where founders literally stopped touching their own code. The architecture is so brittle that one manual edit cascades into total failure. The mental model lives in the agent, not the human. Hallucinated APIs. The AI invents internal endpoints or security libs that don't exist. It looks perfect in the sandbox, but you get a 404 in production. Hours wasted on a phantom. Architecture Drift. Vibe coding leads to documented prompt-spaghetti. By month two, you have a repo where no human dev can be onboarded without a total rewrite. The last straw for most of these founders is always the same: We had to nuke it and rebuild from scratch. Am I the only one seeing this paralysis threshold hitting earlier and earlier? At what point did you realize your AI-built app was becoming a Ghost House you couldn't live in anymore? submitted by /u/AddressEven8485 [link] [comments]
View originalI shipped a game after 20 years using Claude Chat (free): here’s what it actually did (and didn’t)
20 years ago, I wrote games in C (Allegro game engine). Then life happened, and I stopped. Recently, my nephew pitched an idea. I thought: why not try again and this time, I brought Claude Chat (Free) along for the ride. Yes, Free. Not Pro, Max or Claude Code. Just the free version of Claude Chat. Somehow, it worked even when I was bumping up against limits. But for a side project, about an hour of work per day for thirty days, that was barely a hiccup. What we built together: A single HTML5 / CSS / JS page, modern space shooter game Procedural sounds for every little zap and pew Enemy AI, upgrade systems, and escalating waves (full game not just a proof of concept) Performance that honestly feels like a small Unity project Claude Chat’s superpowers: Fast, tireless coding assistant Solving tricky functionnalities I’d have pulled my hair out on Rewriting messy logic loops without ever complaining Auditing and solving performance related issues Claude Chat’s kryptonite: Game design intuition (it can’t decide what’s fun) Balancing combat or upgrades Understanding why that one cursor animation loop is secretly eating your frame budget (and other similar issues) You can ship a modern-feeling browser game entirely using Claude Chat Free, but the real magic (the gameplay, the fun, the soul) still comes from you. Don’t take my word for it, judge for yourself: https://pajujo.itch.io/swarm-sector I’m curious, has anyone else tried building something substantial with Claude Chat Free? submitted by /u/pajujo-dev [link] [comments]
View originalIf you're celebrating the harness cutoff because "less cuing/more speed for me" you're missing the bigger picture.
Third-party harness users consume disproportionate resources, and if they leave, your sessions get faster. I understand the appeal. But, remember back when your ISP sold you 500 Mbps, and when you complained about slow speeds they told you it was ACTUALLY "up to" 500 Mbps. And really the problem is that your neighbors are using too much bandwidth, and probably doing something illegal with all that bandwidth. I don't think most of us signed up for Claude thinking we were signing up for Comcast/Xfinity, but that's exactly how they're behaving. Anthropic either has the capacity to provide what subscribers pay for, or they don't. That's on them, not on the users who found more productive ways to use the product. The agentic users building on third-party harnesses aren't abusing the system. They're ahead of the curve. Everything they're doing today (multi-agent workflows, autonomous coding pipelines, custom orchestration) is what Anthropic will eventually ship inside their own walled garden and charge you a premium for. The trailblazers are making the path that Claude Code will follow. Pushing them out doesn't protect your bandwidth. It just slows down the ecosystem and claude. In the last week Anthropic has leaked 512K lines of source code to npm, now permanently available to OpenAI, every competitor, and yes, China. Security researchers found critical vulnerabilities in the leaked source within days. Their response to paying subscribers was silence about the security incident (unless you're reading AI news) and a restriction on how we use the product. They handed a massive competitive intelligence gift to the very companies they need to outrun before an IPO. The harness users aren't the problem. The users celebrating their departure aren't the winners. Anthropic's handling of this entire period has been epically bad, and that affects everyone on the platform, whether you use open source GitHub harnesses or not. And let's be honest about what people actually love about Claude. It's the voice. The way it feels different when you talk to it, more human, more thoughtful. Some of that comes from the model itself and some of it comes from the careful system of guardrails, permissions, and behavioral tuning that shapes how it communicates. Fair enough. But now that the source code is in the wild, that magic probably isn't exclusive anymore. Every competitor now has the blueprint for how Anthropic shapes Claude. A huge portion of that secret sauce is on its way into every other product. So what are you left with inside the walled garden? A harness that Anthropic controls, that you can't customize as much, that can't do what third-party harnesses are already doing. And remember, tools like OpenClaw have only been in serious development since November. They're already leaping ahead of what Claude Code offers in terms of memory, customization, and multi-agent workflows. Claude is playing catch up to its own ecosystem, and now they've given away the source code to help competitors close the gap even faster. This reminds me of when Elon Musk signed onto and promoted that open letter calling for a 6 month pause on AI development for "safety" while Grok was scrambling to catch up. Restricting third-party harnesses in the name of efficiency, right after leaking your own source code, has the same energy. It's not about protecting users. It feels like it's about buying time. And the effect will be sending a bunch of smart people to use Codex and others. I want Anthropic to succeed. The model is genuinely excellent. But right now they're relying on subscribers that don't push the technology to keep the lights on. They need to innovate far enough past their own leaked source code to make it irrelevant, somehow make the product even better, and do all of that before an IPO. Sheesh. Tough road ahead. submitted by /u/kanigget [link] [comments]
View originalStuck in a Support Loop: Does Anthropic actually have human support?
Hey everyone, I’m reaching out because I’m losing my mind with Claude’s support system. I’ve been trying to get help with an issue for a while now, but every time I email them, I get a bot response with generic instructions. I reply stating that I’ve already tried those steps and specifically ask to speak with a human. The very next email I get is: "Thank you, we have resolved your ticket." I’ve tried this 5–6 times now with the exact same result. It’s like the system is programmed to just close tickets regardless of the outcome. Has anyone actually managed to reach a human at Anthropic? Is there a specific "magic word" or a different contact method I should be using? Am I missing something, or is their support 100% automated right now? Any advice would be appreciated! submitted by /u/Professional-Row-781 [link] [comments]
View originalThe Magic of Machine Learning That Powers Enemy AI in Arc Raiders
"... it doesn't take a trained eye to see that, even at a glance, the enemies in Arc Raiders feel fundamentally different from traditional game AI. They don’t follow rigid patterns or scripted behaviors, but instead, they react dynamically to the environment, recover from disruption, and occasionally end up in places even the developers didn’t anticipate. That sense of unpredictability is not just a design choice but the result of years of research into robotics, physics simulation, and machine learning. At Embark Studios, the team approached enemy design from a systems-first perspective, treating enemies less like animated characters and more like physical entities that must navigate and survive in a dynamic world. That decision led them directly into robotics research and reinforcement learning, borrowing techniques for controlling real-world machines and adapting them to a game environment. Rather than relying purely on traditional AI systems, Arc Raiders blends learned locomotion with behavior trees, creating a layered approach where movement itself becomes part of the intelligence." submitted by /u/jferments [link] [comments]
View originalScaled my Haiku→Sonnet pipeline to 2,000+ items. Three things that broke.
A couple weeks ago I posted about using Haiku as a gatekeeper before Sonnet to cut API costs by ~80%. A lot of people had questions about how it holds up at scale, so here's the update. Quick context: I run a platform called PainSignal (painsignal.net, free to use) that ingests real comments from workers and business owners, filters out noise, and classifies what's left into structured app ideas with industries, categories, severity scores, and revenue models. When I posted last time I had about 60 problems classified. Now I'm at 2,164 across 92 industries. Here's what changed as the data grew. 1. The taxonomy got weird. I let Sonnet create industries and categories dynamically instead of using a predefined list. At 60 items this felt magical. At 2,000+ it started creating near-duplicates and edge cases. "Auto Repair" and "Automotive Electronics" as separate industries. "Shop Management Software" showing up as a category, which is a solution, not a problem type. I even ended up with a "null" industry containing 16 problems that slipped through with no classification at all. The fix isn't to switch to a static list. The dynamic approach still surfaces categories I never would have thought of. Instead I'm building a normalization layer that runs periodically to merge duplicates and catch misclassifications. Think of it like a cleanup crew that runs after the creative work is done. 2. Sonnet hedges too much at scale. When you're generating a handful of app concepts, Sonnet's cautious language is fine. When you're generating over a thousand, you start to notice patterns. Every market size estimate gets a "potentially" or "could be." Every risk rating lands in the middle. The outputs start feeling like they were written by a consultant who bills by the hour. I've been reworking prompts to force sharper calls. Explicit instructions to commit to a rating, pick a number, name the risk directly. I also started injecting web search results before the analysis step so Sonnet has real competitive data to anchor against instead of generating everything from its training data alone. The difference in output quality is noticeable. 3. Haiku needed a bouncer. The original pipeline sent everything to Haiku first. But a surprising amount of input is obviously not a real complaint. Single emoji reactions, "great video," bare URLs, strings under 15 characters. Haiku handles these fine but it's still a fraction of a cent per call, and those fractions add up at volume. I added a regex pre-filter that catches the obvious junk before anything hits the API. Emoji-only messages, single words, URLs without context, extremely short strings. Estimated savings: another 20-30% off the Haiku bill. Maybe 50 lines of code and it runs in microseconds. So the full pipeline now looks like: regex filter → Haiku gate → Sonnet extraction. Three layers, each one cheaper and faster than the next, each one catching a different type of noise. Still running on BullMQ with Redis for queue management and PostgreSQL with pgvector for storage. Still building the whole thing with Claude Code, which continues to be underrated for iterative backend work. Happy to dig into any of these if people have questions. The prompt engineering piece especially has been a rabbit hole worth going down. submitted by /u/gzoomedia [link] [comments]
View originalHonest take on Claude Cowork's limitations useful but not magic (yet)
Okay so I've been messing around with Claude Cowork and it's genuinely useful for everyday stuff renaming files, drafting things, basic task automation. No complaints there. But once you try to push it a little, you notice the walls pretty fast. It doesn't really remember anything between sessions, so if you're doing recurring stuff you're basically re-explaining yourself every time. Chaining more than a few steps together also gets shaky it's not like writing a script where you have full control. And if you use any niche apps, don't count on it playing nice with them. It's mostly Excel, PowerPoint, and the basics. Honestly it feels like it's built for people who'd otherwise do everything manually which is fine! But if you're expecting something that just handles complex workflows end to end, it's not quite there yet. Still using it though. Some days it saves me a solid 30 mins. Just know what you're getting into. submitted by /u/Longjumping_Bid_9870 [link] [comments]
View originalMagic uses a tiered pricing model. Visit their website for current pricing details.
Key features include: 100M Token Context Windows, AGI Readiness Policy.
Based on user reviews and social mentions, the most common pain points are: cost tracking, API costs, raised, ai agent.
Based on 35 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Matt Shumer
CEO at HyperWrite / OthersideAI
2 mentions