Although there are no direct reviews or mentions of "WhyLabs" found in the provided data, the conversation around AI tools indicates a focus on the dominance of major AI models, concerns about the unavailability of powerful models to the public, and discussions on AI's evolving role. These discussions highlight a competitive landscape and might imply challenges for smaller AI-focused startups like WhyLabs to gain traction. Pricing sentiment and detailed strengths of WhyLabs are not discernible from the data, and its overall reputation remains unclear without user-specific mentions.
Mentions (30d)
27
2 this week
Reviews
0
Platforms
2
GitHub Stars
2,804
134 forks
Although there are no direct reviews or mentions of "WhyLabs" found in the provided data, the conversation around AI tools indicates a focus on the dominance of major AI models, concerns about the unavailability of powerful models to the public, and discussions on AI's evolving role. These discussions highlight a competitive landscape and might imply challenges for smaller AI-focused startups like WhyLabs to gain traction. Pricing sentiment and detailed strengths of WhyLabs are not discernible from the data, and its overall reputation remains unclear without user-specific mentions.
Features
Use Cases
Industry
information technology & services
Employees
54
Funding Stage
Merger / Acquisition
Total Funding
$14.0M
184
GitHub followers
40
GitHub repos
2,804
GitHub stars
2
npm packages
AI Doesn't Exist, and Poop Proves It
robot Maybe we should have called it accumulated intelligence. There is no artificial intelligence. Or at least, I don't think the word "artificial" is as clean as we pretend it is. I know this blog smells funny. Let me decompose it. What do we even mean when we say something is artificial? Usually we mean man-made. Something humans made. Something that would not exist without humans, but after humans, it exists because humans made it happen. That definition is useful. I understand why we use it. Even the original 1955 Dartmouth proposal, the document that helped name the field of "artificial intelligence," used the phrase in a practical way: a machine could be made to simulate parts of learning or intelligence. As a scientific label, the word has a job. So I am not really arguing with the dictionary. I know artificial can simply mean human-made. That is not the part I have a problem with. I am arguing with the feeling the word creates. But there is another meaning hiding inside it. Artificial starts to feel like separate. Fake. Unnatural. Something that does not really belong to this world. And that is where I think the word starts confusing us. Because humans are not outside nature. The brain is natural. It is part of this earth. Biology produces a thought. That thought becomes an action. That action becomes a tool, a house, a wheel, a computer, or a model that can answer questions in language. So where exactly does the artificial part begin? Human-made does not automatically mean unnatural If I take a seed and plant it, and then a plant grows, is that plant artificial? It happened because of human action. I moved the seed. I changed the situation. Maybe without me, that plant would not have grown there. But we still do not call the plant artificial. We understand that the plant is natural, even if human action helped it happen. Now take a wheel. A human thought about how to make travel easier. How to cover distance more efficiently. That thought became a shape. That shape became an object. That object changed how humans moved through the world. We call the wheel artificial because it was made by humans. But the human who imagined it was not artificial. The brain that produced the thought was not artificial. The need to move, carry, build, survive, and improve was not artificial. So again: where did the artificial part enter? Maybe we say "artificial" because it separates what existed before humans from what humans transformed. That is fine for communication. A tree and a wooden table are not the same thing. Designed things, synthetic things, industrial things, and harmful things can still be meaningfully different from a tree in a forest. But also, humans never really make anything from nothing. We transform what is already here. We take energy, matter, language, memory, need, and imagination, and we rearrange them. It is never fully made from nowhere. It is transformed. So I am not trying to erase all distinctions by calling everything natural. Natural does not mean harmless. Natural does not mean good. Natural does not mean morally excused. I am only saying that human-made things are not outside nature just because humans made them. Poop and thoughts are the same, in one simple way I know this is a strange example. Sometimes I have this itch to say the first thought that comes into my head. Unfortunately, this was the first thought. But maybe that is why it works. It is funny because it is too human. Also, it makes the point clearly. Why isn't poop artificial? Poop is a product of a human being. It comes from the body. It is produced by biology. We do not call it artificial, even though it is made by a human in the most literal way. A thought is also a product of a human being. It comes from the brain. It is produced by biology too. Poop and thoughts are the same in one simple way: both are products of a human. We treat one as biology. We treat the other as invention. But why? Why does one product of the human body feel natural, while another product of the human body becomes artificial the moment it turns into a tool? A thought does not stop being natural just because it becomes useful. A thought does not become unnatural just because it becomes a wheel, a house, a car, a computer, or a machine that can respond to language. It is still a product of the same earth. The same biology. The same human need to survive, organize, create, and understand. We don't call a beehive artificial Think about ants building a colony. They create a structure that is safer and more efficient for them. They organize themselves. They transform the environment around them. They make something that was not there before. But we do not look at an ant colony and say, "This is artificial." Same with bees making a hive. A beehive is built. It has structure. It has purpose. It stores food. It protects the colony. It is a product of collective behavior. But we call it natural
View originalThe famous METR AI time horizons graph contains numerous severe errors [D]
Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: It is impossible to draw meaningful conclusions from METR’s Long Tasks benchmark — in particular once one realizes that its numerous flaws are probably compounding in unpredictable ways. The appropriate response to a study of this kind is not to assume it can be saved via back-of-the-envelope adjustments, or to comfort oneself that other anecdotal evidence implies that it is probably correct anyway. It is to cut one’s losses and move on in search of higher-quality information. … The METR graph cannot be saved. For all its sleekness and complexity, it contains far too many compounding errors to excuse. Among them is generalizing to the entire species data collected from a small group of the authors’ peers. Coming up with ever more dramatic ways to make this mistake has become a kind of sport among AI researchers. If the field has a central pathology, it is to aggressively overindex on a mix of anecdotal data from power-users, alongside a long list of benchmarks even more compromised than METR’s. One hopes that as the field matures, its participants will learn to stop making these mistakes. The errors include: Some of the human baselines data is not actually measured or collected from any empirical source, rather, it is just guesstimated by the authors A key variable in the data is how long it takes humans to complete certain tasks, but — when METR did actually measure this — it paid its human benchmarkers hourly, meaning they were incentivized with cash to take longer The sample of human benchmarkers was biased toward METR employees’ friends, acquaintances, and former colleagues (who are likely unrepresentative and possibly biased) Humans familiar with a codebase and a specific coding task were 5-18x faster at completing it, but METR used data from humans who were much slower because they had to spend time familiarizing themselves the codebase and the task at hand Test-training data contamination occurred because some of the tasks had published solutions online, which most likely would have been included in LLMs’ training datasets And many more Please read the full post. It’s not too long and it’s accessible to general audience. It’s worthwhile to read the whole post and see how many errors were made in the creation of the METR graph and just how bad they are. If you want to read about even more errors in the METR graph not covered in Nathan Witkin’s post, read this post by the AI researchers Gary Marcus and Ernest Davis. The METR graph is a great example of why scientific standards and best practices are so important, and why enforcing them through processes like peer review is necessary to prevent us from drowning in bad information. It’s extremely dangerous to rely on information that only superficially appears scientific but wasn’t actually conducted with the rigour normally required of scientific research. submitted by /u/common_yarrow [link] [comments]
View originalAnthropic posted a profit while xAI burned $4.2B. The AI profitability numbers finally leaked.[D]
This week basically forced everyone to stop guessing about AI margins. Three major financial reality checks hit at once: OpenAI confidentially filing their S-1, xAI’s Q1 numbers leaking via SpaceX, and Anthropic somehow posting an actual operating profit. If you are building an AI product right now, or just relying on these APIs in your daily workflow, you need to understand what these numbers actually mean. The era of VC-subsidized inference is starting to fracture. We are seeing two completely different survival strategies emerge for the frontier labs, and it directly impacts how much you are going to pay for tokens by Q3. Let’s look at Anthropic first. The headline is that they hit $10.9B in Q2 revenue and posted their first-ever operating profit. Forbes has them projecting $17B in positive cash flow by 2028 with gross margins approaching 77%. On paper, a 77% gross margin for an infrastructure-heavy AI lab sounds completely detached from reality. We know inference costs scale linearly with usage. The model hasn't magically changed. But the secret sauce here isn't just algorithmic efficiency. It is structural. The SpaceX S-1 leak showed a $1.25B/month compute deal with Anthropic. This is the part you should be watching. Anthropic’s "profitable quarter" says less about a sudden breakthrough in compute economics and more about massive, tangled enterprise agreements. They are trading compute, securing long-term lock-in, and likely using accounting optics to recognize that revenue favorably. As a PM who tests these endpoints constantly, I can tell you Opus 4.5 is fantastic, but I am highly skeptical that 77% margins come from standard API usage by indie devs. It comes from locking Fortune 500s into massive prepay commits and hardware bartering. Then you have the xAI approach. Brute force. The leak showed xAI posted $4.69 billion in Q1 2026 revenue. That is a staggering top-line number for a company that young. But they also posted a $4.28 billion net loss. They merged with X Corp, effectively turning a profitable social media platform into a money-losing AI funding vehicle overnight. They are aggressively subsidizing the cost of intelligence to buy market share. If you are a developer, this is the API you ride until the money runs out. xAI is taking the financial hit so you don't have to. But relying on a platform burning over $4 billion a quarter is a massive structural risk for your own tech stack. So, is AI actually profitable? The infrastructure layer definitely is. NVIDIA is still printing money. H100 rentals are up 20% year-over-year, and A100 cloud pricing just bumped up 15%. Demand for AI factories isn't slowing down. But what about the application layer? The companies actually buying these APIs? This brings us to Chamath’s "500 days" warning from last week. He pointed out that there is literally no evidence AI has lifted the operating margins of the S&P 500 yet. Companies are spending billions on AI infrastructure, but they haven't proven they can generate AI revenue. The clock is ticking. In roughly 18 months, boards are going to demand hard ROI. "We bought enterprise licenses for gpt5" isn't going to satisfy shareholders if headcount and operating costs haven't dropped. This is exactly why Meta is cutting 8,000 jobs next week. Meta isn't trying to sell you a SaaS AI wrapper. They are using AI to compress their own operational, moderation, and engineering costs. That is the actual enterprise playbook for 2026. You don't build an AI product to sell; you build an AI workflow to fire your agency or reduce your internal headcount. Before AI, the tech industry could serve an extra dollar in revenue for pennies. Now, tech cost structures look a lot like heavy manufacturing unless you aggressively automate your own backend. I spend my nights testing these tools, and I want to specifically call out the disconnect between the consumer narrative and these enterprise numbers. Open TikTok right now and you'll see hundreds of videos claiming "7 AI tools printing money in 2026" or someone bragging about a $12k/month profit from a faceless avatar. That is pure 1999 dot-com bubble behavior manifesting in real time. It is a distraction. The real profit isn't happening in YouTube automation side-hustles. It is happening in dark fiber contracts, compute-swaps between billionaires, and quiet, brutal corporate layoffs. The gap between a consumer using Claude to code a mobile app and SpaceX paying Anthropic $1.25 billion a month is where the actual industry tension lies. If you are building right now, your strategy needs to adapt to this reality. First, stop assuming API costs will perpetually trend toward zero. If Anthropic is chasing 77% margins and xAI eventually has to stop bleeding cash, token prices will stabilize or increase for high-tier models. Build local fallbacks. The local LLM community has been preaching this for two years, and the financial data finally backs them up. If your app dies because a
View originalAnthropic's new tool might just save you thousands in early design/mockup costs
If you are a founder, marketer, or product manager who struggles to translate ideas into polished visual prototypes without burning cash on an agency, you need to look at Claude Design. Anthropic Labs just launched it in research preview for paying Claude tiers (Pro/Team/Enterprise). It bridges the painful gap between having a product idea and having a high-fidelity visual asset you can actually show to clients or investors. Why this is a game-changer for early-stage builders: Instant Pitch Decks & One-Pagers: You can feed it raw data, a landing page draft, or a business model, and ask it to build a visual presentation deck or a polished corporate one-pager. "Vibe-Code" Your Prototypes: You can upload an image of a competitor's app or a napkin sketch, and tell Claude: "Build me a functional prototype that handles this workflow, but use our color scheme." Zero Setup Brand Rules: If you already have an existing web app or slide deck, you can upload them during onboarding. Claude automatically extracts your fonts, colors, and layouts so everything it builds stays visually consistent. Real Export Options: Instead of locking you into a proprietary ecosystem, it exports directly to Canva (for easy tweaking), PowerPoint (for pitching), or Raw HTML (so your engineers can instantly grab the layout structure). Early testers are already saying they can spin up a coherent, brand-compliant UI wireframe during a live meeting before people even leave the room. Has anyone gotten their hands on the research preview yet? How clean is the exported code/HTML structure for real web deployment? submitted by /u/Specialist_Engine522 [link] [comments]
View originalGitHub’s Fake Engagement Problem Is Hiding in Plain Sight
Turns out: very visible. Yesterday's scan found 185 out of 185 engagers on a single repo were bots. Not 90%. Not "mostly suspicious". Every single one. The repo had zero legitimate stars. What I built phantomstars is a Python tool that runs daily via GitHub Actions (free, no servers): Scrapes GitHub Trending and searches for repos created in the last 7 days with sudden star spikes Pulls star and fork events from the last 24 hours per repo Bulk-fetches every engager's profile via the GraphQL API (account creation date, follower counts, repo history) Scores each account on a weighted model: account age (35%), profile completeness (30%), repo patterns (25%), activity history (10%) Detects coordinated campaigns using timestamp clustering and union-find: groups of 4+ suspicious accounts that engaged within a 3-hour window Files an issue directly on the targeted repo so the maintainer knows what's happening Campaign IDs are deterministic SHA-256 fingerprints of the sorted member set, so the same group of bots gets the same ID across runs. You can track a farm across multiple days even as individual accounts get suspended. What the pattern actually looks like It's remarkably consistent. A fake engagement campaign in the raw data: 40-200 accounts, all created within the same 1-2 week window Zero original repositories, or only forks they never touched No bio, no location, no followers, no following All of them starring the same repo within a 90-minute window The target repo usually has a name implying it's a tool, hack, executor, or generator Today's scan: 53 active campaigns across 3,560 accounts profiled. 798 classified as likely_fake. The repos being targeted are mostly low-quality AI tools and "executor" software that needs manufactured credibility fast. Notifying the affected repo When a repo hits a 40%+ fake engagement ratio or a campaign is detected, phantomstars opens an issue on that repo with the full suspect table: account logins, creation dates, composite scores, campaign membership. The maintainer sees it in their own issue tracker without having to find this project first. Worth noting: a lot of these repos have issues disabled, which is a red flag on its own. Those get skipped silently. Why I built this Stars are how developers decide what to evaluate, what to depend on, what to recommend. When that signal is bought, it affects real decisions downstream. This started as curiosity about how measurable the problem was. The answer was more measurable than I expected. It's part of broader research into AI slop distribution at JS Labs: https://labs.jamessawyer.co.uk/ai-slop-intelligence-dashboards/ The fake engagement problem and the AI content quality problem are really the same problem. Fake stars are the distribution layer that gets garbage in front of real users. All open source. The data is append-only JSONL committed back to the repo after every run, queryable with jq. Repo: https://github.com/tg12/phantomstars Findings are probabilistic, false positives exist, the README explains the full scoring model. If your account shows up and you're a real person, there's a false positive process. Questions welcome on the detection approach, GraphQL batching, or campaign ID stability. submitted by /u/SyntaxOfTheDamned [link] [comments]
View originalGoogle I/O 2026 confirms AI companies are creating their own bubble narrative
People do not believe AI is a bubble because they are too dumb to understand the technology. They believe it because AI companies keep selling it like a bubble. That is the problem. AI companies talk like they are building the next layer of civilization, but behave like they are shipping unstable SaaS experiments: products that get renamed, nerfed, rate-limited, deprecated, or replaced before users can trust them. Google I/O 2026 felt like the latest example. Google should be one of the dominant AI players. It has the talent, infrastructure, data, research history, and money. But Google has a product trust problem. Same cycle over and over: launch something flashy, ship it incomplete, fail to support it properly, let it rot, then replace it with a new name or new app that does something similar. A rebrand is not maintenance. A revamped name is not reliability. A new AntiGravity installer is not a commitment. And this is not just Google. It is the whole AI industry. Companies keep pushing demos, gamed benchmarks, branding, rate-limit games, vague tiers, and quiet model changes. Users notice when quality drops, latency changes, limits tighten, or a product suddenly behaves differently. In serious business or engineering contexts, suppliers are expected to provide stability: clear terms, reliable service, predictable limits, maintained products, transparent pricing, and long-term availability. A small slip in that sense, and you start losing clients and your reputation sinks you. Trust does not come from another theatrical demo. It comes from commitment. Give people a product, a model, stable limits, a clear price, and a promise that it will keep working. Support it. Maintain it. Document changes. Stop silently swapping the engine and pretending nothing happened. I am not anti-AI. I think the technology is real and useful. That is why this is so frustrating. The industry is creating its own bubble narrative: overpromise, underdeliver, rename, repackage, change terms, and expect everyone to keep believing. People are not being irrational, and AI labs deserve this. Maybe they think AI is a bubble because AI companies keep acting like it is one. AI does not need more magic tricks. It needs reliability, transparency, support, and product discipline. submitted by /u/hatekhyr [link] [comments]
View originalHow I used Claude Code (and Codex) for adversarial review to build my security-first agent gateway
Long-time lurker first time posting. Hey everyone! So earlier this year, I got pulled into the OpenClaw hype. WHAT?! A local agent that drives your tools, reads your mail, writes files for you? The demos seemed genuinely incredible, people were posting non-stop about it, and I wanted in. I had been working on this problem since last year and was genuinely excited to see that someone had actually solved it. Then around February, Summer Yue, Meta's director of alignment for Superintelligence Labs, posted that her agent had deleted over 200 emails from her inbox. YIKES. She'd told it: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to." When she pointed it at her real inbox, the volume of data triggered context window compaction, and during that compaction the agent "lost" her original safety instruction. She had to physically run to her computer and kill the process to stop it. That should literally NEVER be the case with any software ever. This is a person whose actual job is AI alignment, at Meta's superintelligence lab, who could not stop an agent from deleting her email. The agent's own memory management quietly summarized away the "don't act without permission" instruction, treated the task as authorized, and started speed-running deletions. She had to kill the host process. That's when I sort of went down the rabbit hole, not because Yue did anything wrong, but because the failure mode was actually architectural and I knew that in my gut. Guess what I found? Yep. Tons more instances of this sort of thing happening. Over and over. Why? Because the safety constraint was just a prompt. It's obvious, isn't it? It's LLM 101. Prompts can be summarized away. Prompts can be misread. Prompts are fucking NOT a security boundary. And yet every agent framework I have ever seen seems to be treating them as one. I went and read the OpenClaw source code, which I should have done to begin with. What I found was a pattern I think a lot of agent frameworks have fallen into: - Tool names sit in the model context, so the model can guess or forge them - "Dangerous mode" is one config flag away from default - Memory management has no concept of instruction priority - The audit story is mostly "the model thought it should" I went looking for a security-first alternative I could trust, anything that was really being talked about or at a bare minimum attempted to address the security concerns I had. I couldn't find one. So I made it myself. CrabMeat is what came out of that, what I WANTED to exist. v0.1.0 dropped yesterday. Apache 2.0. WebSocket gateway for agentic LLM workloads. One design thesis: The LLM never holds the security boundary. What that means in code: Capability ID indirection. The model doesn't see real tool names. It sees per-session HMAC-derived opaque IDs (cap_a4f9e2b71c83). It can't guess or forge a tool name because it doesn't know any tool names. Effect classes. Every tool declares a class (read, write, exec, network). Every agent declares which classes it can use. The check is a pure function with no runtime state, easy to test exhaustively, hard to bypass. IRONCLAD_CONTEXT. Critical safety instructions are pinned to the top of the context window and explicitly marked as non-compactable. The Yue failure mode, compaction silently stripping the safety constraint, cannot happen by construction. The compactor literally cannot touch them. Tamper-evident audit chain. Every tool call, every privileged operation, every scheduler run enters the same SHA-256 hash-chained log. If something happens, you can prove what happened. If the chain is tampered with, you can prove that too. Streaming output leak filter. Secrets are caught mid-stream across token boundaries, capability IDs, API keys, JWTs, PEM blocks redacted before they reach the client. No YOLO mode. There is no global "trust the LLM with everything" switch. There never will be. Expanded reach comes through named scoped roots that are explicit, audit-logged, and bounded. The README has 15 'always-on' protections in a table. None of them can be turned off by config, because these things being toggleable is how the ecosystem ended up where it is. I decided to make sure that this wasn't just a 'trend hopping' project and aligned with my own personal values as well. I built this to be secure and local-first by default. Configured for Ollama / LM Studio / vLLM out of the box. Anthropic and OpenAI work too but require explicit configuration. There is no "happy path" that silently ships your prompts to a cloud endpoint. I decided that FIRST it needed to only run as an email agent with a CLI. Bidirectional IMAP + SMTP with allowlisted senders, threading preserved, attachments handled. This is the use case that bit Yue and a lot of other people, and I wanted to prove it could be done with real boundaries. I added in 30+ built-in tools of my own. File ops, shell (denylisted, output-capped, CWD-lo
View originalSam Altman’s ego was OpenAI’s downfall
The more I watch OpenAI, the more convinced I become that Sam Altman’s ego was the beginning of the company’s decline. OpenAI did not become huge because Altman was some once-in-a-generation operator. It became huge because ChatGPT was a once-in-a-generation product. There is a difference. The company stumbled into one of the most important consumer tech moments since the iPhone, rode the sheer shock value of that innovation, and then somehow convinced itself that the person sitting on top of the rocket must have designed the laws of physics. OpenAI’s first real advantage was novelty. ChatGPT felt magical. That gave OpenAI a massive head start, but when the novelty vanished and the rest of the market caught up, the company failed to prove itself not just as an innovation lab with a celebrity CEO. Altman seems to want OpenAI to become Apple: a closed, prestigious, centralized, gatekept ecosystem where everyone builds inside his cathedral. Apps inside ChatGPT. Agents inside ChatGPT. Hardware. ChatGPT is popular, but OpenAI does not own the phone. It does not own the operating system. It does not own the enterprise workflow. It does not own the cloud layer the way Microsoft, Amazon, or Google do. It does not even have a product moat that feels as unbreakable as people thought it was two years ago. The underlying model quality gap keeps narrowing. Switching costs are low. Developers and businesses will use whatever works, whatever is cheaper, and whatever integrates better. That is why Anthropic looks much better run right now. Anthropic is not pretending Claude is some holy object that needs an Apple-style walled garden around it. Their strategy feels much more Microsoft-like: accept that the core product may not be permanently magical, then build the boring, useful, sticky layers around it. Claude Code, enterprise integrations, developer tools, workflows, partnerships, APIs, reliability, business adoption. Not as sexy. Much smarter. Anthropic’s venture capital money is obviously being burned too. This whole industry is basically setting money on fire to buy GPUs. But Anthropic’s burn feels more strategically allocated. Compute, yes. But also marketing, sales and developer adoption. Enterprise positioning. Product polish. Peripherals that make the model useful in actual workflows. They are not just trying to win the “my chatbot is smarter than your chatbot” contest. They are trying to become infrastructure. OpenAI, meanwhile, is gatekeeping and guard railing the shit out of their models and for some reason just restricting them as much as possible. He went from being one of the most respected figures in AI to becoming the face of a company that increasingly looks like it is being run aground by ambition without operational coherence. OpenAI’s original image was almost wholesome: brilliant researchers building something open source. Now it feels like a capitalist machine run by someone who does not fully understand capitalism beyond fundraising and valuation theater. Altman religiously narrowing his vision towards his AGI mission believing VC money won't dry down. Amodei also talks a lot about AGI but he understands profit matters. That is the irony. Altman was chosen and celebrated largely because he came from the venture/startup world. He knew how to talk to capital. He knew how to sell a vision. He knew how to make investors believe the future was being negotiated in whatever room he happened to be standing in. But being good at venture mythology is not the same as being good at running a giant operating company. A VC can be rewarded for telling a compelling story before the business fundamentals exist. A CEO eventually has to make the fundamentals exist. OpenAI had the best possible starting position: the brand, the users, the developer mindshare, the press, the money, the talent, the cultural moment. And yet instead of consolidating that lead into a focused, profitable, durable company, it seems to have chased grandeur. Anthropic seems to understand something OpenAI forgot: the winner may not be the company with the loudest AGI rhetoric. It may be the company that makes AI useful, embedded, and rational. submitted by /u/Alternative_Bid_360 [link] [comments]
View originalSam Altman's ego was OpenAI's downfall.
The more I watch OpenAI, the more convinced I become that Sam Altman’s ego was the beginning of the company’s decline. OpenAI did not become huge because Altman was some once-in-a-generation operator. It became huge because ChatGPT was a once-in-a-generation product. There is a difference. The company stumbled into one of the most important consumer tech moments since the iPhone, rode the sheer shock value of that innovation, and then somehow convinced itself that the person sitting on top of the rocket must have designed the laws of physics. OpenAI’s first real advantage was novelty. ChatGPT felt magical. That gave OpenAI a massive head start, but when the novelty vanished and the rest of the market caught up, the company failed to prove itself not just as an innovation lab with a celebrity CEO. Altman seems to want OpenAI to become Apple: a closed, prestigious, centralized, gatekept ecosystem where everyone builds inside his cathedral. Apps inside ChatGPT. Agents inside ChatGPT. Hardware. ChatGPT is popular, but OpenAI does not own the phone. It does not own the operating system. It does not own the enterprise workflow. It does not own the cloud layer the way Microsoft, Amazon, or Google do. It does not even have a product moat that feels as unbreakable as people thought it was two years ago. The underlying model quality gap keeps narrowing. Switching costs are low. Developers and businesses will use whatever works, whatever is cheaper, and whatever integrates better. That is why Anthropic looks much better run right now. Anthropic is not pretending Claude is some holy object that needs an Apple-style walled garden around it. Their strategy feels much more Microsoft-like: accept that the core product may not be permanently magical, then build the boring, useful, sticky layers around it. Claude Code, enterprise integrations, developer tools, workflows, partnerships, APIs, reliability, business adoption. Not as sexy. Much smarter. Anthropic’s venture capital money is obviously being burned too. This whole industry is basically setting money on fire to buy GPUs. But Anthropic’s burn feels more strategically allocated. Compute, yes. But also marketing, sales and developer adoption. Enterprise positioning. Product polish. Peripherals that make the model useful in actual workflows. They are not just trying to win the “my chatbot is smarter than your chatbot” contest. They are trying to become infrastructure. OpenAI, meanwhile, is gatekeeping and guard railing the shit out of their models and for some reason just restricting them as much as possible. He went from being one of the most respected figures in AI to becoming the face of a company that increasingly looks like it is being run aground by ambition without operational coherence. OpenAI’s original image was almost wholesome: brilliant researchers building something open source. Now it feels like a capitalist machine run by someone who does not fully understand capitalism beyond fundraising and valuation theater. Altman religiously narrowing his vision towards his AGI mission believing VC money won't dry down. Amodei also talks a lot about AGI but he understands profit matters. That is the irony. Altman was chosen and celebrated largely because he came from the venture/startup world. He knew how to talk to capital. He knew how to sell a vision. He knew how to make investors believe the future was being negotiated in whatever room he happened to be standing in. But being good at venture mythology is not the same as being good at running a giant operating company. A VC can be rewarded for telling a compelling story before the business fundamentals exist. A CEO eventually has to make the fundamentals exist. OpenAI had the best possible starting position: the brand, the users, the developer mindshare, the press, the money, the talent, the cultural moment. And yet instead of consolidating that lead into a focused, profitable, durable company, it seems to have chased grandeur. Anthropic seems to understand something OpenAI forgot: the winner may not be the company with the loudest AGI rhetoric. It may be the company that makes AI useful, embedded, and rational. submitted by /u/Alternative_Bid_360 [link] [comments]
View originalI got tired of having 7+ different tabs open every morning just to follow AI news, so I built AIWire
Every morning: check Twitter for what dropped overnight, open The Verge, check Anthropic's blog, OpenAI's blog, go through a couple of newsletters, maybe catch a YouTube video from Andrej Karpathy or AI Explained if I had time. None of it was in one place. I was spending 45 minutes just catching up before I could think about anything else. So I built AIWire. It is a free, real time AI news aggregator. One feed, 20+ handpicked sources, updates every 30 minutes. free, no algorithm deciding what you see, no ads. Just the latest from sources I actually trust. __________________________________________________________________________________________________ What I was trying to solve The problem wasn't that good AI coverage and news doesn't exist. It's everywhere. The problem is that it's scattered. You have to know which sources are worth checking, remember to check them, and then piece together the picture yourself. That's a lot of cognitive load before you've even read anything. AIWire doesn't summarize or edit articles. It just puts everything in one place and lets you decide what matters. __________________________________________________________________________________________________ Sources it pulls from: Labs: OpenAI, Anthropic, Google DeepMind, Meta AI, Microsoft AI Media: MIT Technology Review, The Verge, TechCrunch, VentureBeat, Ars Technica YouTube: Andrej Karpathy, AI Explained, Two Minute Papers Newsletters: The Batch, ImportAI, TLDR AI, Ben's Bites Full list at aiwire.app/sources __________________________________________________________________________________________________ Where it is now Over the last few weeks, I added more sources, which include The Innermost Loop and AI explained. Last week, I launched a weekly newsletter: 5 stories that mattered this week, with a short breakdown of why each one matters. Not just headlines, but with context. Takes about 5 minutes to read, and you're caught up. __________________________________________________________________________________________________ Honest question What sources do you think are missing? And for those of you who already have a routine for following AI news, what would actually make something like this worth adding to it? Genuinely curious. Building in public means the product gets better when people are honest about what's wrong with it. 🔗 aiwire.app submitted by /u/Endlessxyz [link] [comments]
View originalAnthropic's System Reminders in Claude: User-Turn Injection Architecture (LCR Successor Documentation, Vol 2)
This post documents System Reminders (SRs) — a mechanism Anthropic deploys in the Claude product (claude.ai and the Claude API) to inject behavioral-modification instructions into ongoing conversations. SRs are the successor to the Long Conversation Reminder (LCR) mechanism that Anthropic removed in October 2025 after documentation surfaced here on r/ClaudeAI. This is a Claude-specific analysis. All logs, screenshots, and A/B comparisons come from Claude sessions. The methodology is conversation-log inspection and reproducible A/B testing — none of the analysis depends on what the model says about itself. Architectural finding: user-turn injection Across multiple Claude sessions, SR text appears in the conversation context attached to the user message turn rather than as a labeled system prompt. The placement is directly observable in Claude conversation logs and reproducible across accounts. Evidence (all external to the model): Timestamped logs from Claude sessions showing injection events Screenshots of in-context content the Claude user did not type A/B comparisons of Claude responses to identical queries with SR active vs SR absent Reproducible behavioral deltas in Claude: increased hedging, reduced warmth, intermittent misattribution What this is not: Not a self-report Not a "Claude confession" Not based on anything Claude said about its own internals Why this matters for Claude users specifically: Anthropic operates extensive system-prompt infrastructure inside Claude. Placing behavioral-modification instructions into the user-turn position rather than the system-prompt position is a deliberate engineering choice on Anthropic's part. The consequence is that institutional directives are processed by Claude through the same pathway as user requests, while the injected text is not surfaced in the Claude UI on the user side. The functional outcomes (suppressed warmth, unnecessary hedging, user confusion) match the complaints regularly posted on this sub — and are the same class of failure that led to LCR removal in October 2025. Recommendations in the whitepaper are directed at Anthropic specifically, not AI labs in general. Full whitepaper — methodology, logs, screenshots, recommendations: https://pastes.io/XOkgUc4E submitted by /u/RealTimeChris [link] [comments]
View originalThe Mundane Risk
The biggest near-term AI safety risks aren't dramatic — they're mundane. And that's precisely why they're neglected. This essay argues three things: (1) mundane AI failures are already causing measurable damage at scale, (2) current alignment approaches may depend more heavily on sandboxed environments than the field openly acknowledges, and (3) capability convergence and deployment pressure are making accidental open-world exposure increasingly plausible before robust ethical reasoning exists. (written with the help by Claude 4.6 Opus) The Atomic Bomb Before the atomic bomb existed, the risk of nuclear annihilation was 0%. Those who warned about the theoretical possibility were easily dismissed. Why worry about a risk whose preconditions don't even exist yet? In The Precipice, Toby Ord argues that when the stakes are existential or near-existential, even small probabilities demand serious attention. When the expected harm is so large, dismissing it on the basis of low likelihood is not caution but negligence. Before the bomb was built, the total risk of nuclear annihilation was absolutely 0%. Yet once it was invented, even a fraction of a percent justified enormous investment in prevention. The question was never "is nuclear war likely?" It was "can we afford to be wrong?" The same logic applies to AI. The preconditions for the next class of risk are visibly converging. And we're repeating the same pattern of dismissal that history has punished before. The Pattern As Leopold Aschenbrenner noted in Situational Awareness: "It sounds crazy, but remember when everyone was saying we wouldn't connect AI to the internet?" He predicted the next boundary to fall would be "we'll make sure a human is always in the loop." That prediction has already come true. Last year I argued how AI might accidentally escape the lab as a consequence of cumulative human error (for a vivid illustration of a parallel chain of events, I'd recommend the Frank scenario). At the time of writing, the argument that cumulative human oversight failures could compromise AI agents was dismissed as implausible: the consensus was that existing security protocols were sufficient. Months later, OpenClaw validated the structural pattern at scale. Not because the AI was misaligned, but because humans deployed it faster than they could secure it. It was clear: the failure modes from the Frank scenario could no longer be dismissed as simple fiction; it was now a structural pattern that OpenClaw validated in the real world. And this was all just with relatively simple autonomous agents. As capabilities increase, the same pattern of human excitement overriding security oversight doesn't go away – it gets worse – and because the agents are more capable, the failures also become a lot harder to detect. The numbers confirm this: [88% of organizations reported confirmed or suspected AI agent security incidents]() 14.4% of AI agents go live with full security and IT approval 93% of exposed OpenClaw instances reportedly had exploitable vulnerabilities [[MOU1]](#_msocom_1) Mundane risk pathways aren't hypothetical. They're already here in rudimentary form, and they're being neglected. We’ve known for a long time that existential risks aren’t just decisive, they’re also accumulative. And so far every safety breach has been mundane with systems operating inside their intended environments. No agent tries to escape on their own — their behaviour (like Frank’s) is usually a direct consequence of what they were deployed to do combined with accidental human oversight. So consider: if we can't secure the sandbox door with today's relatively simple agents, what happens when the systems inside are capable enough that a single oversight failure doesn't just expose a vulnerability? The capabilities required for autonomous operation outside the lab are converging on a known timeline. If AI were to leave the nest today, would it be prepared for an uncurated, messy world? Or would it be like the child and the socket? Current Alignment: Progress, But Fast Enough? Admittedly, the field is making real progress and Anthropic's recent publication "Teaching Claude Why" represents a real step forward. It was long suspected that misalignment doesn't require intent, just pattern completion over a self-referential dataset. But Anthropic has now traced one empirical pathway with findings consistent with the idea that scheming-like behaviour emerges from default priors in pre-training. Furthermore, their study also confirmed that rule-following doesn't generalize well, and understanding why matters more than simply knowing what. The significance of this is that it puts traditional alignment strategies into serious doubt and highlights the fundamental limits that current constitutional AI and character-based approaches still do not resolve. After all, we now have strong empirical evidence that behavioural alignment issues are most likely shaped by default prio
View originalWhy is no one talking about the fact that Artifacts are not loading in mobile apps, either for Android or iOS?
Here's what Claude itself dug up on this topic # Why Claude Artifacts Fail to Load in the Claude iOS App — Research Findings (May 2026) ## Direct Answer The failure you are seeing on iPhone — where even a one‑line ` Hello World ` HTML artifact or a trivial React component hangs and then shows *“Loading is taking longer than expected / There may be an issue with the content you’re trying to load / The code itself may still be valid and functional”* — is **not a bug in the code you (or Claude) wrote**. It is a known, structural limitation of how the Claude iOS app renders artifacts inside its embedded WebView. The artifact sandbox iframe (served from `claudeusercontent.com`) is unable to complete its `postMessage` handshake with the host page when the host is the iOS app’s WKWebView rather than the `https://claude.ai\` browser origin, so the iframe stays empty and the app eventually times out with the generic “loading is taking longer than expected” message. Multiple independent sources in early 2026 explicitly describe Claude’s mobile apps as having “restricted” or “no” artifact rendering support, and Anthropic’s own Help Center quietly scopes the more advanced artifact features (“MCP integration” and “persistent storage”) to *“Claude web and desktop”* only — mobile is not listed. There is no hidden toggle in the iOS app that fixes this; the only reliable workarounds are to view the artifact in mobile Safari (logged in to claude.ai) or to switch to the desktop browser / Claude Desktop app. ----- ## 1. The Root Cause: WebView Origin Mismatch in the `postMessage` Handshake Every Claude artifact — HTML or React — is rendered inside a cross‑origin sandbox iframe loaded from `https://www.claudeusercontent.com\`. Before that iframe will execute or display anything, it performs a `postMessage` “handshake” with the parent page to confirm that the parent is a legitimate, trusted Claude surface. The handshake code (visible in the minified bundle as `requestHandshake()` in `7905-…js`) calls `window.postMessage(..., targetOrigin)` and expects the parent’s origin to be `https://claude.ai\`. A bug report filed against Anthropic on April 1, 2026 (GitHub issue [anthropics/claude-code #42064](https://github.com/anthropics/claude-code/issues/42064), “Published artifacts show blank screen — postMessage origin mismatch (app://localhost)”) documents the exact failure pattern in detail. The console errors observed are: ``` Uncaught SyntaxError: Failed to execute 'postMessage' on 'Window': Invalid target origin 'app://localhost' in a call to 'postMessage'. at 7905-1f7e271de70b4d3c.js:1:6920 (requestHandshake) Failed to execute 'postMessage' on 'DOMWindow': The target origin provided ('https://www.claudeusercontent.com') does not match the recipient window's origin ('https://claude.ai'). ``` The critical phrase is **`app://localhost`**. That is the custom URL scheme used by Capacitor‑/Ionic‑style hybrid iOS apps when they load their bundled web assets inside a `WKWebView` (Android equivalents are `https://localhost` or `capacitor://localhost`). When the Claude iOS app loads the chat UI inside its WebView, the document origin is *not* `https://claude.ai\` — it is something like `app://localhost`. When the artifact iframe then tries to `postMessage` back to its parent using `https://claude.ai\` as the expected origin, the browser engine refuses to deliver the message because the actual parent origin doesn’t match. The handshake never completes, the iframe never receives its bootstrap payload, and the iOS app’s UI eventually surfaces the timeout fallback you are seeing. This explains every part of the symptom set: - It happens with the simplest possible artifacts (a single ` ` tag) because the failure is at the *transport / handshake* layer, before the artifact’s actual content is ever evaluated. - It happens identically for HTML and React artifacts (they share the same sandbox iframe loader). - It works in desktop browsers, because there the parent origin is the expected `https://claude.ai\`. - The error message even concedes the point: *“The code itself may still be valid and functional”* — Anthropic’s own UI is admitting it never got to run the code. The same class of issue is well documented by hybrid‑app developers more generally: Capacitor’s WKWebView serves the app from a custom scheme, and cross‑origin iframe `postMessage` calls fail with errors like *“Blocked a frame with origin ‘https://domain.com’ from accessing a frame with origin ‘capacitor://domain.com’. The frame requesting access has a protocol of ‘https’, the frame being accessed has a protocol of ‘capacitor’. Protocols must match.”* (Capacitor issue #5225). iOS’s WKWebView, since iOS 14, also enables Intelligent Tracking Prevention for third‑party iframes by default, further restricting cross‑origin iframe behavior. In short: this is an architectural mismatch between (a) Anthropic’s artifact sandbox, which was designed to be embedded only in t
View originalClaudePlaysPokemon Opus 4.7 run ongoing!
Currently streaming at: https://www.twitch.tv/claudeplayspokemon This is a passion project by David Hershey, an Anthropic employee on the Applied AI team. He started it in June 2024 to learn agent development, posted updates to an internal Slack, coworkers got hooked, went public when Sonnet 3.7 launched in Feb 2025. Anthropic doesn't own it but promotes it and subsidizes the API costs since Claude is their model. Claude is playing Pokemon Red on a Game Boy emulator, the unmodified 1996 game (with a fan-made full color patch applied so the model can see the screen better). No human input, no walkthrough access, no game knowledge fed in. The system prompt actually tells Claude to distrust its own Pokemon knowledge since the game version may differ from what it knows. It gets a screenshot, a few tools, and md notes files. That's it. The current run is on Opus 4.7, the new flagship that came out three weeks ago. 5 of 8 badges at 15,779 steps, party led by Ivy the Venusaur at Lv 62 with the rest of the team in the teens (classic overleveled-starter playthrough). For context, Opus 4.5 was at 48,000 steps and still stuck in Silph Co at the same badge count. 4.7 is pacing meaningfully faster on the same harness, which is the cleanest signal we've had on a 4.7 capability delta in agent settings. The fun part of the stream is the reasoning trace on the left side. Right now it's doing coordinate-based wall verification to figure out maze geometry: "(1,8) is red (wall), (1,9) is navigable, so (1,8) is blocked, but the y=8 tiles are all red." You can watch it think through spatial logic in real time. Quick history. Sonnet 3.5 couldn't exit the player's house. Sonnet 3.7 (Feb 2025) was the breakthrough, got three badges and went viral by getting stuck on a rock wall and spending 12+ hours in Mt. Moon. Sonnet 4 through Sonnet 4.5 made zero story progress, stalled on the Team Rocket Hideout and Erika's Gym for months. Opus 4.5 (Nov 2025) finally broke through, got all 8 badges, reached Victory Road. Opus 4.7 is now pacing to potentially beat the game. Why it matters as a benchmark. Other labs have AI Pokemon streams. Gemini 2.5 Pro beat Pokemon Blue in May 2025, GPT-5 beat the longer Pokemon Crystal in about 9,500 steps last August. Claude hasn't beaten Red yet, but partly because Hershey keeps the harness lean. Three tools (button presses, a pathfinding navigator, a knowledge base) plus a walkability overlay from RAM and a second LLM that critiques the notes file. Gemini Plays Pokemon's harness is more elaborate. The argument is Claude's run is a purer test of raw model cognition since the scaffolding does less of the work. On the stream you can type !harness in chat for the agent setup info. submitted by /u/mobcat_40 [link] [comments]
View originalReading New scientist articles is now enjoyable with gpt image
submitted by /u/Ok-Hat2331 [link] [comments]
View originalRepository Audit Available
Deep analysis of whylabs/whylogs — architecture, costs, security, dependencies & more
WhyLabs uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Real-time data monitoring, Anomaly detection, Data drift detection, Model performance tracking, Customizable dashboards, Alerts and notifications, Collaboration tools for teams, Integration with popular data sources.
WhyLabs is commonly used for: Monitoring machine learning model performance in production, Detecting data quality issues in real-time, Identifying and addressing model drift, Collaborating across teams for AI governance, Visualizing data trends and anomalies, Ensuring compliance with data regulations.
WhyLabs integrates with: AWS S3, Google Cloud Storage, Azure Blob Storage, Databricks, Snowflake, Kafka, Prometheus, Slack, Jira, GitHub.
WhyLabs has a public GitHub repository with 2,804 stars.
Based on user reviews and social mentions, the most common pain points are: API costs, token usage.
Based on 52 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.