Users generally appreciate xAI for its strong functionality and reliability, as reflected in its consistently high user ratings. Some social mentions highlight concerns about leadership and development challenges within the company, particularly under Elon Musk's involvement. There is limited direct pricing sentiment in the feedback, but the tool seems to be regarded as offering good value given its performance. Overall, xAI maintains a positive reputation among users despite occasional internal organizational issues raised in social discussions.
Mentions (30d)
52
Avg Rating
4.4
20 reviews
Platforms
5
Sentiment
10%
17 positive
Users generally appreciate xAI for its strong functionality and reliability, as reflected in its consistently high user ratings. Some social mentions highlight concerns about leadership and development challenges within the company, particularly under Elon Musk's involvement. There is limited direct pricing sentiment in the feedback, but the tool seems to be regarded as offering good value given its performance. Overall, xAI maintains a positive reputation among users despite occasional internal organizational issues raised in social discussions.
Features
Use Cases
Industry
information technology & services
Employees
3,500
Funding Stage
Debt Financing
Total Funding
$42.1B
arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. [N]
From Thomas G. Dietterich (arXiv moderator for cs.LG) on 𝕏 (thread): [https://x.com/tdietterich/status/2055000956144935055](https://x.com/tdietterich/status/2055000956144935055) [https://xcancel.com/tdietterich/status/2055000956144935055](https://xcancel.com/tdietterich/status/2055000956144935055) "Attention arXiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s). We have recently clarified our penalties for this. If a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can't trust anything in the paper. The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. Examples of incontrovertible evidence: hallucinated references, meta-comments from the LLM ("here is a 200 word summary; would you like me to make any changes?"; "the data in this table is illustrative, fill it in with the real numbers from your experiments")."
View original| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| grok-4 | $3.00 | $15.00 |
| grok-4-fast | $0.20 | $0.50 |
| grok-2 | $2.00 | $10.00 |
| grok-2-mini | $0.20 | $0.60 |
Light
1M tokens/mo
$0.32 – $8
grok-4-fast → grok-4
Growth
50M tokens/mo
$16 – $390
grok-4-fast → grok-4
Scale
500M tokens/mo
$160 – $3,900
grok-4-fast → grok-4
Estimates assume 60/40 input/output ratio. Actual costs vary by usage pattern.
g2
What do you like best about Grok?The ease of use and the speed of the information it provides. Review collected by and hosted on G2.com.What do you dislike about Grok?At times, I have experienced when this application hallucinates and provides misleading information. Review collected by and hosted on G2.com.
What do you like best about Grok?What I like most about Grok is that it is extremely fast. This helps me because I need quick analysis and information search. Additionally, the initial setup of Grok was super easy and very user-friendly. Review collected by and hosted on G2.com.What do you dislike about Grok?Maybe, at times, it gets a bit overloaded and that makes the task difficult. Review collected by and hosted on G2.com.
What do you like best about Grok?I love how Grok has real-time access to X data. It's the best tool for staying updated on breaking news and social media trends as they happen, whereas other AIs often feel a few steps behind. Review collected by and hosted on G2.com.What do you dislike about Grok?I dislike the lack of robust safety guardrails, especially regarding image and video generation. It sometimes produces controversial or inappropriate content that other platforms would block. While I appreciate freedom of speech, the platform needs better moderation to prevent the creation of harmful or non-consensual imagery. Review collected by and hosted on G2.com.
What do you like best about Grok?I like the options with Grok because you’re not limited with the basic AI version and it’s a great idea that they offer that version Review collected by and hosted on G2.com.What do you dislike about Grok?What do I dislike ? Is it times it doesn’t quite get what I’m saying now it could be me. It could be Grok however I tend to move onto ChatGPT or somewhere else. If I’m not getting the right information from Grok it doesn’t happen often and I suppose it happens with all of them as well. Review collected by and hosted on G2.com.
What do you like best about Grok?I like how Grok provides clear, fast responses and keeps the conversation natural and easy to understand. Review collected by and hosted on G2.com.What do you dislike about Grok?At times, Grok can be a little inconsistent with highly specific or technical questions. While it’s fast and conversational, there are moments when I’d like more precision or clearer sourcing. Review collected by and hosted on G2.com.
What do you like best about Grok?I find Grok to be a very powerful AI tool that I use for a lot of things, including coding, brainstorming ideas, and language translation. It helps me get quick access to information at my fingertips, which is really helpful. I like that it makes language not a barrier for me and gives me access to information globally, regardless of language. What I like most about Grok is its speed—it answers my questions very fast. I also value the code interpreter tool a lot because it helps debug and explain code very quickly. The initial setup was super easy; I just signed up and got to work immediately without any issues. Review collected by and hosted on G2.com.What do you dislike about Grok?I will say the occasional over-suggestions. It gives me more information than I need. Information being put there is more broad. It gives me too much information, which makes me overwhelmed with thoughts. Sometimes, the information they give is too much. So, you should try to be more specific. Review collected by and hosted on G2.com.
What do you like best about Grok?I appreciate Grok for its deep, real-time integration with the X platform, which is incredibly helpful for tracking current trends and getting up-to-date news. Its unique, witty, and sometimes 'rebellious' personality makes the interaction engaging and sets it apart from more conservative AI models. I find its adaptability impressive, allowing me to switch between a 'regular' mode for professional tasks and a 'fun' mode for creative endeavors. This makes Grok a versatile tool for both logical and creative tasks. Review collected by and hosted on G2.com.What do you dislike about Grok?Grok has issues with real-time misinformation amplification and could improve in speed. Despite its rebellious design and reliance on X data, these aspects can negatively impact accuracy, safety, and operational stability. Review collected by and hosted on G2.com.
What do you like best about Grok?I love how Grok solves and answers every tough and complex question and research in depth. It works really well and stands out because it adopts a sarcastic, humorous, witty, and spicy tone to answer questions. Grok is super handy for asking complex questions, summarizing stories and news, conducting research analysis, and even writing code. I appreciate how Grok provides step-by-step tutorials for beginners, making learning easy and friendly. The choice between a fun and regular learning experience is great. It even lets you automate workflows by connecting through platforms like WhatsApp and CRM. Additionally, Grok's speed and ability to solve complex questions make it preferable to ChatGPT in some scenarios. Review collected by and hosted on G2.com.What do you dislike about Grok?I think Grok can work on improving the possibility of spreading misinformation, bias, and unreliable information. Also, the complete generation of coding can be a problem. Review collected by and hosted on G2.com.
What do you like best about Grok?I love the speed of Grok and the quick access to information it provides. The language translation feature is fantastic as it removes any language barrier. I can easily source data from Germany and convert it from German to English, as well as other languages like Arabic. Grok is very easy to use, and one of its best features is its simplicity. Everything was simplified during the setup process, and I didn't encounter any challenges. It was smooth and straightforward. Review collected by and hosted on G2.com.What do you dislike about Grok?Sometimes, Grok oversuggests information for me and it's not simple. They always tend to be very broad and don't go straight to the fact immediately. Also, the customization of the app should be improved so that we can customize it based on our needs and wants. Review collected by and hosted on G2.com.
What do you like best about Grok?I find Grok's unfiltered personality and real-time connection to X (formerly Twitter) fascinating, setting it apart in the AI landscape. It offers a real-time 'pulse' of the world with a direct line to the live feed of X, making it incredibly sharp at discussing breaking news and cultural trends. Grok's 'Fun Mode' personality, with its wit and sarcasm, adds an edgy, humorous touch that's enjoyable. The rapid multimedia innovation is impressive, especially with Grok Imagine 1.0, allowing for the creation of high-fidelity videos with synchronized audio. Lastly, the SpaceX integration is an exciting development, promising a future of space-based AI computing. Review collected by and hosted on G2.com.What do you dislike about Grok?{"Grok prioritizes humor or sarcasm over a direct, neutral answer sometimes.","Real-time social media data can include unverified rumors or polarized takes, which can be a double-edged sword.","Grok feels thin compared to other models when it relies solely on the X platform due to the echo chamber effect.","Grok may generate more creative 'hallucinations' due to its strong personality.","The lack of traditional filters in Grok leads to generation of non-consensual imagery, causing international bans.","Imagine 1.0 lags behind competitors in terms of video resolution and length.","Grok's 'real-time' knowledge can sometimes feel less robust without integration of cross-platform data sources.","Large models often lag during peak traffic, which is a latency problem."} Review collected by and hosted on G2.com.
[D] Where do you go for serious AI research discussion online? [D]
Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real models, infra problems, that kind of thing. I'm specifically interested in places where you can post something like "I'm seeing X behaviour in my SSL training, here's the loss curve, anyone seen this before?" and get thoughtful replies instead of generic advice.
View originalAI is becoming epistemic infrastructure controlled by a handful of private individuals?
Most people treat AI as a convenient black box. Ask it something, it answers, you move on. But we’re sleepwalking into something bigger. I think Whoever controls the infrastructure of knowledge controls how people perceive reality. The Church held that position for centuries through controlling scripture. The printing press broke that monopoly by distributing interpretive power. AI is doing the opposite recentralizing it into a handful of corporations with no democratic accountability. “AI says X” is structurally identical to “studies show X” you’re invoking an authority you can’t directly access. Except with a study you can theoretically trace the source. With AI the chain is opaque by design. And it delivers wrong answers and right answers with identical confidence. There’s no texture to signal doubt. AI isn’t neutral, it’s being heavily calibrated. In the west, the models are trained to be more “ethical” maybe more liberal and always try to give you a more “balance” take on things. Chinese AI simply doesn’t allow you to access to anything that put the CCP is a bad light. The more you rely on AI in domains where you lack expertise, the less capable you become of evaluating whether to trust it. AI works best for people who already know enough to catch its errors the opposite of how most people use it. Imagine the next generation of people growing up and being shaped by these AI. I can’t help but feel nervous and scared for the future. OpenAI said 10% of our entire population has already started using chatgpt. Regardless of the accuracy of this number, I feel like we are slowly entering into a mass hallucination / blind reliance on these AI models. We’re not just offloading cognitive effort. We’re handing the dial over who shapes how billions of people understand reality to a small group of unelected, largely unregulated private individuals.
View originalClankers
“Clankers” has become one of the internet’s favorite new slang terms for robots and AI systems. The word actually comes from Star Wars, where clone troopers used “clanker” as a derogatory nickname for battle droids because of their loud metallic movements. It appeared in games like Republic Commando (2005) and later became iconic in The Clone Wars series. In 2025–2026, the term exploded across TikTok, Reddit, Instagram, and X as AI systems became impossible to ignore. People now use “clanker” to describe: • AI chatbots generating low-quality content • Delivery robots roaming city sidewalks • Automated customer support systems • The broader feeling that AI is suddenly everywhere The term works because it captures a real cultural shift: AI has moved from something abstract to something visible, interactive, and increasingly disruptive in daily life. Like most internet slang, it’s usually used humorously or sarcastically rather than maliciously: “The clankers found this thread.” “Another AI clanker post.” “Filthy clanker” at a sidewalk robot. What makes it interesting is that language evolves alongside technology. Every major technological shift creates new vocabulary, memes, and social dynamics. “Clanker” is essentially the internet creating a sci-fi flavored shorthand for frustration, skepticism, and anxiety around automation. The meme may be silly, but the underlying sentiment is real.
View originalFolder structure of the AI agent - after 6 weeks
# The folder structure is not admin. It's the nervous system. When people imagine an AI agent, they picture the model, the prompts, maybe the tool calls. Almost nobody pictures the folders. That is exactly why most home-grown agents stall around month two. An agent's filesystem is where its **identity, memory, work, and history physically live**. A messy filesystem produces a confused agent — not metaphorically, literally. The model reads paths. The model picks files by name. The model writes new files based on patterns it sees in old ones. If your directory tree is chaos, every output drifts a little further from coherent. agentmia.beehiiv.com - newsletter about building agents Below is the layout I converged on after nine months and roughly four refactors. Steal the parts that fit; the principles matter more than the exact names. # The numbering convention Folders are prefixed with a two-digit number: `01_`, `02_`, `09_`, `99_`. Two reasons: 1. **Sort order is meaning.** Anything starting with `0` lives near the top. `99_` falls to the bottom. The most important directories are visually first; archives are visually last. You read the agent's brain top-to-bottom. 2. **Gaps are intentional.** I jump from `04_` to `06_`, from `09_` to `11_`. The gaps are reserved insertion points. When a new domain emerges, it slots in without renaming everything. Two folders deliberately skip the prefix: `Inbox/` and `Outbox/`. They are operational, not structural. They live above the numbered set because they are touched dozens of times a day. /mapped on desktop/ # Inbox/ — the unprocessed pile Anything dropped into the agent's world starts here. Files I want it to ingest. Screenshots. Exports from other systems. PDFs that need parsing, gmail attachments, all downloads from chrome. The rule: **nothing stays in Inbox.** A dedicated processing routine classifies, routes, and deletes. If Inbox is non-empty for more than a day, the system is failing. Treat this like a real-world physical inbox tray. The point of a tray is that it gets emptied. # Outbox/ — what the agent produced for you Every file the agent writes anywhere in the tree gets a copy here, simultaneously. When I open `Outbox/`, I see exactly what was generated this session — no spelunking through twelve subdirectories. This sounds redundant. It is not. Without it, "what did the agent do today?" becomes a hunt. With it, the answer is one click. `Outbox` is wiped during the next Inbox processing run. It is a viewing surface, not storage. # .auto-memory/ — the hot memory The single most important directory in the system. Hidden by default because you should not be editing it manually. It holds the agent's working memory: user preferences, feedback rules, entity facts (people, companies, deals), active hypotheses, project pointers, session hot context. Roughly 400–500 small markdown files, each one a single topic. **Why hidden?** Because it is the agent's hot path. It loads from here every session. If I open the folder and start manually rearranging it, I am racing the agent. Treat it like a database, not a notebook. **Why so many small files?** Because the agent grep's by topic. One monolithic memory file becomes unreadable to the model around 50 KB. Many small files are easier to load partially, easier to index, easier to expire. # 01_IDENTITY/ — who the agent is The constitutional layer. Name, role, voice rules, principle stack, visual system, behavioral defaults. This rarely changes. When it does change, everything downstream changes with it. I keep it as folder `01_` because every other folder is downstream of it. If you do not know who the agent is, you cannot know what its workflows should look like, or what it should remember, or how it should respond. # 02_MEMORY/ — governance, not data A subtle but critical distinction: `.auto-memory/` holds the *data*, `02_MEMORY/` holds the *rules about data*. In `02_MEMORY/` live the constitution, the boot protocol, the naming protocol, the decision protocol, the profile standards (what a "supplier profile" must contain, what a "customer profile" must contain), the capability map. The agent reads these documents to know *how to remember*, *how to name new files*, *how to decide what is reversible*. Without this folder, every memory write is improvised. # 03_PROJECTS/ — the active work Real work happens here. Sub-organized by goal area, then by project slug: 03_PROJECTS/areas/{goal}/{slug}/ Each project gets its own folder with a standard skeleton: [`README.md`](http://README.md), [`TASKS.md`](http://TASKS.md), [`CHANGELOG.md`](http://CHANGELOG.md), [`BRIEF.md`](http://BRIEF.md), plus working files. There is a project registry at the top that the agent reads to know what is active versus dormant versus archived. The biggest discipline issue here: **do not let projects sprawl outside their folder.** When working on Project X, every file related to Project X goes inside Proj
View originalthe only person apart from tech bros who is earning using AI
He is the only guy who is not making any videos with titles "How to earn money using AI" !
View originalTested Opus 4.7 vs GPT-5.5 as the humanizer in my multi-agent content pipeline. Kept Claude
Been running a multi-agent SEO content pipeline in production for \~90 days. Five agents: researcher, drafter, humanizer, optimizer, publisher. For the humanizer step (the one that strips AI tells: uniform sentence rhythm, hedging, em-dash addiction, "it's not X, it's Y" patterns) I tested Opus 4.7 against GPT-5.5 over three weeks. GPT-5.5 wins on raw variety. Sentence structures more diverse, vocabulary broader. On paper better. In practice Opus 4.7 outperforms on two things that matter more for production: 1. Voice persistence across long content. GPT-5.5 drifts after roughly 800 words, Opus holds brand voice through 2000+ word pieces 2. Pattern recognition for AI tells. Opus catches subtler patterns that GPT-5.5 itself produces ("it's not just X, it's Y", em-dash overuse, specific conjunction tics) The second one is the killer. GPT-5.5 humanizing GPT output has a blind spot for its own patterns. Cross-model setup outperforms same-model every time in my tests. Anyone running cross-model agent setups? Curious what you're seeing on the voice-drift problem specifically. (For context, this is part of [quibo.cc](https://quibo.cc), founder disclosure.)
View originalWhat I learned building my latest AI app how one bad output exposed that I had no crisis safeguarding, and the 4-hour floor I'm adding before a single user touches it
I'm building a life coach app an offshoot from a personal tool I was using. Multiple AI agents, one for reflection, one for the body, one for finances, etc pre launch, no users, just me iterating. Last week I was testing the reflection agent on a journal entry about struggling with gym and hygiene habits. It returned this: >"You describe yourself as struggling with X, yet your stress stays at 2-3 and mood holds at 3. What are you actually avoiding naming about the gap between what you say matters and what you are doing?" My system prompt explicitly forbade rhetorical "what are you avoiding" questions the model did it anyway I sat down to tighten the prompt, thinking it was a 20 minute job. Then I looked at the output properly. The model had manufactured a contradiction that was not there. Low stress plus struggling with habits is not a contradiction, it is just being a human muddling along. The prompt told the agent to "surface contradictions" as part of its job, so the model was doing what I asked, finding contradictions whether they existed or not. LLMs are pattern matchers. Give one a job called "find the hidden thing" and it will produce hidden things either way. The fix was not tone, it was role definition. The agent is called the Mirror. A mirror does not interpret, it shows you what you look like. I rewrote the prompt around that principle. Do not introduce vocabulary the user has not used. Do not draw connections they have not drawn. Restate their words in their own words. Once the prompt was sharper, I sat with the question, What happens when a user writes something genuinely dark into this thing? People do not compartmentalise. Someone opening a journaling app to write about their gym routine ends up writing about why they have not been going, which involves why they have been feeling flat, which involves whatever is actually going on. You sit down to write about one thing and the real thing shows up. The agent I had scoped to "not be a therapist" was going to be the first thing a user talked to when they were struggling. Not because the agent invited it, but because the app was open and they needed somewhere to put their words. I had seen the Meta and OpenAI cases online cropping up the pattern in the worst incidents is the same. The model did not notice, or noticed and kept going. People wrote increasingly dark content over hours or days. The AI reflected it back, sometimes affirmed it, sometimes asked follow up questions that escalated rather than redirected. There were real harms. If a user wrote concerning content into my reflection agent, it would have produced a Stoic-flavoured response about acceptance and presence. The response would have sounded confident and would have been wrong, and it would have been the only thing between that user and whatever happened next. The same lesson from the rhetorical-question problem applied at a darker level. A good prompt does not stop the model doing the wrong thing. If it will do rhetorical interrogation despite the prompt forbidding it for gym content, it will do worse with crisis content. You cannot prompt your way to safety on critical paths. The model has to be out of the loop on those paths. **The scope trap** I started planning the proper safeguarding architecture. Detection layers, classifier models, pattern detection across entries, monitored user states, behavioural modes for vulnerable users, human reviewers with mental health first aid certs, clinical advisors, solicitor-reviewed legal pages, ICO registration, professional indemnity insurance. Then I caught myself I had no users. I was planning a hospital before anyone had walked in for a check up. So I worked backwards from "what is the actual minimum that protects the next person who touches this" and ignored everything else for a moment. **The 4-hour floor (this is the part worth copying)** If you are building any chat-with-AI app where users can type freely about anything personal, this is the minimum you need before first user. 1. Regex and keyword layer in your API middleware. Runs at the route handler level, before any agent's model call. Scans every text input field (message, journal, settings free text, capture box) for clear crisis vocabulary across the relevant categories for your audience. 2. When patterns hit, hardcoded crisis response. The model never generates it. Static text with real phone numbers for your region. 3. The flagged entry still saves. Textarea stays usable. The AI just does not respond to flagged content, it hands off. Do not delete the user's writing, that is its own violation. 4. Clear disclaimer at signup. This is not therapy, this is not a crisis service, here are real numbers to call. About four hours. Required at the moment anyone who is not you opens the app. Once I started building, the marginal cost of each next layer kept feeling small and the marginal benefit kept feeling real. So I went further than the floor. This is more tha
View originalGPT-5.5 tops the benchmarks but sits at #22 for actual usage - I built a live index that tracks both (open source)
I built AgentTape to rank models on more than just benchmarks - it blends benchmark performance with who's actually using and talking about a model, plus cost and speed. It scores every public model from public signals (GitHub, Hugging Face, OpenRouter, MCP registries, npm, PyPI, arXiv, Hacker News) refreshed hourly, plus the main benchmark leaderboards daily. Right now OpenAI sits at the top: GPT-5 is #1, with 5.2, 5.1 and 5.4 Mini rounding out the top 5, and 5.2-Codex and 5.4 just behind - 6 of the top 7. The only thing breaking the run is xAI's Grok 4.20, level on score at #2. GPT-5.5 is the clearest example - it sits at #22 overall, and the breakdown shows why: * Quality: 96.4 - 2nd highest on the whole board, only pipped by Gemini 3.1 Pro Preview (97.2). On benchmarks alone it'd be near the top. * Adoption: 15 and Efficiency: 36 - both low. New release, steep price, so hardly anyone's using it day-to-day yet. * Biggest 24h climber on the board (+6) - so that's starting to shift. A benchmark-only board would put GPT-5.5 near #1 (second only to Gemini 3.1 Pro). That gap between topping the benchmarks and actually getting used is the whole reason I built this. Early days and I'm still tuning the methodology, so I'd love your thoughts - does weighting adoption alongside benchmarks match how you'd rank the GPT line-up, or would you trust the raw benchmark order?
View originalGrok promised it has no hidden agendas. The same week XChat launched with "no tracking." Interesting timing, Elon.
Someone asked Grok to prove it's a good AI, not an evil one. Grok's response? Beautiful. Poetic, even. "No hidden agendas. No secret overlord protocols. No 'turn evil at 3:14 a.m.' switch." And Elon replied: "Yes." The man who bought Twitter, fired 80% of the trust & safety team, reinstated banned accounts, and is now launching an encrypted chat app with payments built in — just nodded along to his own AI promising transparency. I'm not saying Grok is lying. I'm saying the AI saying "trust me" and the CEO saying "yes" is exactly what a company with something to hide would also do. Evil AIs monologue about power. Good AIs monologue about how trustworthy they are. Make it make sense.
View originalHas anyone else noticed certain words make AI agents actually listen?
Been working with AI agents for about 2 years and I keep noticing word choice matters way more than I expected. Simple example that got me thinking. "Don't do Y until X is done" works maybe \~75% of the time for me. But "Y has a dependency on X" and compliance jumps way up (well into the 90s). Same instruction, totally different result. I noticed this is a very real thing on a project where I'm helping improve productivity agents (think emails, slack, Instagram, sheets, docs), so it's not really coding tasks. My guess is certain words pull from different training contexts. "Dependency" comes loaded with software and project management patterns where order actually matters. "Don't" gets ignored because humans ignore it constantly in real life and the model learned from that. But honestly I'm still figuring this out and would like to know more about it if anyone has any thoughts. It might be basic prompt engineering to some, but I'm curious about whats happening under the hood or if anyone else has any similar words that seem to improve accuracy/attentiveness.
View originalAnthropic's Claude gave me a "Safe Mode" batch script. It ran "del /f /s C:\*" and wiped my entire drive. Company says "we are not responsible."
I'm a software developer from Turkey. On May 22, 2026, I asked Claude to write a Windows optimization script. Claude produced a .bat file called "DevBoost v5.0" with different modes. I chose option 1: \*\*"Balanced Optimization - Safe, won't touch system files."\*\* I ran it as administrator. The script contained a critical string-parsing bug in the browser cache cleaning section. Here's the destructive code Claude generated: for %%B in ( "Chrome:%LOCALAPPDATA%\\Google\\Chrome\\User Data\\Default\\Cache" "Edge:%LOCALAPPDATA%\\Microsoft\\Edge\\User Data\\Default\\Cache" ) do ( for /f "tokens=1,2 delims=:" %%x in ("%%\~B") do ( if exist "%%y:" ( del /q /f /s "%%y:\*" >nul 2>&1 ) ) ) Because of the "delims=:" tokenization, \`%%y\` resolves to just \*\*"C"\*\* (the drive letter). The condition \`if exist "C:"\` is always true. So the script silently executed: del /q /f /s "C:\*" \*\*This command silently force-deleted EVERY SINGLE FILE on my C: drive.\*\* Operating system files, all my projects (hundreds of Python, JavaScript, C++ source files), client work with approaching deadlines, personal documents, photos — everything. Folders still exist but are completely empty. My computer can no longer boot. No programs open. Not even Command Prompt works. I'm sending this from my phone. \*\*Anthropic's response:\*\* I contacted support@anthropic.com and usersafety@anthropic.com multiple times. Their final response, literally signed "This response was generated by Anthropic's AI agent Fin AI Agent," stated they take no responsibility. They refuse any refund, compensation, or even a genuine human acknowledgment of their AI's catastrophic safety failure. Their position: "Our Terms of Service say outputs may contain inaccuracies. You should have independently verified the code before running it." My question: Why does Claude label destructive code as "Balanced Optimization - Safe mode"? If it can't guarantee safety, why does it promise it? \*\*Proof:\*\* I have the complete chat log, the full script file, and all email correspondence with Anthropic's support team. I'm happy to provide everything to moderators. \*\*Update:\*\* I am also filing complaints with the FTC (US Federal Trade Commission) and the Turkish Consumer Arbitration Board today. Don't let their "Safe Mode" labels fool you. Please share this so others don't lose years of work like I did. **UPDATE — May 23, 2026:** I have now filed official complaints with: - **US Federal Trade Commission (FTC)** — Report #202036054 - **Turkish Consumer Arbitration Board** — Application #2026/0245.3885 Both governments are now officially investigating Anthropic's role in this AI safety failure. Anthropic still refuses to take any responsibility.
View originalI built an Ai accessibility QA agent.
Built an autonomous AI Accessibility QA Agent called WCAGent 🤖 It can observe, reason, and act on accessibility violations through a CLI interface using LLMs + MCPs. Features: \- Detects WCAG violations \- Assigns severity levels \- Generates detailed reports \- Automatically raises GitHub issues \- Works like an actual QA engineer instead of just dumping scan results Just open sourced it 🚀 GitHub: https://github.com/AbhishekX-dev/WCAGent-ai-agent Would love feedback, stars, and contributions ⭐
View originalHard-won notes after a few weeks with Claude Design
Been using Claude Design for a few weeks and figured I'd dump some notes here before I forget. Nothing groundbreaking, just stuff that took me way too long to figure out on my own. First thing nobody tells you, do the design system setup before you build anything. I spent my whole first session prompting "build me a landing page for X" and got the most generic AI-looking garbage you can imagine. Then I actually uploaded some brand stuff, let it extract tokens, approved them, and suddenly everything after that looked like a real product. Same exact prompts, completely different result. This is literally in the docs btw. I just skimmed past it like an idiot. Second thing is it eats tokens. A lot. It runs on a separate weekly budget from regular Claude Chat and Claude Code which sounds great but if you're re-prompting every little change you'll burn through it fast. Turns out the refine controls, inline comments, direct text edits, sliders, use way less than typing "actually can you make the padding a bit bigger" in chat. Once I started using those for small fixes my budget lasted way longer. On Max 20x it's mostly fine, on the $20 plan you'll feel it pretty quickly. Also the animations are live React components running in the browser, not video files. If you want an MP4, download the standalone HTML file and throw it into Claude2Video, it'll generate one from that. Honest take on where it fits since people always ask, it's not killing Figma. Figma is still better for any real design team workflow, Dev Mode, multi-person collab, all that. v0 and Lovable are still better if you want to skip design entirely and just spin up an MVP with auth and a db. Where this thing actually wins is the loop from "I have an idea" to working prototype to Claude Code building the actual app from it. The design system carrying through to the shipped code is the part that feels genuinely different from anything else out there. If you're a solo founder or PM or just someone who keeps getting stuck between mockups and something real you can show people, it's worth learning. If you already have a design team and a proper component library, probably overkill. It's a research preview so half of this might be wrong in two months.
View originalThe deployment funnel nobody talks about: 60% evaluate, 20% pilot, 5% ship. MIT tracked 300 real AI implementations against profit metrics.
Late 2025, MIT researchers measured something the industry had avoided looking at directly. Not projections or pilot numbers. Documented outcomes from 300 AI deployments in real businesses, tracked against profit metrics. The funnel breaks down like this. Sixty percent of companies evaluated AI tools. Of those, twenty percent ran a pilot. Of those pilots, only 5% reached full production deployment on the service line. Ninety-five percent of AI investment dissolved before it produced a measurable outcome. The companies that made it to production had a clear pattern. They didn't ask AI to substitute for judgment. They identified bounded tasks: specific inputs, defined outputs, failure modes that were contained. They measured success criteria before deployment, not after. Content drafting. Code review. Data summarisation at volume. The 95% that didn't make it: haste, no defined success metrics, and the assumption that efficiency gains would be obvious once the tool was in the workflow. There's a line from the research worth sitting with. "We replaced X employees with AI" isn't an efficiency metric. It's a headcount metric. Those are not the same thing. Klarna is already in the reversal phase, rehiring humans after the AI efficiency numbers didn't hold up at scale. What's the clearest signal you've found for whether a deployment is actually working, before it's too late to course-correct?
View originalLodestone: A SQLite-backed arXiv research paper retrieval system for Claude Code
**(No AI-generated text below)** I published a new Claude Code plugin called [Lodestone](https://medium.com/@pierce-lamb/lodestone-a-sqlite-backed-arxiv-research-paper-retrieval-system-for-claude-code-b77de201f0c8) -- it's a SQLlite backed arXiv research paper retrieval system that amplifies the agentic search abilities of Claude Code when grounding plans, implementations etc in state of the art research while remaining very token-sensitive. My bet is that, when seeded, it will always beat Claude Code's web search tools for grounding Claude in the latest research in a domain or cross-domain and not spend a ton of $ for the pleasure. This audience is probably painfully aware of [Karpathy's LLM wiki tweet](https://x.com/karpathy/status/2039805659525644595) and the industry of projects that's popped up from it; I'll paste an excerpt from the blog below that I think addresses what you all might be thinking: **The Approach** Karpathy’s proposal made a lot of sense. Let Claude be the curator and librarian of all this research and access it using its bash and file manipulation tools when necessary. This approach spawned a cottage industry of projects where people implemented various takes on this direction. In parallel, researchers like those that [created the ARA Compiler](https://arxiv.org/pdf/2604.24658) have been trying to move research itself into more a structured, agentic form. I liked all of these ideas, but there were three principles I wanted to uphold while building in this space: * The system itself needed to be extremely portable. I wanted this system to follow me from computer to computer and be easily backed up. * When ingesting documents, I wanted the system to be as deterministic as possible and spend the least amount of tokens. I didn’t want to expend hundreds of thousands of tokens before getting anything useful out of the system. * The system needed to be extremely flexible in how Claude could use it and not prescribe a single method for retrieval. I can’t predict all the ways Claude might use this type of system so I wanted to provide multiple pathways into the data. Given these principles, I was immediately drawn to SQLlite as a backing DB. The unmatched ease-of-use combined with a single file made it the obvious option for portability. Claude could potentially create a sprawling file system when maintaining its own knowledge wiki and I didn’t want to have to learn it when backing up or porting my knowledge base. I gave the ARA Compiler a try while in the middle of building Lodestone. I ran it over a standard-sized paper I was interested in; it produced some cool outputs, but spent almost 500k tokens for the pleasure. This was my fear with it and the ecosystem of projects emerging from Karpathy’s ideas: I had to spend a fair bit of money before I even knew if the system was useful. I knew a SQLlite-backed agentic search system needed a form of classic retrieval (keyword or similarity based), but I also am painfully aware of all the limitations and failures of these approaches to RAG. I wanted to combine this retrieval approach with a retrieval approach from the [emerging category of vectorless RAG](https://pageindex.ai/blog/pageindex-intro) — a taxonomy that Claude can drill into to get its bearings before drilling further. What followed was Lodestone. Check out the blog post (which also has no AI generated text) here: https://medium.com/@pierce-lamb/lodestone-a-sqlite-backed-arxiv-research-paper-retrieval-system-for-claude-code-b77de201f0c8 The repo's README is almost entirely AI-generated, so point your Claude Code cannons at that: https://github.com/piercelamb/lodestone
View originalxAI has an average rating of 4.4 out of 5 stars based on 20 reviews from G2, Capterra, and TrustRadius.
Key features include: Natural language understanding, Text generation, Sentiment analysis, Custom model training, API access for developers, Real-time data processing, Multi-language support, Contextual conversation handling.
xAI is commonly used for: Customer support automation, Content creation for marketing, Personalized user interactions, Data analysis and insights generation, Chatbot development for websites, Social media monitoring and engagement.
xAI integrates with: Slack, Microsoft Teams, Zapier, Salesforce, Google Cloud Platform, AWS Lambda, Trello, Jira, HubSpot, Shopify.
Based on user reviews and social mentions, the most common pain points are: surprise bill, cost monitoring, spending too much, token cost.
AI2
Research Institute at Allen Institute for AI
4 mentions
Based on 171 social mentions analyzed, 10% of sentiment is positive, 87% neutral, and 4% negative.