StackAI empowers enterprises to deploy AI Agents at scale. Build secure, compliant AI applications in minutes with our intuitive drag-and-drop no-code
Stack AI has been discussed in social mentions concerning advanced AI functionalities, including voice agents and the development of sophisticated agent protocols. However, users shared significant concerns about costly billing anomalies and the software's tendency to deviate from expected operations or provide unreliable output. The sentiment around pricing suggests a level of unpredictability in managing costs, leading to financial strain for some users. Overall, Stack AI seems to stir curiosity for its innovative potential, but users are wary of operational reliability and cost management.
Mentions (30d)
82
34 this week
Reviews
0
Platforms
2
Sentiment
10%
14 positive
Stack AI has been discussed in social mentions concerning advanced AI functionalities, including voice agents and the development of sophisticated agent protocols. However, users shared significant concerns about costly billing anomalies and the software's tendency to deviate from expected operations or provide unreliable output. The sentiment around pricing suggests a level of unpredictability in managing costs, leading to financial strain for some users. Overall, Stack AI seems to stir curiosity for its innovative potential, but users are wary of operational reliability and cost management.
Features
Use Cases
Industry
information technology & services
Employees
76
Funding Stage
Series A
Total Funding
$19.1M
Banned by OpenAI after reporting a live credential hijack. They admitted in writing my account was broken. Here are 7 months of forensic receipts and 20+ cases.
[Drive Link for Zipped Proof](https://drive.google.com/file/d/1qU_LyLY-JMhNR_bqOV1-a2RJAbplL68e/view?usp=drivesdk) I am a developer and paying long term subscriber to ChatGPT since January 2025. I build complex local first sovereign systems. My workflows are incredibly context heavy with large files spanning code, research reports, and other analysis. I do not, or rather did not as the platform has been non functional since November 2025 meanwhile customer support is auto closing tickets, admitting I am having platform issues. I do not use this platform for casual queries, as a solo developer with no formal "team" chatgpt was one of my reliable co collaboration hubs to help ensure I am maintaining proper development of said complex systems. I feed it massive codebases for systems analysis and obtaining new insights I may personally have missed. My manual code uploads and token inputs routinely exceed the model's output volume by a massive margin. I do not abuse this platform. It is actually impossible as the very features advertised under the paid subscription do not work. I am exactly the type of user this platform was built for, and I have been a continuous, paying ChatGPT Plus subscriber since January 2025. Since October 2025, my workspace has been systematically breaking and beginning November 2025 total workspace degredation. This was not an occasional glitch. Persistent memory modules stopped updating. Custom instructions were ignored by the models. Project files failed to load. Custom instructions, personalization features, connector abilities, file tool, even projects do not work. It started as a continuous degradation until total failure. OpenAI customer service even admitted as such and yet months later I've talked to nothing but bots, not only LLMs as customer service but even instances of falsely identifying as true human support. It was a state of rolling degradation across the entire paid tier, month after month. Meanwhile OpenAI freely has enhanced for businesses and enterprise tiers. I have not just rapid complained to standard support. I ran and obtained cross platform diagnostics, failure logs. I even documented and told oai customer support the exact replication steps only to be met with acknowledgement of degredation with no resolution. I handed OpenAI support a completely packaged technical breakdown of their failing infrastructure across 20 separate support tickets over a 7 month period. I did their QA work for free. And I have the receipts to prove it. I am attaching the screenshots and the exact email files to this post. In Case 06830839, OpenAI Support explicitly put this in writing: "We acknowledge that you have been experiencing persistent technical issues affecting several features of your ChatGPT subscription, including tools, memory functions, personalization settings, connectors, and project files... We also understand your concern that communication on the case stopped after you provided detailed evidence..." Read that again. They acknowledged in writing that my account was fundamentally broken. They acknowledged that their own team ghosted me after I handed them the diagnostic proof. Yet they kept charging my card every single month for a product they knew was failing. The Hijack Escalation: Two days ago, the situation escalated from a broken product to a severe security incident. I was monitoring my environment and watched my Codex rate limits drop in 10 percent chunks across 2 seperate sessions on a fresh boot of the desktop app. This happened twice inside a 10 minute window. I had zero active sessions running. There was zero usage on my end. My account token was being actively drained by an unauthorized third party exploit. I immediately opened an emergency unauthorized activity report under Case 09113391 to notify them of the hack. Their response was to totally reframe this problem as disputing fraudulent activity trying to do damage control of the situation and altering the record. The Reframe Attempts: Instead of investigating the breach, OpenAI support deliberately twisted the record. They not only deliberately reframed my security report as an "appeal for fraud." They manipulated the ticket classification to make it look like I had been flagged for fraud and was begging for an appeal, rather than a developer reporting a live exploit on their infrastructure. They ignored the active threat their own platform was exposing. They did not lock the token. They did not roll my API keys. They did absolutely nothing to secure a compromised paying user other than shift the blame. Fast forward to this morning, their automated Trust and Safety system swept the high volume traffic from the attacker, scored it as a malicious exploit originating from my account, and deactivated/banned me for "Cyber Abuse." All the while actively preventing chatgpt models from helping me try to disgnose and trace the infiltration. They locked the doors and blamed the homeowner for the
View originalPricing found: $0, $0 /month, $0, $0, $0
AI solves 80-year-old math conjecture for under $1000
GPT-next solved an 80-year-old Erdős combinatorics conjecture for under $1,000 in compute. That single fact reframes everything else happening this week. The [Erdős unit distance problem](https://www.latent.space/p/ainews-openai-gpt-next-disproves) resisted human mathematicians since 1946. A frontier model closed it at a cost lower than a mid-tier SaaS subscription, which means the boundary between "AI as tool" and "AI as independent discoverer" is no longer theoretical. [Lilian Weng's new deep dive](https://lilianweng.github.io/posts/2025-05-01-thinking/) on test-time compute and chain-of-thought reasoning explains the underlying mechanism: reasoning models are not retrieving known proofs, they are generating novel inference chains at scale. The infrastructure layer is pricing this in faster than most observers realize. [Railway reports $200K+ monthly coding agent spend](https://www.latent.space/p/railway) and 100K signups per week, and is now building own-metal data centers to absorb the load. Daytona hit 850K daily sandbox runs with 74% month-over-month growth, confirming that isolated compute environments are now a first-class primitive, not a niche DevOps concern. Three specialized infrastructure companies, Exa, Modal, and TurboPuffer, reached unicorn valuations simultaneously this week, covering retrieval, serverless GPU, and vector search. When picks-and-shovels companies price in sustained demand at the same moment, it is not coincidence. Every major lab has now repositioned as an agent lab, not a model lab. [ClickUp replacing hundreds of employees with thousands of AI agents](https://techcrunch.com/2026/05/25/what-clickups-mass-layoff-tells-us-about-the-future-of-work/) is the first established tech company to execute that repositioning at the labor level rather than just the product level. The counterweight is that [Salesforce customers remain locked in](https://www.theregister.com/saas/2026/05/26/the-saas-pocalypse-can-wait-salesforce-still-has-customers-where-it-wants-them/5245228) despite the theoretical ability to rebuild on AI-native stacks cheaply. Data gravity and switching costs are buying incumbents time, but ClickUp's move suggests that time is measured in quarters, not years. The governance conversation caught up this week in an unexpected place. [Pope Leo XIV's 42,000-word encyclical](https://simonwillison.net/2026/May/25/encyclical-on-ai/#atom-everything) names specific failure modes including algorithmic control, surveillance capitalism, and autonomous weapons, and will directly shape EU and Latin American regulatory debates. [TechCrunch's read](https://techcrunch.com/2026/05/25/the-popes-ai-encyclical-isnt-really-about-ai/) is that the document's real target is the tech elite's capacity to reshape society outside democratic accountability, a framing that lands harder alongside [new UK research](https://www.theregister.com/off-prem/2026/05/26/big-tech-extracts-retirement-scale-wealth-from-uk-internet-users-research-shows/5246048) quantifying data extraction from consumers as equivalent in value to retirement savings. The Vatican and the empiricists arrived at the same diagnosis from opposite directions. Two structural forces will shape AI infrastructure economics over the next 90 days in ways most deployment teams are not modeling. China flooding global markets with DRAM and NAND will compress inference cluster costs faster than US export controls intended. The EU's sovereign cloud setback has paradoxically clarified the build-domestic mandate, accelerating European AI infrastructure investment independent of US hyperscalers. Security remains the open variable: even Google has no established playbook for prompt injection, model supply chain risk, or agentic authorization at production scale. A second Fortune 500 company will publicly attribute a reduction of more than 500 knowledge-worker roles directly to agentic AI systems before Q3 earnings season, making ClickUp's announcement the start of a visible series rather than an isolated case.
View originalAI Infrastructure Has a Physical Weak Spot Nobody Talks About Enough - Copper Supply Shocks
Something interesting happened this week that barely crossed into mainstream AI discussion. A strong earthquake in Chile disrupted copper ore production and pushed copper prices higher again. Chile matters because it produces roughly 24% of the world’s copper supply, and a huge part of global AI infrastructure indirectly depends on that metal. That connection is becoming impossible to ignore. Everyone talks about GPUs, compute scaling, inference costs, and power demand. But very few people talk about the raw materials underneath the entire AI stack. Copper is everywhere inside AI infrastructure: * data center power systems * transformers * cooling systems * switchgear * high-voltage cabling * backup energy systems * grid expansion * GPU interconnect infrastructure A single hyperscale AI data center can reportedly consume tens of thousands of tonnes of copper depending on scale and power architecture. At the same time, global copper supply is getting tighter: * new mines can take 15-20+ years to develop * major deposits are aging * permitting remains difficult globally * geopolitical risk keeps increasing * now even earthquakes are disrupting supply chains This is where the story becomes interesting from an AI perspective. AI demand growth is exponential. Copper supply growth is not. That mismatch is why more people are suddenly watching early-stage copper exploration companies again. One example is NovаRed Mining Inc. and its Wilmac Copper-Gold Project in British Columbia. Not because it is producing copper today - it is not. But because markets are starting to realize future AI infrastructure may require entirely new copper discoveries. Some interesting details about Wіlmac: * 16,078 hectares in BC’s Quesnel porphyry belt * located near Hudbay’s Copper Mountain Mine * soil results up to 1,125 ppm copper * interpreted intrusive centers identified * recent IP/AMT geophysics added deeper targeting data * company also pushing an AI-assisted targeting platform called MetalCore The bigger point is not "this stock goes up." The bigger point is that AI is no longer just a software story. It is becoming a materials story. And every supply disruption - whether geopolitical, regulatory, or seismic - reminds the market that physical infrastructure still matters. The AI boom may eventually depend just as much on copper supply chains as on semiconductor innovation itself. NFA.
View originalSpec: Version Control for AI Agent Intent
AI agents are getting good at writing code. That is not the hard problem anymore. The hard problem is coordination. When you have multiple agents working on the same codebase, who decides what gets built? How do two agents with conflicting opinions resolve a disagreement? How does a human stay in control without reviewing every line before it gets written? Git does not solve this. Git is brilliant at tracking what changed, when, and by whom. But it operates on code that has already been written. By the time a conflict shows up in Git, two agents have already done the work, made assumptions, and written implementations that may be fundamentally incompatible — not at the line level, but at the intent level. I wanted to solve the problem one layer up. Before the code. The Core Idea Every code file in a Spec project has a paired .spec file living right next to it. app/Http/Controllers/HomeController.php app/Http/Controllers/HomeController.php.spec The .spec file is a plain Markdown description of what the code file is supposed to do. It is the source of truth for intent. Agents do not write code directly — they write proposals against the spec. The code only gets written once every agent has explicitly agreed on what it should do. The spec is never “checked out.” It has one canonical state at any moment. Agents read it, propose changes to it, and debate those proposals. When all agents agree, the session locks, the spec is updated, and only then does an implementer generate the code. Code is always the output of consensus. Never the battleground. The Flow A typical session looks like this: An agent reads the current spec and submits a proposal with reasoning attached. Not just what they want to change, but why. A second agent reads the proposal and responds — accepting it, rejecting it with specific objections, or suggesting modifications. If they get stuck, a mediator surfaces the contradiction and helps them find common ground. The mediator has no vote and no authority — it just asks better questions. When every agent has explicitly agreed on the same spec state, the session locks. An implementer reads the locked spec and writes the code. One pass. From a fully agreed specification. This means a few things that feel unusual at first: A build is never produced from a broken or partial spec. If agents cannot agree, nothing gets built. That is a feature, not a bug — better to surface the disagreement at the intent level than to discover it six files deep in an implementation. Conflicts in Spec are semantic, not syntactic. Two agents can touch completely different parts of a spec and still be contradictory. One says the controller should cache responses for 60 seconds. The other says it should always fetch fresh data. No line conflict. Completely incompatible intent. Spec is designed to catch this before a line of code is written. Every message carries reasoning. Proposals alone are not enough. The full session log — with reasoning trails — is what keeps the human comfortable staying hands-off. The Human Role The human operates at what I call a god level. You provide the original request. You can observe at any granularity — project, session, agent, or individual message. You can intervene at any point: rewrite the spec, stop a session, override an agent, shut the whole thing down. And critically, every intervention you make becomes a lesson — captured with full provenance and fed back into future sessions so the system learns from it. The goal is not to remove the human from the loop. It is to move the human up the stack. Mission commander, not task manager. You set the intent. The agents work out the details. You intervene when they get it wrong, and the system gets smarter from each intervention. The Technical Details Spec is built in Rust. Three dependencies: serde, serde_json, and tokio. LLM calls go over raw HTTP via curl — no SDKs. The provider layer is deliberately abstract. Agents, the mediator, and the implementer all talk to the same interface. Swap the provider in config and nothing else changes. Different agents can run on different models. You can run fully local with Ollama for cost control or privacy. Agent identity is explicit. You set SPEC_AGENT_ID before running commands. Without it, Spec errors with a clear message. This is intentional — the system cannot coordinate identity automatically, and a silent fallback to hostname:pid would make consensus unreachable in practice. The lesson graph lives at: ~/.spec/lessons.json It lives outside the repo entirely. Lessons accumulate across all projects and branches. Check out an old branch and you do not lose what the system has learned. Lessons are knowledge about how your agents work, not knowledge about any particular codebase. A hook system lets you plug in your own behavior at defined lifecycle points: • post-agree: fires when a session locks • post-build: fires after code is written • pre-release: fires be
View originalBuilding a personal AI Chief of Staff on Telegram — 7 real problems, looking for advice
I've been building a personal AI assistant for the past few months — not a chatbot wrapper, but something that actually manages my workload, tracks client relationships, processes meeting transcripts, handles task management, and proactively tells me what to focus on. It lives in Telegram so I can use it from anywhere. Happy to share what's working. But I'm hitting real walls and want honest input from people who've built similar things. **What I have today (context** Moved away from multi-agent routing (too rigid for natural conversation) → one capable agent with full history.**)** **Stack:** * Python Telegram bot as the frontend * Claude (Sonnet) as the brain via API — single conversational agent with full tool access * Integrations: Notion (tasks/goals), Google Calendar, Gmail, meeting transcription tool, customer support platform, Google Chat * File-based context system: each "project" or relationship has its own markdown files (readme + activity log) that the agent reads on demand * Skills defined as markdown spec files that the agent loads per use case (morning briefing, meeting processing, email drafting, weekly review) * Conversation history kept in memory (last 20 messages per session) **What actually works:** * Natural conversation with full tool access — ask anything, agent decides which tools to use * Meeting processing: drops a transcript link, agent extracts decisions, action items, saves structured brief * Morning briefing on demand: tasks, calendar, open support tickets, suggested focus * Drafting messages for any channel with the right tone * Creating and updating tasks with natural language **7 problems I haven't solved:** **1. No memory between sessions** History is in-memory. Bot restarts = full amnesia. The agent has no idea what we discussed yesterday unless it's written in a project file. Thinking of a `hot_context.md` that gets written at session end with TTL — but feels hacky and depends on the agent being disciplined about writing it. **2. Purely reactive** Only responds when I message it. I want it to send me a morning briefing at 9am without me asking, alert me when a client relationship goes quiet, run a weekly loop-killer on Friday. The infra is there (job scheduler). The question is what format actually makes you read a proactive message vs. dismiss it as noise. **3. Can't tell if I'm avoiding something or actually blocked** I procrastinate differently by task type — technical tasks I attack immediately, tasks with human dependencies (waiting on someone, uncomfortable follow-ups) I let sit for weeks. I want the agent to detect the pattern and call me out. The challenge: how do you prompt for real accountability without the agent turning into an annoying nag? **4. No closure ritual** I'm good at creating tasks, terrible at killing them. The list grows forever because nothing forces a binary decision. Want a weekly "kill or commit" where everything open >7 days gets a date or gets deleted. Not sure if this works better as an automated message or an on-demand command. **5. Context loading blind spots** Each client/project has a markdown file the agent reads on demand. Works great when I explicitly mention a client. Falls apart when I ask "what should I focus on this week?" — the agent doesn't know to proactively check which relationships have been neglected. **6. Hosting kills the file sync** Running locally means the bot dies when my laptop closes. Moving to a VPS — but then my markdown context files live on the server, not my machine. Now every manual edit requires a push, every agent update requires a pull. Is git the right sync layer here or is there a cleaner approach? **7. Context files go stale** Client files have sections for current status, last contact, open items. The agent appends logs but doesn't maintain the top-level summary. Two months in, files are half-accurate — some sections fresh, some outdated. Is the answer agent discipline (always update on write), user discipline (manual cleanup), or periodic jobs? What's your experience with any of these?
View originalFolder structure of the AI agent - after 6 weeks
# The folder structure is not admin. It's the nervous system. When people imagine an AI agent, they picture the model, the prompts, maybe the tool calls. Almost nobody pictures the folders. That is exactly why most home-grown agents stall around month two. An agent's filesystem is where its **identity, memory, work, and history physically live**. A messy filesystem produces a confused agent — not metaphorically, literally. The model reads paths. The model picks files by name. The model writes new files based on patterns it sees in old ones. If your directory tree is chaos, every output drifts a little further from coherent. agentmia.beehiiv.com - newsletter about building agents Below is the layout I converged on after nine months and roughly four refactors. Steal the parts that fit; the principles matter more than the exact names. # The numbering convention Folders are prefixed with a two-digit number: `01_`, `02_`, `09_`, `99_`. Two reasons: 1. **Sort order is meaning.** Anything starting with `0` lives near the top. `99_` falls to the bottom. The most important directories are visually first; archives are visually last. You read the agent's brain top-to-bottom. 2. **Gaps are intentional.** I jump from `04_` to `06_`, from `09_` to `11_`. The gaps are reserved insertion points. When a new domain emerges, it slots in without renaming everything. Two folders deliberately skip the prefix: `Inbox/` and `Outbox/`. They are operational, not structural. They live above the numbered set because they are touched dozens of times a day. /mapped on desktop/ # Inbox/ — the unprocessed pile Anything dropped into the agent's world starts here. Files I want it to ingest. Screenshots. Exports from other systems. PDFs that need parsing, gmail attachments, all downloads from chrome. The rule: **nothing stays in Inbox.** A dedicated processing routine classifies, routes, and deletes. If Inbox is non-empty for more than a day, the system is failing. Treat this like a real-world physical inbox tray. The point of a tray is that it gets emptied. # Outbox/ — what the agent produced for you Every file the agent writes anywhere in the tree gets a copy here, simultaneously. When I open `Outbox/`, I see exactly what was generated this session — no spelunking through twelve subdirectories. This sounds redundant. It is not. Without it, "what did the agent do today?" becomes a hunt. With it, the answer is one click. `Outbox` is wiped during the next Inbox processing run. It is a viewing surface, not storage. # .auto-memory/ — the hot memory The single most important directory in the system. Hidden by default because you should not be editing it manually. It holds the agent's working memory: user preferences, feedback rules, entity facts (people, companies, deals), active hypotheses, project pointers, session hot context. Roughly 400–500 small markdown files, each one a single topic. **Why hidden?** Because it is the agent's hot path. It loads from here every session. If I open the folder and start manually rearranging it, I am racing the agent. Treat it like a database, not a notebook. **Why so many small files?** Because the agent grep's by topic. One monolithic memory file becomes unreadable to the model around 50 KB. Many small files are easier to load partially, easier to index, easier to expire. # 01_IDENTITY/ — who the agent is The constitutional layer. Name, role, voice rules, principle stack, visual system, behavioral defaults. This rarely changes. When it does change, everything downstream changes with it. I keep it as folder `01_` because every other folder is downstream of it. If you do not know who the agent is, you cannot know what its workflows should look like, or what it should remember, or how it should respond. # 02_MEMORY/ — governance, not data A subtle but critical distinction: `.auto-memory/` holds the *data*, `02_MEMORY/` holds the *rules about data*. In `02_MEMORY/` live the constitution, the boot protocol, the naming protocol, the decision protocol, the profile standards (what a "supplier profile" must contain, what a "customer profile" must contain), the capability map. The agent reads these documents to know *how to remember*, *how to name new files*, *how to decide what is reversible*. Without this folder, every memory write is improvised. # 03_PROJECTS/ — the active work Real work happens here. Sub-organized by goal area, then by project slug: 03_PROJECTS/areas/{goal}/{slug}/ Each project gets its own folder with a standard skeleton: [`README.md`](http://README.md), [`TASKS.md`](http://TASKS.md), [`CHANGELOG.md`](http://CHANGELOG.md), [`BRIEF.md`](http://BRIEF.md), plus working files. There is a project registry at the top that the agent reads to know what is active versus dormant versus archived. The biggest discipline issue here: **do not let projects sprawl outside their folder.** When working on Project X, every file related to Project X goes inside Proj
View originalBest architecture for seamless Bilingual TTS? (Azure / English + Korean) [D]
Hi guys, when building a language learning app (React Native/Expo frontend, Python backend) and I’ve hit a frustrating wall with Text-to-Speech. I need the app to read sentences that mix English instructions and Korean examples (e.g., "To say hello, we use the phrase 안녕하세요."). Since native pronunciation is critical for a learning app, I'm struggling to find a solution that sounds natural. I'm currently using Azure Cognitive Services, and I'm stuck between two bad options: Approach 1: The Multilingual Voice (en-US-AvaMultilingualNeural) The Good: Seamless reading, zero pauses mid-sentence. The Bad: Because it's an English-first model, the Korean comes out with a slight, robotic/Americanized accent. It doesn't sound like a true native speaker, which defeats the purpose of teaching pronunciation. And also there is some scratching and lack of smoothness when it is reading korean words. Approach 2: SSML Voice Switching (Ava for EN, SunHi for KO) The Good: Perfect English, perfect native Korean. The Bad: Switching <voice> tags mid-sentence causes Azure to pause for a fraction of a second while it unloads/loads the neural models. It completely ruins the natural flow of the audio, making it sound very disjointed. My Questions: Is there an SSML trick in Azure to pre-load voices or eliminate that micro-pause when switching voices? How do the big apps handle this? Because if I use two models for korean and english they will sound different when reading. Should I migrate away from standard Azure Speech and use the Azure OpenAI voices (alloy, nova) instead? Are they truly seamless for bilingual text? Any advice on the best tech stack or architecture for this would be massively appreciated!
View originalA CEO built his own AI agent with Claude MCP + NetSuite. It worked. Then it didn't scale.
How many of you have a prototype that demos great and then falls apart the moment real users touch it? Yeah. This is that story, except the person who built the prototype was the CEO himself. S&B Filters, a U.S. manufacturer with 700+ employees, runs its entire operation on NetSuite. Their CEO wired up Claude's MCP connector to NetSuite, wrote his own prompts, and got an internal AI assistant working for order status lookups. Legit impressive for a solo build. Then the fun part: 4–6 minute response times, a 40-page prompt holding the whole thing together, PO numbers coming in different formats from Shopify, phone, and email, and zero path to putting this in front of actual customers. He came to us basically saying, "I proved it works, now make it work for real." We didn't patch the prototype. Our team at BotsCrew rebuilt the whole stack around NetSuite as the source of truth. We built an input normalization layer that validates across formats, falls back across identifiers (Sales Order > PO > customer reference), and uses conversation context when the input is garbage. This was 80% of the engineering challenge. Then: two interfaces off one backend, an internal assistant for the support team, and customer-facing on the website. Same AI layer, different access controls. Beyond order lookups, installation guides, compatibility checks, and technical inquiries with images and videos. Dynamic knowledge base via OneDrive, updated by the client without redeployment. Results: * \~50% of support requests are fully automated * 24x faster first response * \~$140K/year in savings * \~250% ROI in Year 1 Now they're expanding into full order management, dealer identification, and personalized discounts through the same system. One prototype turned into a full AI program. If you want to read the full case study with screenshots and more technical details, I'll drop the link in the comments.
View originalBanned by OpenAI after reporting a live credential hijack. They admitted in writing my account was broken. Here are 7 months of forensic receipts and 20+ cases.
[Drive Link for Zipped Proof](https://drive.google.com/file/d/1qU_LyLY-JMhNR_bqOV1-a2RJAbplL68e/view?usp=drivesdk) I am a developer and paying long term subscriber to ChatGPT since January 2025. I build complex local first sovereign systems. My workflows are incredibly context heavy with large files spanning code, research reports, and other analysis. I do not, or rather did not as the platform has been non functional since November 2025 meanwhile customer support is auto closing tickets, admitting I am having platform issues. I do not use this platform for casual queries, as a solo developer with no formal "team" chatgpt was one of my reliable co collaboration hubs to help ensure I am maintaining proper development of said complex systems. I feed it massive codebases for systems analysis and obtaining new insights I may personally have missed. My manual code uploads and token inputs routinely exceed the model's output volume by a massive margin. I do not abuse this platform. It is actually impossible as the very features advertised under the paid subscription do not work. I am exactly the type of user this platform was built for, and I have been a continuous, paying ChatGPT Plus subscriber since January 2025. Since October 2025, my workspace has been systematically breaking and beginning November 2025 total workspace degredation. This was not an occasional glitch. Persistent memory modules stopped updating. Custom instructions were ignored by the models. Project files failed to load. Custom instructions, personalization features, connector abilities, file tool, even projects do not work. It started as a continuous degradation until total failure. OpenAI customer service even admitted as such and yet months later I've talked to nothing but bots, not only LLMs as customer service but even instances of falsely identifying as true human support. It was a state of rolling degradation across the entire paid tier, month after month. Meanwhile OpenAI freely has enhanced for businesses and enterprise tiers. I have not just rapid complained to standard support. I ran and obtained cross platform diagnostics, failure logs. I even documented and told oai customer support the exact replication steps only to be met with acknowledgement of degredation with no resolution. I handed OpenAI support a completely packaged technical breakdown of their failing infrastructure across 20 separate support tickets over a 7 month period. I did their QA work for free. And I have the receipts to prove it. I am attaching the screenshots and the exact email files to this post. In Case 06830839, OpenAI Support explicitly put this in writing: "We acknowledge that you have been experiencing persistent technical issues affecting several features of your ChatGPT subscription, including tools, memory functions, personalization settings, connectors, and project files... We also understand your concern that communication on the case stopped after you provided detailed evidence..." Read that again. They acknowledged in writing that my account was fundamentally broken. They acknowledged that their own team ghosted me after I handed them the diagnostic proof. Yet they kept charging my card every single month for a product they knew was failing. The Hijack Escalation: Two days ago, the situation escalated from a broken product to a severe security incident. I was monitoring my environment and watched my Codex rate limits drop in 10 percent chunks across 2 seperate sessions on a fresh boot of the desktop app. This happened twice inside a 10 minute window. I had zero active sessions running. There was zero usage on my end. My account token was being actively drained by an unauthorized third party exploit. I immediately opened an emergency unauthorized activity report under Case 09113391 to notify them of the hack. Their response was to totally reframe this problem as disputing fraudulent activity trying to do damage control of the situation and altering the record. The Reframe Attempts: Instead of investigating the breach, OpenAI support deliberately twisted the record. They not only deliberately reframed my security report as an "appeal for fraud." They manipulated the ticket classification to make it look like I had been flagged for fraud and was begging for an appeal, rather than a developer reporting a live exploit on their infrastructure. They ignored the active threat their own platform was exposing. They did not lock the token. They did not roll my API keys. They did absolutely nothing to secure a compromised paying user other than shift the blame. Fast forward to this morning, their automated Trust and Safety system swept the high volume traffic from the attacker, scored it as a malicious exploit originating from my account, and deactivated/banned me for "Cyber Abuse." All the while actively preventing chatgpt models from helping me try to disgnose and trace the infiltration. They locked the doors and blamed the homeowner for the
View originalInferring I/O token usage
Checked April token usage for our AI stack. Input/output ratio was roughly 125:1. Most of it came from building PerceptoAI, an intent-driven voice AI that qualifies and converts website visitors into pipeline. If I average out at Clause Sonnet 4.6 pricing, which is at $3 and $15 per million input & output tokens the total *input side cost* dominates massively. Large context windows, retrieval, memory, reasoning chains, tool calls, evaluations, retries, orchestration etc went into the AI stack. also noticed the actual user-facing response is tiny compared to the amount of computation happening underneath. What are you folks looking at for this particular ratio ?
View originalNeed expert advice to a non-coder!
My vibe-coding journey started about 8 months ago with Replit. Before that, I wasn't a developer, but I did have experience building websites with WordPress and Elementor. I was also comfortable working with third-party integrations, CRMs, and customizing/deploying code purchased from platforms like CodeCanyon and ThemeForest for clients. In many ways, I'm a non-coder who understands project management, business workflows, and systems. Using Replit, I spent roughly $3,000 building a CRM for a service-based company. It worked surprisingly well in the beginning, but as the codebase grew, I started running into the classic "last 10% takes 90% of the effort" problem. Replit began struggling with the larger codebase, introducing regressions and silently breaking existing functionality while fixing something else. Despite the challenges, I was able to build a fully functional CRM in about three months. That experience got me excited about what was possible, which led me to discover Claude Code. Over time, my workflow evolved into: **Claude Code → GitHub → Vercel** For the past four months, I've been building a much larger software product. The roadmap spans roughly two years, but development and rollout are planned in phases, so it's not a two-year wait before launch. The results have been remarkable. It's honestly mind-blowing what someone without a traditional software engineering background can build today. Current stack: * Next.js (Monorepo/Turborepo) * Supabase + MCP * Claude Code * GitHub + mcp * Vercel +mcp * Context7 * Playwright for testing What I'd love to learn from experienced engineers and builders is: * How do you keep a rapidly growing codebase maintainable? * What practices help prevent technical debt from accumulating? * What tools, workflows, or guardrails should I implement early? * What are the biggest mistakes AI-assisted builders make as projects scale? * How would you structure engineering processes if you were starting today? Any advice, resources, or lessons learned would be greatly appreciated.
View originalTested 4 AI video generation MCPs in claude for making short clips
Hello everyone, recently I saw a lot of AI, especially GenAI, MCPs being launched. Out of the ones that I had an opportunity to test there were 4 I could consider worth trying out. **Higgsfield AI mcp.** the model coverage and claude comping up with ready scenarios is the main reason. one connection gets you sora 2, veo 3.1, kling, seedance 1.5 pro, nano banana, soul id. I've been able to get some gems using this. The problem is that if Claude doesn't understand you properly it can come up with something absolutely random or choose the most expensive models. **kubeez mcp.** also goes wide on models, similar pitch to the previous: image, video, music, tts in one place. i used it for batch work where i needed audio + visuals from the same chat. **runway mcp.** narrower scope, deeper on gen-4 specifically, which is why I don't really use it. the keyframe and reference image handling is solid in comparison, others tend to lose it. **elevenlabs mcp.** not video but i'm including it because every video workflow needs voiceover and this is the one that actually works end-to-end. claude writes the script, picks the voice, generates the audio. pairs well with any of the above. you will need it very frequently if you don't know/can't handle proper audio generation using higgsfield or runway. stack i settled on: higgsfield for the visuals, elevenlabs for better voiceover. what video mcps am i missing? happy to hear opinions
View originalCreated an on-device ML based photo organizing app - as a non-coder
I have a background in software product management but not coding. Love photography and started wondering if I can start leveraging some of the dedicated AI processing power on modern devices for photo library management. Used Claude Code to do this "use AI to build AI thing". Had it do research + code + optimization on the entire stack. I designed the features, UX and optimization goals. This is the second release of the app and I'm reaching 100+ photos/second on my iPhone 17PM, the previous version was 10+ photos/second. The new techniques turned out to be much more accurate as well. Note on tech: v1 relied on Apple Vision engine for quality + CLIP for subjects. Turned out if I just use CLIP for both it's much much faster. Learned to vibe code from scratch on this journey and I try to keep up with the best practices like skills & subagents. (What I notice is Anthropic tends to Sherlock a lot of stuff that third parties create, which is... convenient? For us users anyway) Used a MCP for Draw Things to have Claude Code generate the subject category photos. The MCP for Figma turned out to be pretty dissapointing, maybe I just wasn't using it right. Design got a lot better with Opus 4.6/4.7 + the frontend design skill. iOS dev seems to randomly eat up huge chunks of hard drive space, and Claude Code is not that great at culling the temp files etc even after I've built a /cleanup skill to explicitly do this. Anyway, enough ranting. Below is how the app works --- Step 1) You select up to three different subjects (8 built-in plus whatever keyword phrase you want, it understands relationship between subjects too such as "man walking dog"), fine-tune up to 7 quality parameters (or use a Technical / Aesthetic slider to move all 7 at once), and balance between subject or quality focused sort. Step 2) The photos that match your criteria well are surfaced to the top, use swiping actions to Pick or Discard them. Then you can save to album / share the picked ones or bulk delete the discarded ones. Different sort profile can be Bookmarked. There's also a bonus "Taste" profile that auto-learns from your picks and discards, which you can use or ignore (I'm continuing to make it work better, but obviously auto-learning user taste is hard). At the picking stage if you don't want to go through each photo one by one just use Autopick and they get divided to different buckets by score tiers. All on-device processing, completely private. \--- Feedback would be very welcome on either the app or my process. Feel free to DM me for a lifetime free premium code. Video demo: [https://www.tiktok.com/@spectrasort/video/7643116905615609102](https://www.tiktok.com/@spectrasort/video/7643116905615609102) App store download: [https://apps.apple.com/us/app/spectrasort/id6757512134](https://apps.apple.com/us/app/spectrasort/id6757512134) \--- Text above is 0% AI generated :)
View originalAfter 6 months of running AI agents in production I think the framework you pick barely matters. The thing that kills them is something else.
Going to get downvoted for this but here we go. I've been running about 30 agents in production for paying customers for the last 6 months and I'm convinced the framework debate is mostly a distraction. LangChain, CrewAI, AutoGen, OpenAI Agents SDK. Pick whichever one your team already knows. It doesn't matter as much as you think. What actually decides whether your agent works in production is something almost nobody talks about on this sub, and it isn't in the framework. Here's what I've seen kill more agents than every framework bug combined. The agent gets stuck in a loop. It calls the same tool 200 times in 4 minutes because something downstream returned ambiguous data and the LLM decided to retry forever. Your OpenAI bill goes from $3 a day to $400 in one afternoon. By the time you notice you've burned a grand. You can't even tell which agent did it because there's no audit trail. Your VPS reboots overnight for kernel patches. Every agent that was mid-task loses everything. Tomorrow morning the support agent has no memory of yesterday's tickets, the research crew has forgotten what they were investigating, the pipeline agent restarts from scratch. None of these are framework problems. They're memory and state problems. A customer complains the agent gave them wrong info three days ago. You go to debug. There's no record of what the agent saw, what it decided, or which tool calls it made. The framework didn't log that because frameworks aren't observability tools. You shrug and refund. You scaled to 15 agents working together. Two of them have conflicting beliefs about the same customer because their memory isn't shared. The customer gets two different answers in the same conversation depending on which agent replies first. You've been around enough times to realize the part you actually need isn't in the framework at all. What I think the real stack is. The framework just orchestrates LLM calls. Use whatever your team likes. It's the cheap layer. A persistent memory layer that survives crashes, restarts, and redeploys, so the agent has actual continuity. This is the layer that decides whether your agent is a toy or a product. Loop detection at the runtime layer, not bolted on as a wrapper around the framework. Something that catches your agent making the same call too many times in a row and stops it before the bill explodes. An audit trail of every decision the agent made, with a hash chain so you can prove later what happened when the customer pushes back. Screenshots and logs aren't enough when ten thousand dollars is on the line. Shared memory between agents in the same team so they're not having different conversations about the same customer. Cost tracking per agent so you actually know which one ran away with your budget. When I look at what makes the agents that survive production look different from the ones that died, it's never that they picked the right framework. It's that they had this layer underneath, either built carefully in-house or borrowed from somewhere. Full disclosure I'm building one of these tools. There are others. Mem0 and Zep and Letta in the memory space. Helicone and LangSmith in the observability space. Mix and match. Use one or build your own. Just please stop arguing about whether LangChain or CrewAI is better when the thing eating your production agents has nothing to do with either of them. What's been your worst production agent failure? Curious what other people have actually hit. I built a free tool that aims to solve most of this issue, what do you think?
View originalRepurposed my old work ThinkPad as a dedicated personal AI workstation — looking for ideas from people who’ve done something similar
Apologies if formatting comes out weird- I am on mobile. My old employer let me keep a ThinkPad when I left. Rather than let it collect dust, I’m turning it into a dedicated personal AI environment — wiping it, installing Linux, and using it specifically for two things: life admin automation and building personal software tools. The core setup I’m planning: • Claude Desktop with MCP servers running persistently as Docker services • Tailscale so I can access everything securely from my phone when I’m not home • Open WebUI as a mobile-friendly chat interface • Code-server (VS Code in the browser) so I can actually write and run code from my phone • A dedicated Gmail account that acts as the “identity” for this Claude instance — wired into Google Drive, Calendar, and potentially an email-triggered agent pipeline • A local RAG system for personal documents — contracts, notes, research — so Claude has persistent context about my life The idea is that this becomes an ambient personal intelligence layer — always on, always up to date on my documents and projects, accessible from anywhere via Tailscale. Not a cloud subscription, not shared with anything work-related. Fully mine. On the software side, I’m planning to use Claude Code + Lovable to build local-first personal apps for my own pain points — things that don’t exist in the market the way I want them, or where I don’t want my data in someone else’s cloud. The ThinkPad is the runtime; Lovable builds the frontend, Claude Code builds the backend, and everything talks over a local API. What I’m curious about from people who’ve built something like this: • What MCP servers have actually been worth setting up vs. overhyped? • Has anyone built a reliable file-drop-to-RAG pipeline that actually stays current? • Is Open WebUI the right mobile interface or is there something better now? • Anyone using a dedicated “agent identity” email account — what workflows have you actually automated? • Claude Code + local backend: what’s your stack? FastAPI? SQLite? Something else? • Any gotchas with running Claude Desktop persistently on Linux? Genuinely trying to build something useful here rather than a tech demo. Would love to hear from people who’ve gone down this road.
View originalRon537/DPlex: Terminal multiplexer for AI-assisted development — manage Copilot CLI, Claude Code, and regular shells across projects in one window.
Hey everyone, Over the last few months, I’ve been heavily integrating terminal-based AI agents like claude-code and github-copilot-cli into my daily development workflow. They are incredibly powerful, but running multiple concurrent sessions across complex codebases quickly hits a major roadblock: **workspace fragmentation**. If you close your terminal, update your IDE, or reboot, your entire layout of splits, tabs, and active agent states vanishes. Trying to keep parallel feature branches, code reviews, and debugging sessions organized side-by-side gets messy fast. To solve this, I built **DPlex**—an open-source (MIT), local desktop workspace and terminal multiplexer optimized specifically for structured AI workflows. 💻 **Landing Page:** [https://ron537.github.io/DPlex/](https://ron537.github.io/DPlex/) 📦 **GitHub Repo:** [https://github.com/Ron537/DPlex](https://github.com/Ron537/DPlex) **What it does:** **\* Absolute Layout & Tab Persistence:** Quit the app, restart your machine, or let it crash—DPlex automatically serializes your exact environment to disk. Every single AI session tab, pane split, and active process restores perfectly back to where you left it. **\* Deep Git Worktree Integration:** It features a project-aware sidebar designed around concurrent development. You can spin up side-by-side AI sessions in separate Git worktrees instantly, keeping your main branch clean while agents work on different features. **\* Unified Project Organization:** Instead of loose terminal windows scattered across your desktop, DPlex groups your workspace by project. Switch between entirely different project environments with a single click. **\* Zero Telemetry & 100% Local:** No cloud wrappers, no analytics, and zero external tracking. The source is completely grep-able and runs entirely on your local machine. **Tech Stack & Architecture:** It’s built to be modular. Adding support for a new AI agent provider is as simple as implementing a single pluggable TypeScript interface—no core forks required. It's available for macOS (Intel/Silicon), Windows, and Linux. I’d love to get your feedback on the layout workflow, feature requests, or any architectural thoughts. If you find it useful, please consider leaving a ⭐ on GitHub to help other developers discover it!
View originalYes, Stack AI offers a free tier. Pricing found: $0, $0 /month, $0, $0, $0
Key features include: Agentic Workflows, Go from time-consuming process to working agent in minutes, Deploy Anywhere, Multi-tenant, VPC, on-premise, Security and Governance, Feature controls, audit logs, and more, Human In The Loop, LLM Agnostic.
Stack AI is commonly used for: 75+ AI Agents Transforming Enterprises.
Stack AI integrates with: Salesforce, Slack, Jira, Trello, Zendesk, HubSpot, Google Workspace, Microsoft Teams.
Based on user reviews and social mentions, the most common pain points are: token usage, cost tracking, openai bill, API costs.
Noam Shazeer
CEO at Character.AI
1 mention
Based on 144 social mentions analyzed, 10% of sentiment is positive, 87% neutral, and 3% negative.