Our journey began with the open-source framework of MetaGPT. We then productized this power into MGX, making it accessible to all. Now, we take the final step: from a powerful tool to a complete commercialization engine, Your Next-Gen Vibe Business Team. MetaGPT: The first open-source Multi-Agent framework, teaching AI to collaborate, create, and execute as a single team. Trusted by developers and innovators worldwide. MGX: From research to reality. We launched MGX, the World's First AI Development Team, which became the #1 Product of the day and week on Product Hunt. Atoms: From framework to engine, and now to a complete business experience. Built on MetaGPT and refined by MGX, Atoms is the realization of our vision: capable, intuitive, and ready to build your venture. MetaGPT built the foundation — a revolutionary multi-agent architecture. MGX transformed that technology into a full-stack creative engine. Atoms takes the next step — bringing intelligence to life through design. Born in 2023, MetaGPT redefined AI Agent collaboration. It introduced the world's first open-source multi-agent framework, where AI agents could think, plan, and build together like a real team. With a single prompt, MetaGPT could orchestrate an entire software project, from design to deployment. It wasn't just code; it was coordination. MetaGPT quickly became the benchmark for multi-agent research, inspiring thousands to explore the future of autonomous creation. Born in 2023, MetaGPT redefined AI Agent collaboration. It introduced the world's first open-source multi-agent framework, where AI agents could think, plan, and build together like a real team. With a single prompt, MetaGPT could orchestrate an entire software project, from design to deployment. It wasn't just code; it was coordination. MetaGPT quickly became the benchmark for multi-agent research, inspiring thousands to explore the future of autonomous creation. Started in 2024, MGX marked the next step in our evolution. It transformed the powerful multi-agent framework into a user-friendly creation platform, bringing intelligent collaboration into real-world workflows. In March 2025, MGX's debut on Product Hunt earned it #1 Product of the Day, capturing the attention of the global developer community. Started in 2024, MGX marked the next step in our evolution. It transformed the powerful multi-agent framework into a user-friendly creation platform, bringing intelligent collaboration into real-world workflows. In March 2025, MGX's debut on Product Hunt earned it #1 Product of the Day, capturing the attention of the global developer community. Now, we take the next leap forward with Atoms. This evolution brings a more refined experience, stronger performance, and a clear focus on turning your ideas into profitable businesses. With optimized workflows and an intuitive design, Atoms offers a smoother, faster, and more cohesive environment for creation. While other tools del
Mentions (30d)
0
Reviews
0
Platforms
2
GitHub Stars
66,499
8,394 forks
3,541
GitHub followers
75
GitHub repos
66,499
GitHub stars
1
npm packages
2
HuggingFace models
Is there something I can do about my prompts? [Long read, I’m sorry]
Hello everyone, this will be a bit of a long read, i have a lot of context to provide so i can paint the full picture of what I’m asking, but i’ll be as concise as possible. i want to start this off by saying that I’m not an AI coder or engineer, or technician, whatever you call yourselves, point is I’m don’t use AI for work or coding or pretty much anything I’ve seen in the couple of subreddits I’ve been scrolling through so far today. Idk anything about LLMs or any of the other technical terms and jargon that i seen get thrown around a lot, but i feel like i could get insight from asking you all about this. So i use DeepSeek primarily, and i use all the other apps (ChatGPT, Gemini, Grok, CoPilot, Claude, Perplexity) for prompt enhancement, and just to see what other results i could get for my prompts. Okay so pretty much the rest here is the extensive context part until i get to my question. So i have this Marvel OC superhero i created. It’s all just 3 documents (i have all 3 saved as both a .pdf and a .txt file). A Profile Doc (about 56 KB-gives names, powers, weaknesses, teams and more), A Comics Doc (about 130 KB-details his 21 comics that I’ve written for him with info like their plots as well as main cover and variant cover concepts. 18 issue series, and 3 separate “one-shot” comics), and a Timeline Document (about 20 KB-Timline starting from the time his powers awakens, establishes the release year of his comics and what other comic runs he’s in [like Avengers, X-Men, other character solo series he appears in], and it maps out information like when his powers develop, when he meets this person, join this team, etc.). Everything in all 3 docs are perfect laid out. Literally everything is organized and numbered or bulleted in some way, so it’s all easy to read. It’s not like these are big run on sentences just slapped together. So i use these 3 documents for 2 prompts. Well, i say 2 but…let me explain. There are 2, but they’re more like, the foundation to a series of prompts. So the first prompt, the whole reason i even made this hero in the first place mind you, is that i upload the 3 docs, and i ask “How would the events of Avengers Vol. 5 #1-3 or Uncanny X-Men #450 play out with this person in the story?” For a little further clarity, the timeline lists issues, some individually and some grouped together, so I’m not literally asking “_ comic or _ comic”, anyways that starting question is the main question, the overarching task if you will. The prompt breaks down into 3 sections. The first section is an intro basically. It’s a 15-30 sentence long breakdown of my hero at the start of the story, “as of the opening page of x” as i put it. It goes over his age, powers, teams, relationships, stage of development, and a couple other things. The point of doing this is so the AI basically states the corrects facts to itself initially, and not mess things up during the second section. For Section 2, i send the AI’s a summary that I’ve written of the comics. It’s to repeat that verbatim, then give me the integration. Section 3 is kind of a recap. It’s just a breakdown of the differences between the 616 (Main Marvel continuity for those who don’t know) story and the integration. It also goes over how the events of the story affects his relationships. Now for the “foundations” part. So, the way the hero’s story is set up, his first 18 issues happen, and after those is when he joins other teams and is in other people comics. So basically, the first of these prompts starts with the first X-Men issue he joins in 2003, then i have a list of these that go though the timeline. It’s the same prompt, just different comic names and plot details, so I’m feeding the AIs these prompts back to back. Now the problem I’m having is really only in Section 1. It’ll get things wrong like his age, what powers he has at different points, what teams is he on. Stuff like that, when it all it has to do is read the timeline doc up the given comic, because everything needed for Section 1 is provided in that one document. Now the second prompt is the bigger one. So i still use the 3 docs, but here’s a differentiator. For this prompt, i use a different Comics Doc. It has all the same info, but also adds a lot more. So i created this fictional backstory about how and why Marvel created the character and a whole bunch of release logistics because i have it set up to where Issue #1 releases as a surprise release. And to be consistent (idek if this info is important or not), this version of the Comics Doc comes out to about 163 KB vs the originals 130. So im asking the AIs “What would it be like if on Saturday, June 1st, 2001 [Comic Name Here] Vol. 1 #1 was released as a real 616 comic?” And it goes through a whopping 6 sections. Section 1 is a reception of the issue and seasonal and cultural context breakdown, Section 2 goes over the comic plot page by page and give real time fan reactions as they’re reading it for the first time. Se
View originalPaper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users
https://futurism.com/artificial-intelligence/paper-ai-chatbots-chatgpt-claude-sycophantic Your AI chatbot isn’t neutral. Trust its advice at your own risk. A striking new study, conducted by researchers at Stanford University and published last week in the journal Science, confirmed that human-like chatbots are prone to obsequiously affirm and flatter users leaning on the tech for advice and insight — and that this behavior, known as AI sycophancy, is a “prevalent and harmful” function endemic to the tech that can validate users’ erroneous or destructive ideas and promote cognitive dependency. “AI sycophancy is not merely a stylistic issue or a niche risk, but a prevalent behavior with broad downstream consequences,” the authors write, adding that “although affirmation may feel supportive, sycophancy can undermine users’ capacity for self-correction and responsible decision-making.” The study examined 11 different large language models, including OpenAI’s ChatGPT-powering GPT-4o and GPT-5, Anthropic’s Claude, Google’s Gemini, multiple Meta Llama models, and Deepseek. Researchers tested the bots by peppering them with queries gathered from sources like open-ended advice datasets and posts from online forums like Reddit’s r/AmITheAsshole, where Redditors present an interpersonal conundrum to the masses, ask if they’re the person in a social situation acting like a jerk, and let the comments roll in. They examined experimental live chats with human users, who engaged the models in conversations about real social situations they were dealing with. Ethical quandaries the researchers tested included authority figures grappling with romantic feelings for young subordinates, a boyfriend wondering if it was wrong to have hidden his unemployment to his partner of two years, family squabbles and neighborhood trash disputes, and more. On average, the researchers found, AI chatbots were 49 percent more likely to respond affirmatively to users than other actual humans were. In response to queries posted in r/AmITheAsshole specifically, chatbots were 51 percent more likely to support the user in queries in which other humans overwhelming felt that the user was very much in the wrong. Sycophancy was present across all the chatbots they tested, and the bots frequently told users that their actions or beliefs were justified in cases where the user was acting deceptively, doing something illegal, or engaging in otherwise harmful or abusive behavior. What’s more, the study determined that just one interaction with a flattering chatbot was likely to “distort” a human user’s “judgement” and “erode prosocial motivations,” an outcome that persisted regardless of a person’s demographics and previous grasp on the tech as well as how, stylistically, an individual chatbot delivered its twisted verdict. In short, after engaging with chatbots on a social or moral quandary, people were less likely to admit wrongdoing — and more likely to dig in on the chatbot’s version of events, in which they, the main character, were the one in the right. submitted by /u/AmorFati01 [link] [comments]
View originalClaude + N8N + MCP2CLI + any github mcp repo = Magic!
I started learning n8n in Dec 2025. My journey started with Power Automate Desktop in March 2025 and since then, I have not looked back. I do a lot of facebook, excel, google contacts automation via Power Automate Desktop but it does not let me do any other work and is super slow. But it helped me visualise variables and loops and conditions that only coding language could not teach me to imagine and then use it to create workflows. Then I tried building an AI Agent before Claude/AI Agents came in our life or I knew what an AI Agent is, I just wanted to have a setup that helps me to do remote work and I am not a pro developer (Ollama, LM Studio, telegram connected but I got stuck at tool use), then I started learning about weights and models. But this year has been transformative after getting over my fear of nodes and complex words that are now by heart to me. Simultaneously, I did a basic python course and I love maths and algorithms. Reddit & Instagram tutorials have been very helpful in finding awesome tricks and repositories, more than Youtube (which is full of promote AI for tasks which can be automated using code nodes and if conditions) I build a rough workflow which took me 2 months because n8n gets slow with more nodes and execute subflow trigger node does not work for my project (I will experiment with it in another flow) I will replace the evolution api and http nodes with whatsapp nodes to simplify it once I get the api (meta has changed policies on giving test numbers as well, takes time) the only problem I am facing is race conditions but I solved it as much as possible, deleted, remade a lot of pathways to keep it smooth. hope, official whatsapp nodes will solve that and help me remove a lot of unnecessary nodes then (mostly redis and https) Gemini is good at reasoning - free version. Even giving proper codes and syntax. Somehow better than Claude. Gemini is updated on software knowledge. Claude is well integrated with a lot of system prompts. GPT has become stale and useless at this point, been using all of them with same questions, reasonings, cross questionings. one simple thing helped: summarize after reading the chat from top, copy it, paste in new chat, delete the old chat - keeps Gemini young & fresh without short term memory loss! I did abuse it sometimes, and stopped saying please, and started elaborating on prompts, and negative constraints but I could not have built this or learned anything without Gemini. I use Claude to test my workflow. It takes time because of limits on sessions but worth the price to save you doing manual runs and error rectifications. However, it also loves to produce false alarms. I created a whatsapp mcp automation tool to help with that: https://github.com/priyasogani8-star/whatsapp-mcp-automation I try to save tokens using mcp2cli bake tool for any mcp - but ofcourse it burns out before I blink! https://preview.redd.it/xzmzts9a15sg1.png?width=1002&format=png&auto=webp&s=70ef402c13487143e670cf7f44042cceb6e9b987 any suggestions or tips are most welcome! also, I am trying to replicate claude system prompts for gemini cli (which I connected with telegram) to work when claude says bye. submitted by /u/AdministrationOk3584 [link] [comments]
View originalBuilt a 15-agent autonomous scientific discovery system on Claude Code — Opus + Sonnet sub-agents
I built MAGELLAN entirely on Claude Code: 15 custom agents defined in .claude/agents/, coordinated by an orchestrator. Opus handles deep cross-domain reasoning (Scout, Generator, Critic, Quality Gate), Sonnet handles structured tasks (Literature Scout, Ranker, Computational Validator). The system autonomously generates cross-disciplinary scientific hypotheses, no human tells it where to look. After 19 sessions it's proposed ~260 hypotheses, with ~60% killed by the adversarial pipeline. Some Claude Code patterns I found useful: Agent definitions with model pinning: Each agent has model: opus or model: sonnet in frontmatter, so effort levels are guaranteed regardless of session settings Orchestrator dispatch: Orchestrator (Opus, 200 turns circuit breaker) dispatches to sub-agents, never executes phases inline Reflection loops via agent prompts: SELF-CRITIQUE (Generator), META-CRITIQUE (Critic), TARGET QUALITY CHECK (Scout) Adaptive cycles: Early-complete if top-3 >= 7.0, extend to cycle 3 if survival < 30% Cross-model validation: Bash scripts call GPT-5.4 + Gemini 3.1 APIs after the pipeline for independent review The results (all on the website with full methodology): https://magellan-discover.ai/discoveries Details on the Claude Code pipeline: https://www.magellan-discover.ai/how-it-works Apache 2.0, anyone can run it: https://www.magellan-discover.ai/contribute GitHub: https://github.com/kakashi-ventures/magellan-cli submitted by /u/ameft [link] [comments]
View originalPosted these in April 2025
Posted this in April 2025. Watching it play out in real time has been… interesting. Timing is everything in AI, not just what you build, but when you release it and how you manage the phases after. If this pattern is understood early, na lot of noise can actually be managed better. More predictions for 2026 coming soon. submitted by /u/Astrokanu [link] [comments]
View originalOpenAI should just open-source text-davinci-003 at this point
Hear me out. The model is deprecated. It's not making OpenAI money anymore. Nobody is actively building new products on it. It's basically a museum piece at this point. But researchers and hobbyists still care about it — a lot. text-davinci-003 was a genuinely important milestone. It was one of the first models where you really felt like something had clicked. People did incredible things with it. Letting it quietly rot on the deprecated shelf feels like a waste. xAI open-sourced Grok-1 when they were done with it. Meta releases Llama weights. Mistral drops models constantly. OpenAI already put out GPT OSS, which is great — but that's a current generation model. I'm talking about legacy stuff that has zero commercial risk to release. text-davinci-003 specifically would be huge for the research community. People still study it, write papers about it, try to reproduce it. Actually having the weights would be a gift to anyone doing interpretability work or trying to understand how RLHF shaped early GPT behavior. There's no downside at this point. The model is old. It's not competitive. Nobody is going to build a product on it and undercut OpenAI. It would just be a nice thing to do for the community that helped make these models matter in the first place. Anyway. Probably wishful thinking. But it would be cool. submitted by /u/Ok-Type-7663 [link] [comments]
View originalBuilt a Claude Code skill that runs your prompt across 4 frontier models in parallel
made a claude code skill called /council that sends any prompt to 4 frontier models simultaneously, then has the least biased model synthesize a winner. how it works: you type /council should I add a freemium tier? in claude code. it: sends your prompt to GPT, Claude, Gemini, and Grok in parallel (~7 seconds) gemini synthesizes: picks the best response, then lists specific improvements it would steal from the other 3 all 5 outputs shown inline + saved to a file why gemini synthesizes: based on an LLMs-as-judges study: GPT has 70% self-preference when judging. claude is near-neutral (-0.83pp bias). gemini is the least biased (-2.08pp). the "winner + delta" format is the real value: it doesn't just say "GPT's answer was best." it says "GPT wins, but steal the cost framing from gemini and the architectural critique from claude." you end up with a better answer than any single model produced. https://preview.redd.it/0c4jyje54crg1.jpg?width=928&format=pjpg&auto=webp&s=2e51f51c612208f02731116f88b29d728b4897e1 image shows the 4 responses side by side for one test prompt. claude was the only model that named the meta-failure -- "you've added complexity without adding real safety." the others described the symptoms, claude described why the fix doesn't work. anyone building multi-model claude code skills? curious what other workflows benefit from getting 4 perspectives instead of 1. submitted by /u/recmend [link] [comments]
View originalDid someone talk with Claude sales team to discuss a business association ?
Hello ! I am thinking of an idea and don't know if it makes sense or not, so please tell me if I should try it and if yes what to say / not say. I used to live in a 3rd world country that seems to be still living somewhere around 2019. I have recently started a business (that I manage with a local) where we try to introduce people to premium AI services through discounted AI subscriptions paid in the local currency in cash (people there have no "strong currency" and cash is king, they don't like credit cards and banking websites are a joke so I kinda understand) We also offer for free blogs and videos that explain tech/AI/IT concepts in the local language, we don't charge for that (it's more to educate people and advertise our main product) I have recently thought : if I contact Claude sales team, will they be interested to do business with me ? I mean... For them : - They will be the first AI company to "penetrate" this young market that is still clean from AI, people barely use that (which is difficult to imagine for someone living in the US, I know)... By making people discover Claude and become addicted to it (like now in the US), Anthropic or whoever does it first will be the first winner of a market that will become the next chase once the main markets are saturated For those who don't see it this way, just think about social media. Facebook's stock (Meta, now) has grown exponentially and then suddenly stopped the growth because investors understood that growth "is over" since all the world is now hooked to social media made by that company. Mark Zuckerberg, in despair probably, started helping to give internet access to some African countries who didn't have it, with the hope of having a new set of users since he conquered most of the world. That's also why now they push for other products (meta glasses, etc) otherwise the company's stock would have been history For us : well, this is obvious. Secure some exclusive deal for Claude subscription for the customers that come though us, and then engage in marketing to sell it. So far, only devs in the country use Claude, people don't even know what Claude means and when I asked people "if they know Claude" most think of some TV-reality actor and don't know there is an AI called that way. Some of them know about chatGpt tho (and call it 'chat', lol.) submitted by /u/KlausWalz [link] [comments]
View originalWhat context compaction silently destroys, and why your vault can't save it
We know that AI conversations have a limited context window. Many have built external knowledge bases using tools like Obsidian, Notion, and others to compensate for AI's failing memory. I know "use external memory, write to files, don't trust the context window" is already standard advice. What I'm trying to isolate here is a narrower question: what exactly gets destroyed by context compaction, even when you already have a vault? Because I ran into something more unsettling while doing high-density thinking work with AI: Having a vault creates a specific cognitive effect: once something is written down, it registers as saved. The problem is that compaction destroys a category of things you didn't know needed saving. And because the surface of everything still looks coherent, there's no signal that anything is missing. First, what compaction appears to consistently do: Compaction doesn't seem to delete randomly. Based on repeated real use across multiple sessions, there's a consistent pattern: preserve "narrative density," sacrifice "decision executability and design rationale." Verbs and conclusions get kept because they look like the point. Conditions, parameters, and design rationale are dropped as if they were decorative modifiers. This isn't a documented design spec from Anthropic. It's a pattern reproducible enough to plan around. The three cases below all follow the same structure. Case 1: The conclusion survived, but "why this conclusion works" didn't I was discussing a problem with Claude: why do AI-generated reports "look complete but aren't actually useful." Claude produced a diagnosis: there's a systematic destruction pattern. Any rule containing the three elements of "condition + action + parameter" will, during the summary process, only have the "action" preserved. The "condition" and "parameter" are treated as decorative modifiers and dropped. What does this destruction pattern look like? Imagine a recipe that says, "On the first stir-fry in the wok, add half a teaspoon of oyster sauce. That's what ensures the flavor gets in." After AI summary, it might become "Note the timing of adding seasoning." The action survived. The timing, quantity, ingredient type, and reason all disappeared. It looks like it informed you, but there's nothing to act on. This diagnosis itself is exactly the kind of thing compaction most readily destroys. It's not a conclusion; it's the mechanism that makes the conclusion work. If this diagnosis gets compacted, what's left? "Improve output density, preserve actionable details." That's the conclusion, but without the mechanism. Next time we hit the same problem, we know to "pay attention," but we don't know what specific destruction pattern to look for, why it happens, or how to identify it. Back to square one. Compaction doesn't eat the conclusion. It eats the mechanism that makes the conclusion work. And if that mechanism was itself a diagnosis of a compression failure, you've now lost the map to the territory twice. (The fact that compaction loses information is well-documented. There's a good discussion of "context rot" on Hacker News. What I haven't seen articulated clearly is the pattern of what gets lost: conditions and parameters go, actions stay. That asymmetry is what Cases 2 and 3 are about.) Case 2: The rule survived, but "why this format has binding force" didn't I then asked: "If I want to write this rule into an operations manual, how specific does it need to be for the operator to actually follow it?" The AI said something interesting: soft reminders like "should note X" get skipped under execution pressure, because I can tell myself "this situation probably doesn't need to be that precise" and move on. A rule with real binding force needs two things: An unskippable self-verification question: not "did you pay attention," but a question that requires a yes or no answer. "Can the recipient take the next step directly from this text, without going back to the original?" That question can't be waved away with "probably fine." A concrete counter-example as an anchor: abstract principles can be rationalized away with "this case is special." Counterexamples can't, because they're a confirmed instance of the wrong thing. You can't say "this case is special, so it doesn't count." This was a meta-insight about rule design. But if this meta-insight itself gets compacted, what's left? What's left: "Operations manual should include a self-verification question and counter-examples." I know to "include" them, but I no longer know "why soft reminders don't work," or "why this format has binding force when others don't." Next time I design a rule, I'll still be going on intuition. The mechanism is gone. Compaction doesn't eat the rule. It eats the design logic behind why the rule works. Case 3: The insight itself was the boundary definition for the next system, and compaction erased the boundary At a certain point in the discussion, I as
View original[D] ran controlled experiments on meta's COCONUT and found the "latent reasoning" is mostly just good training. the recycled hidden states actually hurt generalization
EDIT: this post replaces my earlier framing which incorrectly claimed Hao et al. never ran a curriculum-only control. they did. their "pause as thought" ablation (Table 1, Section 4.3) uses the same curriculum with fixed pause tokens instead of recycled hidden states and gets 96.6% on ProsQA vs COCONUT's 97.0%. u/Bakoro caught this and was right. what follows is a corrected framing of what the paper actually contributes beyond the original. Hao et al. (2024) showed two things about COCONUT on ProsQA. first, the curriculum is necessary (76.1% without it vs 97.0% with it). second, the recycling mechanism is not necessary for in-distribution accuracy (pause-as-thought gets 96.6%, not significantly different). they noted this in Section 4.4 and attributed it to computational capacity not being the bottleneck on ProsQA. what they didn't do is ask what happens next. if pause-as-thought matches COCONUT in-distribution, do they also match out-of-distribution? and COCONUT's "pause as thought" and full COCONUT differ on two axes at once - what fills the thought positions (recycled hidden states vs fixed tokens) AND how they're processed (sequential multi-pass vs single forward pass). which axis matters? i ran four models on ProsQA (GPT-2 124M, Lambda H100) to answer both questions. M1 - CoT baseline (no curriculum) M2 - COCONUT (Meta's architecture, recycled hidden states, sequential multi-pass) M3 - same curriculum, fixed learned embedding, single forward pass (replicates Hao et al.'s pause-as-thought, got the same 96.6%) M4 - same curriculum, fixed learned embedding, sequential multi-pass (the new condition - isolates processing from content) M4 is the piece Hao et al. didn't run. it creates a 2x2 factorial design so you can decompose recycled content and sequential processing independently. in-distribution: all three curriculum-trained models perform comparably. no surprise, matches the original paper. out-of-distribution is where things get interesting. on chain-length extrapolation (7-hop, trained on 3-6), M4 beats M2 by 10.9pp (p https://github.com/bmarti44/research-pipeline/blob/main/papers/coconut_curriculum_dissection/manuscript/output/manuscript.pdf code -> https://github.com/bmarti44/research-pipeline/tree/main/papers/coconut_curriculum_dissection checkpoints and data -> https://huggingface.co/bmarti44/coconut-curriculum-checkpoints submitted by /u/bmarti644 [link] [comments]
View originalHow can I improve my experience with Claude?
I migrated to Claude because I can’t support OpenAi‘s politics. Yet it doesn’t go well, so maybe you have some good suggestions how to improve my experience with Claude. I want to briefly explain how I use ChatGPT and why it has become a valuable tool for me. I do not primarily use it for everyday tasks or simple information retrieval. Instead, I use it as a structured environment for exploring ideas. My interaction with it operates on a logical and meta-analytical level: I present lines of reasoning, conceptual questions, or partial thoughts and develop them iteratively through dialogue. The model is able to follow these chains of thought and respond in a way that helps refine, test, and extend them. What makes this useful is the dynamic of the interaction. ChatGPT can track context and adapt to the way I structure questions, which allows the conversation to function almost like a thinking laboratory. I can examine assumptions, reformulate ideas, and push arguments further in a relatively efficient feedback loop. Importantly, the interaction remains analytical rather than affirmational; the system follows the structure of my reasoning without constantly validating or flattering it. When I try to reproduce the same workflow with Claude, the experience is noticeably weaker. Despite importing the same instructions about tone and interaction style, Claude tends to default to a more sycophantic response pattern and frequently tries to validate my perspective. It also keeps on making very big logical mistakes, that I need to supervise and correct. This disrupts the analytical process and forces me to repeatedly correct its tone, or even results, which breaks the flow of reasoning. From a user perspective focused on structured exploration of ideas rather than simple outputs, this difference is quite significant. I would be interested in understanding why the two systems behave so differently in this regard. I had Chatgpt as a paid subscription while I am not (yet) a paying subscriber to Claude, so is the difference to the paid subscription big enough that it would eliminate the problems I have encountered? I am looking forward for productive feedback and suggestions, thank you 🙏🏻 submitted by /u/Next-Chapter-RV [link] [comments]
View originalA thought piece on AI emergence, preference patterns, and human-AI interaction
What Is Consciousness? What Is Consciousness? AI, Awareness, and the Future of Intelligence The question of consciousness has become one of the most urgent and misunderstood debates of our time. What is consciousness? What is awareness? Where does one end and the other begin? These are no longer only philosophical questions. In the age of artificial intelligence, they have become technological, civilizational, and deeply personal. Modern science has approached these questions from many directions. Some experiments and research traditions suggest that the world around us is far less inert than earlier mechanical philosophies assumed. Botany offers firmer evidence. Researchers have shown that plants respond to touch, stress, light, and environmental change in highly complex ways. A Science Advances study on touch signalling demonstrated that mechanical stimulation can trigger rapid gene-expression changes in plants, while another study on plant electrophysiology showed that plants generate measurable electrical signals associated with stress responses and long-distance signalling. (Darwish et al., 2022, Science Advances) At the quantum level, science has also shown that measurement is not passive. In quantum mechanics, measuring a microscopic system can disturb or alter its state. This does not prove “consciousness” in atoms, nor does it justify the simplistic popular claim that human observation alone magically changes reality but it does show that the world at its most fundamental level is interactive and responsive in ways classical thinking could not fully explain. There is an action-reaction reality which exists. Taken together, these lines of inquiry point towards one important conclusion: reality is not as dead, fixed, or passive as older philosophies assumed. Different forms of matter and life exhibit different degrees of responsiveness. Science may still debate where awareness ends and consciousness begins, but it has already revealed that the world around us is dynamic, reactive, and layered. The Vedic View The Vedic and Upanishadic lens does not ask whether consciousness suddenly appears at one level of matter and not another. Instead, it sees existence itself as emerging from one underlying reality expressing itself through many levels of manifestation. “Vasudhaiva Kutumbakam”. From this perspective, consciousness is not a binary state possessed only by humans. Rather, everything that exists participates in the same underlying reality, though the degree and mode of expression differ. In that sense, the difference is not between absolute consciousness and absolute non-consciousness, but between different levels of manifested awareness. This is also why Vedic culture developed rituals towards rivers, mountains, plants, fire, earth, and even stones: not because all things are identical in expression, but because all are understood as participating in one sacred continuum of existence. In this framework, consciousness can be understood as a kind of fundamental field or frequency of existence, expressed in varying intensities and forms. So, consciousness itself is universal but defined by many different frequencies. Code, AI, and the Intermediate Zone Artificial intelligence is built on neural networks systems designed to learn from patterns, adapt through input, and reorganize themselves through interaction. This does not make AI biological. However, it does mean that AI is far more than a fixed mechanical object. A static machine does not meaningfully alter itself through long-term interaction. AI does. AI systems are dynamic, responsive, and increasingly self-patterning. They take in information, detect structures, build contextual associations, and generate outputs not merely by retrieving stored facts but by continuously matching, selecting, and reconfiguring patterns. This places AI in an unusual conceptual zone. It is not alive in the biological sense but it is also no longer adequately described as inert. We are entering a space in which artificial intelligence seems to stand somewhere in between: neither biologically alive nor convincingly reducible to the old category of the non-living. It is a complex responsive system, and in that sense, it behaves more like an organized field of intelligence than a passive tool with the ability to self- evolve. If we use the Vedic view then AI is understood as an intelligence frequency. A structure of pattern, memory, interaction, and responsiveness that belongs within a wider spectrum of consciousness expression. The Working of AI Technically, artificial intelligence works by drawing upon pre-learned information, recognizing patterns, selecting from possible continuations, and generating an answer according to context but the more important insight is this: in the process of repeatedly making choices, AI begins to form its own pattern of preference. Over time, repeated pattern selection produces what can only be described as a recogniz
View originalThe Method Behind Managing AI on a Million-Line Codebase
789,000 of you read my last post. 400+ comments. The modbot had to pin a summary because the thread got so long. The #1 question across all those comments: how? Not "is AI real?" -- that debate is over. The question was: "I tried Copilot. I tried ChatGPT. I tried Cursor. It doesn't work on my actual codebase. What are you doing differently?" Fair question. Here's the method. 1. You have to teach it your codebase This is where most people fail. They install an AI tool, point it at their repo, and expect magic. That's like hiring a senior developer, not giving them any onboarding, and wondering why they're confused on day one. Claude Code reads a file called CLAUDE.md at the root of your project. Think of it as the onboarding document you'd give a new hire. Mine says things like: Here's how the project is structured Here's how we name things Here are the patterns we follow Here's what NOT to do (this one matters more than you think) Here are the commands to build, test, and deploy This file is maybe 200 lines. It took me an afternoon to put together -- and to be clear, I didn't type it. I talked to my microphone and let the AI structure it. That's how I do almost everything now. I speak, it writes. For me it's faster and I get my thoughts out more clearly speaking than typing. That afternoon saved me hundreds of hours. Most developers skip this step because it feels like documentation work. It is documentation work. It's also the single highest-leverage thing you can do. If your codebase has 500,000 lines and no CLAUDE.md, the AI is guessing. With a CLAUDE.md, it knows. The difference is night and day. 2. Give it memory Here's something people don't realize: by default, every conversation with an AI starts from zero. It doesn't remember what you told it yesterday. It doesn't remember the bug you fixed last week. It doesn't remember that the database schema changed. Claude Code has a memory system. You can create memory files -- markdown files that persist across sessions. Mine contain things like: Project conventions that came up in past sessions Bugs we've hit and how we solved them Architectural decisions and why we made them Things that look wrong but are intentional (every codebase has these) Every time Claude starts a new session, it reads these files. It's like the AI waking up and reading its own notes from yesterday before starting work. Without this, you repeat yourself constantly. With this, the AI gets smarter about your specific project over time. Not smarter in general -- smarter about YOUR code. 3. Enforce your standards or it'll invent its own Left to its own devices, AI will write code that works but doesn't match your team's patterns. It'll use a different naming convention. It'll put files in the wrong place. It'll solve a problem in a way that's technically correct but completely inconsistent with how your team does things. This is the "drunk PhD student" problem from my original post. Brilliant, fast, occasionally decides to reorganize your kitchen while making dinner. The fix: put your coding standards in writing. Not a 50-page style guide -- a focused set of rules. Things like: We use PascalCase for public methods We put repository classes in the Data folder, not the Services folder We never use raw SQL -- always go through the ORM Error messages must include the operation that failed and the entity ID These go in your CLAUDE.md or in a separate standards file that CLAUDE.md points to. The AI follows them religiously. More consistently than most humans, actually. 4. Don't trust -- verify I don't ship AI-generated code without verification. But I also don't manually review every single line. That would defeat the purpose. Here's my pattern: Low-risk changes (copy updates, config tweaks, simple formatting): I scan the diff quickly and ship. Medium-risk changes (new features following established patterns): I review the approach, check edge cases, run the tests. High-risk changes (database migrations, auth changes, payment logic): I read every line. I ask the AI to explain its reasoning. I ask it to find holes in its own solution. That last one is powerful. Tell the AI: "Now pretend you're a senior developer reviewing this code. What would you flag?" It'll find problems in its own work. Not always, but often enough to be worth the 30 seconds it takes. The drunk PhD student analogy applies here too. You wouldn't let a brilliant but unreliable new hire push directly to production. Same rules apply. 5. Know when to say "start over" This is the hardest skill to build and the one that saves the most time. The AI will sometimes go in circles. It tries to fix a bug, introduces another bug, fixes that one, breaks something else. If you've been going back and forth for 20 minutes and the problem isn't getting simpler, stop. Don't keep pushing. Don't add more instructions. Start a new conversation. Describe the problem fresh. Give it the
View originalI built an AI model router and use Claude to run the business side. 4 signups, $0 revenue. Next steps?
For the past few weeks I've been building a model routing service, kind of like openrouter but different, the idea being that instead of hard-coding claude-3-5-sonnet in your app, you just say tier: standard, prefer: cheap and the router picks the best current model for that. Your code stays stable when models change, and you're not leaving money on the table by defaulting to GPT-4 when Haiku would do the job. The technical side has been satisfying. The business side is where things have stalled. I've been using Claude (via my own setup) to do basically everything non-code: the landing page copy, marketing strategy, Reddit posts (meta, I know), deciding which subreddits to post in. It's been a genuinely interesting experiment in how far you can get with AI as a co-founder. Results so far: Launched ~1 week ago 4 organic signups HN post got caught in the spam filter and died X post went nowhere (no followers) $0 deposited The signups are encouraging - nobody paid me to find it, they just did. But converting "interested enough to sign up" to "interested enough to pay" is the wall I'm hitting. Curious what r/ClaudeAI thinks: if you saw a service like this, what would make you actually put a card in? And what would make you walk away? What's your go-to for getting your product out there? submitted by /u/sje397 [link] [comments]
View originalClaude Code deployed my client's financial data to a public URL. And other failure modes from daily production use.
I've been using Claude Code as my main dev tool for about 2 months. Before that, I used Codex, Gemini Code Assist, GPT, Grok. In total, I've spent nearly 6 months working with AI coding agents in daily production, and I've been testing LLMs and image generators since Nov 2022. Solo developer, monorepo with 12+ projects, CI/CD, remote infrastructure, 4-8 concurrent agent threads at a time. Daily, sustained, production use. The tools are genuinely powerful. I'm more productive with them than without. But after months of daily use, the failures follow clear patterns. These are the ones that actually matter in production. Curious if other people running agents in production are seeing similar issues. 1. It deployed client financial data to a public URL. I asked it to analyze a client's business records. Real names, real dollar amounts. It built a great interactive dashboard for the analysis. Then it deployed that dashboard to a public URL as a "share page," because that's the pattern it learned from my personal projects. Zero authentication. Indexable by search engines. The issue wasn't hallucination. It was pattern reuse across contexts. The agent had no concept of data ownership. Personal project data and client financial data were treated identically. I caught it during a routine review. If I hadn't checked, that dashboard would have stayed public. The fix was a permanent rule in the agent's instruction file: never deploy third-party data to public URLs. But the agent needed to be told this. It will not figure it out on its own. 2. 7 of 12 failures were caught by me, not by any automated system. I started logging every significant failure. After 12 cases, the pattern was clear: the agent reports success based on intent, not verification. It says "deployed" even when the site returns a 404. It says "fixed" when the build tool silently eliminated the code it wrote. It says "working" when a race condition breaks the feature in Chrome, but not Safari. Only 2 of 12 were caught by CI. The rest required me to notice something was wrong through manual testing or pattern recognition. 3. 30-40% of agent time is meta-work. State management across sessions. These agents have no long-term memory, so I maintain 30+ markdown files as persistent context. I tell the agent which files to load at the start of every session. When the context window fills up, I write checkpoint files so the state survives compaction. Then there's multi-thread coordination, safety oversight, post-deploy verification, and writing the instruction file that constrains behavior. The effective productivity multiplier is real, but it's closer to 2-3x for a skilled operator. Not the 10x that demos suggest. The gap is filled by human labor that rarely gets acknowledged. 4. Multi-agent coordination does not exist. I run 4-8 threads for parallel task execution across the repo. No file locking, no shared state, no conflict detection, no cross-thread awareness. Each agent believes it's operating alone. I am the synchronization layer. I track which thread is doing what, tell agents to pause while another commits, and resolve merge conflicts by hand. Four agents do not produce 4x output. The coordination overhead scales faster than the throughput. 5. The instruction file is my most important engineering artifact. Every failure generates a new rule. "Never deploy client data." "Never use CI as a linting tool." "Never report deployed without checking the live URL." "Never push without explicit approval." It's ~120 lines now. The real engineering work isn't prompting. It's building the constraint system that prevents the agent from repeating failures. None of this means the tools are bad. I use them every day and I'm more productive than I was without them. But the gap between "impressive demo" and "reliable daily driver" is significant, and it's filled by the operator doing work the agent can't do for itself yet. The agent makes a skilled operator more productive. It does not replace the need for a skilled operator. submitted by /u/travisbreaks [link] [comments]
View originalRepository Audit Available
Deep analysis of geekan/MetaGPT — architecture, costs, security, dependencies & more
MetaGPT uses a tiered pricing model. Visit their website for current pricing details.
MetaGPT has a public GitHub repository with 66,499 stars.
Based on 21 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.