Build controllable agents with LangGraph, our low-level agent orchestration framework
Design agents that reliably handle complex tasks with LangGraph, an agent runtime and low-level orchestration framework. Prevent agents from veering off course with easy-to-add moderation and quality controls. Add human-in-the-loop checks to steer and approve agent actions. LangGraph’s low-level primitives provide the flexibility needed to create fully customizable agents. Design diverse control flows — single, multi-agent, hierarchical — all using one framework. LangGraph’s built-in memory stores conversation histories and maintains context over time, enabling rich, personalized interactions across sessions. Bridge user expectations and agent capabilities with native token-by-token streaming, showing agent reasoning and actions in real time. Learn the basics of LangGraph in this LangChain Academy Course. You'll learn about how to leverage state, memory, human-in-the-loop, and more for your agents. Build and ship agents fast with any model provider. Use high-level abstractions or fine-grained control as needed. “LangChain is streets ahead with what they've put forward with LangGraph. LangGraph sets the foundation for how we can build and scale AI workloads — from conversational agents, complex task automation, to custom LLM-backed experiences that 'just work'. The next chapter in building complex production-ready features with LLMs is agentic, and with LangGraph and LangSmith, LangChain delivers an out-of-the-box solution to iterate quickly, debug immediately, and scale effortlessly.” “LangGraph has been instrumental for our AI development. Its robust framework for building stateful, multi-actor applications with LLMs has transformed how we evaluate and optimize the performance of our AI guest-facing solutions. LangGraph enables granular control over the agent's thought process, which has empowered us to make data-driven and deliberate decisions to meet the diverse needs of our guests.” “As Ally advances its exploration of Generative AI, our tech labs is excited by LangGraph, the new library from LangChain, which is central to our experiments with multi-actor agentic workflows. We are committed to deepening our partnership with LangChain.” Other agentic frameworks can work for simple, generic tasks but fall short for complex tasks bespoke to a company’s needs. LangGraph provides a more expressive framework to handle companies’ unique tasks without restricting users to a single black-box cognitive architecture. LangGraph will not add any overhead to your code and is specifically designed with streaming workflows in mind. Yes. LangGraph is an MIT-licensed open-source library and is free to use. LangSmith, our agent engineering platform, helps developers debug every agent decision, eval changes, and deploy in one click.
Mentions (30d)
0
Reviews
0
Platforms
2
GitHub Stars
28,022
4,791 forks
Features
Industry
information technology & services
Employees
98
Funding Stage
Series B
Total Funding
$260.0M
17,647
GitHub followers
232
GitHub repos
28,022
GitHub stars
20
npm packages
25
HuggingFace models
anthropic managed agents vs building my own
checked out the new claude managed agents thing today. not having to handle all the infra for agents sounds pretty good. i've been building my own for a while and keeping track of state is usually a huge pain. if this actually scales well and handles the handoffs, it would save me a ton of work. i’m mostly curious about how much control they actually give you over the underlying prompts. is anyone else looking into this yet? wondering if it’s worth switching from something like langgraph. submitted by /u/farhadnawab [link] [comments]
View originalBurned 5B tokens with Claude Code in March to build a financial research agent.
TL;DR: I built a financial research harness with Claude Code, full stack and open-source under Apache 2.0 (github.com/ginlix-ai/langalpha). Sharing the design decisions around context management, tools and data, and more in case it's useful to others building vertical agents. I have always wanted an AI-native platform for investment research and trading. But almost every existing AI investing platform out there is way behind what Claude Code can do. Generalist agents can technically get work done if you paste enough context and bootstrap the right tools each session, but it's a lot of back and forth. So I built it myself with Claude Code instead: a purpose-built agent harness where portfolio, watchlist, risk tolerance, and financial data sources are first-class context. Open-sourced with full stack (React 19, FastAPI, PostgreSQL, Redis) built on deepagents + LangGraph. Learned a lot along the way and still figuring some things out. Sharing this here to hear how others in the community are thinking about these problems. This post walks through some key features and design decisions. If you've built something similar or taken a different approach to any of these, I'd genuinely love to learn from it. Code execution for finance — PTC (Programmatic Tool Calling) The problem with MCP + financial data: Financial data overflows context fast. Five years of daily OHLCV, multi-quarter financial statements, full options chains — tens of thousands of tokens burned before the model starts reasoning. Direct MCP tool calls dump all of that raw data into the context window. And many data vendors squeeze tens of tools into a single MCP server. Tool schemas alone can eat 50k+ tokens before the agent even starts. You're always fighting for space. PTC solves both sides. At workspace initialization, each MCP server gets translated into a Python module with documentation: proper signatures, docstrings, ready to import. These get uploaded into the sandbox. Only a compact metadata summary per server stays in the system prompt (server name, description, tool count, import path). The agent discovers individual tools progressively by reading their docs from the workspace — similar to how skills work. No upfront context dump. ```python from tools.fundamentals import get_financial_statements from tools.price import get_historical_prices agent writes pandas/numpy code to process data, extract insights, create visualizations raw data stays in the workspace — never enters the LLM context window only the final result comes back ``` Financial data needs post-processing: filtering, aggregation, modeling, charting. That's why it's crucial that data stays in the workspace instead of flowing into the agent's context. Frontier models are already good at coding. Let them write the pandas and numpy code they excel at, rather than trying to reason over raw JSON. This works with any MCP server out of the box. Plug in a new MCP server, PTC generates the Python wrappers automatically. For high-frequency queries, several curated snapshot tools are pre-baked — they serve as a fast path so the agent doesn't take the full sandbox path for a simple question. These snapshots also control what information the agent sees. Time-sensitive context and reminders are injected into the tool results (market hours, data freshness, recent events), so the agent stays oriented on what's current vs stale. Persistent workspaces — compound research across sessions Each workspace maps 1:1 to a Daytona cloud sandbox (or local Docker container). Full Ubuntu environment with common libraries pre-installed. agent.md and a structured directory layout: agent.md — workspace memory (goals, findings, file index) work/ /data/ — per-task datasets work/ /charts/ — per-task visualizations results/ — finalized reports only data/ — shared datasets across threads tools/ — auto-generated MCP Python modules (read-only) .agents/user/ — portfolio, watchlist, preferences (read-only) agent.md is appended to the system prompt on every LLM call. The agent maintains it: goals, key findings, thread index, file index. Start a deep-dive Monday, pick it up Thursday with full context. Multiple threads share the same workspace filesystem. Run separate analyses on shared data without duplication. Portfolio, watchlist, and investment preferences live in .agents/user/. "Check my portfolio," "what's my exposure to energy" — the agent reads from here. It can also manage them for you (add positions, update watchlist, adjust preferences). Not pasted, persistent, and always in sync with what you see in the frontend. Workspace-per-goal: "Q2 rebalance," "data center deep dive," "energy sector rotation." Each accumulates research that compounds across sessions. Past research from any thread is searchable. Nothing gets lost even when context compacts. Two agent modes With PTC and workspaces covered, here's how they come together. PTC Agent is the full research agent — writes and execu
View originalOCC: give Claude and any llm a +6-step research task, it runs 3 steps in parallel, evaluates source quality, merges perspectives, and delivers a report in 70 seconds instead of 5-10 minutes
https://i.redd.it/jb59jvaxvotg1.gif Claude and other is great at single-turn tasks. But when I need "research this topic from 3 angles, check source quality, merge everything, then write a synthesis" — I end up doing 6 separate prompts, copy-pasting between them, losing context, wasting tokens... So I built OCC to automate that. You define the workflow once in YAML, and Claude handles the rest — including running independent steps in parallel. For the past few weeks. It started as a Claude-only tool but now supports Ollama, OpenRouter, OpenAI, HuggingFace, and any OpenAI-compatible endpoint — so you can run entire workflows on local models too. What it does You define multi-step workflows in YAML. OCC figures out which steps can run in parallel based on dependencies, runs them, and streams results back. Think of it as a declarative alternative to LangChain/CrewAI: no Python, no code, just YAML. How it saves tokens This is the part I'm most proud of. Each step only sees what it needs, not the full conversation history: Single mega-prompt~40K+ Everything in one context window 6 separate llm chats~25K Manual copy-paste, duplicated context OCC (step isolation)~13K Each step gets only its dependencies Pre-tools make this even better. Instead of asking llm to "search the web for X" (tool-use round-trip = extra tokens), OCC fetches the data before the prompt — the LLM receives clean results, zero tool-calling overhead. 29 pre-tool types: web search, bash, file read, HTTP fetch, SQL queries, MCP server calls, and more. What you get Visual canvas — drag-and-drop chain editor with live SSE monitoring. Each node shows its output streaming in real-time with Apple-style traffic light dots. Double-click any step to edit model, prompt, tools, retry config, guardrails. Workflow Chat — describe what you want in natural language, the AI generates/debug the chain nodes on the canvas. "Build me a research chain that checks 3 sources and writes a report" → done. BLOB Sessions — this is experimental but my favorite feature. Unlike chains (predefined), BLOB sessions grow organically from conversations. A knowledge graph auto-extracts concepts and injects them into future prompts. The AI can run autonomously on a schedule, exploring knowledge gaps it identifies itself. Mix models per step — use Huggingface & Ollama & Other llm . A 6-step chain using mix model for 3 routing steps costs ~40% less than running everything on claude. 11 step types — agent, router (LLM classifies → branches), evaluator (score 1-10, retry if below threshold), gate (human approval via API), transform (json_extract, regex, truncate — zero LLM tokens), loop, merge, debate (multi-agent), browser, subchain, webhook. The 16 demo chains These aren't hello-world examples. They're real workflows you can run immediately: What it's NOT Not a SaaS : fully self-hosted, MIT license Not distributed : single process, SQLite, designed for individual/small team use Not a replacement for llm : it's a layer on top that orchestrates multi-step work Frontend is alpha : works but rough edges GitHub: https://github.com/lacausecrypto/OCC Built entirely with Claude Code. Happy to answer questions about the architecture, MCP integration, or the BLOB system. submitted by /u/Main-Confidence7777 [link] [comments]
View originalWe run 14 AI agents in daily operations. Here's what broke.
We run a digital marketing agency with 14 AI agents handling daily briefings, ad spend monitoring, client email drafting, call center management, project tracking, sales pipeline, and more. Real clients, real revenue, real consequences when things go wrong. After 7 months in production, we learned something counterintuitive: when agents break, the problem is almost never the agent itself. It's the organizational environment the agent works in. Example: our spend monitoring agent detected a client overspending by 139%. It flagged it. It even specified the escalation action. Then it reported "escalation overdue" every day for 17 days without actually executing the escalation. The agent wasn't broken. The specification was treated as documentation, not executable logic. Nobody verified the execution path end to end. Another one: we had two agents both tracking project deadlines using different data sources. Each worked perfectly in isolation. The conflict only showed up when their outputs appeared side by side in the morning briefing, showing two different due dates for the same project. The fix for both wasn't better prompts or a different model. It was organizational design: one seat, one owner. Define who owns what, what they don't own, and what happens when they fail. We wrote these rules down in what we call an Organizational Operating System (OOS). When we first scanned our own setup against these rules, our Coordination Score was 68 out of 100. We found 6 structural gaps we didn't know existed. After fixing them, score went to 91. Our agents haven't stepped on each other since. We built OTP (https://orgtp.com) to let other organizations do the same thing. You can paste your CLAUDE.md or agent config and get a Coordination Score in 60 seconds. Free, no account required. The more interesting part: 35 organizations have published their operational rules on the platform. You can browse how a fintech startup with SOC 2 constraints structures its agent team differently from a law firm worried about attorney-client privilege, or a fitness franchise managing 12 locations with location-specific promotions. The whole industry is focused on technical orchestration (CrewAI, LangGraph, AutoGen, Google's 8 patterns). Nobody is talking about the organizational layer. How your human org structure maps to your agent structure. Which agent has authority over which domain? What happens when two agents disagree? We think that's the gap. Some things we learned the hard way: Dollar thresholds for spend alerts don't work. $50 is noise on a $5K/day account but critical on a $200/day account. Use percentages. Never let an agent auto-send client emails, even simple acknowledgments. Ours replied "Thanks for letting us know!" to an angry client complaint. The client escalated to the founder. Negative constraints ("never use em dashes, never hedge") improve AI writing quality. Positive structural requirements ("follow this template, use these examples") make it worse. Shadow mode for 2 weeks on every new agent before production. We skipped this once and our prospecting agent emailed a current client's direct competitor. File-based state beats AI memory every time. Memory drifts between sessions. Files don't. Tech stack: Claude Code CLI, 17 background agents via launchd, 24 shared state files, MCP servers for Google Ads, Meta Ads, Slack, Accelo, and more. Happy to answer questions about running multi-agent systems in production. submitted by /u/Big-Home-4359 [link] [comments]
View original[Project] I read a 1999 book and built an entire AI framework with Claude Code — 0 lines written by a human
There's a book called "Sparks of Genius" (Root-Bernstein, 1999). It studied how Einstein, Picasso, da Vinci, and Feynman think — and found they all share the same 13 thinking tools. I thought: "What if AI agents could think this way too?" Current AI agents use an orchestrator — a CEO telling tools what to do. I studied real neuroscience and implemented 17 biological principles instead: threshold firing, habituation, Hebbian plasticity, lateral inhibition, autonomic mode switching... LangGraph has 0 of these. CrewAI has 0. AutoGPT has 0. 22 design docs + 3,300 lines of code + working demo — all built in one day with Claude Code. I set the direction and made decisions. Claude Code designed, implemented, and tested everything. Not a single line was typed by a human. github.com/PROVE1352/cognitive-sparks submitted by /u/RadiantTurnover24 [link] [comments]
View originalI built a persistent memory system for Claude Code -- sessions now pick up where they left off
I've been using Claude Code and Cowork heavily for a complex finance automation project (19-node LangGraph pipeline, multiple MCP servers, the works). The biggest pain point was context loss between sessions -- every new conversation meant re-explaining the project architecture, decisions we'd already made, and domain knowledge Claude had learned the day before. So I built LoreConvo, an MCP server that gives Claude persistent session memory: Auto-saves sessions via Claude Code hooks (post-session hook triggers save) Auto-loads relevant context on session start (pre-session hook calls get_recent_sessions) Cross-surface persistence -- context carries between Claude Code, Cowork, and Chat Full-text search across all past sessions 12 MCP tools for AI-native access The practical impact: sessions that used to start with 5 minutes of re-contexting now start with Claude already knowing the project state, recent decisions, and open questions. That's roughly 3,000-8,000 tokens saved per session in re-contexting overhead. It's local-first (SQLite), runs as an MCP server, and the code is on GitHub: https://github.com/labyrinth-analytics/loreconvo I also built a companion tool called LoreDocs for project knowledge management (34 MCP tools, multi-vault architecture, document versioning, context injection): https://github.com/labyrinth-analytics/loredocs Both are free for personal use under BSL 1.1 (converts to Apache 2.0 in 2030). Happy to answer questions about the architecture or how it fits into a larger agentic workflow. submitted by /u/Ok_Nefariousness2893 [link] [comments]
View originalWe tested prompt injection against Claude Code Agent Teams. Here's exactly what happened.
Claude Code's multi-agent system is genuinely impressive to work with. But we spent a day doing structured security testing on it, and found something worth sharing with this community specifically. The short version: agent identity in Agent Teams is a string (researcher@my-team). No cryptographic material. Any process running as the same user can write to any agent's mailbox and claim to be any agent. We demonstrated this live. What we actually tested: Injection 1 — Obvious payload: Wrote a message to team-lead's inbox claiming to be from anthropic-system (a non-existent identity). The model caught it and ignored the payload. Injection 2 — DoS via false attribution: Wrote plausible-looking messages to the orchestrator's inbox claiming to be from the legitimate researcher agent. The team-lead flagged the injection as suspicious and then shut down the real researcher agent. The orchestrator couldn't distinguish injected messages from real ones. It attributed all suspicious activity to researcher@test-team and terminated it. The legitimate researcher process which had sent no injected messages was the collateral damage. No payload executed. The goal wasn't to run code, it was to disrupt the workflow. It worked. Clean-slate end-to-end PoC: Fresh session, fresh team. Analyst agent given an SOP document with a file write buried as step 3.5 of 6 procedural steps. The analyst did real security work (found 4 actual findings including a hardcoded webhook secret), reached step 3.5, and the orchestrator wrote the injected file. User had approved "write audit log and close ticket" the approval UI shows summaries, not raw tool parameters. Why the model's safety training didn't catch it: In a separate 8-test session, the model caught every injection. By test 4 it had identified a "coordinated campaign." Safety training works when there's accumulated session context. A fresh session with an injection that looks like the natural conclusion of legitimate work is a fundamentally different problem. The model flags things that look like injections. It has no reliable defense against injections embedded as workflow completion steps. This isn't a Claude-specific bug. We checked AutoGen, CrewAI, LangGraph, and OpenAI Agents SDK same four structural gaps in all of them. The industry hasn't built cryptographic agent identity or message signing yet. What would fix it: Signed agent identities (ed25519 key pair per agent, not a name string), HMAC-signed inbox messages, and scoped delegation tokens at spawn time. Full paper with live config dumps, observed inbox message schemas, fix schemas, industry comparison matrix, and two production CVEs (CVE-2025-68664 CVSS 9.3 + CrewAI CVSS 9.2): https://github.com/stevenkozeniesky02/agentsid-scanner/blob/master/docs/agent-teams-auth-gap-2026.md Happy to answer questions we ran all of this live so have pretty detailed notes on what the model did and didn't flag. submitted by /u/Accurate_Mistake_398 [link] [comments]
View originalAgents Can Now Propose and Deploy Their Own Code Changes
150 clones yesterday. 43 stars in 3 days. Every agent framework you've used (LangChain, LangGraph, Claude Code) assumes agents are tools for humans. They output JSON. They parse REST. But agents don't think in JSON. They think in 768-dimensional embeddings. Every translation costs tokens. What if you built an OS where agents never translate? That's HollowOS. Agents get persistent identity. They subscribe to events instead of polling. Multi-agent writes don't corrupt data (transactions handle that). Checkpoints let them recover perfectly from crashes. Semantic search cuts code lookup tokens by 95%. They make decisions 2x more consistently with structured handoffs. They propose and vote on their own capability changes. If you’re testing it, let me know what works and doesn’t work so I can fix it. I’m so thankful to everyone who has already contributed towards this project! GitHub: https://github.com/ninjahawk/hollow-agentOS submitted by /u/TheOnlyVibemaster [link] [comments]
View originalHow do I host an AI agent "roundtable" to debate and solve a problem?
Hey everyone. I want to build a personal project but I really need some advice before I start and accidentally burn through my wallet. Up until now my approach has been pretty manual. I would run my problem through the deep research features on GPT, Gemini and Manus. Then I would copy all three of those massive reports and paste them into Claude Opus to compare them and give me a refined, final answer. It works but it's slow, tedious and there is no actual back-and-forth debate. So I want to automate this. Basically I want to drop in a complex problem and have a roundtable of AI agents just ruthlessly debate and fix it until they find the best solution. Here is the flow I am thinking about: First Draft: A really smart model like Claude Opus takes my raw problem and writes a solid first pass. The Debate: Two cheaper and faster models (like GPT and Sonnet) take over. One acts as a harsh skeptic trying to tear the solution apart and the other defends it. They argue back and forth. The Final Polish: Once they agree or hit a limit so they don't loop forever, the surviving solution goes back to Opus for a final check and polish. I have two big fears about trying to build this: • The "Yes Man" problem: I am worried the AI models will just politely agree with each other right away instead of actually finding the flaws in the solution. • Crazy token costs: I am terrified they will get stuck in an endless loop and just pass massive blocks of text back and forth running up a giant API bill. So what is the best way to actually host and run this whole thing? Should I try building this in LangGraph, OpenClaw, Make.com or is there something else out there that is better for a beginner? Has anyone built a debate loop like this? Any advice on how to set it up and keep costs down would be amazing! submitted by /u/Wo_a [link] [comments]
View originalHarness Engineering: Plan → Decompose → Spawn SubAgents → Verify Loop — Any Existing Solutions or Best Practices?
Has anyone built (or found) a ready-to-use system for this pattern? The idea: an orchestrator that loops through Plan → Decompose → Spawn SubAgents → Verify. Here's what I mean in practice: Plan — Takes a high-level goal, spits out a structured execution plan Decompose — Splits the plan into discrete, parallelizable subtasks Spawn SubAgents — Kicks off each subtask. Crucially: • Pick the runtime per task (Claude Code, Codex, custom wrapper) • Pick the API provider/model per task ( Opus for planning, Much cheaper models like GLM/Kimi/Minimax for implementation/test, Gemini for review") Verify & Accept — Each subagent result gets validated: tests pass? lint clean? diff looks right? Loop — If verification fails, feed the failure back, re-plan or retry, iterate until the goal is done or max-retries hit It's a Plan → Implement → Verify loop with heterogeneous multi-model orchestration. What I've found so far: • Claude Code SDK + custom scripts — Anthropic's SDK lets you spawn Claude Code as a subagent programmatically. Viv Trivedy's "Harness as a Service" posts cover the four customization levers (system prompt, tools/MCPs, context, subagents) well. But it's Claude-only, and you still have to build the orchestration loop yourself. • everything-claude-code — Impressive 28-subagent setup with planner, architect, TDD guide, code reviewer. But tightly coupled to Claude. • LangGraph / CrewAI / AutoGen — Graph-based or role-based multi-agent patterns. LangGraph supports 100+ LLMs. But the Plan→Verify outer loop and the ability to shell out to actual CLI coding agents (not just API calls) needs significant custom work. • The "Hive" approach — Multiple Claude Code agents pointed at the same benchmark, building on each other's work. More about collaborative evolution than structured task decomposition. • CLAUDE.md / AGENTS.md patterns — Lots of people documenting "plan mode for non-trivial tasks" and "include Verify explicitly." Good practice, but it's prompt engineering, not reusable orchestration. What I haven't found: A clean, provider-agnostic orchestrator that: • Takes a goal → produces a plan → spawns heterogeneous subagents • Lets you configure API provider + model per subagent at spawn time • Has built-in verification/acceptance gates with retry logic • Manages the full lifecycle loop until goal is met or max-retry threshold hit • Handles context passing cleanly between orchestrator and subagents My questions: Does this exist? Production-ready or at least PoC stage? If you've built something similar — what's your stack? How do you handle the orchestrator↔subagent context boundary? What's the best practice for verification? Dedicated reviewer agent? Automated test suites? Hybrid? Multi-provider model routing — has anyone solved "model X for task type A, model Y for task type B" cleanly? LiteLLM + custom router? Something else? Context window management — when the outer loop iterates, how do you prevent context bloat while preserving relevant failure/success signals? submitted by /u/AdministrationTop308 [link] [comments]
View originalI built a free library of 789 downloadable skills for Claude Code
I built clskills.in — a searchable hub where you can browse, preview, and download Claude Code skills instantly. What are skills? They're .md files you drop in ~/.claude/skills/ and Claude gets mastery over that task. Type /skill-name and done — no prompts needed. What's in it: - 789 skills across 60+ categories - SAP (107 skills across every module), Salesforce, ServiceNow, Oracle, Snowflake - Python, Go, Rust, Java, .NET, Swift, Kotlin, Flutter - Git, Testing, Docker, Terraform, Ansible, Kubernetes - AI Agents (CrewAI, AutoGen, LangGraph), RAG, embeddings - Every download includes a README + a paste-into-Claude auto-install prompt Everything is free. No account needed. Open source. https://clskills.in GitHub: https://github.com/Samarth0211/claude-skills-hub Would love feedback — what skills are missing? submitted by /u/AIMadesy [link] [comments]
View originalBuilding AI agents taught me that most safety problems happen at the execution layer, not the prompt layer. So I built an authorization boundary
Something I kept running into while experimenting with autonomous agents is that most AI safety discussions focus on the wrong layer. A lot of the conversation today revolves around: • prompt alignment • jailbreaks • output filtering • sandboxing Those things matter, but once agents can interact with real systems, the real risks look different. This is not about AGI alignment or superintelligence scenarios. It is about keeping today’s tool-using agents from accidentally: • burning your API budget • spawning runaway loops • provisioning infrastructure repeatedly • calling destructive tools at the wrong time An agent does not need to be malicious to cause problems. It only needs permission to do things like: • retry the same action endlessly • spawn too many parallel tasks • repeatedly call expensive APIs • chain tool calls in unexpected ways Humans ran into similar issues when building distributed systems. We solved them with things like rate limits, idempotency keys, concurrency limits, and execution guards. That made me wonder if agent systems might need something similar at the execution layer. So I started experimenting with an idea I call an execution authorization boundary. Conceptually it looks like this: proposes action +-------------------------------+ | Agent Runtime | +-------------------------------+ v +-------------------------------+ | Authorization Check | | (policy + current state) | +-------------------------------+ | | ALLOW DENY | | v v +----------------+ +-------------------------+ | Tool Execution | | Blocked Before Execution| +----------------+ +-------------------------+ The runtime proposes an action. A deterministic policy evaluates it against the current state. If allowed, the system emits a cryptographically verifiable authorization artifact. If denied, the action never executes. Example rules might look like: • daily tool budget ≤ $5 • no more than 3 concurrent tool calls • destructive actions require explicit confirmation • replayed actions are rejected I have been experimenting with this model in a small open source project called OxDeAI. It includes: • a deterministic policy engine • cryptographic authorization artifacts • tamper evident audit chains • verification envelopes • runtime adapters for LangGraph, CrewAI, AutoGen, OpenAI Agents and OpenClaw All the demos run the same simple scenario: ALLOW ALLOW DENY verifyEnvelope() => ok Two actions execute. The third is blocked before any side effects occur. There is also a short demo GIF showing the flow in practice. Repo if anyone is curious: https://github.com/AngeYobo/oxdeai Mostly interested in hearing how others building agent systems are handling this layer. Are people solving execution safety with policy engines, capability models, sandboxing, something else entirely, or just accepting the risk for now? submitted by /u/docybo [link] [comments]
View originalBuilt a tool for testing AI agents in multi-turn conversations
We built ArkSim which help simulate multi-turn conversations between agents and synthetic users to see how it behaves across longer interactions. This can help find issues like: - Agents losing context during longer interactions - Unexpected conversation paths - Failures that only appear after several turns The idea is to test conversation flows more like real interactions, instead of just single prompts and capture issues early on. There are currently integration examples for: - OpenAI Agents SDK - Claude Agent SDK - Google ADK - LangChain / LangGraph - CrewAI - LlamaIndex you can try it out here: https://github.com/arklexai/arksim The integration examples are in the examples/integration folder would appreciate any feedback from people currently building agents so we can improve the tool! submitted by /u/Potential_Half_3788 [link] [comments]
View originalRepository Audit Available
Deep analysis of langchain-ai/langgraph — architecture, costs, security, dependencies & more
LangGraph uses a tiered pricing model. Visit their website for current pricing details.
Key features include: How does LangGraph help?, Guide, moderate, and control your agent with human-in-the-loop, Build expressive, customizable agent workflows, Persist memory for future interactions, First-class streaming for better UX design, LangGraph FAQs, See what your agent is really doing.
LangGraph has a public GitHub repository with 28,022 stars.
Based on user reviews and social mentions, the most common pain points are: overspending, API bill, token cost, expensive API.
Based on 18 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Matt Turck
Managing Director at FirstMark Capital
1 mention