LangSmith

observabilityevaluation

View in LangSmith

LangSmith is recognized for its capabilities in providing observability for AI agents, a necessary feature due to the risk associated with running these agents in production environments. A key complaint highlighted is that LangSmith is a cloud-only service with paid access, which may not be ideal for all users, especially those preferring open-source alternatives. The general sentiment around its pricing is somewhat negative, as users express a preference for non-commercial options. Overall, LangSmith appears to have a solid reputation for its functional strengths but faces criticism regarding its availability and cost structure.

Website

Mentions (30d)

Reviews

Platforms

Sentiment

18%

2 positive

15 integrations14 featuresSeries B

Voices Discussing LangSmith

Harrison Chase

CEO at LangChain

20 mentions

Hamel Husain

Independent Consultant at AI Consulting

1 mention

Shreya Shankar

PhD Researcher at UC Berkeley

1 mention

Share:Twitter LinkedIn

AI Summary

Features & Use Cases

Features

Agent debugging toolsPerformance monitoring dashboardsReal-time observability metricsError tracking and reportingAgent performance evaluationDeployment management for AI agentsCustomizable alerting systemIntegration with CI/CD pipelinesUser activity trackingData loss prevention mechanisms

Use Cases

Monitoring AI agent performance in productionDebugging issues in multi-agent systemsEvaluating the effectiveness of AI agentsPreventing data loss in AI applicationsManaging deployment of AI agentsIntegrating observability into CI/CD workflowsTracking user interactions with AI agentsAnalyzing agent behavior over time

Company Intel

Industry

information technology & services

Employees

Funding Stage

Series B

Total Funding

$260.0M

Top Mention

reddit@DetectiveMindless65216 engagement5/23/2026

After 6 months of running AI agents in production I think the framework you pick barely matters. The thing that kills them is something else.

Going to get downvoted for this but here we go. I've been running about 30 agents in production for paying customers for the last 6 months and I'm convinced the framework debate is mostly a distraction. LangChain, CrewAI, AutoGen, OpenAI Agents SDK. Pick whichever one your team already knows. It doesn't matter as much as you think. What actually decides whether your agent works in production is something almost nobody talks about on this sub, and it isn't in the framework. Here's what I've seen kill more agents than every framework bug combined. The agent gets stuck in a loop. It calls the same tool 200 times in 4 minutes because something downstream returned ambiguous data and the LLM decided to retry forever. Your OpenAI bill goes from $3 a day to $400 in one afternoon. By the time you notice you've burned a grand. You can't even tell which agent did it because there's no audit trail. Your VPS reboots overnight for kernel patches. Every agent that was mid-task loses everything. Tomorrow morning the support agent has no memory of yesterday's tickets, the research crew has forgotten what they were investigating, the pipeline agent restarts from scratch. None of these are framework problems. They're memory and state problems. A customer complains the agent gave them wrong info three days ago. You go to debug. There's no record of what the agent saw, what it decided, or which tool calls it made. The framework didn't log that because frameworks aren't observability tools. You shrug and refund. You scaled to 15 agents working together. Two of them have conflicting beliefs about the same customer because their memory isn't shared. The customer gets two different answers in the same conversation depending on which agent replies first. You've been around enough times to realize the part you actually need isn't in the framework at all. What I think the real stack is. The framework just orchestrates LLM calls. Use whatever your team likes. It's the cheap layer. A persistent memory layer that survives crashes, restarts, and redeploys, so the agent has actual continuity. This is the layer that decides whether your agent is a toy or a product. Loop detection at the runtime layer, not bolted on as a wrapper around the framework. Something that catches your agent making the same call too many times in a row and stops it before the bill explodes. An audit trail of every decision the agent made, with a hash chain so you can prove later what happened when the customer pushes back. Screenshots and logs aren't enough when ten thousand dollars is on the line. Shared memory between agents in the same team so they're not having different conversations about the same customer. Cost tracking per agent so you actually know which one ran away with your budget. When I look at what makes the agents that survive production look different from the ones that died, it's never that they picked the right framework. It's that they had this layer underneath, either built carefully in-house or borrowed from somewhere. Full disclosure I'm building one of these tools. There are others. Mem0 and Zep and Letta in the memory space. Helicone and LangSmith in the observability space. Mix and match. Use one or build your own. Just please stop arguing about whether LangChain or CrewAI is better when the thing eating your production agents has nothing to do with either of them. What's been your worst production agent failure? Curious what other people have actually hit. I built a free tool that aims to solve most of this issue, what do you think?

LangSmith

Compare LangSmith With