What is the overall sentiment around Langfuse?

Based on 17 social mentions analyzed, 24% of sentiment is positive, 76% neutral, and 0% negative.

Langfuse

observabilitysubscription + tiered

Traces, evals, prompt management and metrics to debug and improve your LLM application.

Langfuse is recognized for its capability to effectively track LLM calls, providing visibility into AI operations which is crucial for production environments. However, some users have raised concerns about its lack of understanding of agent topology and potential interoperability limitations with other tracing formats. There isn't much specific sentiment mentioned about pricing, but there seems to be an implication that it's a paid solution compared to some open-source alternatives. Overall, Langfuse is appreciated as a valuable tool for observability in AI, though it faces some competition from both paid and open-source tools offering varied features.

Website

Mentions (30d)

Reviews

Platforms

GitHub Stars

24,100

2,434 forks

15 integrations1 features870,710 npm downloads/wkMerger / Acquisition

Voices Discussing Langfuse

Max Mergenthaler

CEO at Langfuse

1 mention

Chris Lattner

CEO at Modular AI (Mojo)

1 mention

Latest Videos

Langfuse Context: All things MCP with Adam Jones (Tech Lead at Anthropic)

Jan 6, 2026

Continuous Evaluation, Monitoring, and Operations of AI Agents with AWS Bedrock AgentCore & Langfuse

Share:Twitter LinkedIn

Product Screenshots

AI Summary

Features & Use Cases

Features

Gain deep visibility into your traces

Use Cases

Monitoring LLM performance in productionTracking API usage and costsAnalyzing user interactions with LLMsIdentifying bottlenecks in LLM workflowsDebugging multi-agent systemsOptimizing LLM response timesConducting A/B testing on LLM outputsCollecting feedback for LLM improvements

Company Intel

Industry

information technology & services

Employees

Funding Stage

Merger / Acquisition

Total Funding

$4.1M

Social Reach

828

GitHub followers

Developer Ecosystem

GitHub repos

24,100

GitHub stars

npm packages

HuggingFace models

870,710

npm downloads/wk

19,249,322

PyPI downloads/mo

Top Mention

reddit@Fine-Discipline-81827 engagement5/4/2026

Anyone actually built a real feedback loop for Claude agents in production? Because "run evals and pray" isn't cutting it

So I've been running a multi-agent setup with Claude for a few months now mostly customer-facing stuff, some internal tooling. And i keep hitting this problem that I think a lot of people here are probably dealing with too but nobody really talks about. You ship a prompt change. Or you swap from Sonnet to Opus for one step in the chain. Or you add a new tool. Everything looks fine in your evals. You push it. Then three days later someone on the team notices the agent is subtly doing something wrong not catastrophically wrong, just... You can sense something's off. Maybe it stopped including a specific field in its output. Maybe it started being way too verbose in one branch of the logic. Whatever it is, it's not a crash, it's a vibe shift. And then you're sitting there doing archaeology on your own system. Manually diffing outputs, reading through traces, asking teammates "hey did you notice anything weird last Tuesday." It's miserable. I've been thinking a lot about what the fastest feedback loop in agent engineering that almost nobody is running actually looks like. Because right now my loop is: ship change → wait for someone to complain → investigate → fix → hope I didn't break something else That's... pre-CI/CD era thinking applied to agents. And it's wild that this is where most of us are at. The thing is, traditional software solved this ages ago. You write tests, you run them in CI, you get red/green before merge. But agents are so much messier. Outputs are non-deterministic, "correct" is fuzzy, and the failure modes are subtle behavioral drift rather than stack traces. So most teams I talk to (including mine honestly) end up relying on vibes. Does the agent feel like it's working? Cool, ship it. What I actually want is something that: 1. Watches production behavior continuously 2. Notices when things drift from expected patterns 3. Connects the regression to the specific change that caused it 4. Tells me before a customer does 5. Ideally feeds that learning back so the same failure doesn't happen again I have tracing set up (Langfuse). It's good for what it does. But it still feels like it stops at "here's what happened" rather than "here's what went wrong and why." I generate a ton of observability data that nobody looks at until something is already broken. The closed-loop part where the system actually learns from failures that's what's missing. I've been looking at a few things. LangSmith, Arize, Braintrust... they all cover pieces of this. Recently stumbled on Bento which seems to be trying to do the full closed-loop thing — tracing + regression detection + feeding fixes back into the system. Haven't gone deep enough to know if it actually delivers on that promise but the framing resonates with what I'm trying to build. If anyone's tried it i'd be curious to hear. But honestly I'm more interested in hearing what people here have actually built or cobbled together. Like: \- Are you running evals against production traffic or just pre-deploy? \- How do you detect behavioral drift that isn't an outright error? \- When you find a regression, how do you trace it back to which change caused it? \- Has anyone built something where the agent actually gets better from production failures automatically rather than you manually tweaking prompts? I feel like this is the unsexy infrastructure problem that's going to separate teams who can actually run agents reliably from teams who are perpetually firefighting. But maybe I'm overthinking this and everyone's just vibing their way through production lol Would love to hear what your setups look like, especially if you're running Claude agents at any kind of scale where you can't just eyeball every interaction.

Langfuse

Compare Langfuse With