Traces, evals, prompt management and metrics to debug and improve your LLM application.
Based on the available social mentions, Langfuse appears to be a well-recognized player in the LLM observability and tracing space, often mentioned alongside other established tools like Helicone and Arize. Users appreciate that it provides LLM call tracking capabilities, but some developers note limitations - specifically that it doesn't fully understand agent topology, tool calls, and handoffs in multi-agent systems. The tool seems to face competition from both cloud-only paid solutions like LangSmith and newer open-source alternatives that aim to address its perceived gaps. Overall, Langfuse is viewed as a solid option for basic LLM monitoring, though the market appears to be evolving toward more comprehensive agent observability solutions.
Mentions (30d)
1
Reviews
0
Platforms
4
GitHub Stars
24,100
2,434 forks
Based on the available social mentions, Langfuse appears to be a well-recognized player in the LLM observability and tracing space, often mentioned alongside other established tools like Helicone and Arize. Users appreciate that it provides LLM call tracking capabilities, but some developers note limitations - specifically that it doesn't fully understand agent topology, tool calls, and handoffs in multi-agent systems. The tool seems to face competition from both cloud-only paid solutions like LangSmith and newer open-source alternatives that aim to address its perceived gaps. Overall, Langfuse is viewed as a solid option for basic LLM monitoring, though the market appears to be evolving toward more comprehensive agent observability solutions.
Industry
information technology & services
Employees
15
Funding Stage
Merger / Acquisition
Total Funding
$4.1M
828
GitHub followers
18
GitHub repos
24,100
GitHub stars
20
npm packages
22
HuggingFace models
870,710
npm downloads/wk
16,482,738
PyPI downloads/mo
OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.
Every LLM tool invents its own tracing format. Langfuse has one. Helicone has one. Arize has one. If...
View originalPricing found: $29 / month, $8/100k, $199 / month, $8/100k, $300/mo
I built an open-source tool that shows exactly where your Claude Code tokens go
I was spending $200+/month on Claude Code with zero visibility into where the money went. So I built AgentTrace. Existing tools (LangSmith, Langfuse) trace LLM calls — prompt in, completion out. But when your agent spawns 3 sub-agents that read 40 files, search 5 URLs, and retry tests 3 times, you need to know: which decisions were worth the money? AgentTrace traces agent DECISIONS, not API calls. It builds a decision tree showing what each agent chose to do, what it cost, and whether it contributed to the outcome. One command setup: `npm install -g agenttrace-sdk && agenttrace init` Every Claude Code session auto-generates a cost report showing effective spend vs waste, with actionable recommendations and projected weekly savings. Example: a $1.97 session showed 42% waste — research agent read 6 irrelevant files, docs agent fetched 4 redundant pages, 2 test failures from missing env vars. Each finding comes with a specific fix. Open source, MIT licensed. Would love feedback from this community since you're the ones actually spending on Claude Code daily. submitted by /u/Intrepid_Income6025 [link] [comments]
View originalOpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.
Every LLM tool invents its own tracing format. Langfuse has one. Helicone has one. Arize has one. If...
View originalBuilt an open-source Agent Firewall to see what Claude Code & MCP servers are actually doing on your machine
I built this after realizing Claude Code was autonomously modifying files, calling APIs, and interacting with my MCP servers—and I had zero visibility into what was happening or why. Unalome Agent Firewall is a free, local-first desktop app (Tauri v2 + Rust + React, Apache 2.0) that runs entirely on your machine and gives you real-time visibility into: What it does: - Auto-detects Claude Code, Claude Desktop, running MCP servers - Real-time action timeline—see every file change, API call, connection - Auto-backup files before agent modifications + one-click restore - PII Guardian—scans for exposed API keys, passwords, credit cards - Connection Monitor—logs outbound traffic, flags unknown domains - Cost Tracker—per-model spend across 40+ Claude models + budget limits - Kill Switch—pause Claude Code or any MCP server instantly - MCP Security Scanner—detects prompt injection, dangerous capabilities - Weekly Activity Report—exportable, shareable HTML summary Why I built this: The transparency gap felt critical. Claude Code can read/write files, execute code, interact with MCP servers, and I realized I had no structured way to audit what it actually did. Existing tools (LangSmith, Langfuse) are built for production teams; nothing existed for an individual developer who just wants to know: what did my agent do? Plus, the MCP security landscape in 2025 is rough. Real-world attacks via tool poisoning and prompt injection have exfiltrated private repo code, API keys, and chat histories. A scan of 2,614 MCP implementations found 82% vulnerable to path traversal. The issue: users had no visibility into what was happening. Status: - v0.1.0 fully built & signed (macOS: signed + notarized; Linux: .deb/.rpm/.AppImage; Windows: .msi/.exe) - Open-source, Apache 2.0 - Repo: https://github.com/unalome-ai/unalome-firewall Happy to discuss the MCP detection approach, Tauri/Rust stack, or how to extend support for other agents. Feedback welcome—especially on what other Claude integrations people want covered. submitted by /u/Status_Degree_6469 [link] [comments]
View originalMy chatbot switches from text to voice mid-conversation. same memory, same context, you just start talking. 2 months of Claude, open-sourcing it for you to try.
been building this since late january. started as a weekend RAG chatbot so visitors could ask about my work. it answers from my case studies. that part was straightforward. then i kept going and it turned into the best learning experience i've had with Claude. still a work in progress. there are UI bugs i'm fixing and voice mode has edge cases. but the architecture is solid and you can try it right now. the whole thing was built with Claude Code. the chatbot runs on Claude Sonnet, and Claude Code wrote most of the codebase including the eval framework. two months of building every other day and i've learned more about production LLM systems than in any course. here's what's in it: streaming responses. tokens come in one by one, not dumped as a wall of text. i tuned the speed so you can actually follow along as it writes. fast enough to feel responsive, slow enough to read comfortably. like watching it think. text to voice mid-conversation. you're chatting with those streaming responses, and at any point you hit the mic and just start talking. same context, same memory. OpenAI Realtime API handles speech-to-speech. keeping state synced between both modes was the hardest part to get right. RAG with contextual links. the chatbot doesn't just answer. when it pulls from a case study, it shows you a clickable link to that article right in the conversation. every new article i publish gets indexed automatically via RAG. i don't touch the prompt. the chatbot learns new content on its own just by me publishing it. 71 automated evals across 10 categories. factual accuracy, safety/jailbreak, RAG quality, source attribution, multi-turn, voice quality. every PR runs the full suite. i broke prod twice before building this. 53 of the 71 evals exist because something actually broke. the system writes tests from its own failures. 6-layer defense against prompt injection. keyword detection, canary tokens, fingerprinting, anti-extraction, online safety scoring (Haiku rates every response in background), and an adversarial red team that auto-generates 20+ attack variants. someone tried to jailbreak it after i shared it on linkedin. that's when i took security seriously. observability dashboard. every decision the pipeline makes gets traced in Langfuse: tool_decision, embedding, retrieval, reranking, generation. built a custom dashboard with 8 tabs to monitor it all. stack: Claude Sonnet (generation + tool_use), OpenAI embeddings (pgvector), Haiku (background safety scoring), Langfuse, Supabase, Vercel. like i said, it's not perfect. some UI rough edges, voice mode still needs polish on certain browsers. but the core works and everything is in the repo. repo: github.com/santifer/cv-santiago (the repo has everything. RAG pipeline, defense layers, eval suite, prompt templates, voice mode). feel free to clone and try. happy to answer questions. submitted by /u/Beach-Independent [link] [comments]
View originalAsk HN: How are you monitoring AI agents in production?
With the recent incidents (DataTalks database wipe by Claude Code, Replit agent deleting data during code freeze), it's clear that running AI agents in production without observability is risky.<p>Common failure modes I've seen: no visibility into what the agent did step-by-step, surprise LLM bills from untracked token usage, risky outputs going undetected, and no audit trail for post-mortems.<p>I've been building AgentShield (https://useagentshield.com) — an observability SDK for AI agents. It does execution tracing, risk detection on outputs, cost tracking per agent/model, and human-in-the-loop approval for high-risk actions. Plugs into LangChain, CrewAI, and OpenAI Agents SDK with a 2-line integration.<p>Curious what others are using. Rolling your own monitoring? LangSmith? Langfuse? Or just hoping for the best?
View originalShow HN: AgentLens – Open-source observability for AI agents
Hi HN,<p>I built AgentLens because debugging multi-agent systems is painful. LangSmith is cloud-only and paid. Langfuse tracks LLM calls but doesn't understand agent topology — tool calls, handoffs, decision trees.<p>AgentLens is a self-hosted observability platform built specifically for AI agents:<p>- *Topology graph* — see your agent's tool calls, LLM calls, and sub-agent spawns as an interactive DAG - *Time-travel replay* — step through an agent run frame-by-frame with a scrubber timeline - *Trace comparison* — side-by-side diff of two runs with color-coded span matching - *Cost tracking* — 27 models priced (GPT-4.1, Claude 4, Gemini 2.0, etc.) - *Live streaming* — watch spans appear in real-time via SSE - *Alerting* — anomaly detection for cost spikes, error rates, latency - *OTel ingestion* — accepts OTLP HTTP JSON, so any OTel-instrumented app works<p>Works with LangChain, CrewAI, AutoGen, LlamaIndex, and Google ADK.<p>Tech: React 19 + FastAPI + SQLite/PostgreSQL. MIT licensed. 231 tests, 100% coverage.<p><pre><code> docker run -p 3000:3000 tranhoangtu/agentlens-observe:0.6.0 pip install agentlens-observe </code></pre> Demo GIF and screenshots in the README.<p>GitHub: <a href="https://github.com/tranhoangtu-it/agentlens-observe" rel="nofollow">https://github.com/tranhoangtu-it/agentlens-observe</a> Docs: <a href="https://agentlens-observe.pages.dev" rel="nofollow">https://agentlens-observe.pages.dev</a><p>I'd love feedback on the trace visualization approach and what features matter most for your agent debugging workflow.
View originalRepository Audit Available
Deep analysis of langfuse/langfuse — architecture, costs, security, dependencies & more
Pricing found: $29 / month, $8/100k, $199 / month, $8/100k, $300/mo
Langfuse has a public GitHub repository with 24,100 stars.
Based on user reviews and social mentions, the most common pain points are: cost tracking, token usage.
Based on 11 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.