View in LangSmith
Based on these limited mentions, LangSmith appears to be positioned as a monitoring and observability tool for AI agents and LLM applications. The main criticism mentioned is that it's "cloud-only and paid," which has led some developers to seek open-source alternatives like AgentLens for debugging multi-agent systems. There's clear demand for AI agent observability tools in the market, particularly given recent production incidents involving AI agents causing data loss. However, the limited social mentions suggest either low adoption or that discussions about LangSmith are happening in more specialized developer communities not captured in these sources.
Mentions (30d)
0
Reviews
0
Platforms
3
Sentiment
0%
0 positive
Based on these limited mentions, LangSmith appears to be positioned as a monitoring and observability tool for AI agents and LLM applications. The main criticism mentioned is that it's "cloud-only and paid," which has led some developers to seek open-source alternatives like AgentLens for debugging multi-agent systems. There's clear demand for AI agent observability tools in the market, particularly given recent production incidents involving AI agents causing data loss. However, the limited social mentions suggest either low adoption or that discussions about LangSmith are happening in more specialized developer communities not captured in these sources.
Industry
information technology & services
Employees
98
Funding Stage
Series B
Total Funding
$260.0M
Ask HN: How are you monitoring AI agents in production?
With the recent incidents (DataTalks database wipe by Claude Code, Replit agent deleting data during code freeze), it's clear that running AI agents in production without observability is risky.<p>Common failure modes I've seen: no visibility into what the agent did step-by-step, surprise LLM bills from untracked token usage, risky outputs going undetected, and no audit trail for post-mortems.<p>I've been building AgentShield (https://useagentshield.com) — an observability SDK for AI agents. It does execution tracing, risk detection on outputs, cost tracking per agent/model, and human-in-the-loop approval for high-risk actions. Plugs into LangChain, CrewAI, and OpenAI Agents SDK with a 2-line integration.<p>Curious what others are using. Rolling your own monitoring? LangSmith? Langfuse? Or just hoping for the best?
View originalAsk HN: How are you monitoring AI agents in production?
With the recent incidents (DataTalks database wipe by Claude Code, Replit agent deleting data during code freeze), it's clear that running AI agents in production without observability is risky.<p>Common failure modes I've seen: no visibility into what the agent did step-by-step, surprise LLM bills from untracked token usage, risky outputs going undetected, and no audit trail for post-mortems.<p>I've been building AgentShield (https://useagentshield.com) — an observability SDK for AI agents. It does execution tracing, risk detection on outputs, cost tracking per agent/model, and human-in-the-loop approval for high-risk actions. Plugs into LangChain, CrewAI, and OpenAI Agents SDK with a 2-line integration.<p>Curious what others are using. Rolling your own monitoring? LangSmith? Langfuse? Or just hoping for the best?
View originalShow HN: AgentLens – Open-source observability for AI agents
Hi HN,<p>I built AgentLens because debugging multi-agent systems is painful. LangSmith is cloud-only and paid. Langfuse tracks LLM calls but doesn't understand agent topology — tool calls, handoffs, decision trees.<p>AgentLens is a self-hosted observability platform built specifically for AI agents:<p>- *Topology graph* — see your agent's tool calls, LLM calls, and sub-agent spawns as an interactive DAG - *Time-travel replay* — step through an agent run frame-by-frame with a scrubber timeline - *Trace comparison* — side-by-side diff of two runs with color-coded span matching - *Cost tracking* — 27 models priced (GPT-4.1, Claude 4, Gemini 2.0, etc.) - *Live streaming* — watch spans appear in real-time via SSE - *Alerting* — anomaly detection for cost spikes, error rates, latency - *OTel ingestion* — accepts OTLP HTTP JSON, so any OTel-instrumented app works<p>Works with LangChain, CrewAI, AutoGen, LlamaIndex, and Google ADK.<p>Tech: React 19 + FastAPI + SQLite/PostgreSQL. MIT licensed. 231 tests, 100% coverage.<p><pre><code> docker run -p 3000:3000 tranhoangtu/agentlens-observe:0.6.0 pip install agentlens-observe </code></pre> Demo GIF and screenshots in the README.<p>GitHub: <a href="https://github.com/tranhoangtu-it/agentlens-observe" rel="nofollow">https://github.com/tranhoangtu-it/agentlens-observe</a> Docs: <a href="https://agentlens-observe.pages.dev" rel="nofollow">https://agentlens-observe.pages.dev</a><p>I'd love feedback on the trace visualization approach and what features matter most for your agent debugging workflow.
View originalEngineering the Autonomous Local Enterprise: A Technical Blueprint for Agentic RAG and Sovereign AI Infrastructure
# Engineering the Autonomous Local Enterprise: A Technical Blueprint for Agentic RAG and Sovereign AI Infrastructure The transition from reactive large language model applications to autonomous agentic workflows represents a fundamental paradigm shift in enterprise computing. In the 2025–2026 technological landscape, the industry has moved beyond simple chat interfaces toward systems capable of planning, executing, and refining multi-step workflows over extended temporal horizons. This evolution is underpinned by the convergence of high-performance local inference, sophisticated document understanding, and multi-agent orchestration frameworks that operate within a "sovereign stack"—an infrastructure entirely controlled by the organization to ensure data privacy, security, and operational resilience. The architecture of such a system requires a nuanced understanding of hardware constraints, the mathematical implications of model quantization, and the systemic challenges of retrieving context from high-volume, complex document sets. # Executive Summary: The Rise of Sovereign Intelligence The contemporary AI landscape is increasingly bifurcated between centralized cloud-based services and a burgeoning movement toward decentralized, sovereign intelligence. For organizations managing sensitive intellectual property, legal documents, or healthcare data, the reliance on third-party APIs introduces unacceptable risks regarding data residency, privacy, and long-term cost volatility. The primary mission of this report is to define the architecture for a fully local, production-ready system that leverages the most advanced open-source components from GitHub and Hugging Face. The proposed system integrates high-fidelity document ingestion, a multi-stage RAG pipeline, and an agentic orchestration layer capable of long-horizon reasoning. By utilizing reasoning models such as DeepSeek-R1 and Llama 3.3, and optimizing them through advanced quantization, the enterprise can achieve performance levels previously reserved for high-cost cloud providers. This architecture is further enhanced by comprehensive observability through the OpenTelemetry standard, ensuring that every reasoning step and retrieval operation is transparent and verifiable. # Phase 1: The Local Discovery Engine Identifying the optimal components for a local sovereign stack requires a rigorous evaluation of active maintenance, documentation quality, and community health. The following repositories and transformers represent the current state-of-the-art for local LLM deployment with agentic RAG. # Top GitHub Repositories for Local Agentic RAG |**Repository**|**Stars**|**Last Updated**|**Primary Language**|**Key Strength**|**Critical Limitation**| |:-|:-|:-|:-|:-|:-| |**langchain-ai/langchain**|125,000|2026-01|Python/TS|700+ integrations; modular agentic workflows.|High abstraction complexity; steep learning curve.| |**langgenius/dify**|114,000|2026-01|Python/TS|Visual drag-and-drop workflow builder; built-in RAG.|Less flexibility for custom low-level Python hacks.| |**infiniflow/ragflow**|70,000|2025-12|Python|Deep document understanding; visual chunk inspection.|Resource-heavy; requires robust GPU for layout parsing.| |**run-llama/llama\_index**|46,500|2025-12|Python/TS|Superior data indexing; 150+ data connectors.|Transition from ServiceContext to Settings can be confusing.| |**zylon-ai/private-gpt**|52,000|2025-11|Python|Production-ready; 100% offline; OpenAI API compatible.|Gradio UI is basic; designed primarily for document Q&A.| |**Mintplex-Labs/anything-llm**|25,000|2026-01|Node.js|All-in-one desktop/Docker app; multi-user support.|Workspace-based isolation can limit cross-context queries.| |**DSProject/Docling**|12,000|2026-01|Python|Industry-leading table extraction (97.9% accuracy).|Speed scales linearly with page count (slower than LlamaParse).| # Top Hugging Face Transformers for Reasoning and RAG |**Model**|**Downloads**|**Task**|**Base Model**|**Params**|**Hardware (4-bit)**|**Fine-tuning**| |:-|:-|:-|:-|:-|:-|:-| |**DeepSeek-R1-Distill-Qwen-32B**|2.1M|Reasoning|Qwen 2.5|32.7B|24GB VRAM (RTX 4090).|Yes (LoRA).| |**DeepSeek-R1-Distill-Llama-70B**|1.8M|Reasoning|Llama 3.3|70.6B|48GB VRAM (2x 4090).|Yes (LoRA).| |**Llama-3.3-70B-Instruct**|5.5M|General/RAG|Llama 3.3|70B|48GB VRAM (2x 4090).|Yes.| |**Qwen 2.5-72B-Instruct**|3.2M|Coding/RAG|Qwen 2.5|72B|48GB VRAM.|Yes.| |**Ministral-8B-Instruct**|800K|Edge RAG|Mistral|8B|8GB VRAM (RTX 3060).|Yes.| # Phase 2: Hardware Topographies and Inference Optimization The viability of local intelligence is strictly dictated by the memory bandwidth and VRAM capacity of the deployment target. In 2025, the release of the NVIDIA RTX 5090 introduced a significant leap in local capability, featuring 32GB of GDDR7 memory and a bandwidth of approximately 1,792 GB/s, representing a 77% improvement over its predecessor. # The Physics of Inference: Bandwidth vs. Compute A detailed 2025 NVIDIA research pap
View originalBased on user reviews and social mentions, the most common pain points are: cost tracking, token usage.