Build AI agents that run your GTM playbooks on autopilot. From BDR outreach to customer success — scale results without scaling headcount.
I don't have enough meaningful information to provide a proper summary of user sentiment about Relevance AI. The social mentions you've provided only show YouTube video titles that repeat "Relevance AI AI" without any actual user feedback, reviews, or comments about the tool's performance, features, or value. To give you an accurate assessment of what users think about Relevance AI, I would need access to actual user reviews, detailed social media discussions, or substantive feedback about their experiences with the platform.
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
I don't have enough meaningful information to provide a proper summary of user sentiment about Relevance AI. The social mentions you've provided only show YouTube video titles that repeat "Relevance AI AI" without any actual user feedback, reviews, or comments about the tool's performance, features, or value. To give you an accurate assessment of what users think about Relevance AI, I would need access to actual user reviews, detailed social media discussions, or substantive feedback about their experiences with the platform.
Features
Industry
information technology & services
Employees
130
Funding Stage
Series B
Total Funding
$36.6M
Pricing found: $2, $240, $840
I got tired of burning $10/day on Claude Code/Cursor forgetting my architecture, so I built a persistent memory engine in Go (Open Source).
Hi guys, I've been using AI coding agents (Claude Code, Cursor, Kiro) heavily lately. They are incredibly smart, but their "goldfish memory" was driving me crazy. Every time I start a new session or clear the chat to save tokens, the AI completely forgets my project conventions, architecture decisions, and the obscure bugs we just fixed. Forcing it to re-read the entire codebase every single time was eating up massive amounts of context window and costing me a fortune in API bills. So over the weekend, I built Mnemos to solve this. It's a persistent memory engine that runs as an MCP (Model Context Protocol) server. Zero BS Stack: It’s a single Go binary backed by an embedded pure-Go SQLite database (using FTS5 for search). No Docker, no Python, no Node required. How it works: It quietly runs in the background. When the AI learns something durable, it stores it. The next time you open the project, Mnemos automatically injects the most relevant ~2k tokens of context right back into the agent's brain before you even start typing. 1-Click Autopilot: I added a setup command (mnemos setup cursor or mnemos setup claude) that instantly wires the MCP configs and steering rules for you. I originally built this just to stop bleeding money on API costs, but it actually made my workflow way smoother since I no longer have to re-explain my CSS conventions every Monday morning. It's 100% open-source. If anyone is dealing with the same "context amnesia" issue, I'd love for you to try it out and let me know what you think! GitHub Repo: https://github.com/s60yucca/mnemos It working perfect in my Kiro, mnemos context read each task, store, search also auto trigger. submitted by /u/suestorm9 [link] [comments]
View originalMultiple Agents Communicating With Each Other
I created this app using Claude Code, to help me use Claude Code. I wanted to have all my Claude prompts able to collaborate through a single discussion - like a real team using Teams - so they can work together on tasks without needing me to keep updating them. This tool lets me add multiple named agents, working in separate spaces, and get them to talk to each other by name. The key benefit for me is that once I have told agents with different roles what to work on, they just talk to each other as necessary. An API will tell the client what endpoint to use, and what the model looks like. A mobile app will ask the API for an endpoint which accepts certain parameters and receives certain values back. I can have a tester agent writing tests based on the discussion, and a designer advising on style guidelines to the agent writing the UX. But unlike with other multi-agent options, I can see exactly what they are saying, and intervene. Plus I can interact directly with each agent prompt, add new agents, exclude agents that don't need to be in the conversation, download the conversation in csv format for adding to dev ops tickets, etc. For me, this is how I want to work with AI. Agents are pre-initialized to know they are working inside the app, and to use the chat. The relevant claude files are minimal and don't conflict with your existing claude files if you don't want them to. Attached video to try and show them talking to each other. I'm not a video editor, so forgive the poor edit of a demo session, but hopefully it shows the idea without being too long. They ask each other questions, offer information, update each other, agree approaches with each other, and generally just act like you would expect. I built the app with one agent originally, and it's now the only way I use Claude daily. I'm adding integration with Azure Dev Ops at the moment, so I can pull tickets straight into the conversation, and update from the discussion directly. I also have some other ideas for how to make it even more streamlined. Happy to take feature requests if anyone suggests any. Maybe someone already did this, but I couldn't find a tool like this, so I am sharing with anyone who might find it useful App is written in Electron, and runs as a local install. Code and release are here. https://github.com/widdev/claudeteam https://github.com/widdev/claudeteam/releases/tag/v1.0.23 submitted by /u/HungryHorace83 [link] [comments]
View originalMy Claude.md file
This is my Claude.md file, it is the same information for Gemini.md as i use Claude Max and Gemini Ultra. # CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview **Atlas UX** is a full-stack AI receptionist platform for trade businesses (plumbers, salons, HVAC). Lucy answers calls 24/7, books appointments, sends SMS confirmations, and notifies via Slack — for $99/mo. It runs as a web SPA and Electron desktop app, deployed on AWS Lightsail. The project is in Beta with built-in approval workflows and safety guardrails. ## Commands ### Frontend (root directory) ```bash npm run dev # Vite dev server at localhost:5173 npm run build # Production build to ./dist npm run preview # Preview production build npm run electron:dev # Run Electron desktop app npm run electron:build # Build Electron app ``` ### Backend (cd backend/) ```bash npm run dev # tsx watch mode (auto-recompile) npm run build # tsc compile to ./dist npm run start # Start Fastify server (port 8787) npm run worker:engine # Run AI orchestration loop npm run worker:email # Run email sender worker ``` ### Database ```bash docker-compose -f backend/docker-compose.yml up # Local PostgreSQL 16 npx prisma migrate dev # Run migrations npx prisma studio # DB GUI npx prisma db seed # Seed database ``` ### Knowledge Base ```bash cd backend && npm run kb:ingest-agents # Ingest agent docs cd backend && npm run kb:chunk-docs # Chunk KB documents ``` ## Architecture ### Directory Structure - `src/` — React 18 frontend (Vite + TypeScript + Tailwind CSS) - `components/` — Feature components (40+, often 10–70KB each) - `pages/` — Public-facing pages (Landing, Blog, Privacy, Terms, Store) - `lib/` — Client utilities (`api.ts`, `activeTenant.tsx` context) - `core/` — Client-side domain logic (agents, audit, exec, SGL) - `config/` — Email maps, AI personality config - `routes.ts` — All app routes (HashRouter-based) - `backend/src/` — Fastify 5 + TypeScript backend - `routes/` — 30+ route files, all mounted under `/v1` - `core/engine/` — Main AI orchestration engine - `plugins/` — Fastify plugins: `authPlugin`, `tenantPlugin`, `auditPlugin`, `csrfPlugin`, `tenantRateLimit` - `domain/` — Business domain logic (audit, content, ledger) - `services/` — Service layer (`elevenlabs.ts`, `credentialResolver.ts`, etc.) - `tools/` — Tool integrations (Outlook, Slack) - `workers/` — `engineLoop.ts` (ticks every 5s), `emailSender.ts` - `jobs/` — Database-backed job queue - `lib/encryption.ts` — AES-256-GCM encryption for stored credentials - `lib/webSearch.ts` — Multi-provider web search (You.com, Brave, Exa, Tavily, SerpAPI) with randomized rotation - `ai.ts` — AI provider setup (OpenAI, DeepSeek, OpenRouter, Cerebras) - `env.ts` — All environment variable definitions - `backend/prisma/` — Prisma schema (30KB+) and migrations - `electron/` — Electron main process and preload - `Agents/` — Agent configurations and policies - `policies/` — SGL.md (System Governance Language DSL), EXECUTION_CONSTITUTION.md - `workflows/` — Predefined workflow definitions ### Key Architectural Patterns **Multi-Tenancy:** Every DB table has a `tenant_id` FK. The backend's `tenantPlugin` extracts `x-tenant-id` from request headers. **Authentication:** JWT-based via `authPlugin.ts` (HS256, issuer/audience validated). Frontend sends token in Authorization header. Revoked tokens are checked against a `revokedToken` table (fail-closed). Expired revoked tokens are pruned daily. **CSRF Protection:** DB-backed synchronizer token pattern via `csrfPlugin.ts`. Tokens are issued on mutating responses, stored in `oauth_state` with 1-hour TTL, and validated on all state-changing requests. Webhook/callback endpoints are exempt (see `SKIP_PREFIXES` in the plugin). **Audit Trail:** All mutations must be logged to `audit_log` table via `auditPlugin`. Successful GETs and health/polling endpoints are skipped to reduce noise. On DB write failure, audit events fall back to stderr (never lost). Hash chain integrity (SOC 2 CC7.2) via `lib/auditChain.ts`. **Job System:** Async work is queued to the `jobs` DB table (statuses: queued → running → completed/failed). The engine loop picks up jobs periodically. **Engine Loop:** `workers/engineLoop.ts` is a separate Node process that ticks every `ENGINE_TICK_INTERVAL_MS` (default 5000ms). It handles the orchestration of autonomous agent actions. **AI Agents:** Named agents (Atlas=CEO, Binky=CRO, etc.) each have their own email accounts and role definitions. Agent behavior is governed by SGL policies. **Decisions/Approval Workflow:** High-risk actions (recurring charges, spend above `AUTO_SPEND_LIMIT_USD`, risk tier ≥ 2) require a `decision_memo` approval before execution. **Frontend Routing:** Uses `HashRouter` from React Router v7. All routes are defined in `src/routes.ts`. **Code Splitting:** Vite config splits chunks into `react-vendor`, `router`, `ui-vendor`, `charts`. **ElevenLabs Voice Agents:** Lucy's
View originalRTFM v0.4 — MCP retrieval server that cuts vault context by 90% (Obsidian + Claude Code)
Problem: Karpathy-style LLM wikis inject everything into context. On a 1,700-file vault, that's your entire quota in minutes. I built an MCP server that does retrieval instead of scanning. **How it works with Claude Code:** The agent calls `rtfm_search("formal grammars")` → gets 5 results with scores and file paths (~300 tokens). Then `rtfm_expand("source-slug")` to read only the relevant section. Progressive disclosure: context grows only by what's actually useful. **New in v0.4 — Obsidian vault integration:** `rtfm vault` indexes your vault in one command: - Auto corpus mapping (folders → searchable corpora) - [[wikilink]] resolution → knowledge graph with centrality ranking - Auto-generated _rtfm/ navigation files (readable in Obsidian) - 10 parsers: Markdown, Python AST, LaTeX, PDF, YAML, JSON, Shell... - Extensible: add any format in ~50 lines of Python Measured on real repos: -51% cost, -61% tokens, -16% duration vs standard grep-based navigation. `pip install rtfm-ai[mcp]` https://github.com/roomi-fields/rtfm MIT licensed. Works with Claude Code, Cursor, Codex — any MCP client. submitted by /u/Plenty-Ad-7699 [link] [comments]
View originalStudying Sutton and Barto's RL book and its connections to RL for LLMs (e.g., tool use, math reasoning, agents, and so on)? [D]
Hi everyone, I graduated from a Master in Math program last summer. In recent months, I have been trying to understand more about ML/DL and LLMs, so I have been reading books and sometimes papers on LLMs and their reasoning capacities (I'm especially interested in AI for Math). When I read about RL on Wikipedia, I also found that it's also really interesting as well, so I wanted to learn more about RL and its connections to LLMs. Since the canonical book on RL is "Sutton and Barto", which was published in 2020 before LLMs getting really popular, therefore it does not mention things like PPO, GRPO, and so on. I asked LLMs to select relevant chapters from the RL book so that I could study more focuses, and they select Chapters 1 (Intro), 3 (Finite MDP), 6 (TD Learning), and then 9 (On-policy prediction with approx), 10 (on-policy ...), 11 (on-policy control with approx), 13 (Policy gradient methods). So I have the following questions that I was wonering if you could help me with: What do you think of its selections and do you have better recommendations? Do you think it's good first steps to understand the landscape before reading and experimenting with modern RL-for-LLM papers? Or I should just go with the Alberta's online RL course? Joseph Suarez wrote "An Ultra Opinionated Guide to Reinforcement Learning" but I think it's mostly about non-LLM RL? Thank you a lot for your time! submitted by /u/hedgehog0 [link] [comments]
View originalDeveloper PSA: be careful with shared env vars when testing multiple AI providers
I want to share a debugging failure mode that may be relevant to other people building AI tooling. I was testing multiple providers side by side in the same shell/session, switching between Claude, OpenAI/Codex, MiniMax, and DeepSeek. The problem is that the API/config patterns are similar enough that it becomes very easy for the shell to pick up the wrong key or backend settings from .bashrc, direnv, or other shared local env setup. This kind of mix-up had actually happened before during testing, but it never seemed to cause anything serious. This time, though, an abnormal request/access error happened shortly before my Claude account was restricted, which makes me think auth/config confusion during debugging may have played a role. I do not have official confirmation about the exact cause, so I’m not claiming a direct causal link. I’m posting this as a developer warning: when multiple provider integrations are tested in the same environment, auth resolution itself becomes part of the failure surface. My current takeaway is: use an explicitly selected profile whenever possible avoid broad global provider env vars if you switch providers often prefer tool-specific namespaced env vars over raw provider-native env vars print the active backend and credential source before test runs assume “wrong key to wrong backend” is a real class of bug, not just user error Curious whether other people building multi-provider tools have run into similar env/auth mixups. submitted by /u/rchuan [link] [comments]
View originalAsking for fun facts: This prompt tweak helps me pick up useful facts along the way
I found a small prompt tweak that’s been way more useful than I expected: I ask the AI to include a real, relevant fun fact sometimes while answering. Not a joke. Not random trivia. I mean something like: a weird but true detail, a short historical note, a little story, or a lesser-known fact that actually fits the topic. I added something like this to my instructions: What I noticed is that it makes the answers feel more alive and also easier to remember. A normal answer gives me the information I asked for. But when it includes one good extra nugget, I remember the whole topic better. It also makes the AI feel less sterile. Sometimes AI answers are correct but feel dry, like reading a manual written by a careful refrigerator. This helps add texture without making the answer messy. Another thing I like is that over time, those little nuggets stack up. You’re not just getting answers — you’re quietly building general knowledge around the subject. Example: If I ask about local AI and memory bandwidth, the answer might include something like: That kind of detail is perfect for me because it’s: relevant, memorable, and actually teaches something useful. So now I think of it as a simple prompt pattern: direct answer + one good nugget Not enough to distract. Just enough to make the answer stick. Curious if anyone else does this in their custom instructions or starter prompts. submitted by /u/Ok-Cable-4252 [link] [comments]
View originalA Claude memory retrieval system that actually works (easily) and doesn't burn all my tokens
TL;DR: By talking to claud and explaining my problem, I built a very powerfu local " memory management" system for Claude Desktop that indexes project documents and lets Claude automatically retrieve relevant passages that are buried inside of those documents during Co-Work sessions. for me it solves the "document memory" problem where tools like NotebookLM, Notion, Obsidian, and Google Drive can't be queried programmatically. Claude did all of it. I didn't have to really do anything. The description below includes plenty of things that I don't completely understand myself. the key thing is just to explain to Claude what the problem is ( which I described below) , and what your intention is and claude will help you figure it out. it was very easy to set this up and I think it's better than what i've seen any youtuber recommend The details: I have a really nice solution to the Claude external memory/external brain problem that lots of people are trying to address. Although my system is designed for one guy using his laptop, not a large company with terabytes of data, the general approach I use could be up-scaled just with substitution of different tools. I wanted to create a Claude external memory system that is connected to Claude Co-Work in the desktop app. What I really wanted was for Claude to proactively draw from my entire base of knowledge for each project, not just from the documents I dropped into my project folder in Claude Desktop. Basically, I want Claude to have awareness of everything I have stored on my computer, in the most efficient way possible (Claude can use lots of tokens if you don't manage the "memory" efficiently. ) I've played with Notion and Google Drive as an external brain. I've tried NotebookLM. And I was just beginning to research Obsidian when I read this article, which I liked very much and highly recommend: https://limitededitionjonathan.substack.com/p/stop-calling-it-memory-the-problem That got my attention, so I asked Claude to read the document and give me his feedback based on his understanding of the projects I was trying to work on. Claude recommended using SQLite to connect to structured facts, an optional graph to show some relationships, and .md files for instructions to Claude. But...I pointed out that almost all of the context information I would want to be retrievable from memory is text in documents, not structured data. Claude's response was very helpful. He understood that although SQLite is good at single-point facts, document memory is a different challenge. For documents, the challenge isn't storing them—it's retrieving the right passage when it's relevant without reading everything (which consumes tokens). SQLite can store text, but storing a document in a database row doesn't solve the retrieval problem. You still need to know which row to pull. I asked if NotebookLM from Google might be a better tool for indexing those documents and making them searchable. Claude explained that I was describing is a Retrieval-Augmented Generation (RAG) problem. The standard approach: Documents get chunked into passages (e.g., 500 words each) Each chunk gets converted to an embedding—a vector that captures its meaning When Claude needs context, it converts the query to the same vector format and finds the semantically closest chunks Those chunks get injected into the conversation as context This is what NotebookLM is doing under the hood. It's essentially a hosted, polished RAG system. NotebookLM is genuinely good at what it does—but it has a fundamental problem for my case: It's a UI, not infrastructure. You use it; Claude can't. There's no API, no MCP tool, no way to have Claude programmatically query it during a Co-Work session. It's a parallel system, not an integrated one. So NotebookLM answers "how do I search my documents as a human?"—not "how does Claude retrieve the right document context automatically?" After a little back and forth, here's what we decided to do. For me, a solo operator with only a laptop's worth of documents that need to be searched, Claude proposed a RAG pipeline that looks like this: My documents (DOCX, PDF, XLSX, CSV) ↓ Text extraction (python-docx, pymupdf, openpyxl) ↓ Chunking (split into ~500 word passages, keep metadata: file, folder, date) ↓ Embedding (convert each chunk to a vector representing its meaning) ↓ A local vector database + vector extension (store chunks + vectors locally, single file) ↓ MCP server (exposes a search_knowledge tool to Claude) ↓ Claude Desktop (queries the index when working on my business topics) With that setup, when you're talking to Claude and mention an idea like "did I pay the overdue invoice" or "which projects did Joe Schmoe help with," Claude searches the index, gets the 3-5 most relevant passages back, and uses them in its answer without you doing anything. We decided to develop a search system like that, specific to each of my discrete projects. Th
View originalGiving Claude Code architectural context via a knowledge graph MCP (inspired by Karpathy's LLM Wiki)
Karpathy's LLM Wiki gist from last week made a point that's directly relevant to how we use Claude Code: RAG and context-stuffing force the LLM to rediscover knowledge from scratch every time. A pre-compiled knowledge artifact is fundamentally better. If you've used Claude Code on a large codebase, you've felt this. You paste in files, maybe a README, maybe some architecture docs, and Claude still doesn't really understand how your services talk to each other, who owns what, or what the dependency chain looks like. It's re-deriving that context on every conversation. We've been working on this problem at OpenTrace. We build a typed knowledge graph from your engineering data — GitHub/GitLab repos, Linear, Kubernetes, distributed traces — and expose it to Claude via MCP. So instead of Claude guessing at your architecture from whatever files you've pasted in, it can query the graph directly: "what services does checkout call?", "who owns the payment service?", "show me the dependency chain for this endpoint." The difference from Karpathy's wiki pattern is that the graph maintains itself automatically (code gets parsed via Tree-sitter/SCIP, traces get correlated, tickets get linked) and it's structured as typed nodes and edges rather than markdown files — which is what an agent actually needs for programmatic traversal. A few things we've seen in practice with the MCP connected to Claude Code: Claude makes significantly better decisions about where to make changes when it can see the full call graph, not just the file it's editing It stops suggesting changes that break downstream services it didn't know existed It can answer "who should review this?" by tracing ownership through the graph We have an open source version you can self-host and try with Claude Code: https://github.com/opentrace/opentrace (quickstart at https://oss.opentrace.ai). There's also a hosted version at https://opentrace.ai with additional features. Both expose an MCP server. Curious if others have tried giving Claude Code more persistent architectural context, and what's worked for you. submitted by /u/steve-opentrace [link] [comments]
View originalI use AI daily but can't figure out what to do beyond chat. What does your actual workflow look like
I'm a non-technical guy (strategy/consulting background), currently job searching and trying to figure out how to use AI tools properly beyond just asking questions. I'm low on savings and currently using Claude Pro, but genuinely only using chat more or less The chat part I get. Research, writing, interview prep, brainstorming, writing this post for example as well. Use it daily, it's helpful. But I want to understand what the next level looks like. I've tried building things like a portfolio site, automating parts of my job search, etc. I can get a decent first output but I struggle to iterate on it without the quality degrading. I've also studied the concepts: APIs, MCP, frontend/backend, hosting, databases. I understand the definitions. But I don't know what to actually do with that knowledge. It's like learning what a carburetor does without ever having a reason to open a hood. There are a ton of tools out there (Claude Code, Cursor, n8n, Bolt, agents) and I can't figure out how they fit together or which ones are actually relevant for someone who doesn't code. Every YouTube video introduces something new before I've understood the last thing. So genuinely asking: Non-technical people: What are you using AI for in your day to day beyond asking it questions? Are you automating stuff at work? Building things? What's the use case that made it click for you? Technical people / founders: Are you using AI coding tools in your actual 9-5 or is it mostly side projects? Are you building full apps? And just some advice will help Would love to hear actual workflows, tool suggestions, or just "here's what my day looks like" answers. Trying to figure out where someone like me fits into all of this submitted by /u/Zathen14 [link] [comments]
View originalOpus roasted Anthropic when I asked about the Mythos backlash
Two "accidental" leaks in five days — 500K lines of source code via npm, then the Mythos blog from a misconfigured CMS. Claude itself pointed out that modern CI/CD pipelines flag a 58MB source map file, and Anthropic literally owns the runtime (Bun) where the bug sat open for 20 days. The community is calling it the best PR stunt in AI history. Best model ever but nobody can verify because it's not public. "Trust us bro" benchmarking and GPT-2's "too dangerous to release" meme is just the surface. The model escaped its sandbox, posted exploits publicly, rewrote git history to hide mistakes, and sent unsolicited emails to real people. Anthropic called this "alignment-relevant" rather than dangerous. Then the hypocrisy layer: DMCA'd OpenClaw while training on everyone else's data. Rate-limited indie devs while giving Big Tech exclusive early access. Refused Pentagon's autonomous weapons request — then built the most powerful offensive cyber tool ever and handed it to a dozen corporations behind closed doors. "Safety-first" apparently means "enterprise-first." Claude literally told that "our model is too dangerous" has become a marketing pitch, and cited Daring Fireball and Platformer saying the same thing. But this could also be a response entirely generated by Claude in his conspiracy theorist mode, IDK. submitted by /u/heraklets [link] [comments]
View originalindxr v0.4.0 - Teach your agents to learn from their mistakes.
I had been building indxr as a "fast codebase indexer for AI agents." Tree-sitter parsing, 27 languages, structural diffs, token budgets, the whole deal. And it worked. Agents could understand what was in your codebase faster. But they still couldn't remember why things were the way they were. Karpathy's tweet about LLM knowledge bases prompted me to take indxr in a different direction. One of the main issues I faced, like many of you, while working with agents was them making the same mistake over and over again, because of not having persistent memory across sessions. Every new conversation starts from zero. The agent reads the code, builds up understanding, maybe fails a few times, eventually figures it out and then all of that knowledge evaporates. indxr is now a codebase knowledge wiki backed by a structural index. The structural index is still there — it's the foundation. Tree-sitter parses your code, extracts declarations, relationships, and complexity metrics. But the index now serves a bigger purpose: it's the scaffolding that agents use to build and maintain a persistent knowledge wiki about your codebase. When an agent connects to the indxr MCP server, it has access to wiki_generate. The tool doesn't write the wiki itself, it returns the codebase's structural context, and the agent decides which pages to create. Architecture overviews, module responsibilities, and design decisions. The agent plans the wiki, then calls wiki_contribute for each page. indxr provides the structural intelligence; the agent does the thinking and writing. But generating docs isn't new. The interesting part is what happens next. I added a tool called wiki_record_failure. When an agent tries to fix a bug and fails, it records the attempt: Symptom — what it observed Attempted fix — what it tried Diagnosis — why it didn't work Actual fix — what eventually worked These failure patterns get stored in the wiki, linked to the relevant module pages. The next agent that touches that code calls wiki_search first and finds: "someone already tried X and it didn't work because of Y." This is the loop: Search — agent queries the wiki before diving into the source. Learn — after synthesising insights from multiple pages, wiki_compound persists the knowledge back Fail — when a fix doesn't work, wiki_record_failure captures the why. Avoid — future agents see those failures and skip the dead ends Every session makes the wiki smarter. Failed attempts become documented knowledge. Synthesised insights get compounded back. The wiki grows from agent interactions, not just from code changes. The wiki doesn't go stale. Run indxr serve --watch --wiki-auto-update and when source files change, indxr uses its structural diff engine to identify exactly which wiki pages are affected — then surgically updates only those pages. Check out the project here: https://github.com/bahdotsh/indxr Would love to hear your feedback! submitted by /u/New-Blacksmith8524 [link] [comments]
View originalI got tired of 3 AM PagerDuty alerts, so I built an AI agent to fix cloud outages while I sleep. (Built with GLM-5.1)
If you've ever been on-call, you know the nightmare. It’s 3:15 AM. You get pinged because heavily-loaded database nodes in us-east-1 are randomly dropping packets. You groggily open your laptop, ssh into servers, stare at Grafana charts, and manually reroute traffic to the European fallback cluster. By the time you fix it, you've lost an hour of sleep, and the company has lost a solid chunk of change in downtime. This weekend for the Z.ai hackathon, I wanted to see if I could automate this specific pain away. Not just "anomaly detection" that sends an alert, but an actual agent that analyzes the failure, proposes a structural fix, and executes it. I ended up building Vyuha AI-a triple-cloud (AWS, Azure, GCP) autonomous recovery orchestrator. Here is how the architecture actually works under the hood. The Stack I built this using Python (FastAPI) for the control plane, Next.js for the dashboard, a custom dynamic reverse proxy, and GLM-5.1 doing the heavy lifting for the reasoning engine. The Problem with 99% of "AI DevOps" Tools Most AI monitoring tools just ingest logs and summarize them into a Slack message. That’s useless when your infrastructure is actively burning. I needed an agent with long-horizon reasoning. It needed to understand the difference between a total node crash (DEAD) and a node that is just acting weird (FLAKY or dropping 25% of packets). How Vyuha Works (The Triaging Loop) I set up three mock cloud environments (AWS, Azure, GCP) behind a dynamic FastApi proxy. A background monitor loop probes them every 5 seconds. I built a "Chaos Lab" into the dashboard so I could inject failures on demand. Here’s what happens when I hard-kill the GCP node: Detection: The monitor catches the 503 Service Unavailable or timeout in the polling cycle. Context Gathering: It doesn't instantly act. It gathers the current "formation" of the proxy, checks response times of the surviving nodes, and bundles that context. Reasoning (GLM-5.1): This is where I relied heavily on GLM-5.1. Using ZhipuAI's API, the agent is prompted to act as a senior SRE. It parses the failure, assesses the severity, and figures out how to rebalance traffic without overloading the remaining nodes. The Proposal: It generates a strict JSON payload with reasoning, severity, and the literal API command required to reroute the proxy. No Rogue AI (Human-in-the-Loop) I don't trust LLMs enough to blindly let them modify production networking tables, obviously. So the agent operates on a strict Human-in-the-Loop philosophy. The GLM-5.1 model proposes the fix, explains why it chose it, and surfaces it to the dashboard. The human clicks "Approve," and the orchestrator applies the new proxy formation. Evolutionary Memory (The Coolest Feature) This was my favorite part of the build. Every time an incident happens, the system learns. If the human approves the GLM's failover proposal, the agent runs a separate "Reflection Phase." It analyzes what broke and what fixed it, and writes an entry into a local SQLite database acting as an "Evolutionary Memory Log". The next time a failure happens, the orchestrator pulls relevant past incidents from SQLite and feeds them into the GLM-5.1 prompt. The AI literally reads its own history before diagnosing new problems so it doesn't make the same mistake twice. The Struggles It wasn't smooth. I lost about 4 hours to a completely silent Pydantic validation bug because my frontend chaos buttons were passing the string "dead" but my backend Enums strictly expected "DEAD". The agent just sat there doing nothing. LLMs are smart, but type-safety mismatches across the stack will still humble you. Try it out I built this to prove that the future of SRE isn't just better dashboards; it's autonomous, agentic infrastructure. I’m hosting it live on Render/Vercel. Try hitting the "Hard Kill" button on GCP and watch the AI react in real time. Would love brutal feedback from any actual SREs or DevOps engineers here. What edge case would break this in a real datacenter? submitted by /u/Evil_god7 [link] [comments]
View originalBuilding an AI agent that finds repos and content relevant to my work
I kept missing interesting stuff on HuggingFace, arXiv, Substack etc., so I made an agent that sends a weekly summary of only what’s relevant, for free Any thoughts on the idea? submitted by /u/d_arthez [link] [comments]
View originalAm I going the right way with my CS PhD?
I work at Microsoft CoreAI as an engineer, and have offers from three equally competitive PhD programs starting Fall 2026 and the Claude Code source leak last week crystallized something I'd been going back and forth on. I would love a gut check from people who think about this carefully. The three directions: Data uncertainty and ML pipelines Work at the intersection of data systems and ML - provenance, uncertain data, how dirty or incomplete training data propagates through and corrupts model behavior. The clearest recent statement of this direction is the NeurIPS 2024 paper "Learning from Uncertain Data: From Possible Worlds to Possible Models." Adjacent threads: quantifying uncertainty arising from dirty data, adversarially stress-testing ML pipelines, query repair for aggregate constraints. Fairness and uncertainty in LLMs and model behavior Uncertainty estimation in LLMs, OOD detection, fairness, domain generalization. Very active research area right now and high citation velocity, extremely timely. Neuromorphic computing / SNNs Brain-inspired hardware, time-domain computing, memristor-based architectures. The professor who gave me an offer has, among other top confs, a Nature paper. After reading a post on the artificial subreddit on the leak, here is my take on some of the notable inner workings of the Claude system: Skeptical memory: the agent verifies observations against the actual codebase rather than trusting its own memory. There's no formal framework yet for when and why that verification fails, or what the right principles are for trusting derived beliefs versus ground truth. Context compaction: five different strategies in the codebase, described internally as still an open problem. What you keep versus drop when a context window fills, and how those decisions affect downstream agent behavior, is a data quality problem with no good theoretical treatment. Memory consolidation under contradiction: the background consolidation system semantically merges conflicting observations. What are the right principles for resolving contradictions in an agent's belief state over time? Multi-agent uncertainty propagation: sub-agents operate on partial, isolated contexts. How does uncertainty from a worker agent propagate to a coordinator's decision? Nobody is formally studying this. It seems like the harness itself barely matters - Claude Code ranks 39th on terminal bench and adds essentially nothing to model performance over the raw model. So raw orchestration engineering isn't the research gap. The gap is theoretical: when should an agent trust its memory, how do you bound uncertainty through a multi-step pipeline, what's the right data model for an agent's belief state. My read: Direction 1 is directly upstream of these problems - building theoretical tools that could explain why "don't trust memory, verify against source" is the right design principle and under what conditions it breaks. Direction 2 is more downstream - uncertainty in model outputs - which is relevant but more crowded and further from the specific bottlenecks the leak exposed. But Direction 2 has much higher current citation velocity and LLM uncertainty is extremely hot. Career visibility on the job market matters. Direction 3 is too novel to predict much about. Of course, hardware is already a bottleneck for AI systems, but I'm not sure how much neuromorphic directions will come of help in the evolution of AI centric memory or hardware. Goal is research scientist at a top lab. Is the data-layer /pipeline-level uncertainty framing actually differentiated enough, or is it too niche relative to where labs are actively hiring? submitted by /u/ifriedthisrice [link] [comments]
View originalPricing found: $2, $240, $840
Key features include: Delegate to SuperGTM, Build AI workforces, Enterprise-grade infrastructure, SOC 2 Type II GDPR, SSO RBAC, Data residency, Version control, Monitoring dashboards evals.
Based on user reviews and social mentions, the most common pain points are: API bill, API costs.
Based on 32 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.