Zep connects your data sources, builds a unified context graph of your users, and delivers assembled context to your agent. One pipeline. One API.
Based on the limited social mentions provided, there isn't enough substantive user feedback to properly assess Zep's reception. The mentions include several YouTube references to "Zep AI" without detailed content, and GitHub activity showing technical development work involving agent modules and commit migrations. Reddit discussions touch on AI memory management and context portability challenges that may relate to Zep's functionality, but don't explicitly evaluate the tool itself. To provide an accurate user sentiment summary, more detailed reviews and user experiences would be needed.
Mentions (30d)
5
Reviews
0
Platforms
3
GitHub Stars
4,316
597 forks
Based on the limited social mentions provided, there isn't enough substantive user feedback to properly assess Zep's reception. The mentions include several YouTube references to "Zep AI" without detailed content, and GitHub activity showing technical development work involving agent modules and commit migrations. Reddit discussions touch on AI memory management and context portability challenges that may relate to Zep's functionality, but don't explicitly evaluate the tool itself. To provide an accurate user sentiment summary, more detailed reviews and user experiences would be needed.
Features
Industry
information technology & services
Employees
5
Funding Stage
Seed
Total Funding
$2.4M
417
GitHub followers
11
GitHub repos
4,316
GitHub stars
13
npm packages
Pricing found: $25 /month, $25 / 20, $475 /month, $125 / 100
I got tired of re-explaining myself to Claude every session, so I built something
I got tired of re-explaining myself to every AI tool, so I built one that makes my context portable Hello everyone out there using AI every day… I build cardiac implants at Boston Scientific during the day and I’m a 1st year CS student. I use Claude, ChatGPT, Cursor, and Gemini daily to improve my skills and my productivity. But every tool starts from zero. Claude doesn’t know what I told Cursor. ChatGPT forgets my preferences. Gemini has no idea about my stack. I was spending the first 5 minutes of every session re-explaining who I am. Over and over. So I built aura-ctx; a free, open-source CLI that defines your AI identity once and serves it to all your tools via MCP. One source of truth. Everything stays local. No cloud. No lock-in. This is not another memory layer. Mem0, Zep, and Letta solve agent memory for developers. aura-ctx solves something different: the end user who wants to own and control their identity across tools. No Docker. No Postgres. No Redis. No auth tokens to manage. Just: pip install -U aura-ctx aura quickstart Why local-first matters here: your MCP server runs on localhost. No network latency. No auth hell. No token refresh. If you’ve dropped cloud-based MCP servers because of the overhead, this is the opposite architecture. Portability is by design: your entire identity lives in ~/.aura/packs/. Move machines? Copy the folder. That’s it. Security built-in: aura audit scans your packs for accidentally stored secrets (API keys, tokens, credentials) before they leak into your context. v0.3.3 is out with 3,500+ downloads. Supports 8 AI tools including Claude Desktop, Cursor, Windsurf, Gemini CLI, Claude Code and more. Exports to CLAUDE.md and AGENTS.md for agent frameworks. Still early. I’d like any feedback on what works, what doesn’t, and what’s missing. Curious : do you re-explain yourself every time you open Claude, or have you found a better way? GitHub: https://github.com/WozGeek/aura-ctx submitted by /u/Miserable_Celery9917 [link] [comments]
View originalI built a persistent memory MCP for Claude Code — here's what I learned about why LLM-based extraction is the wrong approach
I've been using Claude Code daily for months and wanted it to remember things across sessions — project context, my preferences, decisions we've made together. I tried Mem0 and Zep but hit the same frustration with both: they intercept conversations and run them through a separate LLM to decide what's worth remembering. That felt wrong. Claude already understands the conversation. Why pay for a second LLM to re-interpret what just happened? So I built Deep Recall — an MCP server that takes a different approach. Claude decides what to store. The memory system handles what happens to those memories over time. **What I learned building this:** The biggest insight was that extraction quality is actually BETTER when the agent does it itself. Claude has full context — it knows what's new information vs what it already knows, what contradicts existing memories, what's important to this specific user. A separate extraction LLM has none of that context. The second insight was that memories need biology, not just storage. I implemented: - **Salience decay** based on ACT-R cognitive architecture — unused memories fade, frequently accessed ones resist decay - **Hebbian reinforcement** — when Claude cites a memory in its response, that memory gets stronger - **Contradiction detection** — if you store "works at Google" then later "works at Meta", it flags the conflict - **Temporal supersession** — detects that's a career change, not a contradiction, and auto-resolves it - **Memory consolidation** — clusters of related episodes compress into durable facts over time **How it works with Claude Code:** ```bash pip install deeprecall-mcp ``` Add to `~/.claude/settings.json`: ```json { "mcpServers": { "deeprecall": { "command": "deeprecall-mcp", "env": { "DEEPRECALL_API_KEY": "your_key" } } } } ``` Claude gets tools like `deeprecall_context` (pull memories before responding), `deeprecall_remember` (store a fact), and `deeprecall_learn` (post-conversation biology processing). **The whole thing was built with Claude Code** — Thomas (my Claude instance) and I pair-programmed the entire backend, MCP server, landing page, billing, and the biological memory algorithms. The irony of using Claude to build a memory system for Claude isn't lost on me. Free to try — 10,000 memories, no credit card, all features: https://deeprecall.dev Happy to answer questions about the architecture or the cognitive science behind the decay/reinforcement models. submitted by /u/floppytacoextrasoggy [link] [comments]
View original[D] MemPalace claims 100% on LoCoMo and a "perfect score on LongMemEval." Its own BENCHMARKS.md documents why neither is meaningful.
A new open-source memory project called MemPalace launched yesterday claiming "100% on LoCoMo" and "the first perfect score ever recorded on LongMemEval. 500/500 questions, every category at 100%." The launch tweet went viral reaching over 1.5 million views while the repository picked up over 7,000 GitHub stars in less than 24 hours. The interesting thing is not that the headline numbers are inflated. The interesting thing is that the project's own BENCHMARKS.md file documents this in detail, while the launch tweet strips these caveats. Some of failure modes line up with the methodology disputes the field has been arguing about for over a year (Zep vs Mem0, Letta's "Filesystem All You Need" reproducibility post, etc.). 1. The LoCoMo 100% is a top_k bypass. The runner uses top_k=50. LoCoMo's ten conversations have 19, 19, 32, 29, 29, 28, 31, 30, 25, and 30 sessions respectively. Every conversation has fewer than 50 sessions, so top_k=50 retrieves the entire conversation as the candidate pool every time. The Sonnet rerank then does reading comprehension over all sessions. BENCHMARKS.md says this verbatim: The LoCoMo 100% result with top-k=50 has a structural issue: each of the 10 conversations has 19–32 sessions, but top-k=50 exceeds that count. This means the ground-truth session is always in the candidate pool regardless of the embedding model's ranking. The Sonnet rerank is essentially doing reading comprehension over all sessions - the embedding retrieval step is bypassed entirely. The honest LoCoMo numbers in the same file are 60.3% R@10 with no rerank and 88.9% R@10 with hybrid scoring and no LLM. Those are real and unremarkable. A 100% is also independently impossible on the published version of LoCoMo, since roughly 6.4% of the answer key contains hallucinated facts, wrong dates, and speaker attribution errors that any honest system will disagree with. 2. The LongMemEval "perfect score" is a metric category error. Published LongMemEval is end-to-end QA: retrieve from a haystack of prior chat sessions, generate an answer, GPT-4 judge marks it correct. Every score on the published leaderboard is the percentage of generated answers judged correct. The MemPalace LongMemEval runner does retrieval only. For each of the 500 questions it builds one document per session by concatenating only the user turns (assistant turns are not indexed at all), embeds with default ChromaDB embeddings (all-MiniLM-L6-v2), returns the top five sessions by cosine distance, and checks set membership against the gold session IDs. It computes both recall_any@5 and recall_all@5, and the project reports the softer one. It never generates an answer. It never invokes a judge. None of the LongMemEval numbers in this repository - not the 100%, not the 98.4% "held-out", not the 96.6% raw baseline - are LongMemEval scores in the sense the published leaderboard means. They are recall_any@5 retrieval numbers on the same dataset, which is a substantially easier task. Calling any of them a "perfect score on LongMemEval" is a metric category error. 3. The 100% itself is teaching to the test. The hybrid v4 mode that produces the 100% was built by inspecting the three remaining wrong answers in their dev set and writing targeted code for each one: a quoted-phrase boost for a question containing a specific phrase in single quotes, a person-name boost for a question about someone named Rachel, and "I still remember" / "when I was in high school" patterns for a question about a high school reunion. Three patches for three specific questions. BENCHMARKS.md, line 461, verbatim: This is teaching to the test. The fixes were designed around the exact failure cases, not discovered by analyzing general failure patterns. 4. Marketed features that don't exist in the code. The launch post lists "contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them" as a feature. mempalace/knowledge_graph.py contains zero occurrences of "contradict". The only deduplication logic is an exact-match check on (subject, predicate, object) triples that blocks identical triples from being added twice. Conflicting facts about the same subject can accumulate indefinitely. 5. "30x lossless compression" is measurably lossy in the project's own benchmarks. The compression module mempalace/dialect.py truncates sentences at 55 characters, filters by keyword frequency, and provides a decode() function that splits the compressed string into a header dictionary without reconstructing the original text. There is no round-trip. The same BENCHMARKS.md reports results_raw_full500.jsonl at 96.6% R@5 and results_aaak_full500.jsonl at 84.2% R@5 — a 12.4 percentage point drop on the same dataset and the same metric, run by the project itself. Lossless compression cannot cause a measured quality drop. Why this matters for the benchmark conversation. The field needs benchmarks where judge reliability is adversarially validated, an
View originalself-hosted monitoring for Claude Code & Codex
About a month after our team started using Claude Code, someone asked in Slack how much we were spending. Nobody knew. We looked around for a monitoring tool, didn't find one we liked, and ended up building our own. Zeude is a self-hosted dashboard that tracks Claude Code and OpenAI Codex usage in one place. You get per-prompt token and cost breakdowns, a weekly leaderboard (with cohort grouping if your org is big enough to care), and a way to push skills, MCP servers, and hooks to your whole team from the dashboard instead of chasing people on Slack The big things in v1.0.0: Windows support. It was macOS/Linux only before. Now the whole team can use it regardless of OS. Codex integration. A lot of teams use both Claude Code and Codex, and tracking only one of them gives you half the picture on costs. Now both go through the same dashboard. Per-user skill opt-out. Team skill sync was already there, but it was all-or-nothing. Now individuals can turn off skills they don't want. Turns out not everyone wants every skill pushed to their machine. Stack is Next.js + Supabase + ClickHouse + OTel Collector. All your data stays on your infra. We ran it internally for ~6 months before cleaning it up for open source. It's not perfect, but it solved a real problem for us and figured others might be in the same spot. https://github.com/zep-us/zeude If you try it out, let me know what breaks. submitted by /u/Lopsided_Yak9897 [link] [comments]
View originalAgentic AI persistent memory with auto pruning based on time decay and Importance
Developing a persistent memory layer on top of your Agentic AI framework is a trending area these days, but there is no complete solution. One of the major challenges faced in developing a layer like this is how to prune your data over time. In order to tackle this problem, I did some research and found a cool formula that somewhat mimicked human memory's ebbinghaus forgetting curve. Tried to work around this concept and established a formula to use Strength = importance × e^(−λ_eff × days) × (1 + recall_count × 0.2) If I break it down: Importance : is a variable that is defined at store time. As each memory can have different importance, I decided to use this attribute. In this, I gave facts higher importance and assumptions lower importance, etc. e^(−λ_eff × days) : This I took from the original formula, it derives the decay rate and λ_eff varies based on some categories that I have defined. (1 + recall_count × 0.2): This part is to strengthen the memory if recalled again. The retrieval is straight forward and uses cosine similarity. I also benchmarked it against existing systems like Mem0 and Zep and was able to outperform them. The benchmark was done using the LoCoMo dataset and the metric was Recall@5. The result is shared in the repo itself. You guys can check that out. I would encourage you guys to check this approach once and let me know if it can be utilized in the persistent memory layer or not ! https://github.com/sachitrafa/cognitive-ai-memory Installation: pip install yourmemory submitted by /u/Sufficient_Sir_5414 [link] [comments]
View original[D] We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally wrong answers
Projects are still submitting new scores on LoCoMo as of March 2026. We audited it and found 6.4% of the answer key is wrong, and the LLM judge accepts up to 63% of intentionally wrong answers. LongMemEval-S is often raised as an alternative, but each question's corpus fits entirely in modern context windows, making it more of a context window test than a memory test. Here's what we found. LoCoMo LoCoMo (Maharana et al., ACL 2024) is one of the most widely cited long-term memory benchmarks. We conducted a systematic audit of the ground truth and identified 99 score-corrupting errors in 1,540 questions (6.4%). Error categories include hallucinated facts in the answer key, incorrect temporal reasoning, and speaker attribution errors. Examples: The answer key specifies "Ferrari 488 GTB," but the source conversation contains only "this beauty" and the image caption reads "a red sports car." The car model exists only in an internal query field (annotator search strings for stock photos) that no memory system ingests. Systems are evaluated against facts they have no access to. "Last Saturday" on a Thursday should resolve to the preceding Saturday. The answer key says Sunday. A system that performs the date arithmetic correctly is penalized. 24 questions attribute statements to the wrong speaker. A system with accurate speaker tracking will contradict the answer key. The theoretical maximum score for a perfect system is approximately 93.6%. We also tested the LLM judge. LoCoMo uses gpt-4o-mini to score answers against the golden reference. We generated intentionally wrong but topically adjacent answers for all 1,540 questions and scored them using the same judge configuration and prompts used in published evaluations. The judge accepted 62.81% of them. Specific factual errors (wrong name, wrong date) were caught approximately 89% of the time. However, vague answers that identified the correct topic while missing every specific detail passed nearly two-thirds of the time. This is precisely the failure mode of weak retrieval, locating the right conversation but extracting nothing specific, and the benchmark rewards it. There is also no standardized evaluation pipeline. Each system uses its own ingestion method (arguably necessary given architectural differences), its own answer generation prompt, and sometimes entirely different models. Scores are then compared in tables as if they share a common methodology. Multiple independent researchers have documented inability to reproduce published results (EverMemOS #73, Mem0 #3944, Zep scoring discrepancy). Full audit with all 99 errors documented, methodology, and reproducible scripts: locomo-audit LongMemEval LongMemEval-S (Wang et al., 2024) is the other frequently cited benchmark. The issue is different but equally fundamental: it does not effectively isolate memory capability from context window capacity. LongMemEval-S uses approximately 115K tokens of context per question. Current models support 200K to 1M token context windows. The entire test corpus fits in a single context window for most current models. Mastra's research illustrates this: their full-context baseline scored 60.20% with gpt-4o (128K context window, near the 115K threshold). Their observational memory system scored 84.23% with the same model, largely by compressing context to fit more comfortably. The benchmark is measuring context window management efficiency rather than long-term memory retrieval. As context windows continue to grow, the full-context baseline will keep climbing and the benchmark will lose its ability to discriminate. LongMemEval-S tests whether a model can locate information within 115K tokens. That is a useful capability to measure, but it is a context window test, not a memory test. LoCoMo-Plus LoCoMo-Plus (Li et al., 2025) introduces a genuinely interesting new category: "cognitive" questions testing implicit inference rather than factual recall. These use cue-trigger pairs with deliberate semantic disconnect, the system must connect "I just adopted a rescue dog" (cue) to "what kind of pet food should I buy?" (trigger) across sessions without lexical overlap. The concept is sound and addresses a real gap in existing evaluation. The issues: It inherits all 1,540 original LoCoMo questions unchanged, including the 99 score-corrupting errors documented above. The improved judging methodology (task-specific prompts, three-tier scoring, 0.80+ human-LLM agreement) was only validated on the new cognitive questions. The original five categories retain the same broken ground truth with no revalidation. The judge model defaults to gpt-4o-mini. Same lack of pipeline standardization. The new cognitive category is a meaningful contribution. The inherited evaluation infrastructure retains the problems described above. Requirements for meaningful long-term memory evaluation Based on this analysis, we see several requirements for benchmarks that can meaningfully
View originalI built an open-source CLI that makes your AI context portable across Claude, ChatGPT, Cursor, and Gemini via MCP
The problem I use Claude for analysis, ChatGPT for writing, Cursor for coding. Each one builds a different picture of who I am — my stack, my style, my preferences. None of them share it. When I switch tools, I start from zero. Platform memories are black boxes. You can't version them, audit them, or export them. And that's by design — it's lock-in. What I built aura is an open-source CLI that scans your machine, builds your AI identity automatically, and serves it to every tool via MCP. pip install aura-ctx aura scan # auto-detects your stack, tools, projects aura serve # starts MCP server on localhost:3847 That's it. Open Claude Desktop, ChatGPT (Developer Mode), Cursor, or Gemini CLI. They read your context automatically. No copy-paste. No re-explaining. How it works aura creates "context packs" — scoped YAML files that describe who you are in a specific domain (developer, writer, work). You control what's in them. The AI never writes to your packs without your explicit action. aura scan detects your languages, frameworks, tools, editor, projects, and git identity from your machine. aura onboard asks 5 questions to capture your style and rules. aura doctor checks your packs for bloat and stale facts. aura consolidate merges duplicates across packs. aura decay removes expired facts based on type-aware TTL. The MCP server exposes your packs as resources and tools that any MCP-compatible client can query. Security - Binds to localhost only - Optional token auth: aura serve --token - Scoped serving: aura serve --packs developer - Read-only mode: aura serve --read-only - No cloud. No telemetry. YAML files on your machine. What it's NOT This is not another memory layer for agent developers (Mem0, Zep, Letta solve that). aura is for the end user who wants to own and control their AI identity across tools. No Docker. No Postgres. No Redis. Just pip install and go. GitHub: https://github.com/WozGeek/aura-ctx PyPI: https://pypi.org/project/aura-ctx/ Happy to answer any questions. Repo submitted by /u/Miserable_Celery9917 [link] [comments]
View originalManual-Driven Development: 190 Findings, 7 Hours, Zero Rule Violations
https://preview.redd.it/lrsipzxvg5og1.jpg?width=2752&format=pjpg&auto=webp&s=eb9621bd6a89962f8abbe85cc87f6991dbea7717 Every Claude Code session you have ever had started with Claude not knowing your system. It read a few files, inferred patterns, and started coding based on assumptions. At small scale that works fine. At production scale it produces confident, wrong code, and you do not find out until something breaks in a way that tests cannot catch, because Claude wrote the tests against its own assumptions too. I call this confident divergence. It is the problem nobody in the AI tooling space is naming correctly. And it is the one that kills production codebases. Manual-Driven Development fixes it. Here is what that looks like in production numbers: Seven sections audited. 190 findings. 876 new tests written. 7 hours and 48 minutes of actual Claude Code session time against an estimated 234 to 361 hours of human developer time. That is a 30 to 46x compression ratio, reproduced independently across every section of a production codebase with 200+ routes, 80+ models, and a daemon enforcement pipeline that converts network policies into live nftables rules on the host. And across all seven sections, not a single CLAUDE.md rule violated. Not one. That last number is the one that should stop you. Everyone who has used Claude Code for more than a week has written CLAUDE.md rules and watched Claude ignore them three tasks later. The model does not do this deliberately. It runs out of context budget to honor them. MDD fixes the budget problem, and the rules hold. RuleCatch, which monitors rule enforcement in real time, reported 60% fewer rule violations during the SwarmK build compared to sessions running without MDD. Same model, same rules, same codebase. The only variable was MDD. I am not going to ask you to take that on faith. The prompts that produced these results are published. The methodology is documented. The section-by-section data is in this article. Everything is reproducible. If you are already using GSD or Mem0, you do not have to stop. MDD is a different layer solving a different problem. All three run together without conflict. I will explain exactly how near the end. The Problem Nobody Is Naming Correctly When Claude Code produces wrong code at scale, the community tends to blame one of two things: context rot, where quality degrades as the session fills up, or session amnesia, where Claude forgets everything when the session ends. GSD was built to solve context rot. Mem0 and Claude-Mem were built to solve session amnesia. Both are real problems. Both tools are real solutions. But there is a third problem that neither tool addresses, and it is the one that produces confident divergence. Claude does not know your system. Not in the way you do. It reads a few files, infers patterns, and starts coding based on assumptions. At production scale, with 200+ routes, 50+ models, and business rules distributed across a codebase that took months to build, the inferences diverge from reality. Claude produces code that compiles, passes its own tests, and is confidently wrong. Here is what makes confident divergence so hard to catch: everything looks correct. The code runs. The tests pass. Claude wrote the tests against its own assumptions about what the system does, not against what the system actually does. The divergence only surfaces in production, when a real user hits the edge case Claude never knew existed. Here is what makes it so hard to prevent: the problem is not just that Claude does not know your system. It is that you cannot reliably narrate your system to Claude either. You built the whole thing. You know how operator scoping works, how the tier hierarchy enforces access, how tunnels allocate /30 subnets in the 10.99.x.0 range. You know all of it in theory. But when you sit down to write a prompt at 11pm, you will not remember to mention that operators are scoped to specific groups and cannot modify policies outside their assigned groups. You will forget that ROLE_HIERARCHY is defined in three different files. You will not think to tell Claude that base-tier policies are system-only and cannot be created via the API. You are not going to enumerate 200 routes worth of business rules in a prompt. Nobody can. So Claude guesses. And confident divergence happens. That is the problem MDD solves. Not context rot within a session. Not forgetting between sessions. The deeper problem of Claude not having explicit knowledge of your system in the first place. The Token Obsession Is Solving the Wrong Problem Before explaining MDD, it is worth naming something about the current tooling landscape, because the framing most tools use will make MDD seem like another entry in the same race. It is not. Every tool launched in the last twelve months leads with the same promise: fewer tokens, lower cost, faster responses. Mem0 claims 90% token reduction. Zep claims 90% latency reduction. GSD kee
View originalupstream(agents): 移植 5 个冲突 commit (P0,P1) — v2026.3.7→v2026.3.8
## 任务 将以下 5 个上游 commit 的修改语义化应用到本 fork。这些 commit 无法直接 cherry-pick(存在冲突),需要理解修改意图后手动应用等效变更。 ### 上游版本范围 - **来源**: openclaw/openclaw v2026.3.7 → v2026.3.8 - **模块**: `agents` - **优先级**: P0,P1 ### 需要移植的 commit #### Commit 1: `e8775cda932f` (P1) **描述**: fix(agents): re-expose configured tools under restrictive profiles **涉及文件**: `src/agents/pi-tools.policy.test.ts,src/agents/pi-tools.policy.ts,src/plugins/config-state.test.ts,src/plugins/config-state.ts` <details> <summary>查看上游 diff</summary> ```diff diff --git a/src/agents/pi-tools.policy.test.ts b/src/agents/pi-tools.policy.test.ts index 4b7a16b4d..0cdc572c4 100644 --- a/src/agents/pi-tools.policy.test.ts +++ b/src/agents/pi-tools.policy.test.ts @@ -3,6 +3,7 @@ import type { OpenClawConfig } from "../config/config.js"; import { filterToolsByPolicy, isToolAllowedByPolicyName, + resolveEffectiveToolPolicy, resolveSubagentToolPolicy, } from "./pi-tools.policy.js"; import { createStubTool } from "./test-helpers/pi-tool-stubs.js"; @@ -176,3 +177,59 @@ describe("resolveSubagentToolPolicy depth awareness", () => { expect(isToolAllowedByPolicyName("sessions_spawn", policy)).toBe(false); }); }); + +describe("resolveEffectiveToolPolicy", () => { + it("implicitly re-exposes exec and process when tools.exec is configured", () => { + const cfg = { + tools: { + profile: "messaging", + exec: { host: "sandbox" }, + }, + } as OpenClawConfig; + const result = resolveEffectiveToolPolicy({ config: cfg }); + expect(result.profileAlsoAllow).toEqual(["exec", "process"]); + }); + + it("implicitly re-exposes read, write, and edit when tools.fs is configured", () => { + const cfg = { + tools: { + profile: "messaging", + fs: { workspaceOnly: false }, + }, + } as OpenClawConfig; + const result = resolveEffectiveToolPolicy({ config: cfg }); + expect(result.profileAlsoAllow).toEqual(["read", "write", "edit"]); + }); + + it("merges explicit alsoAllow with implicit tool-section exposure", () => { + const cfg = { + tools: { + profile: "messaging", + alsoAllow: ["web_search"], + exec: { host: "sandbox" }, + }, + } as OpenClawConfig; + const result = resolveEffectiveToolPolicy({ config: cfg }); + expect(result.profileAlsoAllow).toEqual(["web_search", "exec", "process"]); + }); + + it("uses agent tool sections when resolving implicit exposure", () => { + const cfg = { + tools: { + profile: "messaging", + }, + agents: { + list: [ + { + id: "coder", + tools: { + fs: { workspaceOnly: true }, + }, + }, + ], + }, + } as OpenClawConfig; + const result = resolveEffectiveToolPolicy({ config: cfg, agentId: "coder" }); + expect(result.profileAlsoAllow).toEqual(["read", "write", "edit"]); + }); +}); diff --git a/src/agents/pi-tools.policy.ts b/src/agents/pi-tools.policy.ts index db9a36755..61d037dd9 100644 --- a/src/agents/pi-tools.policy.ts +++ b/src/agents/pi-tools.policy.ts @@ -2,6 +2,7 @@ import { getChannelDock } from "../channels/dock.js"; import { DEFAULT_SUBAGENT_MAX_SPAWN_DEPTH } from "../config/agent-limits.js"; import type { OpenClawConfig } from "../config/config.js"; import { resolveChannelGroupToolsPolicy } from "../config/group-policy.js"; +import type { AgentToolsConfig } from "../config/types.tools.js"; import { normalizeAgentId } from "../routing/session-key.js"; import { resolveThreadParentSessionKey } from "../sessions/session-key-utils.js"; import { normalizeMessageChannel } from "../utils/message-channel.js"; @@ -196,6 +197,37 @@ function resolveProviderToolPolicy(params: { return undefined; } +function resolveExplicitProfileAlsoAllow(tools?: OpenClawConfig["tools"]): string[] | undefined { + return Array.isArray(tools?.alsoAllow) ? tools.alsoAllow : undefined; +} + +function hasExplicitToolSection(section: unknown): boolean { + return section !== undefined && section !== null; +} + +function resolveImplicitProfileAlsoAllow(params: { + globalTools?: OpenClawConfig["tools"]; + agentTools?: AgentToolsConfig; +}): string[] | undefined { + const implicit = new Set<string>(); + if ( + hasExplicitToolSection(params.agentTools?.exec) || + hasExplicitToolSection(params.globalTools?.exec) + ) { + implicit.add("exec"); + implicit.add("process"); + } + if ( + hasExplicitToolSection(params.agentTools?.fs) || + hasExplicitToolSection(params.globalTools?.fs) + ) { + implicit.add("read"); + implicit.add("write"); + implicit.add("edit"); + } + return implicit.size > 0 ? Array.from(implicit) : undefined; +} + export function resolveEffectiveToolPolicy(params: { config?: OpenClawConfig; sessionKey?: string; @@ -226,6 +258,15 @@ export function resolveEffectiveToolPolicy(params: { modelProvider: params.modelProvider, modelId: params.modelId, }); + const explicitProfileAl
View originalRepository Audit Available
Deep analysis of getzep/zep — architecture, costs, security, dependencies & more
Yes, Zep offers a free tier. Pricing found: $25 /month, $25 / 20, $475 /month, $125 / 100
Key features include: Trusted by developers at, Ingest, Graph, Assemble, Every Source, Built for Real-Time, Three Lines of Code, Chat Memory.
Zep has a public GitHub repository with 4,316 stars.
Based on user reviews and social mentions, the most common pain points are: spending limit.
Based on 14 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Alex Volkov
Host at ThursdAI
1 mention