Let AI organize your team
I cannot provide a meaningful summary of user sentiment about "Mem" software based on the provided social mentions. The mentions appear to be from various tech discussions on platforms like Hacker News, Lemmy, and TikTok, but they don't contain actual reviews or discussions specifically about Mem as a software tool. The only reference to "mem" appears to be a hashtag in a TikTok post about AI pricing, which doesn't provide substantive feedback about the Mem application itself. To properly analyze user sentiment about Mem, I would need actual user reviews, testimonials, or discussions specifically focused on the software and its features.
Mentions (30d)
23
1 this week
Reviews
0
Platforms
8
Sentiment
0%
0 positive
I cannot provide a meaningful summary of user sentiment about "Mem" software based on the provided social mentions. The mentions appear to be from various tech discussions on platforms like Hacker News, Lemmy, and TikTok, but they don't contain actual reviews or discussions specifically about Mem as a software tool. The only reference to "mem" appears to be a hashtag in a TikTok post about AI pricing, which doesn't provide substantive feedback about the Mem application itself. To properly analyze user sentiment about Mem, I would need actual user reviews, testimonials, or discussions specifically focused on the software and its features.
Industry
information technology & services
Employees
69
Funding Stage
Venture (Round not Specified)
Total Funding
$34.7M
Show HN: Sub-millisecond VM sandboxes using CoW memory forking
I wanted to see how fast an isolated code sandbox could start if I never had to boot a fresh VM.<p>So instead of launching a new microVM per execution, I boot Firecracker once with Python and numpy already loaded, then snapshot the full VM state. Every execution after that creates a new KVM VM backed by a `MAP_PRIVATE` mapping of the snapshot memory, so Linux gives me copy-on-write pages automatically.<p>That means each sandbox starts from an already-running Python process inside a real VM, runs the code, and exits.<p>These are real KVM VMs, not containers: separate guest kernel, separate guest memory, separate page tables. When a VM writes to memory, it gets a private copy of that page.<p>The hard part was not CoW itself. The hard part was resuming the snapshotted VM correctly.<p>Rust, Apache 2.0.
View original[D] MemPalace claims 100% on LoCoMo and a "perfect score on LongMemEval." Its own BENCHMARKS.md documents why neither is meaningful.
A new open-source memory project called MemPalace launched yesterday claiming "100% on LoCoMo" and "the first perfect score ever recorded on LongMemEval. 500/500 questions, every category at 100%." The launch tweet went viral reaching over 1.5 million views while the repository picked up over 7,000 GitHub stars in less than 24 hours. The interesting thing is not that the headline numbers are inflated. The interesting thing is that the project's own BENCHMARKS.md file documents this in detail, while the launch tweet strips these caveats. Some of failure modes line up with the methodology disputes the field has been arguing about for over a year (Zep vs Mem0, Letta's "Filesystem All You Need" reproducibility post, etc.). 1. The LoCoMo 100% is a top_k bypass. The runner uses top_k=50. LoCoMo's ten conversations have 19, 19, 32, 29, 29, 28, 31, 30, 25, and 30 sessions respectively. Every conversation has fewer than 50 sessions, so top_k=50 retrieves the entire conversation as the candidate pool every time. The Sonnet rerank then does reading comprehension over all sessions. BENCHMARKS.md says this verbatim: The LoCoMo 100% result with top-k=50 has a structural issue: each of the 10 conversations has 19–32 sessions, but top-k=50 exceeds that count. This means the ground-truth session is always in the candidate pool regardless of the embedding model's ranking. The Sonnet rerank is essentially doing reading comprehension over all sessions - the embedding retrieval step is bypassed entirely. The honest LoCoMo numbers in the same file are 60.3% R@10 with no rerank and 88.9% R@10 with hybrid scoring and no LLM. Those are real and unremarkable. A 100% is also independently impossible on the published version of LoCoMo, since roughly 6.4% of the answer key contains hallucinated facts, wrong dates, and speaker attribution errors that any honest system will disagree with. 2. The LongMemEval "perfect score" is a metric category error. Published LongMemEval is end-to-end QA: retrieve from a haystack of prior chat sessions, generate an answer, GPT-4 judge marks it correct. Every score on the published leaderboard is the percentage of generated answers judged correct. The MemPalace LongMemEval runner does retrieval only. For each of the 500 questions it builds one document per session by concatenating only the user turns (assistant turns are not indexed at all), embeds with default ChromaDB embeddings (all-MiniLM-L6-v2), returns the top five sessions by cosine distance, and checks set membership against the gold session IDs. It computes both recall_any@5 and recall_all@5, and the project reports the softer one. It never generates an answer. It never invokes a judge. None of the LongMemEval numbers in this repository - not the 100%, not the 98.4% "held-out", not the 96.6% raw baseline - are LongMemEval scores in the sense the published leaderboard means. They are recall_any@5 retrieval numbers on the same dataset, which is a substantially easier task. Calling any of them a "perfect score on LongMemEval" is a metric category error. 3. The 100% itself is teaching to the test. The hybrid v4 mode that produces the 100% was built by inspecting the three remaining wrong answers in their dev set and writing targeted code for each one: a quoted-phrase boost for a question containing a specific phrase in single quotes, a person-name boost for a question about someone named Rachel, and "I still remember" / "when I was in high school" patterns for a question about a high school reunion. Three patches for three specific questions. BENCHMARKS.md, line 461, verbatim: This is teaching to the test. The fixes were designed around the exact failure cases, not discovered by analyzing general failure patterns. 4. Marketed features that don't exist in the code. The launch post lists "contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them" as a feature. mempalace/knowledge_graph.py contains zero occurrences of "contradict". The only deduplication logic is an exact-match check on (subject, predicate, object) triples that blocks identical triples from being added twice. Conflicting facts about the same subject can accumulate indefinitely. 5. "30x lossless compression" is measurably lossy in the project's own benchmarks. The compression module mempalace/dialect.py truncates sentences at 55 characters, filters by keyword frequency, and provides a decode() function that splits the compressed string into a header dictionary without reconstructing the original text. There is no round-trip. The same BENCHMARKS.md reports results_raw_full500.jsonl at 96.6% R@5 and results_aaak_full500.jsonl at 84.2% R@5 — a 12.4 percentage point drop on the same dataset and the same metric, run by the project itself. Lossless compression cannot cause a measured quality drop. Why this matters for the benchmark conversation. The field needs benchmarks where judge reliability is adversarially validated, an
View originalThe end of 'shadow AI' at enterprises? Kilo launches KiloClaw for Organizations to enable secure AI agents at scale
As generative AI matures from a novelty into a workplace staple, a new friction point has emerged: the "shadow AI" or "Bring Your Own AI (BYOAI)" crisis. Much like the unsanctioned use of personal devices in years past, developers and knowledge workers are increasingly deploying autonomous agents on personal infrastructure to manage their professional workflows. "Our journey with Kilo Claw has been to make it easier and easier and more accessible to folks," says Kilo co-founder Scott Breitenother. Today, the company dedicated to providing a portable, multi-model, cloud-based AI coding environment is moving to formalize this "shadow AI" layer: it's launching KiloClaw for Organizations and KiloClaw Chat, a suite of tools designed to provide enterprise-grade governance over personal AI agents. The announcement comes at a period of high velocity for the company. Since making its securely hosted, one-click OpenClaw product for individuals, KiloClaw, generally available last month, more than 25,000 users have integrated the platform into their daily workflows. Simultaneously, Kilo’s proprietary agent benchmark, PinchBench, has logged over 250,000 interactions and recently gained significant industry validation when it was referenced by Nvidia CEO Jensen Huang during his keynote at the 2026 Nvidia GTC conference in San Jose, California. The shadow AI crisis: Addressing the BYOAI problem The impetus for KiloClaw for Organizations stems from a growing visibility gap within large enterprises. In a recent interview with VentureBeat, Kilo leadership detailed conversations with high-level AI directors at government contractors who found their developers running OpenClaw agents on random VPS instances to manage calendars and monitor repositories. "What we’re announcing on Tuesday is Kilo Claw for organizations, where a company can buy an organization-level package of Kilo Claws and give every team member access," explained Kilo co-founder and head of product and engineering Emili
View originalClaude Code's source code appears to have leaked: here's what we know
Anthropic appears to have accidentally revealed the inner workings of one of its most popular and lucrative AI products, the agentic AI harness Claude Code, to the public. A 59.8 MB JavaScript source map file (.map), intended for internal debugging, was inadvertently included in version 2.1.88 of the @anthropic-ai/claude-code package on the public npm registry pushed live earlier this morning. By 4:23 am ET, Chaofan Shou (@Fried_rice), an intern at Solayer Labs, broadcasted the discovery on X (formerly Twitter). The post, which included a direct download link to a hosted archive, acted as a digital flare. Within hours, the ~512,000-line TypeScript codebase was mirrored across GitHub and analyzed by thousands of developers. For Anthropic, a company currently riding a meteoric rise with a reported $19 billion annualized revenue run-rate as of March 2026, the leak is more than a security lapse; it is a strategic hemorrhage of intellectual property.The timing is particularly critical given the commercial velocity of the product. Market data indicates that Claude Code alone has achieved an annualized recurring revenue (ARR) of $2.5 billion, a figure that has more than doubled since the beginning of the year. With enterprise adoption accounting for 80% of its revenue, the leak provides competitors—from established giants to nimble rivals like Cursor—a literal blueprint for how to build a high-agency, reliable, and commercially viable AI agent. Anthropic confirmed the leak in a spokesperson’s e-mailed statement to VentureBeat, which reads: “Earlier today, a Claude Code release included some internal source code. No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach. We're rolling out measures to prevent this from happening again.” The anatomy of agentic memory The most significant takeaway for competitors lies in how Anthropic solved "context entropy"—the tendency for AI agents to
View originalImagine if your Teams or Slack messages automatically turned into secure context for your AI agents — PromptQL built it
For the modern enterprise, the digital workspace risks descending into "coordination theater," in which teams spend more time discussing work than executing it. While traditional tools like Slack or Teams excel at rapid communication, they have structurally failed to serve as a reliable foundation for AI agents, such that a Hacker News thread went viral in February 2026 calling upon OpenAI to build its own version of Slack to help empower AI agents, amassing 327 comments. That's because agents often lack the real-time context and secure data access required to be truly useful, often resulting in "hallucinations" or repetitive re-explaining of codebase conventions. PromptQL, a spin-off from the GraphQL unicorn Hasura, is addressing this by pivoting from an AI data tool into a comprehensive, AI-native workspace designed to turn casual, regular team interactions into a persistent, secure memory for agentic workflows — ensuring these conversations are not simply left by the wayside or that users and agents have to try and find them again later, but rather, distilled and stored as actionable, proprietary data in an organized format — an internal wiki — that the company can rely on going forward, forever, approved and edited manually as needed. Imagine two colleagues messaging about a bug that needs to be fixed — instead of manually assigning it to an engineer or agent, your messaging platform automatically tags it, assigns it and documents it all in the wiki with one click Now do this for every issue or topic of discussion that takes place in your enterprise, and you'll have an idea of what PromptQL is attempting. The idea is a simple but powerful one: turning the conversation that necessarily precedes work into an actual assignment that is automatically started by your own messaging system. “We don’t have conversations about work anymore," CEO Tanmai Gopal said in a recent video call interview with VentureBeat. "You actually have conversations that do the work.” Origina
View originalOllama taps Apple’s MLX framework to make local AI models faster on Macs
Running large language models (LLMs) locally has often meant accepting slower speeds and tighter memory limits. Ollama’s latest update, built The post Ollama taps Apple’s MLX framework to make local AI models faster on Macs appeared first on The New Stack.
View originalIndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models
Processing 200,000 tokens through a large language model is expensive and slow: the longer the context, the faster the costs spiral. Researchers at Tsinghua University and Z.ai have built a technique called IndexCache that cuts up to 75% of the redundant computation in sparse attention models, delivering up to 1.82x faster time-to-first-token and 1.48x faster generation throughput at that context length. The technique applies to models using the DeepSeek Sparse Attention architecture, including the latest DeepSeek and GLM families. It can help enterprises provide faster user experiences for production-scale, long-context models, a capability already proven in preliminary tests on the 744-billion-parameter GLM-5 model. The DSA bottleneck Large language models rely on the self-attention mechanism, a process where the model computes the relationship between every token in its context and all the preceding ones to predict the next token. However, self-attention has a severe limitation. Its computational complexity scales quadratically with sequence length. For applications requiring extended context windows (e.g., large document processing, multi-step agentic workflows, or long chain-of-thought reasoning), this quadratic scaling leads to sluggish inference speeds and significant compute and memory costs. Sparse attention offers a principled solution to this scaling problem. Instead of calculating the relationship between every token and all preceding ones, sparse attention optimizes the process by having each query select and attend to only the most relevant subset of tokens. DeepSeek Sparse Attention (DSA) is a highly efficient implementation of this concept, first introduced in DeepSeek-V3.2. To determine which tokens matter most, DSA introduces a lightweight "lightning indexer module" at every layer of the model. This indexer scores all preceding tokens and selects a small batch for the main core attention mechanism to process. By doing this, DSA slashes the heavy co
View original[D] We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally wrong answers
Projects are still submitting new scores on LoCoMo as of March 2026. We audited it and found 6.4% of the answer key is wrong, and the LLM judge accepts up to 63% of intentionally wrong answers. LongMemEval-S is often raised as an alternative, but each question's corpus fits entirely in modern context windows, making it more of a context window test than a memory test. Here's what we found. LoCoMo LoCoMo (Maharana et al., ACL 2024) is one of the most widely cited long-term memory benchmarks. We conducted a systematic audit of the ground truth and identified 99 score-corrupting errors in 1,540 questions (6.4%). Error categories include hallucinated facts in the answer key, incorrect temporal reasoning, and speaker attribution errors. Examples: The answer key specifies "Ferrari 488 GTB," but the source conversation contains only "this beauty" and the image caption reads "a red sports car." The car model exists only in an internal query field (annotator search strings for stock photos) that no memory system ingests. Systems are evaluated against facts they have no access to. "Last Saturday" on a Thursday should resolve to the preceding Saturday. The answer key says Sunday. A system that performs the date arithmetic correctly is penalized. 24 questions attribute statements to the wrong speaker. A system with accurate speaker tracking will contradict the answer key. The theoretical maximum score for a perfect system is approximately 93.6%. We also tested the LLM judge. LoCoMo uses gpt-4o-mini to score answers against the golden reference. We generated intentionally wrong but topically adjacent answers for all 1,540 questions and scored them using the same judge configuration and prompts used in published evaluations. The judge accepted 62.81% of them. Specific factual errors (wrong name, wrong date) were caught approximately 89% of the time. However, vague answers that identified the correct topic while missing every specific detail passed nearly two-thirds of the time. This is precisely the failure mode of weak retrieval, locating the right conversation but extracting nothing specific, and the benchmark rewards it. There is also no standardized evaluation pipeline. Each system uses its own ingestion method (arguably necessary given architectural differences), its own answer generation prompt, and sometimes entirely different models. Scores are then compared in tables as if they share a common methodology. Multiple independent researchers have documented inability to reproduce published results (EverMemOS #73, Mem0 #3944, Zep scoring discrepancy). Full audit with all 99 errors documented, methodology, and reproducible scripts: locomo-audit LongMemEval LongMemEval-S (Wang et al., 2024) is the other frequently cited benchmark. The issue is different but equally fundamental: it does not effectively isolate memory capability from context window capacity. LongMemEval-S uses approximately 115K tokens of context per question. Current models support 200K to 1M token context windows. The entire test corpus fits in a single context window for most current models. Mastra's research illustrates this: their full-context baseline scored 60.20% with gpt-4o (128K context window, near the 115K threshold). Their observational memory system scored 84.23% with the same model, largely by compressing context to fit more comfortably. The benchmark is measuring context window management efficiency rather than long-term memory retrieval. As context windows continue to grow, the full-context baseline will keep climbing and the benchmark will lose its ability to discriminate. LongMemEval-S tests whether a model can locate information within 115K tokens. That is a useful capability to measure, but it is a context window test, not a memory test. LoCoMo-Plus LoCoMo-Plus (Li et al., 2025) introduces a genuinely interesting new category: "cognitive" questions testing implicit inference rather than factual recall. These use cue-trigger pairs with deliberate semantic disconnect, the system must connect "I just adopted a rescue dog" (cue) to "what kind of pet food should I buy?" (trigger) across sessions without lexical overlap. The concept is sound and addresses a real gap in existing evaluation. The issues: It inherits all 1,540 original LoCoMo questions unchanged, including the 99 score-corrupting errors documented above. The improved judging methodology (task-specific prompts, three-tier scoring, 0.80+ human-LLM agreement) was only validated on the new cognitive questions. The original five categories retain the same broken ground truth with no revalidation. The judge model defaults to gpt-4o-mini. Same lack of pipeline standardization. The new cognitive category is a meaningful contribution. The inherited evaluation infrastructure retains the problems described above. Requirements for meaningful long-term memory evaluation Based on this analysis, we see several requirements for benchmarks that can meaningfully
View originalShow HN: Forkrun – NUMA-aware shell parallelizer (50×–400× faster than parallel)
forkrun is the culmination of a 10-year-long journey focused on "how to make shell parallelization fast". What started as a standard "fork jobs in a loop" has turned into a lock-free, CAS-retry-loop-free, SIMD-accelerated, self-tuning, NUMA aware shell-based stream parallelization engine that is (mostly) a drop-in replacement for xargs -P and GNU parallel.<p>On my 14-core/28-thread i9-7940x, forkrun achieves:<p>* 200,000+ batch dispatches/sec (vs ~500 for GNU Parallel)<p>* ~95–99% CPU utilization across all 28 logical cores, even when the workload is non-existant (bash no-ops / `:`) (vs ~6% for GNU Parallel). These benchmarks are intentionally worst-case (near-zero work per task) because they measure the capability of the parallelization framework itself, not how much work an external tool can do.<p>* Typically 50×–400× faster on real high-frequency low-latency workloads (vs GNU Parallel)<p>A few of the techniques that make this possible:<p>* Born-local NUMA: stdin is splice()'d into a shared memfd, then pages are placed on the target NUMA node via set_mempolicy(MPOL_BIND) before any worker touches them, making the memfd NUMA-spliced. Each numa node only claims work that is <i>already</i> born-local on its node. Stealing from other nodes is permitted under some conditions when no local work exists.<p>* SIMD scanning: per-node indexers/scanners use AVX2/NEON to find line boundaries (delimiters) at speeds approaching memory bandwidth, and publish byte-offsets and line-counts into per-node lock-free rings.<p>* Lock-free claiming: workers claim batches with a single atomic_fetch_add — no locks, no CAS retry loops; contention is reduced to a single atomic on one cache line.<p>* Memory management: a background thread uses fallocate(PUNCH_HOLE) to reclaim space without breaking the logical offset system.<p>…and that’s just the surface. The implementation uses many additional systems-level techniques (phase-aware tail handling, adaptive batching, early-flush detection, etc.) to eliminate overhead, increase throughput and reduce latency at every stage.<p>In its fastest (-b) mode (fixed-size batches, minimal processing), it can exceed 1B lines/sec.<p>forkrun ships as a single bash file with an embedded, self-extracting C extension — no Perl, no Python, no install, full native support for parallelizing arbitrary shell functions. The binary is built in public GitHub Actions so you can trace it back to CI (see the GitHub "Blame" on the line containing the base64 embeddings). Trying it is literally two commands:<p><pre><code> . frun.bash frun shell_func_or_cmd < inputs </code></pre> For benchmarking scripts and results, see the BENCHMARKS dir in the GitHub repo<p>For an architecture deep-dive, see the DOCS dir in the GitHub repo<p>Happy to answer questions.
View originalShow HN: Threadprocs – executables sharing one address space (0-copy pointers)
This project launches multiple independent programs into a single shared virtual address space, while still behaving like separate processes (independent binaries, globals, and lifetimes). When threadprocs share their address space, pointers are valid across them with no code changes for well-behaved Linux binaries.<p>Unlike threads, each threadproc is a standalone and semi-isolated process. Unlike dlopen-based plugin systems, threadprocs run traditional executables with a `main()` function. Unlike POSIX processes, pointers remain valid across threadprocs because they share the same address space.<p>This means that idiomatic pointer-based data structures like `std::string` or `std::unordered_map` can be passed between threadprocs and accessed directly (with the usual data race considerations).<p>This accomplishes a programming model somewhere between pthreads and multi-process shared memory IPC.<p>The implementation relies on directing ASLR and virtual address layout at load time and implementing a user-space analogue of `exec()`, as well as careful manipulation of threadproc file descriptors, signals, etc. It is implemented entirely in unprivileged user space code: <<a href="https://github.com/jer-irl/threadprocs/blob/main/docs/02-implementation.md" rel="nofollow">https://github.com/jer-irl/threadprocs/blob/main/docs/02-imp...</a>>.<p>There is a simple demo demonstrating “cross-threadproc” memory dereferencing at <<a href="https://github.com/jer-irl/threadprocs/tree/main?tab=readme-ov-file#demo" rel="nofollow">https://github.com/jer-irl/threadprocs/tree/main?tab=readme-...</a>>, including a high-level diagram.<p>This is relevant to systems of multiple processes with shared memory (often ring buffers or flat tables). These designs often require serialization or copying, and tend away from idiomatic C++ or Rust data structures. Pointer-based data structures cannot be passed directly.<p>There are significant limitations and edge cases, and it’s not clear this is a practical model, but the project explores a way to relax traditional process memory boundaries while still structuring a system as independently launched components.
View originalFlash-KMeans Dropped and It Makes sklearn Look Slow
Flash-KMeans brings Flash Attention-style optimizations to K-Means clustering — 5-16x faster with less memory. Here's what it means for your ML pipelines.
View originalBuilding an AI Coding Practice Mentor with Persistent Memory using Hindsight
🚀 Introduction Preparing for coding interviews is a challenging journey for students. Most coding...
View original[P] Zero-code runtime visibility for PyTorch training
https://preview.redd.it/kfjsajv7h7qg1.png?width=1862&format=png&auto=webp&s=373b5d81aa2bb3b7fcff2e09cab9c17cd73d9c20 I added a zero-code mode to TraceML (oss) : traceml watch train.py It gives a live terminal view of system + process metrics during PyTorch training, with normal stdout/stderr still visible. Built for the case where a run feels slow and you want a quick first-pass view before adding instrumentation or reaching for a heavier profiler. Current limitation: not for multi-node launches yet. Repo: https://github.com/traceopt-ai/traceml/ submitted by /u/traceml-ai [link] [comments]
View originalCalling OpenAI from a PHP framework the same way you query Redis or Memcache
Temma is a PHP MVC framework designed to be easy to pick up and use. It sits between micro-frameworks...
View originalOpen-source memory layer for OpenAI apps. Your chatbot can now remember things between sessions and say "I don't know" when it should.
If you're building apps with the OpenAI API, you've probably hit this: your chatbot forgets everything between sessions. You either stuff the entire conversation history into the context window (expensive, slow) or lose it all. I built widemem to fix this. It's an open-source memory layer that sits between your app and the API. It extracts important facts from conversations, scores them by importance, and retrieves only what's relevant for the next query. Instead of sending 20k tokens of chat history, you send 500 tokens of actual relevant memories. Just shipped v1.4 with confidence scoring. The system now knows when it doesn't have useful context and can say "I don't know" instead of hallucinating from low-quality vector matches. Three modes: - Strict: only answers when confident - Helpful: answers normally, flags uncertain stuff - Creative: "I can guess if you want" Also added retrieval modes (fast/balanced/deep) so you can choose your accuracy vs cost tradeoff, and mem.pin() for facts that should never be forgotten. Works with GPT-4o-mini, GPT-4o, or any OpenAI model. Also supports Anthropic and Ollama if you want alternatives. GitHub: https://github.com/remete618/widemem-ai Install: pip install widemem-ai Would appreciate any feedback or suggestions. Thanks! submitted by /u/eyepaqmax [link] [comments]
View originalI gave a local LLM memory, moods, and a task loop. Then it wrote a philosophical book.
The Problem with AI Today: Goldfish Memory Most of our interactions with AI today look...
View originalMem uses a tiered pricing model. Visit their website for current pricing details.
Based on user reviews and social mentions, the most common pain points are: llm, API costs, large language model, ai agent.
Based on 78 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Sentdex
Creator at Python & AI YouTube
2 mentions