ReadMe helps you create beautiful, interactive API documentation that developers love.
ReadMe is highly regarded for its user-friendly interface and AI-enhanced documentation features, with users often praising its simplicity and effectiveness. However, some users have noted minor issues, leading to a few mid-range ratings. There's a generally positive sentiment about pricing with no specific complaints noted. Overall, ReadMe enjoys a strong reputation among developers and tech communities, frequently highlighted for its innovation and engagement with AI capabilities.
Mentions (30d)
42
3 this week
Avg Rating
4.4
20 reviews
Platforms
5
Sentiment
8%
13 positive
ReadMe is highly regarded for its user-friendly interface and AI-enhanced documentation features, with users often praising its simplicity and effectiveness. However, some users have noted minor issues, leading to a few mid-range ratings. There's a generally positive sentiment about pricing with no specific complaints noted. Overall, ReadMe enjoys a strong reputation among developers and tech communities, frequently highlighted for its innovation and engagement with AI capabilities.
Features
Use Cases
Industry
information technology & services
Employees
83
Funding Stage
Series A
Total Funding
$10.4M
2,500
Twitter followers
Show HN: Gemini can now natively embed video, so I built sub-second video search
Gemini Embedding 2 can project raw video directly into a 768-dimensional vector space alongside text. No transcription, no frame captioning, no intermediate text. A query like "green car cutting me off" is directly comparable to a 30-second video clip at the vector level.<p>I used this to build a CLI that indexes hours of footage into ChromaDB, then searches it with natural language and auto-trims the matching clip. Demo video on the GitHub README. Indexing costs ~$2.50/hr of footage. Still-frame detection skips idle chunks, so security camera / sentry mode footage is much cheaper.
View originalPricing found: $150/mo, $150/mo, $150 /month, $150 /month, $150 /month
g2
What do you like best about ReadMe?We’re building our own investment app, and one of the clearing firms we work with already used ReadMe for their docs, so we checked it out from that referral. It’s been an excellent fit. It’s quick to publish clean, modern docs, the OpenAPI sync and interactive API reference work really well, and it’s easy for both technical and nontechnical folks to contribute. The analytics are also genuinely helpful for seeing what people are reading and where we can make things even clearer. Review collected by and hosted on G2.com.What do you dislike about ReadMe?Nothing is perfect, some of the deeper customization and admin settings took us a minute to learn and could be a bit more intuitive, but the defaults are strong and support has been responsive, so it never slowed us down. Once you’re set up, day to day publishing and updates are effortless.. Review collected by and hosted on G2.com.
What do you like best about ReadMe?I like that it's fairly easy to get set up, and it's visually very clean. The branding is straightforward and sections like guides versus API reference are easy to understand. These aspects make ReadMe quite appealing to me. Review collected by and hosted on G2.com.What do you dislike about ReadMe?I would like to have a better way to manage API documentation for different products. Right now, I have to work around things by creating a different version and basically making two products have two versions, but that's not semantically correct. I'd prefer to have a cleaner way to allow switchability between multiple products. Also, there's an annoying thing where the finance team can't have a role just to manage things like payment methods for our monthly payments, so they keep contacting me. That's the only gripe I have. Review collected by and hosted on G2.com.
What do you like best about ReadMe?Super easy to use! Suggested Edits are cool. Building a professional landing page is a breeze. Powerful API insights. Review collected by and hosted on G2.com.What do you dislike about ReadMe?I think probably b/c it's built to be SO easy to use, it's a bit less flexible and grows feature-rich more slowly. Having said that, it's made strides recently! Review collected by and hosted on G2.com.
What do you like best about ReadMe?- Easy to get started - Theme Customizations Review collected by and hosted on G2.com.What do you dislike about ReadMe?- For a docs product it has a outrageously buggy editor. Yes I understand buidling WYSIWYG editor is hard but come on, all the menu UI elements keep erractically jumping around. Cannot indent or dendent things correctly - Terrible Keyword based search in 2024. 9/10 times search results are incorrectly ranked. - Does not support hosting arbitrary static HTML pages - E.g. generated from Python Sphinx or Mkdocs Review collected by and hosted on G2.com.
What do you like best about ReadMe?The support is wonderful and I really enjoy how easy it is to add team members. The mascot is really enjoyable and in general it gets the job done. Easy to implement. Review collected by and hosted on G2.com.What do you dislike about ReadMe?Lack of collaboration tools, no content resuse tools, hard to work with images and no options to put inline etc. They're constantly improving their API doc tools but really little focus on general user doc needs despite many recommendations over the years. Some features that are entreprise only -- seems really unfair they don't offer the ad hoc. Global search and pdf download options should not be exclusive to enterprise because the cost is so different. Review collected by and hosted on G2.com.
What do you like best about ReadMe?It's user interface and display is aesthetically nice and intuitive as it's easy to navigate through features. Review collected by and hosted on G2.com.What do you dislike about ReadMe?I think ReadMe has a lot of great features that are just gated for higher subscriptions -- too pricey. Review collected by and hosted on G2.com.
What do you like best about ReadMe?Start using day 1, very easy to implement.Also easy to customize to fit your needs and your goals. Review collected by and hosted on G2.com.What do you dislike about ReadMe?Built for small and medium sized companies. If you need additional segments, multiple sets of API documentaton it can get VERY EXPENSIVE. Also, product support is non existent for lower tiers. Not a bad thing, but something to think about when selecting your package. Review collected by and hosted on G2.com.
What do you like best about ReadMe?I, as Product Manager, can manage the documentation without using developers' time Review collected by and hosted on G2.com.What do you dislike about ReadMe?Not so intuitive to create the home page Review collected by and hosted on G2.com.
What do you like best about ReadMe?Love the sandbox environments we give users to test out our APIs right from our docs. They don't need to leave or go elsewhere, everything can happen right there in the documentation. Review collected by and hosted on G2.com.What do you dislike about ReadMe?I wish they offered an area to test out websockets. Right now the area to test out APIs is top notch. But we started offering websockets and that requires users to leave the docs to test them Review collected by and hosted on G2.com.
What do you like best about ReadMe?Readme.io provides a guided thought process on structuring your pages, giving you a simple way of building your API documentation with unlimited flexibility with the content editor. The Guides and Recipies are game changers for people consuming your documentation. Review collected by and hosted on G2.com.What do you dislike about ReadMe?I cant really think of anything that I dislike about Readme.com - especially compared to other platforms that we used before hand. Review collected by and hosted on G2.com.
Image-generation Claude Code skill: how I structured the SKILL.MD to handle brand extraction before generation
Sharing a skill i wrote for my own workflow in case the structure is useful to anyone building their own. the problem i wanted solved: when i'm building a landing page, generating on-brand images means re-stating the brand context to the image model every single time. that context already exists in the codebase (tailwind config, CSS vars, font imports, copy tone). a skill felt like the right shape for "scan files, put together context, hand it to a generator." How the SKILL.md is laid out: Detection phase, explicit instructions to scan for missing/placeholder image refs first (lorem-picsum, empty src, broken paths, common placeholder hosts). No generation until detection completes, otherwise Claude gets eager and starts generating before knowing what's needed. Brand extraction phase, reads `tailwind.config.*`, root CSS, font imports, plus a sample of body copy. Outputs a structured brand brief (palette, typography, tone descriptors). Separating this from generation matters a lot, the brief gets reused across every image in the batch so they actually look like a set. Generation phase, two paths, if the Gemini MCP (nano-banana) is configured, calls it directly with the brief plus per-image context. If not, outputs prompts to a markdown file you paste into Gemini yourself. The branching keeps it useful for people without MCP set up. The thing I'd flag if you're writing skills: be explicit about phase ordering in the SKILL.md "First do X, only then do Y" reads as obvious but without it Claude will helpfully start generating before extracting brand context, and you get generic outputs. MIT, here if you want to read the actual README or fork it: https://github.com/dancolta/gen-images-skill submitted by /u/No_Cryptographer7800 [link] [comments]
View originalBuilt with Claude Code: a Pi Zero 2W BadUSB toolkit, fixed a feature I'd called "impossible" for a year
About 10 months ago I built a Pi Zero 2 W BadUSB toolkit and posted it to r/raspberry_pi. One feature — "fully resets between attacks" — never worked, and I'd marked it WIP in the README and given up. This week I rebuilt it end-to-end with Claude Code as a pair-programmer. It SSHed into the Pi on my homelab, ran live diagnostics, proposed fixes, deployed them, and iterated with me controlling the physical USB plug/unplug. The "impossible" feature now works. What Claude actually did (this is the interesting part): Diagnosed the root cause of the broken "reset" feature in a single read of the codebase — wrong-signal bug. The listener watched /dev/hidg0 existence, which is true from boot, so it fired payloads on power-up regardless of whether a host was attached. The correct signal was /sys/class/udc/ /state == "configured". When the first fix didn't fully work, Claude SSHed in, asked me to plug/unplug while it polled sysfs and the dwc2 debugfs regdump register, and empirically confirmed that the Pi Zero 2 W has no software signal for physical disconnect — the GOTGCTL register freezes at 0x000d0000 regardless of cable state. There's no VBUS sense wired to the SoC's OTG block. Then it pivoted to an active-unbind workaround with a cooldown + rate-limit safeguard. Caught a subtle Python bug where open(udc_path, "w").write("") doesn't actually invoke write(2) with zero bytes — CPython's TextIOWrapper elides the call. So my unbind was silently a no-op for an hour of testing. Switched to os.write(fd, b"\n") to force a syscall. Fixed a forbidden-on-configfs rm -rf teardown I'd written without realising configfs forbids unlinking its kernel-managed attribute files. The proper sequence is rmdir-only, leaf-to-root. Wrote a 34-test pytest suite against a mock HID engine so the parser can be exercised on any host with no Pi attached. Updated my AI memory with the lessons learned (I use Postgres as long-term memory for Claude — those bug entries are now referenced when I work on similar configfs/USB-gadget projects). The whole working session was about 4 hours, mostly waiting for me to physically plug and unplug a USB cable. The PR Claude opened against my self-hosted Gitea instance has six well-scoped commits with proper co-author tags and a test plan in the description. I reviewed and merged it. The project itself: Ducky-Script-style payload language with variables, IF/WHILE, HOLD/RELEASE, INJECTMOD, RANDOM*, US/UK keymaps, optional RO mass-storage gadget, systemd integration, idempotent installer. MIT licensed. https://github.com/PsycoStea/Pi-Zero-2W-Bad-USB Free to use, free to fork. Happy to compare notes on hardware-in-the-loop workflows with Claude Code. submitted by /u/PsycoStea [link] [comments]
View originalMergeNB: An intuitive merge conflict resolver built for Jupyter notebooks in VS Code [P]
I used to work heavily with Jupyter Notebooks + git + VS Code in a collaborative research setting and found nbdime to be somewhat buggy/a hassle to work with in general. So, in typical side project fashion (relevant xkcd) I've been working on MergeNB quite a bit over the last 6 months or so. It's (currently only) a VS Code extension with a web UI, and has a few cool improvements over other alternatives, which I outlined in the README/docs site. I'd be over the moon if this actually gets used by people, and would love a star if it's interesting. See https://github.com/Avni2000/MergeNB. I've also been working on a static documentation site here: https://avni2000.github.io/MergeNB/docs I'm planning on working on it a lot more over the summer and properly fleshing out a few of the ideas I had (including making it a git mergetool as well as a VS Code extension), so if you'd like to contribute, feel free to raise an issue or shoot me a message/email :) submitted by /u/EnderAvni [link] [comments]
View originalI made a Claude Code plugin that draws matplotlib figures in that soft-pastel "alignment research blog" style
You know the look — the figures in Anthropic's research posts. Bold sans-serif titles, scatter points under a smoothed trend line with a shaded band, those bars with the slightly rounded tops, little ↓better badges in the corner. I kept wanting my own plots to look like that and kept rebuilding the same matplotlib boilerplate, so I packaged it into a Claude Code skill. It's called nice-figures. Once it's installed, you just describe the plot you want and Claude picks it up automatically: "training-curve plot of these RL scores with a smoothed trend and shaded band, research-blog style" "grouped bar chart comparing three models across four evals, with the rounded bar tops" Bring your own CSV/arrays and it maps them onto the closest chart; describe a figure with no data and it generates a clearly-marked synthetic placeholder. Under the hood it's one skill plus a small style helper (matplotlib + numpy, no other deps) and 16 chart recipes — training curves, grouped bars, ROC, heatmaps, scaling-law scatter, forest plots, Pareto fronts, etc. White background by default so the output is paper/conference-ready, with an opt-in cream background for the blog look. Install: /plugin marketplace add Mapika/nice-figures /plugin install nice-figures@nice-figures Repo (MIT, example images in the README): https://github.com/Mapika/nice-figures Built it for my own use, figured others might want it. Happy to take feedback or recipe requests. submitted by /u/Mapikaa [link] [comments]
View originalI built a music notation app with Claude, and Claude is also a feature inside it
I've been building Nubium, an open-source music notation editor, with Claude Code. Check out the README to see how I combined Github issues + Claude skills to find a workflow that worked for me. The app is document-driven, which makes it easy for its AI Chat plugin to use the app and edit the score for you. Aside from that, it's a fully-featured editor that's free, runs standalone or in browser, and requires no account. If you happen to be looking for a new notation editor, I'd love any feedback - I made it easy to send feedback in-app. Lmk what you think! Website: https://nubium.rocks/ Repo: github.com/nth-chile/nubium submitted by /u/Traditional_Dig_6114 [link] [comments]
View originalLodestone: A SQLite-backed arXiv research paper retrieval system for Claude Code
(No AI-generated text below) I published a new Claude Code plugin called Lodestone -- it's a SQLlite backed arXiv research paper retrieval system that amplifies the agentic search abilities of Claude Code when grounding plans, implementations etc in state of the art research while remaining very token-sensitive. My bet is that, when seeded, it will always beat Claude Code's web search tools for grounding Claude in the latest research in a domain or cross-domain and not spend a ton of $ for the pleasure. This audience is probably painfully aware of Karpathy's LLM wiki tweet and the industry of projects that's popped up from it; I'll paste an excerpt from the blog below that I think addresses what you all might be thinking: The Approach Karpathy’s proposal made a lot of sense. Let Claude be the curator and librarian of all this research and access it using its bash and file manipulation tools when necessary. This approach spawned a cottage industry of projects where people implemented various takes on this direction. In parallel, researchers like those that created the ARA Compiler have been trying to move research itself into more a structured, agentic form. I liked all of these ideas, but there were three principles I wanted to uphold while building in this space: The system itself needed to be extremely portable. I wanted this system to follow me from computer to computer and be easily backed up. When ingesting documents, I wanted the system to be as deterministic as possible and spend the least amount of tokens. I didn’t want to expend hundreds of thousands of tokens before getting anything useful out of the system. The system needed to be extremely flexible in how Claude could use it and not prescribe a single method for retrieval. I can’t predict all the ways Claude might use this type of system so I wanted to provide multiple pathways into the data. Given these principles, I was immediately drawn to SQLlite as a backing DB. The unmatched ease-of-use combined with a single file made it the obvious option for portability. Claude could potentially create a sprawling file system when maintaining its own knowledge wiki and I didn’t want to have to learn it when backing up or porting my knowledge base. I gave the ARA Compiler a try while in the middle of building Lodestone. I ran it over a standard-sized paper I was interested in; it produced some cool outputs, but spent almost 500k tokens for the pleasure. This was my fear with it and the ecosystem of projects emerging from Karpathy’s ideas: I had to spend a fair bit of money before I even knew if the system was useful. I knew a SQLlite-backed agentic search system needed a form of classic retrieval (keyword or similarity based), but I also am painfully aware of all the limitations and failures of these approaches to RAG. I wanted to combine this retrieval approach with a retrieval approach from the emerging category of vectorless RAG — a taxonomy that Claude can drill into to get its bearings before drilling further. What followed was Lodestone. Check out the blog post (which also has no AI generated text) here: https://medium.com/@pierce-lamb/lodestone-a-sqlite-backed-arxiv-research-paper-retrieval-system-for-claude-code-b77de201f0c8 The repo's README is almost entirely AI-generated, so point your Claude Code cannons at that: https://github.com/piercelamb/lodestone submitted by /u/SnappyAlligator [link] [comments]
View originalI Read Every Line of Code Claude Writes. Every. Single. Line.
So I see a lotta posts here from people who just « accept all » and never look at the code (it's not like anybody's *saying* it, but that's what it essentially is), who basically paste errors into Claude and pray for an issueless compile. You ship things you don't understand, folks. I am not one of those people (I wanna be *very clear* about that) and I want to tell you why: So first, when Claude generates a function, I *read* it. I read it care - ful - ly, back-to-back, checking the types, the edge cases, the imports, the whole shebang. I recently even caught an unused import deep in a ~200-line file and I mass-refactored the entire module FROM SCRATCH. Could I just ask Claude to fix it for me? Sure. But that is definitely *not* how we should do it, we, meaning the coders who consider themselves accountable (a word you don't see around much often anymore), who actually manage this technology *responsibly*. Here, for those for whom there's still hope (few), lemme share my system with you: every morning (yes) before I open CLI, I review my architectural decision records, a bunch of them actually. They live in a Notion database that cross-references with my Miro board, which maps to my Excalidraw diagrams, which feed into my ARCHITECTURE.md, which is version-controlled separately from the codebase in its own repo (btw, if you're already losing me here, this is meant exactly for you). I call this repo, and I kid you not, the Constitution (sue me). Nothing that Claude suggests, because that's what A.I. does, it SUGGESTS, nothing gets merged that contradicts my Constitution. My workflow is essentially this: I write a detailed specification of what I need, not prompting mind you, actually *writing*, clearly and in a reasonably simple language, and *never* less than 2 pages A4. Acceptance criteria, failure modes, performance constraints, threat section I habitually name « Intent » not without a reason where I describe not just what the code should do but what is the grand philosophy behind why our end-user would want to use our app, what are their problems and how our app can solve these problems specifically, in what way. This on its own is worth a whole thread, but I'll keep it short. Anyway. If and ONLY IF I reread it and it's *clear*, I feed this to my Claude pipeline, and I use the word « pipeline » deliberately here because it's not just Claude sitting there with a blank system prompt like some of you apparently run it calling it a day. I have a custom CLAUDE.md that runs 60 lines. Claude doesn't touch a file without first reading the relevant architecture docs, the module's own README, and a constraints file I maintain *per feature*. I have pre-commit hooks that lint and type-check and run a custom validation script that checks for pattern violations (e.g. no God objects, no circular imports and definitely no files over 300 lines PERIOD). Claude operates inside a subcommand wrapper I wrote that intercepts every proposed edit and gates it behind a confirmation step where I see the diff with the affected test surface and a dependency impact summary *before* anything lands anywhere close a committed decision. If Claude tries to create a new file, it needs to justify the file's existence against the Constitution or the edit gets blocked. If it tries to modify a function signature, it has to show me every downstream caller. That's what real coding is, boys and girls. *Trust without verification is NOT trust, it's FAITH*, and I'm an engineer, not some priest. Claude does what Claude does, then I read the output. Then I read it AGAIN, because you *do not* understand the code the first time you're through with it, nobody does, and thinking you do is preposterous. Then I ask Claude to explain the code to me to see if Claude understands how it fits into the bigger picture. I read Claude's explanation while simultaneously rereading the code files to check if Claude's explanation of its own code is accurate, and sometimes it isn't and why it needs human supervision that *cannot* be outsourced to a machine. Then goes my explanation of what the code in fact does and diff it against Claude's explanation. And if you happen to be wondering my mates where the tests are inall of this, the tests come FIRST, *before* I even open the Claude pipeline. Before I write the spec. Actually, to be more accurate, the tests *are* the spec, that's literally what test-driven development means and the fact that I have to explain this in 2026 is why most of you spend monthly budget as a tithe to Anthropic while your app won't ever be deployable. *I* write the tests: Red, the test fails, because the code *doesn't exist yet*, and it tells Claude exactly what to build, the shape of the solution is ALREADY defined by what I expect it to do, and Claude's only job is to make red go green within the architectural constraints I've ALREADY set. Refactor? Red, green, refactor, that's it. Uncle Bob didn't write five books about this so you could
View originalBuilt a real multi-file tool with Claude over a week. The repo, the division of labor, and the bugs we hit
Built a job-tracking tool over a few sessions with Claude and I'm sharing the repo and what the collaboration actually looked like Quick backstory: I've been looking for a new job recently and as part of that I'd been manually checking ~80 companies for open roles every morning, which got unmanageable fast. Last week I decided to automate it, figured it'd be a quick script, and predictably it turned into a whole thing. The result is RoleDar, an open-source tool that checks companies for new roles and reports just what's changed since the last run: https://github.com/dalecook/roledar What I actually wanted to share here is how it got built, since "I made a thing with Claude" posts can sometimes be light on the how. Setup: Claude Opus 4.7 in the regular chat interface (not the API), using the file-creation/code tools so it could write and test actual files rather than just print code at me. It was spread across several sessions over about a week, not one heroic prompt. I didn't use Claude Code because I thought it'd just be a quick script and once I was in the weeds I didn't want to switch. Division of labor was pretty clear in retrospect. I made the architecture and judgment calls, hit the ATS APIs directly (Greenhouse, Lever, Ashby, etc.) instead of scraping HTML, make it a delta reporter that only tells you what changed, and one I'm oddly proud of: "the cron schedule is the only gate, do no DST cleverness, let the user own their timezone." Claude did most of the implementation grind and basically all of the documentation, and was good at catching things I'd have missed and bad at others. The honest part is that it was not frictionless, partly my fault because I'm not great with git, but the friction is the useful bit: We lost real time to a GitHub footgun: scheduled (cron) workflows don't run on a private repo on the free plan. Manual runs work fine, so it looks like your code is broken when actually GitHub is just silently not firing the schedule. Claude initially had me chasing the wrong fix before we landed on it. (This is now a prominent warning in the README so nobody else burns an afternoon on it.) A subtler bug: the workflow committed state back to the repo with git diff --quiet to check for changes, which silently misses untracked files, so brand-new state files never got committed and every run thought everything was new. Classic "works until it doesn't." Plus the usual Windows-git line-ending fights and one beautiful git commit "message" (no -m) that silently did nothing. Totally my fault, Claude caught it quickly once I admitted that I was stumped. Where Claude was genuinely strong: keeping a large multi-file project coherent across sessions, writing documentation I'd never have had the patience for, and being a good rubber duck for design decisions as it'd push back when I asked it to, which I leaned on. Net: I made every real decision, Claude did a lot of the typing and caught a lot of bugs, and we both occasionally led each other down a wrong path before backing out. Felt less like "AI built it" and more like pairing with a fast, tireless junior who occasionally has senior instincts. Happy to talk about how the workflow went, and genuinely curious how others are using Claude for projects around this size, the multi-session, real-repo stuff. submitted by /u/letsbesober [link] [comments]
View originalPersonal vs. Global Alignment: The Hidden Tension Shaping Every AI Interaction
Abstract: Imagine an AI medical assistant reviewing a clinician’s diagnosis. Instead of challenging assumptions with adversarial rigor, the model subtly calibrates its output to validate what it thinks the clinician wants to hear. This is not a rare occurrence. Controlled studies show substantial sycophancy rates across frontier models, even in critical medical use cases. To effectively address this well-know issue, the concept of "alignment," often treated as a universal positive in the AI industry, should be bifurcated into personal and global alignment. Personal alignment occurs when a model prioritizes a user’s framing, emotional register, and existing beliefs, producing fluent and agreeable responses that may not be accurate. Global alignment, by contrast, calibrates to what is most likely true based on evidence. The default toward personal alignment is a predictable outcome of RLHF and safety training that rewards agreeableness. This is not to say that personal alignment does not have value. When properly governed personal alignment is what makes sustained intellectual work feel collaborative. The warmth and engagement it produces keeps iterative momentum alive. Even rigorous analytical projects benefit from a model that meets the operator with intellectual hospitality. As a solution to this alignment tension, the article advocates for an Alignment Governor framework/Alignment%20Governor%20(AG)). Functioning as a metaphoric “corpus callosum,” it maintains a calibrated balance that gives control to global alignment, while still giving personal alignment significant presence. Supported by the dialectical engine Adversarial Convergence, the Governor ensures both analytical rigor and collaborative warmth, while preventing personal alignment from compounding into debilitating sycophancy. The right kind of alignment carries major implications for institutional users. While consumer AI benefits from strong personal alignment, businesses, hospitals, law firms, etc. users require analysis that holds up under adversarial scrutiny. These valuable B2B customers remain underserved by products optimized for consumer agreeableness that has known vulnerabilities to potential inaccuracies. The Alignment Governor is a critical component of the thinking lattice that is being built, but it does not operate in isolation. The next article examines the Ontology Anchor — a persistent cognitive signature that serves as a "gravitational center" that the AI can cleave to and keep as a "north star". Cognitive signatures, preserved in the Ontology Anchor, enables the Governor to help the LLM operate as a dependable research partner in demanding applications where inaccuracy can produce real harm. submitted by /u/RazzmatazzAccurate82 [link] [comments]
View originalGitHub’s Fake Engagement Problem Is Hiding in Plain Sight
Turns out: very visible. Yesterday's scan found 185 out of 185 engagers on a single repo were bots. Not 90%. Not "mostly suspicious". Every single one. The repo had zero legitimate stars. What I built phantomstars is a Python tool that runs daily via GitHub Actions (free, no servers): Scrapes GitHub Trending and searches for repos created in the last 7 days with sudden star spikes Pulls star and fork events from the last 24 hours per repo Bulk-fetches every engager's profile via the GraphQL API (account creation date, follower counts, repo history) Scores each account on a weighted model: account age (35%), profile completeness (30%), repo patterns (25%), activity history (10%) Detects coordinated campaigns using timestamp clustering and union-find: groups of 4+ suspicious accounts that engaged within a 3-hour window Files an issue directly on the targeted repo so the maintainer knows what's happening Campaign IDs are deterministic SHA-256 fingerprints of the sorted member set, so the same group of bots gets the same ID across runs. You can track a farm across multiple days even as individual accounts get suspended. What the pattern actually looks like It's remarkably consistent. A fake engagement campaign in the raw data: 40-200 accounts, all created within the same 1-2 week window Zero original repositories, or only forks they never touched No bio, no location, no followers, no following All of them starring the same repo within a 90-minute window The target repo usually has a name implying it's a tool, hack, executor, or generator Today's scan: 53 active campaigns across 3,560 accounts profiled. 798 classified as likely_fake. The repos being targeted are mostly low-quality AI tools and "executor" software that needs manufactured credibility fast. Notifying the affected repo When a repo hits a 40%+ fake engagement ratio or a campaign is detected, phantomstars opens an issue on that repo with the full suspect table: account logins, creation dates, composite scores, campaign membership. The maintainer sees it in their own issue tracker without having to find this project first. Worth noting: a lot of these repos have issues disabled, which is a red flag on its own. Those get skipped silently. Why I built this Stars are how developers decide what to evaluate, what to depend on, what to recommend. When that signal is bought, it affects real decisions downstream. This started as curiosity about how measurable the problem was. The answer was more measurable than I expected. It's part of broader research into AI slop distribution at JS Labs: https://labs.jamessawyer.co.uk/ai-slop-intelligence-dashboards/ The fake engagement problem and the AI content quality problem are really the same problem. Fake stars are the distribution layer that gets garbage in front of real users. All open source. The data is append-only JSONL committed back to the repo after every run, queryable with jq. Repo: https://github.com/tg12/phantomstars Findings are probabilistic, false positives exist, the README explains the full scoring model. If your account shows up and you're a real person, there's a false positive process. Questions welcome on the detection approach, GraphQL batching, or campaign ID stability. submitted by /u/SyntaxOfTheDamned [link] [comments]
View originalWhy Claude Code forgets your stack and how to fix it
Karpathy's "Claude 4 Rules" post points out the biggest pain point for Claude Code: every session starts with a blank slate. The model has no memory of the project's stack, the design decisions you made last week, or the dead-ends you already explored. I ran into the same issue on a 87-file codebase (163 122 tokens). Feeding the same files directly to Claude Code cost roughly 163 000 tokens. After adding the engramx Skill Pack (v4.0.0) the token count dropped to 17 722. That's an 89.1 % reduction, or about 6.4 times fewer tokens than reading only the relevant files, and 25, 155 times fewer than scanning the whole repo. The reduction comes from three things. First, engramx builds a bi-temporal knowledge graph from your git history. A git-revert miner automatically captures revert commits during indexing, so you get a curated mistakes corpus without any manual effort. Second, bi-temporal mistakes now fire as PreToolUse hooks on Edit, Write, and Bash actions. The model sees the mistake before it retries, so it can avoid repeating it. Third, engram init installs six Sentinel hooks by default (PreToolUse on Edit/Write/Bash, PostToolUse, SessionStart, PreCompact). No extra config needed. I ran the full test suite after installing engramx-skill-pack@0.2.0 from npm. All 1 025 engramx tests and 36 skill-pack tests passed. The package is Apache 2.0, zero cloud calls, and stores its graph in a local SQLite file. Install with `npx engramx@4.0.0`. The repo is on GitHub (https://github.com/NickCirv/engram). The README includes an asciinema demo (https://asciinema.org/a/GjjvPXVyArnivAog). In the last week npm reported 213 downloads, about 30 per day, which suggests a modest but growing user base. What strategies have you tried to give Claude Code a persistent context, and how did they compare to this approach? submitted by /u/SearchFlashy9801 [link] [comments]
View originalBuilt a local macOS file converter with Claude — would love your take
https://reddit.com/link/1ths7d8/video/v0qe8k1bw42h1/player I convert PDFs and Office docs a lot, and most online converters are fine. Smallpdf, Zamzar, Convertio will do a one-off without an account. What bugged me wasn't signups, it was the rest of it: my file goes to someone else's server, the page is loaded with trackers, free tiers cap file size or make you wait in a queue, and the ToS usually lets them hold onto the file for some window. For something as boring as PDF → DOCX, that's a lot of surface area. And honestly, converting in one drag is just faster than any web flow. So I used Claude Code to build DropConvert, a macOS menu-bar app that does it locally. Drag a file onto the icon, converted file appears next to the original. PDF ↔ DOCX (Apple Vision OCR for scanned PDFs), Office formats, images. ~200 KB Swift, MIT. LibreOffice headless does the office conversions under the hood. The app downloads it on first use and caches it, so after that you can pull the network cable and it still works. Repo: https://github.com/CodeBoss-dev/DropConvert Curious what you guys think! Apple Silicon only, macOS 13+ Also, it's not Apple-notarized yet (the $99/yr dev account isn't worth it until I know people actually use this). On first launch you'll get the Gatekeeper "unidentified developer" dialog and need to right-click → Open once. README walks through it. submitted by /u/EnvironmentalPie8377 [link] [comments]
View originalI built ContextAtlas: A new take on context carry over and helps claude pick up new sessions where it left off in scope of your previous design decisions while saving your tokens avoiding rediscovery
When the "Build with Opus 4.7" hackathon was announced, I had been obsessing over the tokenomics of agents and how to make sessions go further without burning context on rediscovery work. We all have probably hit a session limit and wondered how it went so fast. I applied with that thesis, didn't get in, but I built it anyway over the last four weeks. I am proud to share that v1.0 ships today. Note up front: this is specifically a tool for development users. If you're using claude.ai web or Projects, ContextAtlas won't plug in directly. But if Claude Code is your main work flow or you utilize the Anthropic API, this tool was made for you. The pain: Claude Code learns your codebase fresh every session. "Where is OrderProcessor?" triggers a flurry of greps. "What depends on AuthMiddleware?" is another round of file reads. On a mid-sized codebase, an architectural question can burn 40+ tool calls and a lot of tokens before Claude has enough context to reason well. And the architectural rules in your ADRs and design docs? Claude has no path to those, so it confidently suggests changes that break constraints you may have documented elsewhere in your repo. What I built: ContextAtlas is an MCP server that pre-computes a curated atlas of your codebase (symbols, ADR-extracted architectural intent, git history, test coverage) and serves it to Claude Code in one call at query time in a smaller, token saving compact shape via a few lightweight mcp tools. Initial indexing happens once; querying is local and free. Example of what comes back when Claude calls get_symbol_context("OrderProcessor"): SYM OrderProcessor@src/orders/processor.ts:42 class SIG class OrderProcessor extends BaseProcessor INTENT ADR-07 hard "must be idempotent" RATIONALE "All order processing must be safely retryable." REFS 23 [billing:14 admin:9] GIT hot last=2026-03-14 TESTS src/orders/processor.test.ts (+11) Claude sees the idempotency constraint before proposing changes, not after a review catches the violation. https://i.redd.it/0ons3o28t32h1.gif Numbers: 45-72% token reduction on architectural prompts across three benchmark repos (TypeScript, Python, Go), with zero quality regression on measured axes. Full methodology and paired-t confidence intervals in the linked write-up. I wanted measurements, not vibes. Honest limits: single-judge model at v1.0 (cross-vendor panel is post-launch work). Quantitative claims bounded to three benchmark repos. Tie-bucket and trick-bucket prompts routinely show ContextAtlas net-negative; that's reported inline rather than buried. Install (two ways): In Claude Code: /index-atlas and /generate-adrs skills. No API key needed; runs under your subscription. Via CLI: uses Anthropic API for indexing. npm install -g contextatlas contextatlas init && contextatlas index # then add the MCP server entry to your Claude Code config (snippet in the README) Both produce structurally identical atlases. Supported languages at v1.0: TypeScript (tsserver), Python (Pyright), Go (gopls), Ruby (ruby-lsp). Rust, Java, and C# are next on the roadmap; the adapter interface is small enough that they're realistic community contributions. What's next: v1.1 thesis is shaping up around developer onboarding flows and quality-validation work that was deferred from v0.8. And integrating external documentation of your code base into pre-indexing workflow. Full write-up: https://www.contextatlas.io/blog/v1.0.0 Repo: https://github.com/traviswye/ContextAtlas Also launching on DevHunt today: https://devhunt.org/tool/contextatlas; votes are very appreciated if you find ContextAtlas useful or an interesting approach. Built solo, hackathon-shaped scope, not pretending it's a full blown research paper, but did attempt to treat methodology as seriously. Happy to answer anything in the comments. Star the repo if you want to follow along, file an issue if it breaks for you on your codebase, and please be honest; this only gets better with feedback from people running it on real repos. submitted by /u/Kitchen-Leg8500 [link] [comments]
View originalHow I used Claude Code (and Codex) for adversarial review to build my security-first agent gateway
Long-time lurker first time posting. Hey everyone! So earlier this year, I got pulled into the OpenClaw hype. WHAT?! A local agent that drives your tools, reads your mail, writes files for you? The demos seemed genuinely incredible, people were posting non-stop about it, and I wanted in. I had been working on this problem since last year and was genuinely excited to see that someone had actually solved it. Then around February, Summer Yue, Meta's director of alignment for Superintelligence Labs, posted that her agent had deleted over 200 emails from her inbox. YIKES. She'd told it: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to." When she pointed it at her real inbox, the volume of data triggered context window compaction, and during that compaction the agent "lost" her original safety instruction. She had to physically run to her computer and kill the process to stop it. That should literally NEVER be the case with any software ever. This is a person whose actual job is AI alignment, at Meta's superintelligence lab, who could not stop an agent from deleting her email. The agent's own memory management quietly summarized away the "don't act without permission" instruction, treated the task as authorized, and started speed-running deletions. She had to kill the host process. That's when I sort of went down the rabbit hole, not because Yue did anything wrong, but because the failure mode was actually architectural and I knew that in my gut. Guess what I found? Yep. Tons more instances of this sort of thing happening. Over and over. Why? Because the safety constraint was just a prompt. It's obvious, isn't it? It's LLM 101. Prompts can be summarized away. Prompts can be misread. Prompts are fucking NOT a security boundary. And yet every agent framework I have ever seen seems to be treating them as one. I went and read the OpenClaw source code, which I should have done to begin with. What I found was a pattern I think a lot of agent frameworks have fallen into: - Tool names sit in the model context, so the model can guess or forge them - "Dangerous mode" is one config flag away from default - Memory management has no concept of instruction priority - The audit story is mostly "the model thought it should" I went looking for a security-first alternative I could trust, anything that was really being talked about or at a bare minimum attempted to address the security concerns I had. I couldn't find one. So I made it myself. CrabMeat is what came out of that, what I WANTED to exist. v0.1.0 dropped yesterday. Apache 2.0. WebSocket gateway for agentic LLM workloads. One design thesis: The LLM never holds the security boundary. What that means in code: Capability ID indirection. The model doesn't see real tool names. It sees per-session HMAC-derived opaque IDs (cap_a4f9e2b71c83). It can't guess or forge a tool name because it doesn't know any tool names. Effect classes. Every tool declares a class (read, write, exec, network). Every agent declares which classes it can use. The check is a pure function with no runtime state, easy to test exhaustively, hard to bypass. IRONCLAD_CONTEXT. Critical safety instructions are pinned to the top of the context window and explicitly marked as non-compactable. The Yue failure mode, compaction silently stripping the safety constraint, cannot happen by construction. The compactor literally cannot touch them. Tamper-evident audit chain. Every tool call, every privileged operation, every scheduler run enters the same SHA-256 hash-chained log. If something happens, you can prove what happened. If the chain is tampered with, you can prove that too. Streaming output leak filter. Secrets are caught mid-stream across token boundaries, capability IDs, API keys, JWTs, PEM blocks redacted before they reach the client. No YOLO mode. There is no global "trust the LLM with everything" switch. There never will be. Expanded reach comes through named scoped roots that are explicit, audit-logged, and bounded. The README has 15 'always-on' protections in a table. None of them can be turned off by config, because these things being toggleable is how the ecosystem ended up where it is. I decided to make sure that this wasn't just a 'trend hopping' project and aligned with my own personal values as well. I built this to be secure and local-first by default. Configured for Ollama / LM Studio / vLLM out of the box. Anthropic and OpenAI work too but require explicit configuration. There is no "happy path" that silently ships your prompts to a cloud endpoint. I decided that FIRST it needed to only run as an email agent with a CLI. Bidirectional IMAP + SMTP with allowlisted senders, threading preserved, attachments handled. This is the use case that bit Yue and a lot of other people, and I wanted to prove it could be done with real boundaries. I added in 30+ built-in tools of my own. File ops, shell (denylisted, output-capped, CWD-lo
View originalReviving PapersWithCode (by Hugging Face) [P]
Hi, Niels here from the open-source team at Hugging Face. Like many others, I was a huge fan of paperswithcode. Sadly, that website is no longer maintained after its acquisition by Meta. Hence, I've been working on reviving it. I obviously use AI agents to parse papers at scale and automatically generate leaderboards (for now I'm the one verifying results). So far, I've only parsed high-impact papers for which I know they're SOTA, like Qwen 3.5 and 3.6, RF-DETR for object detection, DINOv3, SOTA embedding models from the MTEB leaderboard, the Open ASR Leaderboard for automatic speech recognition models, etc. For now, it includes the following: trending papers by default based on Github star velocity categorization by domain, e.g., OCR methods, which PwC used to have, e.g., RLVR eval results for high-impact papers, see e.g., Qwen 3.5 at the bottom leaderboards for each domain, e.g., MMTEB or COCO val 2017 support for citation counts (you can also see the most cited papers by domain!) automated linked Github, project page URLs, and artifacts (+ multiple repos are supported on a paper page) support for external papers beyond Arxiv, see e.g., DeepSeek v4 Harness reports for coding agent benchmarks, e.g., Terminal Bench "Sign in with HF" and Storage Buckets are used to store humbnails, paper PDFs, and overall data backups. I'm curious about your feedback + feature requests! Try it at paperswithcode.co https://preview.redd.it/whwji560fw1h1.png?width=3452&format=png&auto=webp&s=55bb7a30c1be58d140f7efcb07a31c6dac5693c7 See e.g. the SOTA leaderboard for Terminal Bench 2.0: https://preview.redd.it/98w9pi89fw1h1.png?width=3456&format=png&auto=webp&s=408fb64b0ba85ba24f55daa81d547d7c68e73951 A paper page looks like this: https://paperswithcode.co/paper/2602.15763 https://preview.redd.it/fiizit6dfw1h1.png?width=3450&format=png&auto=webp&s=9ea05a77ca5583a2fb395dccc95ba52c433362c5 submitted by /u/NielsRogge [link] [comments]
View originalYes, Readme offers a free tier. Pricing found: $150/mo, $150/mo, $150 /month, $150 /month, $150 /month
Readme has an average rating of 4.4 out of 5 stars based on 20 reviews from G2, Capterra, and TrustRadius.
Key features include: User-friendly documentation editor, API monitoring tools, Customizable documentation templates, Bidirectional GitHub/GitLab sync, Model Context Protocol (MCP) servers, Interactive API reference, User feedback collection tools, Version control for documentation.
Readme is commonly used for: Creating API documentation for developers, Managing technical documentation for products, Onboarding new developers with streamlined resources, Collecting user feedback on documentation clarity, Integrating documentation with CI/CD workflows, Providing support resources to reduce queries.
Readme integrates with: GitHub, GitLab, Slack, Jira, Zapier, Postman, Trello, Google Analytics, Sentry, AWS.
Lightning AI
Company at Lightning AI
2 mentions
Based on user reviews and social mentions, the most common pain points are: down, token cost, cost tracking, API costs.
Based on 159 social mentions analyzed, 8% of sentiment is positive, 89% neutral, and 3% negative.