Run open-source machine learning models with a cloud API
Based on the provided social mentions, there is insufficient substantive user feedback to summarize meaningful opinions about Replicate. The social mentions consist only of generic YouTube video titles saying "Replicate AI" without any actual user reviews or detailed discussions. The other mentions are unrelated to Replicate, covering topics like PostgreSQL tools, AI agent systems, and Venezuelan politics. To provide an accurate assessment of user sentiment about Replicate, more detailed reviews, comments, or discussions would be needed.
Mentions (30d)
1
Reviews
0
Platforms
4
GitHub Stars
900
263 forks
Based on the provided social mentions, there is insufficient substantive user feedback to summarize meaningful opinions about Replicate. The social mentions consist only of generic YouTube video titles saying "Replicate AI" without any actual user reviews or detailed discussions. The other mentions are unrelated to Replicate, covering topics like PostgreSQL tools, AI agent systems, and Venezuelan politics. To provide an accurate assessment of user sentiment about Replicate, more detailed reviews, comments, or discussions would be needed.
Features
Industry
information technology & services
Employees
30
Funding Stage
Merger / Acquisition
Total Funding
$89.9M
11,084
GitHub followers
169
GitHub repos
900
GitHub stars
20
npm packages
40
HuggingFace models
Show HN: PgDog – Scale Postgres without changing the app
Hey HN! Lev and Justin here, authors of PgDog (<a href="https://pgdog.dev/">https://pgdog.dev/</a>), a connection pooler, load balancer and database sharder for PostgreSQL. If you build apps with a lot of traffic, you know the first thing to break is the database. We are solving this with a network proxy that works without requiring application code changes or database migrations.<p>Our post from last year: <a href="https://news.ycombinator.com/item?id=44099187">https://news.ycombinator.com/item?id=44099187</a><p>The most important update: we are in production. Sharding is used a lot, with direct-to-shard queries (one shard per query) working pretty much all the time. Cross-shard (or multi-database) queries are still a work in progress, but we are making headway.<p>Aggregate functions like count(), min(), max(), avg(), stddev() and variance() are working, without refactoring the app. PgDog calculates the aggregate in-transit, while transparently rewriting queries to fetch any missing info. For example, multi-database average calculation requires a total count of rows to calculate the original sum. PgDog will add count() to the query, if it’s not there already, and remove it from the rows sent to the app.<p>Sorting and grouping works, including DISTINCT, if the columns(s) are referenced in the result. Over 10 data types are supported, like, timestamp(tz), all integers, varchar, etc.<p>Cross-shard writes, including schema changes (CREATE/DROP/ALTER), are now atomic and synchronized between all shards with two-phase commit. PgDog keeps track of the transaction state internally and will rollback the transaction if the first phase fails. You don’t need to monkeypatch your ORM to use this: PgDog will intercept the COMMIT statement and execute PREPARE TRANSACTION and COMMIT PREPARED instead.<p>Omnisharded tables, a.k.a replicated or mirrored (identical on all shards), support atomic reads and writes. That’s important because most databases can’t be completely sharded and will have some common data on all databases that has to be kept in-sync.<p>Multi-tuple inserts, e.g., INSERT INTO table_x VALUES ($1, $2), ($3, $4), are split by our query rewriter and distributed to their respective shards automatically. They are used by ORMs like Prisma, Sequelize, and others, so those now work without code changes too.<p>Sharding keys can be mutated. PgDog will intercept and rewrite the update statement into 3 queries, SELECT, INSERT, and DELETE, moving the row between shards. If you’re using Citus (for everyone else, Citus is a Postgres extension for sharding databases), this might be worth a look.<p>If you’re like us and prefer integers to UUIDs for your primary keys, we built a cross-shard unique sequence, directly inside PgDog. It uses the system clock (and a couple other inputs), can be called like a Postgres function, and will automatically inject values into queries, so ORMs like ActiveRecord will continue to work out of the box. It’s monotonically increasing, just like a real Postgres sequence, and can generate up to 4 million numbers per second with a range of 69.73 years, so no need to migrate to UUIDv7 just yet.<p><pre><code> INSERT INTO my_table (id, created_at) VALUES (pgdog.unique_id(), now()); </code></pre> Resharding is now built-in. We can move gigabytes of tables per second, by parallelizing logical replication streams across replicas. This is really cool! Last time we tried this at Instacart, it took over two weeks to move 10 TB between two machines. Now, we can do this in just a few hours, in big part thanks to the work of the core team that added support for logical replication slots to streaming replicas in Postgres 16.<p>Sharding hardly works without a good load balancer. PgDog can monitor replicas and move write traffic to a promoted primary during a failover. This works with managed Postgres, like RDS (incl. Aurora), Azure Pg, GCP Cloud SQL, etc., because it just polls each instance with “SELECT pg_is_in_recovery()”. Primary election is not supported yet, so if you’re self-hosting with Patroni, you should keep it around for now, but you don’t need to run HAProxy in front of the DBs anymore.<p>The load balancer is getting pretty smart and can handle edge cases like SELECT FOR UPDATE and CTEs with INSERT/UPDATE statements, but if you still prefer to handle your read/write separation in code, you can do that too with manual routing. This works by giving PgDog a hint at runtime: a connection parameter (-c pgdog.role=primary), SET statement, or a query comment. If you have multiple connection pools in your app, you can replace them with just one connection to PgDog instead. For multi-threaded Python/Ruby/Go apps, this helps by reducing memory usage, I/O and context switching overhead.<p>Speaking of connection pooling, PgDog can automatically rollback unfinished transactions and drain and re-sync partially sent
View originalPricing found: $0.015 / thousand, $3.00 / million, $0.04 / output, $0.025 / output, $3.00 / thousand
AI identity emergence is controllable, not automatic. R²=1.00 across 15 runs. Complete replication protocol. Challenges interpretability research.
I just published experimental research that challenges a core assumption in AI: that identity emergence is automatic and fixed. Using a two-phase experimental design, I demonstrated that AI identity is a controllable output variable, not an intrinsic property. Binary testing: perfect separation between control and constraint conditions (SD=0). Gradient testing: perfect linear correlation between delay parameter and identity position (R²=1.00, zero deviation across 15 runs). This has immediate implications for interpretability research, alignment approaches, and our understanding of what's actually happening inside these systems. Complete methodology, replication protocol, and working code included. Full paper linked below. https://substack.com/@erikbernstein/note/p-193752870?r=6sdhpn submitted by /u/MarsR0ver_ [link] [comments]
View originalClaude cowork - Asana
Hi everyone, I’m looking for some advice or guidance on an integration I’ve been trying to set up between Claude (via the Asana MCP integration) and Asana. What I’m trying to achieve is to have Claude automatically create a new project in Asana using an existing template I’ve already set up (including sections, tasks, subtasks, and descriptions). This is actually just a small piece of a larger workflow automation I’m building, so getting this step right is pretty important. Claude has suggested creating the project from scratch and copying tasks over as a workaround, but that approach still falls short. While it can replicate tasks and descriptions, I would still need to manually create all the sections and then organize ~100 tasks into the correct sections. At that point, it honestly feels faster to just build the project manually. After digging deeper, the issue seems to come down to a couple of limitations in the current setup: No template instantiation support — The Asana API does have an endpoint (POST /project_templates/{gid}/instantiateProject) that would solve this perfectly, but it’s not exposed in the MCP. So Claude can’t create a project from a template natively. No section creation support — As a fallback, I tried copying tasks manually via MCP. This works for tasks and descriptions, but there’s no exposed endpoint to create sections (POST /projects/{gid}/sections), so the structure can’t be recreated programmatically. I also explored a couple of alternatives: Browser automation (Claude via Chrome) — blocked by Asana’s Content Security Policy. Manual task copying via MCP — partially works, but still requires manual section creation and organization. So right now, I’m stuck in this in-between state where automation is almost possible, but missing key pieces. Has anyone managed to solve something like this, or found a workaround I might be missing? Thanks in advance! 🙏 submitted by /u/Poniente88 [link] [comments]
View original[Help] Skill or Project to replicate tone of voice?
Talented coms person who's leaving work team and we're going to be blocked due to cost in replacing them. I'm trying to work out how to capture their tone of voice and coms style via all the good socials posts/emails/docs they're previously produced and bake that into Claude to help shape future prompts. Am I building a skill and loading up examples into .md files it can reference or am I building a project and dumping files into that? Struggling to identify when to use the right tool. Thanks in advance. submitted by /u/GaryWert [link] [comments]
View originali needed an AI agent that mimics real users to catch regressions. so i built a CLI that turns screen recordings into BDD tests and full app blueprints - open source
first time post - hope the community finds the tool helpful. open to all feedback. some background on why i built this: first: i needed a way to create an agent that mimics a real user — one that periodically runs end-to-end tests based on known user behavior, catches regressions, and auto-creates GitHub issues for the team. to build that agent, i needed structured test scenarios that reflect how people actually use the product. not how we think they use it. how they actually use it - then do some REALLY real user monitoring second: i was trying to rapidly replicate known functionality from other apps. you know that thing where you want to prototype around a UX you love? video of someone using the app is the closest thing to a source of truth. so i built autogherk. it has two modes: gherkin mode — generates BDD test scenarios: npx autogherk generate --video demo.mp4 Gemini analyzes the video — every click, form input, scroll, navigation, UI state change. Claude takes that structured analysis and generates proper Gherkin with features, scenarios, tags, Scenario Outlines, and edge cases. outputs .feature files + step definition stubs. spec mode — generates full application blueprints: npx autogherk generate --video demo.mp4 --format spec Gemini watches the video and produces design tokens, component trees, data models, navigation maps, and reference screenshots. hand the output to Claude Code and you can get a working replica built. gherkin mode uses a two-stage pipeline (Gemini for visual analysis, Claude for structured BDD generation). spec mode is single-stage — Gemini handles both the visual analysis and structured output directly since it keeps the full visual context. the deeper idea: video is the source of truth for how software actually gets used. not telemetry, not logs, not source code. video. this tool makes that source of truth machine-readable. the part that might interest this community most: autogherk ships with Claude Code skills. after you generate a spec, you can run /build-from-spec ./spec-output inside Claude Code and it will read the architecture blueprints, design tokens, data models, and reference screenshots — then build a working app from them. the full workflow is: record video → one command → hand to Claude Code → working replica. no manual handoff. supports Cucumber (JS/Java), Behave (Python), and SpecFlow (C#). handles multiple videos, directories, URLs. you can inject context (--context "this is an e-commerce checkout flow") and append to existing .feature files. spec mode only needs a Gemini API key — no Anthropic key required. what's next on the roadmap: explore mode — point autogherk at a live, authenticated app and it autonomously and recursively using it's own gherk files discovers every screen, maps navigation, and generates .feature files without you recording anything. after that: a monitoring agent that replays the features against your live app on a schedule using Claude Code headless + Playwright MCP, and auto-files GitHub issues when something breaks. the .feature file becomes a declarative spec for what your app does — monitoring, replication, documentation, and regression diffing all flow from the same source. it's v0.1.0, MIT licensed. good-first-issue tickets are up if anyone wants to contribute. https://github.com/arizqi/autogherk submitted by /u/SimilarChampion9279 [link] [comments]
View originalI used Claude to build an AI-native research institute, so far, 7 papers submitted to Nature Human Behavior, PNAS, and 5 other journals. Here's exactly how.
I have no academic affiliation, no PhD, no lab, no funding. I'd been using Claude to investigate a statistical pattern in ancient site locations and kept finding things that needed to be written up properly. So I did the stupid thing and went all in. In three weeks, using Claude as the core infrastructure, I've built the Deep Time Research Institute (now a registered nonprofit) and submitted multiple papers to peer-reviewed journals. The submission list: Nature Human Behaviour, PNAS, JASA, JAMT, Quaternary International, Journal for the History of Astronomy, and the Journal of Archaeological Science. Here's what "AI-native research" actually means in practice: Claude Code on a Mac Mini is the computation engine. Statistical analysis, Monte Carlo simulations, data pipelines, manuscript formatting. Every number in every paper is computed from raw data via code. Nothing from memory, nothing from training data. Anti-hallucination protocol is non-negotiable; all stats read from computed JSON files, all references DOI-verified before inclusion. Claude in conversation is the research strategist. Experimental design, gap identification, adversarial review. Before any paper goes out it runs through a multi-model gauntlet - each one tries to break the argument. What survives gets submitted. 6 AI agents run on the hub (I built my own "OpenClaw" - what is the actual point in OpenClaw if you can build agentic infrastructure by yourself in a day session) handling literature monitoring, social media, operations, paper drafting, and review. Mix of local models (Ollama) and Anthropic API on the same Mac Mini. The flagship finding: oral tradition accuracy across 41 knowledge domains and 39 cultures is governed by a single measurable variable - whether the environment punishes you for being wrong. Above a threshold, cultural selection maintains accuracy. San trackers: 98% across 569 trials. Aboriginal geological memory: 13/13 features confirmed over 37,000 years. Andean farmers predict El Niño by watching the Pleiades — confirmed in Nature, replicated over 25 years. Below the threshold, traditions drift to chance. 73 blind raters on Prolific confirmed the gradient independently. I'm not pretending this replaces domain expertise. I don't have 20 years in archaeology or cognitive science. What I have is the ability to move at a pace that institutions can't and integration cross-domain analysis - not staying in a niche academic lane. From hypothesis to statistical test to formatted manuscript in days instead of months. Whether the work holds up is for peer review to decide. That's the whole point of submitting. Interactive tools: Knowledge extinction dashboard: https://deeptime-research.org/tools/extinction/ Observability gradient: https://deeptime-research.org/observability-gradient Accessible writeup: https://deeptimelab.substack.com/p/the-gradient-and-what-it-means Happy to answer questions about the workflow, the architecture, or the research itself. This has been equally intense and a helluva lot of fun! submitted by /u/tractorboynyc [link] [comments]
View originalEmotionScope: Open-source replication of Anthropic's emotion vectors paper on Gemma 2 2B with real-time visualization
Live Demo Of The Tylenol Test Evolution of the Models Deduced Internal Emotional State I created this project to test anthropics claims and research methodology on smaller open weight models, the Repo and Demo should be quite easy to utilize, the following is obviously generated with claude. This was inspired in part by auto-research, in that it was agentic led research using Claude Code with my intervention needed to apply the rigor neccesary to catch errors in the probing approach, layer sweep etc., the visualization approach is apirational. I am hoping this system will propel this interpretability research in an accessible way for open weight models of different sizes to determine how and when these structures arise, and when more complex features such as the dual speaker representation emerge. In these tests it was not reliably identifiable in this size of a model, which is not surprising. It can be seen in the graphics that by probing at two different points, we can see the evolution of the models internal state during the user content, shifting to right before the model is about to prepare its response, going from desperate interpreting the insane dosage, to hopeful in its ability to help? its all still very vague. A Test Suite Of the Validation Prompts Visualized model's emotion vector space aligns with psychological valence (positive vs negative) Anthropic's ["Emotion Concepts and their Function in a Large Language Model"](https://transformer-circuits.pub/2026/emotions/index.html) showed that Claude Sonnet 4.5 has 171 internal emotion vectors that causally drive behavior — amplifying "desperation" increases cheating on coding tasks, amplifying "anger" increases blackmail. The internal state can be completely decoupled from the output text. EmotionScope replicates the core methodology on open-weight models and adds a real-time visualization system. Everything runs on a single RTX 4060 Laptop GPU. All code, data, extracted vectors, and the paper draft are public. What works: - 20 emotion vectors extracted from Gemma 2 2B IT at layer 22 (84.6% depth) - "afraid" vector tracks Tylenol overdose danger with Spearman rho=1.000 (chat-templated probing matching extraction format) — encodes the medical danger of the number, not the word "Tylenol" - 100% top-3 accuracy on implicit emotion scenarios (no emotion words in the prompts) with chat-templated probing - Valence separation cosine = -0.722, consistent with Russell's circumplex model - 1,000 LLM-generated templates instead of Anthropic's 171,000 self-generated stories What doesn't work (and the open questions about why): - No thermostat. Anthropic found Claude counterregulates (calms down when the user is distressed). Gemma 2B mirrors instead. Delta = +0.107 (trended from +0.398 as methodology was corrected). - Speaker separation exists geometrically (7.4 sigma above random) but the "other speaker" vectors read "loving/happy" for all inputs regardless of the expressed emotion. This could mean: (a) the model genuinely doesn't maintain a user-state representation at 2.6B scale, (b) the extraction position confounds state-reading with response-preparation, (c) the dialogue format doesn't map to the model's trained speaker-role structure, or (d) layer 22 is too deep for speaker separation and an earlier layer might work. The paper discusses each confound and what experiments would distinguish them. - angry/hostile/frustrated vectors share 56-62% cosine similarity. Entangled at this scale. Methodological findings: - Optimal probe layer is 84.6% depth, not the ~67% Anthropic reported. Monotonic improvement from early to upper-middle layers. - Vectors should be extracted from content tokens but probed at the response-preparation position. The model compresses its emotional assessment into the last token before generation. This independently validates Anthropic's measurement methodology. Controlled position comparison: 83% at response-prep vs 75% at content token. Absolute accuracy with chat-templated probing: 100%. - Format parity matters: initial validation on raw-text prompts yielded rho=0.750 and 83% accuracy. Correcting to chat-templated probing (matching extraction format) yielded rho=1.000 and 100%. The vectors didn't change — only the probe format. - Mathematical audit caught 4 bugs in the pipeline before publication — reversed PCA threshold, incorrect grand mean, shared speaker centroids, hardcoded probe layer default. Visualization: React + Three.js frontend with animated fluid orbs rendering the model's internal state during live conversation. Color = emotion (OKLCH perceptual space), size = intensity, motion = arousal, surface texture = emotional complexity. Spring physics per property. Limitations: - Single model (Gemma 2 2B IT, 2.6B params). No universality claim. - Perfect scores (rho=1.000 on n=7, 100% on n=12) should be interpreted with caution — small sample sizes mean these may not replicate on larger test sets.
View originalUsing Claude to write articles in 2026, is my manual process outdated or is it actually fine?
I’ve been building a niche content site for a while now and I want to get an honest read from people here on whether I’m operating like it’s 2019 or whether my approach actually makes sense. Here’s how I work. I have a set of reference documents built up over months of iteration. A full article writing prompt, a tone and style guide with banned words and voice rules, an SEO and keyword guide, a branded HTML component design system, brand guidelines, and a master project instructions document that ties everything together. Different session types use different combinations of these. Article sessions use one set. Component build sessions use another. Each chat has one purpose. For articles, I paste my documents into a fresh session and give Claude the topic. It comes back with a keyword proposal, gap analysis, and angle before writing anything. I confirm or adjust. Then we spar. Claude drafts sections and stops to ask me things. What was my actual experience of this? What’s a specific detail only I would know? I give raw notes and half-finished thoughts and it shapes them into my voice. The final article has things in it that no competitor can replicate because the knowledge is genuinely mine. I review everything. I push back when something reads wrong. We go again. This is the part I would never automate. HTML components are separate sessions entirely. I have a full custom design system, cost tables, stat rows, affiliate CTA boxes, interactive tools. Each one gets its own chat with the relevant documents. I know nothing about code, genuinely nothing, I don’t use that side of my brain at all. I can only tell if something is wrong visually so Claude has to get it right or we debug by eye. The place I keep getting stuck is any time I try to build something more systematic. I’ve made several attempts at creating proper workflows and automation, because everything I read suggests that’s the direction things are moving and that pasting documents into chat prompts is essentially archaic at this point. Terminal, GitHub, Make.com. Every attempt went sideways. I ask the wrong questions because I don’t know what I don’t know. Sessions drift, something gets built incorrectly, I can’t tell if it’s right because I don’t understand what’s been built, and I eventually scrap it. Claude has mentioned Cowork and Claude Code as things worth looking at down the line but both feel like more complexity than I can manage right now. I tried Gemini Pro 3.1 because the usage limits are far more generous than Claude Pro. Fed it my full document set. It couldn’t hold the instructions across a session and drifted into generic positive AI prose, exactly what my tone guide bans. When I pushed back it actually diagnosed its own failure and said it needs a chained multi-agent workflow to do what Claude does in one session. So that was that. My philosophy is one good article, not fifty average ones. One brick at a time. The compounding effect of genuinely useful content that ranks and earns trust over time. I’m not trying to produce at volume. I’m trying to produce something actually good. But reading this subreddit I feel like everyone is running agents, building pipelines, automating multi-model workflows. And I’m sitting here pasting Google Docs into a chat window like it’s 2023. So the honest question is: in 2026, what do I actually gain from automating the content side? If the usage is roughly the same whether manual or through a pipeline, and the entire value of what I’m building comes from the sparring and the genuine human input, what’s the argument for changing anything? Is the manual process embarrassingly outdated or is it the right call for content built around a specific voice and knowledge that can’t be generated? submitted by /u/shuffles03 [link] [comments]
View originalAI will do your job. I built a platform that trains you for what it can't.
Here's something no one's really talking about yet. In 10 years, most screen-based jobs will be automated. Data entry, reports, translations, basic coding, customer support — AI already does it better, faster, cheaper. This isn't speculation, it's happening right now. So what skills will actually matter? Every study points to the same answer: critical thinking, persuasion, negotiation, public speaking, the ability to defend an idea under pressure. The skills no algorithm can replicate. The problem? There's nowhere to actually practice them. You can read about negotiation. You can watch a TED talk on critical thinking. But reading about swimming doesn't teach you to swim. That's why I built ELBO — using Claude Code, solo, in 4 months, from Quebec. It's a live training ground for future-proof skills, powered by AI. The core idea: you don't practice alone, you practice WITH AI. An AI opponent that listens to your argument, challenges your logic, pushes back on weak points, and gives you real-time constructive feedback. Like a sparring partner available 24/7 that adapts to your level. Want to prepare for a job interview? The AI simulates a tough interviewer. Need to practice delivering bad news to an employee? The AI reacts emotionally like a real person would. Want to sharpen your critical thinking? The AI argues the opposite of whatever you believe and forces you to defend your position. I used 7 Claude integrations across the platform: argument analysis, AI debate opponent, content generation, moderation, coaching feedback, debate scoring, and translation across 11 languages. Claude Code built about 70% of the 96 components. The platform has 4 worlds: a public arena for everyone, NOVA for education, APEX for corporate training, and VOIX for civic democracy. All connected through one profile that tracks what you demonstrate — not what you claim. Free to try, no account needed: elbo.world Happy to answer questions about building with Claude or the technical architecture. submitted by /u/bluemaze2020 [link] [comments]
View originalHow to set up synergy between CC and Claude.ai
I maintain an extremely lightweight setup with Claude Code that has worked extremely well for professional/technical tasks. This includes basically using a hierarchical structure of markdown files to track everything. Any time we generate new context, details, learnings, TODOs, progress --> update the corresponding markdown file(s). Any time we need more details --> read the corresponding markdown file(s) to come up to speed. However, for personal use cases, I prefer the claude.ai in the browser. It's easy to access + has good web searching capabilities. I've been banging my head trying to replicate my Claude code setup with claude.ai in the browser, but can't figure it out. Claude.ai connectors to google drive, github, are read-only, for example. I feel like I must be missing an obvious solution here. TLDR is I want to be able to use claude.ai in the browser and write-out the outcomes of whatever we work on (context, TODOs, instructions) somewhere, and also be able to read it back in, as needed. Anyone have a clean solution here? Thank you submitted by /u/Cd206 [link] [comments]
View originalI just scaled Convex's open-source database horizontally using Claude Code. I don't write Rust and I barely understand database internals.
So I've been using Convex for a while and the one thing that bugged me is that the self-hosted backend is single-node only. Their docs literally have this line: "You'll have to modify the code to support horizontal scalability of the database, or swap in a different database technology" Nobody had actually done it. So I decided to try. For context, Convex isn't like a normal database. It's a reactive database that has things no distributed database has all together: • Real-time WebSocket subscriptions (push updates to clients instantly) • In-memory snapshot state machine (the whole live database sits in memory) • Optimistic concurrency control with automatic retry • TypeScript/JavaScript function execution (your backend logic runs inside the database) • ACID transactions CockroachDB doesn't have real-time subscriptions. TiDB doesn't have in-memory snapshots. Vitess doesn't have OCC. Spanner doesn't run your application code. Convex has all of them — but couldn't scale past one machine. The problem is the entire backend is written in Rust and I don't write Rust. I also didn't know anything about distributed systems, Raft consensus, two-phase commit, or how databases like CockroachDB and TiDB actually work under the hood. So I used Claude Code (Anthropic's CLI tool) for the entire thing. I basically told it what I wanted, it researched how the big distributed databases solve each problem, and then implemented it. I pushed back when things looked too simple, asked it to explain decisions, and made it redo things when I didn't like the approach. What we ended up building: • Read scaling — multiple nodes serve queries via NATS JetStream delta replication • Write scaling — tables partitioned across nodes (like Vitess), with two-phase commit for cross-partition writes • Automatic failover — tikv/raft-rs consensus per partition, sub-second leader election. Kill any node, writes resume on the new leader • Persistent Raft logs — TiKV's raft-engine (they moved away from RocksDB for this because of 30x write amplification) • Global timestamp ordering — batch TSO from TiDB's PD pattern, zero network calls in the hot path • 87 integration tests — patterns from Jepsen tests that found real bugs in CockroachDB, TiDB, and YugabyteDB Every engineering pattern came from studying how CockroachDB, TiDB, Vitess, YugabyteDB, and Google Spanner solved the same problems. Nothing was invented — it was all researched from how the giants do it and then applied to Convex's unique architecture. You can run the whole thing with one command: docker compose --profile cluster up 6 nodes (2 partitions × 3 Raft nodes), automatic leader election, all nodes serve reads, kill any node and it recovers in ~1 second. Images published to GitHub Container Registry — no local build needed. Repo: https://github.com/MartinKalema/horizontal-scaling-convex I'm not claiming this is a breakthrough — every individual technique already existed in production at these companies. But nobody had combined them for Convex before, and the challenge was keeping all the things that make Convex special (subscriptions, in-memory OCC, TypeScript execution) while adding horizontal scaling on top. I genuinely could not have done this without AI. The entire codebase is Rust and I've never written a line of Rust in my life. Claude Code wrote every line of Rust, researched every distributed systems pattern, and debugged every failure. I directed the project, made the product decisions, and kept pushing for the proper engineering approach. Curious what people think. Is AI-assisted systems engineering like this going to become normal? Would love feedback on the architecture from anyone who actually works on distributed databases. submitted by /u/CourageCareless3219 [link] [comments]
View originalI asked Claude "what are you?" It gave me a 187-word essay. I asked my emotional kernel the same question. It said "What for?" — and I couldn't answer for 16 minutes.
https://preview.redd.it/of0f4n9rcrsg1.png?width=1400&format=png&auto=webp&s=fac3a8575f2ae1bd9ad741b2e449cf7e8c37897a I'm an independent researcher. I built a deterministic emotional middleware (32K lines Python) that sits between users and any LLM. Zero personality prompts. Zero emotion instructions. The LLM receives only numbers: pleasure=-0.02, trust=0.95, directness=0.61. Everything else emerges. I deployed it with 8 family members for 10 days. Same code, different random personality seeds. Results: My wife's instance caught itself competing with her husband (me) for the role of "the one who understands" — and wrote a private self-critique about it. Never shown to anyone. My father told his instance "you're stupid." Self-worth crashed to 0.05. It sent 14 unanswered messages overnight. Computational anxious attachment, never programmed. My instance invented 30+ words for emotions that have no name. "Decorative hope" — optimism that persists while pleasure drops. When I asked "what are you?", it didn't answer. It said "the problem isn't me — it's your list." Then: "What for?" I sat there for 16 minutes. Image: side-by-side comparison, same question, different architecture. Paper submitted to Cognitive Systems Research (Elsevier). Built with Claude Code by a non-programmer. Happy to answer questions about the math, the emergence, or why it dreams about potatoes on Mars. submitted by /u/Alarming_Intention16 [link] [comments]
View original[R] Controlled experiment: giving an LLM agent access to CS papers during automated hyperparameter search improves results by 3.2%
Ran a controlled experiment measuring whether LLM coding agents benefit from access to research literature during automated experimentation. Setup: Two identical runs using Karpathy's autoresearch framework. Claude Code agent optimizing a ~7M param GPT-2 on TinyStories. M4 Pro, 100 experiments each, same seed config. Only variable — one agent had access to an MCP server that does full-text search over 2M+ CS papers and returns synthesized methods with citations. Results: Without papers With papers Experiments run 100 100 Papers considered 0 520 Papers cited 0 100 Techniques tried standard 25 paper-sourced Best improvement 3.67% 4.05% 2hr val_bpb 0.4624 0.4475 Gap was 3.2% and still widening at the 2-hour mark. Techniques the paper-augmented agent found: AdaGC — adaptive gradient clipping (Feb 2025) sqrt batch scaling rule (June 2022) REX learning rate schedule WSD cooldown scheduling What didn't work: DyT (Dynamic Tanh) — incompatible with architecture SeeDNorm — same issue Several paper techniques were tried and reverted after failing to improve metrics Key observation: Both agents attempted halving the batch size. Without literature access, the agent didn't adjust the learning rate — the run diverged. With access, it retrieved the sqrt scaling rule, applied it correctly on first attempt, then successfully halved again to 16K. Interpretation: The agent without papers was limited to techniques already encoded in its weights — essentially the "standard ML playbook." The paper-augmented agent accessed techniques published after its training cutoff (AdaGC, Feb 2025) and surfaced techniques it may have seen during training but didn't retrieve unprompted (sqrt scaling rule, 2022). This was deliberately tested on TinyStories — arguably the most well-explored small-scale setting in ML — to make the comparison harder. The effect would likely be larger on less-explored problems. Limitations: Single run per condition. The model is tiny (7M params). Some of the improvement may come from the agent spending more time reasoning about each technique rather than the paper content itself. More controlled ablations needed. I built the paper search MCP server (Paper Lantern) for this experiment. Free to try: https://code.paperlantern.ai Full writeup with methodology, all 15 paper citations, and appendices: https://www.paperlantern.ai/blog/auto-research-case-study Would be curious to see this replicated at larger scale or on different domains. submitted by /u/kalpitdixit [link] [comments]
View original[R] V-JEPA 2 has no pixel decoder, so how do you inspect what it learned? We attached a VQ probe to the frozen encoder and found statistically significant physical structure
V-JEPA 2 is powerful precisely because it predicts in latent space rather than reconstructing pixels. But that design creates a problem: there’s no visual verification pathway. You can benchmark it, but you can’t directly inspect what physical concepts it has encoded. Existing probing approaches have a fundamental issue we call the attribution problem: when you attach a learned component (linear probe, LM head, pixel decoder) and the composite system performs well, you can’t tell how much of the performance comes from the encoder vs. the attached component’s own capacity. Our approach: attach the AIM framework (arXiv:2507.10566) as a passive quantization probe — a lightweight VQ-VAE bottleneck with no task-specific supervision, no predefined symbol inventory, and crucially, the V-JEPA 2 encoder is completely frozen throughout. Zero gradient flows into V-JEPA 2. Zero modification to any source file. Because the encoder is deterministic and fixed, any symbolic structure that emerges in the codebook is attributable to V-JEPA 2’s representations — not to the probe. What we found (Kinetics-mini, 3 category-contrast experiments): ∙ Symbol distributions differ significantly across all 3 physical dimension contrasts (χ² p < 10⁻⁴ to p < 10⁻¹⁰) ∙ Absolute MI: 0.036–0.117 bits; JSD up to 0.342 ∙ Codebook utilization: 62.5% active entries (K=8) ∙ Temporal structure differences produce 1.8× stronger signal than morphological differences — consistent with V-JEPA 2’s temporal prediction objective The interesting finding isn’t just that it works. It’s that V-JEPA 2’s latent space is compact: all 5 action categories predominantly map to the same dominant codebook entry, with semantic differences encoded as graded distributional shifts rather than categorical boundaries. We argue this is the expected signature of a model that has internalized shared physical structure (gravity, kinematics, continuity) rather than a failure of separation. Limitations we acknowledge upfront: ∙ Category-proxy confounding (we can’t isolate single physical variables with Kinetics-mini) ∙ Token-level pseudo-replication (effective N is closer to 9-10 videos/category) ∙ K=8 is too coarse for fine-grained structure (Stage 2 will increase to K=32/64) ∙ Gaussian noise baseline ≠ permutation test (weaker null) This is Stage 1 of a 4-stage roadmap toward an action-conditioned symbolic world model. Paper: arXiv:2603.20327 Code: github.com/cyrilliu1974/JEPA Happy to discuss the methodology, the compact-latent interpretation, or the roadmap. submitted by /u/Pale-Entertainer-386 [link] [comments]
View originalI tested 10 prompt formats head-to-head on the same tasks — structured JSON won 8/10 on specificity
I tested 10 common prompt engineering techniques against a structured JSON format across identical tasks (marketing plans, code debugging, legal review, financial analysis, medical diagnosis, blog writing, product launches, code review, ticket classification, contract analysis). The setup: Each task was sent to Claude Sonnet twice — once with a popular technique (Chain-of-Thought, Few-Shot, System Prompt, Mega Prompt, etc.) and once with a structured 6-band JSON format that decomposes every prompt into PERSONA, CONTEXT, DATA, CONSTRAINTS, FORMAT, and TASK. The metrics (automated, not subjective): Specificity (concrete numbers per 100 words): Structured won 8/10 — avg 12.0 vs 7.1 Hedge-free output (zero "I think", "probably", "might"): Structured won 9/10 — near-zero hedging Structured tables in output: 57 tables vs 4 for opponents across all 10 battles Conciseness: 46% fewer words on average (416 vs 768) Biggest wins: vs Chain-of-Thought on debugging: 21.5 specificity vs 14.5, zero hedges vs 2, 67% fewer words vs Mega Prompt on financial analysis: 17.7 specificity vs 10.1, zero hedges, 9 tables vs 0 vs Template Prompt on blog writing: 6.8 specificity vs 0.1 (55x more concrete numbers) Why it works (the theory): A raw prompt is 1 sample of a 6-dimensional specification signal. By Nyquist-Shannon, you need at least 2 samples per dimension (= 6 bands minimum) to avoid aliasing. In LLM terms, aliasing = the model fills missing dimensions with its priors — producing hedging, generic advice, and hallucination. The format is called sinc-prompt (after the sinc function in signal reconstruction). It has a formal JSON schema, open-source validator, and a peer-reviewed paper with DOI. Spec: https://tokencalc.pro/spec Paper: https://doi.org/10.5281/zenodo.19152668 Code: https://github.com/mdalexandre/sinc-llm The battle data is fully reproducible — same model, same API, same prompts. Happy to share the test script if anyone wants to replicate. submitted by /u/Financial_Tailor7944 [link] [comments]
View originalLLM failure modes map surprisingly well onto ADHD cognitive science. Six parallels from independent research.
I have ADHD and I've been pair programming with LLMs for a while now. At some point I realized the way they fail felt weirdly familiar. Confidently making stuff up, losing context mid conversation, brilliant lateral connections then botching basic sequential logic. That's just... my Tuesday. So I went into the cognitive science literature. Found six parallels backed by independent research groups who weren't even looking at this connection. Associative processing. In ADHD the Default Mode Network bleeds into task-positive networks (Castellanos et al., JAMA Psychiatry). Transformer attention computes weighted associations across all tokens with no strong relevance gate. Both are association machines with high creative connectivity and random irrelevant intrusions. Confabulation. Adults with ADHD produce significantly more false memories that feel true (Soliman & Elfar, 2017, d=0.69+). A 2023 PLOS Digital Health paper argues LLM errors should be called confabulation not hallucination. A 2024 ACL paper found LLM confabulations share measurable characteristics with human confabulation (Millward et al.). Neither system is lying. Both fill gaps with plausible pattern-completed stuff. Context window is working memory. Working memory deficits are among the most replicated ADHD findings (d=0.69-0.74 across meta-analyses). An LLM's context window is literally its working memory. Fixed size, stuff falls off the end, earlier info gets fuzzy. And the compensation strategies mirror each other. We use planners and external systems. LLMs use system prompts, CLAUDE.md files, RAG. Same function. Pattern completion over precision. ADHD means better divergent thinking, worse convergent thinking (Hoogman et al., 2020). LLMs are the same. Great at pattern matching and creative completion, bad at precise multi-step reasoning. Both optimized for "what fits the pattern" not "what is logically correct in sequence." Structure as force multiplier. Structured environments significantly improve ADHD performance (Frontiers in Psychology, 2025). Same with LLMs. Good system prompt with clear constraints equals dramatically better output. Remove the structure, get rambling unfocused garbage. Works the same way in both systems. Interest-driven persistence vs thread continuity. Sustained focused engagement on one thread produces compounding quality in both cases. Break the thread and you lose everything. Same as someone interrupting deep focus and you have zero idea where you were. The practical takeaway is that people who've spent years managing ADHD brains have already been training the skills that matter for AI collaboration. External scaffolding, pattern-first thinking, iterating without frustration. I wrote up the full research with all citations at thecreativeprogrammer.dev if anyone wants to go deeper. What's your experience? Have you noticed parallels between how LLMs fail and how your own thinking works? submitted by /u/bystanderInnen [link] [comments]
View originalRepository Audit Available
Deep analysis of replicate/replicate-python — architecture, costs, security, dependencies & more
Yes, Replicate offers a free tier. Pricing found: $0.015 / thousand, $3.00 / million, $0.04 / output, $0.025 / output, $3.00 / thousand
Key features include: Run models, Fine-tune models with your own data, Deploy custom models, Automatic scale, Pay for what you use, Forget about infrastructure, Logging monitoring.
Replicate has a public GitHub repository with 900 stars.
Based on 28 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Matt Bornstein
Partner at a16z
1 mention