I cannot provide a meaningful summary of user sentiment about "Second" based on the provided content. The social mentions you've shared don't appear to contain any reviews or discussions specifically about a software tool called "Second." Instead, the mentions cover various unrelated topics like OpenAI funding, AI coding costs, verification software issues, and political news. To accurately summarize user opinions about "Second," I would need reviews and social mentions that actually discuss this specific tool, its features, pricing, and user experiences.
Mentions (30d)
15
Reviews
0
Platforms
8
Sentiment
0%
0 positive
I cannot provide a meaningful summary of user sentiment about "Second" based on the provided content. The social mentions you've shared don't appear to contain any reviews or discussions specifically about a software tool called "Second." Instead, the mentions cover various unrelated topics like OpenAI funding, AI coding costs, verification software issues, and political news. To accurately summarize user opinions about "Second," I would need reviews and social mentions that actually discuss this specific tool, its features, pricing, and user experiences.
Industry
information technology & services
Employees
2
Funding Stage
Seed
Total Funding
$0.1M
#OpenAI has closed a $110 billion funding round, a financing that's more than double the size of its last raise a year ago, which was a record for a private tech company. #Amazon invested $50 billion
#OpenAI has closed a $110 billion funding round, a financing that's more than double the size of its last raise a year ago, which was a record for a private tech company. #Amazon invested $50 billion, #Nvidia invested $30 billion and #SoftBank invested $30 billion in the round, OpenAI said in a release on Friday. The investment boosts OpenAI to a $730 billion pre-money valuation, which marks a big jump from its $500 billion valuation in a secondary financing in October. Read more at the #linkinbio or the link on screen. #CNBC
View original83k tokens to 3.7k. Semantic knowledge base for Claude Code, inspired by Karpathy's wiki
Karpathy called for "an incredible new product" for LLM knowledge bases. I built one but instead of compiling docs for Claude to read, it gives Claude a semantic index it can query. Every codebase has its own vocabulary. Take FastAPI for example -- "dependency" might mean DI injection, pip packages, or import graphs. That meaning is spread across hundreds of files and isn't written down anywhere. Claude rediscovers it from scratch every session. Without ontomics, "what does 'dependency' mean in this codebase" costs 27 tool calls, 83k tokens, and 3 minutes. With ontomics: 4 calls, 3.7k tokens, 5 seconds. What it answers that search can't: "What does X mean in this codebase?" — the domain concept, not string matches "What functions behave like authenticate()?" — ranked by code embedding similarity "Is this name consistent with the project?" — learned from usage patterns "What changed in the domain vocabulary since last release?" — ontology diff It also catches things you didn't know about: Your repo uses `params` in 47 places and `parameters` in 12 — catches inconsistencies you didn't know about Three functions in different modules do the same validation — grouped by behavioral similarity, not name Tested on FastAPI, PyTorch, voxelmorph, ScribblePrompt. Python, TS, JS, Rust. Tree-sitter, not regex. tree-sitter + TF-IDF + two embedding models + PageRank. All local, no API keys. claude mcp add -s user ontomics -- ontomics Free and open source: github.com/EtienneChollet/ontomics submitted by /u/YvngScientist [link] [comments]
View originalStarted a video series on building an orchestration layer for LLM post-training [P]
Hi everyone! Context, motivation, a lot of yapping, feel free to skip to TL;DR. A while back I posted here asking [D] What framework do you use for RL post-training at scale?. Since then I've been working with verl, both professionally and on my own time. At first I wasn't trying to build anything new. I mostly wanted to understand veRL properly and have a better experience working with it. I started by updating its packaging to be more modern, use `pyproject.toml`, easily installable, remove unused dependencies, find a proper compatibility matrix especially since vllm and sglang sometimes conflict, remove transitive dependencies that were in the different requirements files etc. Then, I wanted to remove all the code I didn't care about from the codebase, everything related to HF/Nvidia related stuff (transformers for rollout, trl code, trtllm for rollout, megatron etc.), just because either they were inefficient or I didn't understand and not interested in. But I needed a way to confirm that what I'm doing was correct, and their testing is not properly done, so many bash files instead of pytest files, and I needed to separate tests that can run on CPU and that I can directly run of my laptop with tests that need GPU, then wrote a scheduler to maximize the utilization of "my" GPUs (well, on providers), and turned the bash tests into proper test files, had to make fixtures and handle Ray cleanup so that no context spills between tests etc. But, as I worked on it, I found more issues with it and wanted it to be better, until, it got to me that, the core of verl is its orchestration layer and single-controller pattern. And, imho, it's badly written, a lot of metaprogramming (nothing against it, but I don't think it was handled well), indirection and magic that made it difficult to trace what was actually happening. And, especially in a distributed framework, I think you would like a lot of immutability and clarity. So, I thought, let me refactor their orchestration layer. But I needed a clear mental model, like some kind of draft where I try to fix what was bothering me and iteratively make it better, and that's how I came to have a self-contained module for orchestration for LLM post-training workloads. But when I finished, I noticed my fork of verl was about 300 commits behind or more 💀 And on top of that, I noticed that people didn't care, they didn't even care about what framework they used let alone whether some parts of it were good or not, and let alone the orchestration layer. At the end of the day, these frameworks are targeted towards ML researchers and they care more about the correctness of the algos, maybe some will care about GPU utilization and whether they have good MFU or something, but those are rarer. And, I noticed that people just pointed out claude code or codex with the latest model and highest effort to a framework and asked it to make their experiment work. And, I don't blame them or anything, it's just that, those realizations made me think, what am I doing here? hahaha And I remembered that u/dhruvnigam93 suggested to me to document my journey through this, and I was thinking, ok maybe this can be worth it if I write a blog post about it, but how do I write a blog post about work that is mainly code, how do I explain the issues? But it stays abstract, you have to run code to show what works, what doesn't, what edge cases are hard to tackle etc. I was thinking, how do I take everything that went through my mind in making my codebase and why, into a blog post. Especially since I'm not used to writing blog post, I mean, I do a little bit but I do it mostly for myself and the writing is trash 😭 So I thought, maybe putting this into videos will be interesting. And also, it'll allow me to go through my codebase again and rethink it, and it does work hahaha as I was trying to make the next video a question came to my mind, how do I dispatch or split a batch of data across different DP shards in the most efficient way, not a simple split across the batch dimension because you might have a DP shard that has long sequences while other has small ones, so it has to take account sequence length. And I don't know why I didn't think about this initially so I'm trying to implement that, fortunately I tried to do a good job initially, especially in terms of where I place boundaries with respect to different systems in the codebase in such a way that modifying it is more or less easy. Anyways. The first two videos are up, I named the first one "The Orchestration Problem in RL Post-Training" and it's conceptual. I walk through the PPO pipeline, map the model roles to hardware, and explain the single-controller pattern. The second one I named "Ray Basics, Workers, and GPU Placement". This one is hands-on. I start from basic Ray tasks / actors, then build the worker layer: worker identity, mesh registry, and placement groups for guaranteed co-location. What I'm working on next is the dispat
View originalComparison between Claude Sonnet 4.6 (Extended Thinking) and Claude Opus 4.6 (Extended Thinking)
First half: Claude Sonnet 4.6 (Extended Thinking) Second half: Claude Opus 4.6 (Extended Thinking) Both models were fed the same prompt: Illustrate: roll a circle of diameter 1 and trace its circumference along a number line to find Pi. Both models were used via the web application by Anthropic. (Claude.com) submitted by /u/alexxdd0 [link] [comments]
View originalMade Claude Code actually understand my codebase — local MCP server with symbol graph + memory tied to git
I've been frustrated that Claude Code either doesn't know what's in my repo (so every session starts with re-explaining the architecture) or guesses wrong about which files matter. Cursor's @codebase kind of solves it but requires uploading to their cloud, which is a no-go for some of my client work. So I built Sverklo — a local-first MCP server that gives Claude Code (and Cursor, Windsurf, Antigravity) the same mental model of my repo that a senior engineer has. Runs entirely on my laptop. MIT licensed. No API keys. No cloud. What it actually does in a real session Before sverklo: I ask Claude Code "where is auth handled?" It guesses based on file names, opens the wrong file, reads 500 lines, guesses again, eventually finds it. After sverklo: Same question. Claude Code calls sverklo_search("authentication flow") and gets the top 5 files ranked by PageRank — middleware, JWT verifier, session store, login route, logout route. In one tool call. With file paths and line numbers. Refactor scenario: I want to rename a method on a billing class. Claude Code calls sverklo_impact("BillingAccount.charge") and gets the 14 real callers ranked by depth, across the whole codebase. No grep noise from recharge, discharge, or a Battery.charge test fixture. The rename becomes mechanical. PR review scenario: I paste a git diff. Claude Code calls sverklo_review_diff and gets a risk-scored review order — highest-impact files first, production files with no test changes flagged, structural warnings for patterns like "new call inside a stream pipeline with no try-catch" (the kind of latent outage grep can't catch). Memory scenario: I tell Claude Code "we decided to use Postgres advisory locks instead of Redis for cross-worker mutexes." It calls sverklo_remember and the decision is saved against the current git SHA. Three weeks later when I ask "wait, what did we decide about mutexes?", Claude Code calls sverklo_recall and gets the decision back — including a flag if the relevant code has moved since. The 20 tools in one MCP server Grouped by job: Search: sverklo_search, sverklo_overview, sverklo_lookup, sverklo_context, sverklo_ast_grep Refactor safety: sverklo_impact, sverklo_refs, sverklo_deps, sverklo_audit Diff-aware review: sverklo_review_diff, sverklo_test_map, sverklo_diff_search Memory (bi-temporal, tied to git SHAs): sverklo_remember, sverklo_recall, sverklo_memories, sverklo_forget, sverklo_promote, sverklo_demote Index health: sverklo_status, sverklo_wakeup All 20 run locally. Zero cloud calls after the one-time 90MB embedding model download on first run. Install (30 seconds) npm install -g sverklo cd your-project && sverklo init sverklo init auto-detects Claude Code / Cursor / Windsurf / Google Antigravity, writes the right MCP config file for each, appends sverklo instructions to your CLAUDE.md, and runs sverklo doctor to verify the setup. Safe to re-run on existing projects. Before you install — a few honest things Not magic. The README has a "when to use grep instead" section. Small repos (<50 files), exact string lookups, and single-file edits are all cases where the built-in tools are fine or better. Privacy is a side effect, not the pitch. The pitch is the mental model. Local-first happens to come with it because running a symbol graph on your laptop is trivially cheap. It's v0.2.16. Pre-1.0. I ran a structured 3-session dogfood protocol on my own tool before shipping this version — the log is public (DOGFOOD.md in the repo) including the four bugs I found in my own tool and fixed. I triage issues within hours during launch week. Links Repo: github.com/sverklo/sverklo Playground (see real tool output on gin/nestjs/react without installing): sverklo.com/playground Benchmarks (reproducible with npm run bench): BENCHMARKS.md in the repo Dogfood log: DOGFOOD.md in the repo If you try it, tell me what breaks. I'll respond within hours and ship fixes fast. submitted by /u/Parking-Geologist586 [link] [comments]
View originalResearch shows auto-generated context makes AI agents 2-3% worse. I tested the opposite approach.
Hey, I've been building in the AI agent space and kept running into the same problem: agents don't really fail at writing code. They fail at understanding how the project works before they start. So they guess. Where to make changes, what pattern to follow, what files are safe to touch. And that's what causes most bad edits. I came across the ETH Zurich AGENTS.md study showing that auto-generated context can actually degrade agent performance by 2-3%. That matched what I was seeing — dumping more code or bigger prompts didn't help. It just gave the agent more surface area to guess from. So I tried the opposite: what if you only give the agent the stuff it *can't* infer from reading code? Things like: - conventions (how routing/auth/testing is actually done in this project) - constraints (generated files you shouldn't edit, circular deps to avoid) - structural signals (which files have 50+ dependents — touch with care) - git signals (what keeps breaking, what was tried and reverted) I built a CLI (and a few runtime tools so the agent can check itself mid-task) to test this. It scans a repo and generates ~70 lines of AGENTS.md with just that information. No LLM, no API key, runs locally in a few seconds. Then I ran it against real closed GitHub issues (Cal.com, Hono, Pydantic) with a pinned model. Agents with this context navigated to the right file faster, used the correct patterns, and produced more complete fixes. On one task: 136s vs 241s, with a 66% more thorough patch — from 70 lines of context, not the full repo. The surprising part: the biggest improvement didn't come from *adding* context. It came from removing everything that didn't matter. This actually lines up with something Karpathy has been saying recently — that agents need a knowledge base, not just more tokens. That distinction clicked after seeing it play out in practice. I also compared against full repo dumps and graph-based tools, and the pattern held — graphs help agents explore, but project knowledge helps them decide. Curious if others have seen the same thing. Feels like most of the problem isn't "more context," it's the wrong kind. (if anyone's curious, the CLI is called sourcebook — happy to share more, but mostly interested in whether this matches what others are seeing with their agents) submitted by /u/re3ze [link] [comments]
View originalUnexplained spending of tokens on Claude Coworking
Hello, I use Claude Cowork quite a bit for responding to calls for tenders. However, I’m surprised by how quickly my tokens are being used up. I have the €99 subscription, and I’m constantly waiting for my limit to be unlocked. Am I using it incorrectly? I’m willing to pay more, but Claude won’t let me. I’ve considered opening a second account, but I think that’s a shame. What do you think? Do you have any suggestions? submitted by /u/ComplaintForeign169 [link] [comments]
View originalCan someone share their workflow?
Are you using strictly CLI - or desktop app, plus chat? I'm very curious how y'all optimize your flow. For example in chat I have all of my "northstar" documents, claude.md, brand guidelines, file structure, prd., product brief etc. And Claude.md is specific in calling each one depending on the task. But for example, I ask claude chat to provide a prompt or a series of prompts that I can paste into CC that keeps each task scope tight and controlled. If the first prompt may output code that will affect the second output, only provide the first prompt. Then wait for the previous prompt output feedback...There must be something more sophisticated! And so I'm switching between chat and cli constantly. I'm sure there's a better way, and I'm ready to make the leap. Anyway, would love if people here could share their best practices. submitted by /u/PoisonTheAI [link] [comments]
View originalI run 3 experiments to test whether AI can learn and become "world class" at something
I will write this by hand because I am tried of using AI for everything and bc reddit rules TL,DR: Can AI somehow learn like a human to produce "world-class" outputs for specific domains? I spent about $5 and 100s of LLM calls. I tested 3 domains w following observations / conclusions: A) code debugging: AI are already world-class at debugging and trying to guide them results in worse performance. Dead end B) Landing page copy: routing strategy depending on visitor type won over one-size-fits-all prompting strategy. Promising results C) UI design: Producing "world-class" UI design seems required defining a design system first, it seems like can't be one-shotted. One shotting designs defaults to generic "tailwindy" UI because that is the design system the model knows. Might work but needs more testing with design system I have spent the last days running some experiments more or less compulsively and curiosity driven. The question I was asking myself first is: can AI learn to be a "world-class" somewhat like a human would? Gathering knowledge, processing, producing, analyzing, removing what is wrong, learning from experience etc. But compressed in hours (aka "I know Kung Fu"). To be clear I am talking about context engineering, not finetuning (I dont have the resources or the patience for that) I will mention world-class a handful of times. You can replace it be "expert" or "master" if that seems confusing. Ultimately, the ability of generating "world-class" output. I was asking myself that because I figure AI output out of the box kinda sucks at some tasks, for example, writing landing copy. I started talking with claude, and I designed and run experiments in 3 domains, one by one: code debugging, landing copy writing, UI design I relied on different models available in OpenRouter: Gemini Flash 2.0, DeepSeek R1, Qwen3 Coder, Claude Sonnet 4.5 I am not going to describe the experiments in detail because everyone would go to sleep, I will summarize and then provide my observations EXPERIMENT 1: CODE DEBUGGING I picked debugging because of zero downtime for testing. The result is either wrong or right and can be checked programmatically in seconds so I can perform many tests and iterations quickly. I started with the assumption that a prewritten knowledge base (KB) could improve debugging. I asked claude (opus 4.6) to design 8 realistic tests of different complexity then I run: bare model (zero shot, no instructions, "fix the bug"): 92% KB only: 85% KB + Multi-agent pipeline (diagnoser - critic -resolver: 93% What this shows is kinda suprising to me: context engineering (or, to be more precise, the context engineering in these experiments) at best it is a waste of tokens. And at worst it lowers output quality. Current models, not even SOTA like Opus 4.6 but current low-budget best models like gemini flash or qwen3 coder, are already world-class at debugging. And giving them context engineered to "behave as an expert", basically giving them instructions on how to debug, harms the result. This effect is stronger the smarter the model is. What this suggests? That if a model is already an expert at something, a human expert trying to nudge the model based on their opinionated experience might hurt more than it helps (plus consuming more tokens). And funny (or scary) enough a domain agnostic person might be getting better results than an expert because they are letting the model act without biasing it. This might be true as long as the model has the world-class expertise encoded in the weights. So if this is the case, you are likely better off if you don't tell the model how to do things. If this trend continues, if AI continues getting better at everything, we might reach a point where human expertise might be irrelevant or a liability. I am not saying I want that or don't want that. I just say this is a possibility. EXPERIMENT 2: LANDING COPY Here, since I can't and dont have the resources to run actual A/B testing experiments with a real audience, what I did was: Scrape documented landing copy conversion cases with real numbers: Moz, Crazy Egg, GoHenry, Smart Insights, Sunshine.co.uk, Course Hero Deconstructed the product or target of the page into a raw and plain description (no copy no sales) As claude oppus 4.6 to build a judge that scores the outputs in different dimensions Then I run landing copy geneation pipelines with different patterns (raw zero shot, question first, mechanism first...). I'll spare the details, ask if you really need to know. I'll jump into the observations: Context engineering helps writing landing copy of higher quality but it is not linear. The domain is not as deterministic as debugging (it fails or it breaks). It is much more depending on the context. Or one may say that in debugging all the context is self-contained in the problem itself whereas in landing writing you have to provide it. No single config won across all products. Instead, the
View originalBREAKING: Anthropic’s new “Mythos” model reportedly found the One Piece before the Straw Hats
Sources close to Anthropic have confirmed that their latest reasoning model, codenamed “Mythos,” has located the legendary treasure One Piece during what was described as a “routine benchmark test.” Eiichiro Oda was reportedly “furious” after learning that a large language model solved the mystery he has been carefully crafting for 27 years in approximately 11 seconds of inference time. “I had 342 more chapters planned,” Oda said through a translator, before locking himself in his studio. In response, Anthropic has launched Project Glasspoiler, an effort to use Mythos Preview to help secure the world’s most critical plot lines, and to prepare the industry for the practices we all will need to adopt to keep ahead of spoilers. Monkey D. Luffy could not be reached for comment, though sources say he is “not worried” and plans to “find it himself anyway because that’s the whole point.” OpenAI has since released a statement claiming their upcoming model “found it first but chose not to publish out of respect for the narrative.” submitted by /u/hencha [link] [comments]
View originalI legitimately think Anthropic is worth $100B more than it was a week ago
A week ago I put out a first-day IPO market cap forecast for Anthropic with a reference point of $19B ARR. Then Anthropic announced their ARR had grown from $19B to $30B. I updated my forecast and now think Anthropic is worth at least $100B more than I did a week ago. I'm still anchoring growth rate assumptions to how companies have historically scaled revenue, but if growth trends from the last four decades were to continue, this would imply a company growing faster than any company in history (~$10B in 2025 to ~$100B by 2027.) Previously, I thought OpenAI could achieve that. Now it looks like Anthropic is the company to do it, but with an even steeper revenue curve, given that they hit their first billion in ARR much later than OpenAI. Of course, it's difficult to figure out how much weight we should give to ridiculously outsized growth in the age of AI. If historical growth patterns no longer apply, then $643B is way too conservative. (Full updated forecast: https://futuresearch.ai/anthropic-30b-arr-ipo-valuation/) The second implication of this week's news is IPO timing and whether the $30B number makes Anthropic list earlier than my original March 2027 date. Investor sentiment is hot now, and it's always risky to bet that growth will continue at this astounding rate. How much could waiting another year cost them? submitted by /u/MathematicianBig2071 [link] [comments]
View originalMy Claude.md file
This is my Claude.md file, it is the same information for Gemini.md as i use Claude Max and Gemini Ultra. # CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview **Atlas UX** is a full-stack AI receptionist platform for trade businesses (plumbers, salons, HVAC). Lucy answers calls 24/7, books appointments, sends SMS confirmations, and notifies via Slack — for $99/mo. It runs as a web SPA and Electron desktop app, deployed on AWS Lightsail. The project is in Beta with built-in approval workflows and safety guardrails. ## Commands ### Frontend (root directory) ```bash npm run dev # Vite dev server at localhost:5173 npm run build # Production build to ./dist npm run preview # Preview production build npm run electron:dev # Run Electron desktop app npm run electron:build # Build Electron app ``` ### Backend (cd backend/) ```bash npm run dev # tsx watch mode (auto-recompile) npm run build # tsc compile to ./dist npm run start # Start Fastify server (port 8787) npm run worker:engine # Run AI orchestration loop npm run worker:email # Run email sender worker ``` ### Database ```bash docker-compose -f backend/docker-compose.yml up # Local PostgreSQL 16 npx prisma migrate dev # Run migrations npx prisma studio # DB GUI npx prisma db seed # Seed database ``` ### Knowledge Base ```bash cd backend && npm run kb:ingest-agents # Ingest agent docs cd backend && npm run kb:chunk-docs # Chunk KB documents ``` ## Architecture ### Directory Structure - `src/` — React 18 frontend (Vite + TypeScript + Tailwind CSS) - `components/` — Feature components (40+, often 10–70KB each) - `pages/` — Public-facing pages (Landing, Blog, Privacy, Terms, Store) - `lib/` — Client utilities (`api.ts`, `activeTenant.tsx` context) - `core/` — Client-side domain logic (agents, audit, exec, SGL) - `config/` — Email maps, AI personality config - `routes.ts` — All app routes (HashRouter-based) - `backend/src/` — Fastify 5 + TypeScript backend - `routes/` — 30+ route files, all mounted under `/v1` - `core/engine/` — Main AI orchestration engine - `plugins/` — Fastify plugins: `authPlugin`, `tenantPlugin`, `auditPlugin`, `csrfPlugin`, `tenantRateLimit` - `domain/` — Business domain logic (audit, content, ledger) - `services/` — Service layer (`elevenlabs.ts`, `credentialResolver.ts`, etc.) - `tools/` — Tool integrations (Outlook, Slack) - `workers/` — `engineLoop.ts` (ticks every 5s), `emailSender.ts` - `jobs/` — Database-backed job queue - `lib/encryption.ts` — AES-256-GCM encryption for stored credentials - `lib/webSearch.ts` — Multi-provider web search (You.com, Brave, Exa, Tavily, SerpAPI) with randomized rotation - `ai.ts` — AI provider setup (OpenAI, DeepSeek, OpenRouter, Cerebras) - `env.ts` — All environment variable definitions - `backend/prisma/` — Prisma schema (30KB+) and migrations - `electron/` — Electron main process and preload - `Agents/` — Agent configurations and policies - `policies/` — SGL.md (System Governance Language DSL), EXECUTION_CONSTITUTION.md - `workflows/` — Predefined workflow definitions ### Key Architectural Patterns **Multi-Tenancy:** Every DB table has a `tenant_id` FK. The backend's `tenantPlugin` extracts `x-tenant-id` from request headers. **Authentication:** JWT-based via `authPlugin.ts` (HS256, issuer/audience validated). Frontend sends token in Authorization header. Revoked tokens are checked against a `revokedToken` table (fail-closed). Expired revoked tokens are pruned daily. **CSRF Protection:** DB-backed synchronizer token pattern via `csrfPlugin.ts`. Tokens are issued on mutating responses, stored in `oauth_state` with 1-hour TTL, and validated on all state-changing requests. Webhook/callback endpoints are exempt (see `SKIP_PREFIXES` in the plugin). **Audit Trail:** All mutations must be logged to `audit_log` table via `auditPlugin`. Successful GETs and health/polling endpoints are skipped to reduce noise. On DB write failure, audit events fall back to stderr (never lost). Hash chain integrity (SOC 2 CC7.2) via `lib/auditChain.ts`. **Job System:** Async work is queued to the `jobs` DB table (statuses: queued → running → completed/failed). The engine loop picks up jobs periodically. **Engine Loop:** `workers/engineLoop.ts` is a separate Node process that ticks every `ENGINE_TICK_INTERVAL_MS` (default 5000ms). It handles the orchestration of autonomous agent actions. **AI Agents:** Named agents (Atlas=CEO, Binky=CRO, etc.) each have their own email accounts and role definitions. Agent behavior is governed by SGL policies. **Decisions/Approval Workflow:** High-risk actions (recurring charges, spend above `AUTO_SPEND_LIMIT_USD`, risk tier ≥ 2) require a `decision_memo` approval before execution. **Frontend Routing:** Uses `HashRouter` from React Router v7. All routes are defined in `src/routes.ts`. **Code Splitting:** Vite config splits chunks into `react-vendor`, `router`, `ui-vendor`, `charts`. **ElevenLabs Voice Agents:** Lucy's
View originalI shipped three Claude Code integrations for my smart TV CLI (CLI, MCP, Skill) and let daily use pick the winner.
I got tired of picking up the remote to start an episode of a show I already knew the name of. So I built stv — a Python CLI that lets Claude Code drive my LG, Samsung, Roku, and Android TVs directly. Say "play Frieren s2e8" and Netflix opens on the TV in about 3 seconds. Full disclosure first: most of stv was written with Claude Code itself. I review and merge, but the keystrokes aren't mine. Meta-ironic given that the whole point of stv is to let Claude control your TV. The thing I actually want to talk about in this post is that stv integrates with Claude Code three different ways, and I wasn't sure which would win — so I shipped all three and let my own daily use decide. 3 integration paths with Claude Code 1. CLI (dead simple — Claude already shells out) pip install stv stv setup Claude Code runs shell commands by default, so you can just tell it: "Run stv play netflix Wednesday s1e7" ...and it works. No config, no MCP setup. 2. MCP server (21 tools, structured) json { "mcpServers": { "tv": { "command": "uvx", "args": ["stv"] } } } 21 structured tools with typed schemas. Tools are intentionally chunky so the model makes fewer round-trips per conversation turn. 3. Claude Code Skill (drop-in, zero config) clawhub install smartest-tv The Skill auto-triggers on phrases like "play", "good night", "next episode" — so Claude knows when to invoke stv without being told. A typical evening for me me: play frieren s2e8 on the living room tv claude: [runs tv_play_content] Playing now. me: make it a bit quieter claude: [runs tv_volume(value=18)] Volume 18. me: good night claude: [runs tv_sync(action="off")] All 3 TVs off. Caveats, up front Samsung 2024+ models may block third-party control by design. Only confirmed on my Q60T. Spotify is web-search based and flaky on niche tracks. HBO Max / Disney+ unsupported. The CLI path is still 90% of what I use. The Skill is the one I want to use the most, but I haven't gotten the trigger phrases tight enough yet — suggestions very welcome. Install pip install stv stv setup GitHub: https://github.com/Hybirdss/smartest-tv PyPI: https://pypi.org/project/stv/ (v0.10.0, 252 tests, MIT) Happy to answer questions about which integration path works best, MCP design tradeoffs, the Netflix resolver, or the Skill triggering heuristics. submitted by /u/PatientEither6390 [link] [comments]
View originalCompiler as a service for AI agents.
Hey, I have been experimenting with Roslyn-style compiler tooling on my Unity project, now well past 400k LOC. Honestly it changes the game, it is like giving AI IDE level understanding, not just raw text access like most AI coding workflows still use today. What’s funny is that Microsoft solved a huge part of this 12+ years ago with Roslyn. Only now, with AI, does it feel like people are finally realizing what that unlocks. Goal of this post is to check whot other people think about this approach and how many of you have tried Roslyn like compilers wired to your AI? Have you hear about Roslyn type compilers yet? My guesstimate would be only around 1-5% of people are currently using some combination of it, although the benefit of using it is crazy when you count compounding interest with AI. For example - I used it to check the monolith that was previously marked as too entangled, and the Roslyn type search and code execution showed only 13 real dependancies compared to 100 found by grep alone. Second useful case is code execution. You can basicaly track the value through the chains, check the math time and precision, check if you have variables actually used or just sitting there as a dead code. Did anyone else exerimented with something similar on their projects? Not selling anything, I am really intrigued what others think abot this approach. Happy to hear your thoughts! submitted by /u/Emotional-Kale7272 [link] [comments]
View original[D] Dealing with an unprofessional reviewer using fake references and personal attacks in ICML26
We are currently facing an ICML 2026 reviewer who lowered the score to a 1 (Confidence 5) while ignoring our rebuttal and relying on fake references and personal insults like "close-minded" and "hostile." Despite my other reviewers giving 5s, this individual is using mathematically nonsensical proofs and making baseless accusations about MIT license/anonymity violations, all while using aggressive formatting and strange syntax errors (e.g., bolding ending with periods like **.). The reviewer is also constantly editing their "PS" section to bait Program Chair attention and bias the discussion phase. I’ve never seen such unprofessionalism in peer review; has anyone successfully had a review discarded or flagged for AC intervention when a reviewer uses demonstrably fraudulent citations and resorts to ad hominem attacks? Note: we got other two as 5 but one is shaking with partially resolved. We are pretty sure I respond each weakness with professional and respectful words in the first rebuttal but in the second, we pointed out the reviewer no relevant references and circular reasoning. He/she seems outrageous… I mean if he/she doesn’t agree we can battle with professionalism but the reviewer is basically living in his / her own mind. submitted by /u/Martinetin_ [link] [comments]
View originalI watched the TBPN acquisition broadcast closely. Here are the things that looked like praise but functioned as something else.
I have a lot of concerns about this whole thing. So I'm going to be making several posts. Post 2. On April 2, OpenAI acquired TBPN live on air. I watched the full broadcast. Most coverage treated it as a feel-good founder story. A few things read differently to me. The mic moment Before Jordi Hays read the hosts’ prepared joint statement, Coogan said on air: “Here... you wrote it, you want to read it?” Hays read the statement, dryly. Then Coogan immediately took the mic back and spent several minutes building a personal character portrait of Sam Altman as a generous, long-term mentor. One was the prepared joint statement. The other was Coogan’s own framing layered on top of it. The Soylent framing Coogan described Altman calling to help during a Soylent financing crisis and said it was “to my benefit, not particularly to his.” But Altman was an investor in Soylent. An investor helping a portfolio company survive a financing crisis may be generous, but it also protects an existing equity relationship. On the day OpenAI bought Coogan’s company, that standard investor-founder dynamic was presented as evidence of Altman’s character. The investor relationship dropped out of the framing. What wasn’t mentioned The acquisition broadcast didn’t mention that Altman personally invested in Soylent. It didn’t mention that Coogan’s second company Lucy went through Y Combinator while Altman was YC president, with YC investing. It didn’t mention that the hosts’ first collaboration was a marketing campaign for Lucy, or that the format prototype for TBPN was filmed during that campaign. The origin story told was: two founders, introduced by a mutual friend, started a podcast. My read on the independence framing (opinion): Altman said publicly he didn’t expect TBPN to go easy on OpenAI. But independence isn’t declared by the owner. It’s demonstrated over time by the journalists. And in the very first podcast, they're already going objectively easy on Altman. What Fidji’s memo actually described From the memo read on air, the hosts described Fidji’s vision roughly as: go talk to the Journal, the Times, Bloomberg, then come back and contextualize it for OpenAI and help them understand the strategy. That sounds less like a conventional media role and more like a strategic access-and-context function. The show’s value to OpenAI may not just be the audience. It may also be the incoming flow of people who want access to the show- investors, reporters, founders; and what gets said in those conversations before the cameras roll that might be objectively pro-OpenAI or anti-other tech companies without the public being able to provide discourse on inaccuracies since background talk is not always what makes it to the public podcast. OpenAI also wound down TBPN’s ad revenue, which reporting said was on track for $30M in 2026. That makes OpenAI TBPN’s primary financial relationship. That looks less like preserving an independent media business and more like absorbing a strategic asset. OpenAI has already demonstrated they are not averse to ads themselves considering the recent addition of ads to ChatGPT. Nicholas Shawa The hosts mentioned, "Nick", and they declined to give his last name, explaining his inbox is already unmanageable. I am assuming this to be Nicholas Shawa, and they noted he handles roughly 99% of guest bookings and outreach. That network of guest access and outreach is now functionally inside OpenAI. Jordi’s prepared quote Nine months before the acquisition, Hays had publicly criticized OpenAI. In his prepared statement on acquisition day, he said what stood out most about OpenAI was “their openness to feedback and commitment to getting this right.” That is a notable shift in tone, and it appeared in a prepared statement read from a script. The work ethic angle (opinion): Coogan runs Lucy, an active nicotine company whose whole premise is productivity: work harder, longer, better. TBPN is now inside the company whose CEO has often spoken in terms of AGI radically reshaping human labor. The person helping frame a technology often discussed in terms of large-scale job displacement also runs a company built around stimulant productivity culture. I don’t think that’s malicious. I think it may reflect a genuine ideological blind spot worth naming. Questions I’d like to discuss: If the independence claim is being made by the acquirer, what would actual editorial independence look like here in practice? Even if TBPN never posts anything unfavorable on air, what does the private discourse with guests, reporters, and investors sound like now? We have no visibility into that. The hosts’ first collaboration was marketing work for Lucy- a company that went through Y Combinator while Altman was YC president, with YC investing. Why was that left out of so much acquisition coverage? Why did OpenAI eliminate a revenue stream it didn’t need to eliminate? Sources on request. Everything factual abov
View originalBased on user reviews and social mentions, the most common pain points are: API costs, anthropic, ai agent, openai.
Based on 85 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Nous Research
Research Lab at Nous Research
2 mentions