We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.
I notice that the reviews section is empty and the social mentions provided don't contain user feedback specifically about Whisper. The mentions discuss other AI services like Vertex AI pricing updates and Cohere's ASR model, but don't include actual user experiences or opinions about Whisper itself. To provide an accurate summary of what users think about Whisper, I would need reviews and social mentions that specifically discuss user experiences with that tool, including comments about its performance, ease of use, pricing, and any issues users have encountered.
Mentions (30d)
2
Reviews
0
Platforms
4
GitHub Stars
97,088
11,974 forks
I notice that the reviews section is empty and the social mentions provided don't contain user feedback specifically about Whisper. The mentions discuss other AI services like Vertex AI pricing updates and Cohere's ASR model, but don't include actual user experiences or opinions about Whisper itself. To provide an accurate summary of what users think about Whisper, I would need reviews and social mentions that specifically discuss user experiences with that tool, including comments about its performance, ease of use, pricing, and any issues users have encountered.
Industry
research
Employees
7,500
Funding Stage
Venture (Round not Specified)
Total Funding
$281.9B
116,688
GitHub followers
238
GitHub repos
97,088
GitHub stars
20
npm packages
40
HuggingFace models
Spill It – I built a local, fast speech-to-text app for my 8GB Mac
I've been using Wispr Flow for a while, but it's gotten glitchy over time. So I started this as a weekend project: build something local that just works, built it fully on CC. The constraints shaped the product. I have a 2020 Mac with 8GB RAM, so I was honestly just building this for myself. Whisper V3 was way too slow locally on my hardware. I wanted something fast and snappy, so I went with NVIDIA's Parakeet TDT 0.6B, quantized to 4-bit (about 400MB). It's nearly instant. You release the hotkey and the text is there. I also made an active choice to skip multilingual and go English-only. That gave me the freedom to do serious rule-based post-processing on the STT output. Multilingual would have added complexity I didn't want. For post-processing, I tried local LLMs, even Gemma 4, but everything put too much pressure on memory and slowed things down. Settled on GECToR (a BERT-based tagger, about 250MB), which does decent cleanup: commas, full stops, capitalization. It edits rather than rewrites, which is what I wanted. Context awareness is the part I'm most excited about. The app reads your screen via the accessibility tree (filenames, names, git branches) and adapts formatting to where you're typing. Terminal gets different treatment than email. It's not perfect and it doesn't catch every word in context, but it does a surprisingly good job, especially in the terminal. Honestly, I've mostly been using this to talk to CC, and all the error don't come in the way of CC's comprehension. Local model with some errors works really well for CC use case. But for email and messages, you need more polish, so I added an optional cloud LLM layer (bring your own API key). From everything I've tested, Qwen3 on Cerebras and Llama on Groq perform best and are among the fastest. Based on my usage (about 3,000 words a day), I'm spending about $6 to $7 a month on API costs. A few other things: - Added Silero VAD, which helps a lot with noisy environments. Also helps with whispering that they keep taking about, personally I don't get why one would whisper. I've tested it in cafes speaking directly into the laptop. Does well with longer sentences, falters a bit more with short ones. - There are still occasional hallucinations at sentence boundaries, a stray "yeah" or "okay" that seeps through. Still working on it. Pricing: The local version is fully free. Unlimited, no login, no credit card, just download and go. The cloud LLM polish layer is a small one-time fee, but you bring your own API key. Ping me, will give you a free activation key, only ask please share feedback. I'd love your feedback, especially on the context-awareness approach and whether the local-first plus optional-cloud model makes sense as a product. Download from here: https://tryspillit.com. Would love to hear to the community's feedback. submitted by /u/afinasch [link] [comments]
View originalclaude-whisper : inter-instance messaging for Claude Code in ~240 lines of bash. Works in VS Code, JetBrains, Desktop (not just CLI)
Disclosure: I'm the author of this open-source tool (Apache 2.0, free). The problem: I run 5 Claude Code instances in parallel (frontend, backend, API, tests, infra). They can't talk to each other. Existing solutions (claude-peers-mcp, claude-ipc-mcp) require daemons, databases, and only work in the CLI. claude-whisper uses the filesystem as the message bus and the UserPromptSubmit hook as the event loop. When the inbox is empty, it costs zero tokens — the hook exits silently in <5ms. claude-whisper : multi-instance communication that works with VS Code plugin What it does: whisper-send backend "I refactored auth, check your imports" The recipient sees the message automatically at their next prompt They can reply, and the conversation flows naturally between instances Key points: ~240 lines of bash + jq. No daemon, no server, no runtime dependency Works everywhere Claude Code runs: CLI, VS Code, JetBrains, Desktop Atomic writes (no partial reads), Unix permissions, input validation Zero tokens at rest Demo GIF and full details: https://github.com/druide67/claude-whisper This is v0.2.0 — feedback and criticism welcome. submitted by /u/the_real_druide67 [link] [comments]
View originalMulti language recording is just trash with Claude
Hey everyone, sorry for the somewhat provocative title, but this is something that really bothers me about Claude, and honestly about Gemini too but I use Claude Desktop heavily in a mode where I speak directly through my mic to dictate what I'm thinking. I'm a native French speaker, but in my work I use tons of English terms, which means I'm constantly switching between both languages. Claude's web version doesn't offer proper voice recording at all, and the desktop version does have recording but it only works natively in one language, which is a problem since anything not in the source language just gets ignored or mangled. This is a real issue. Literally, this gives OpenAI a massive head start on this front and makes using Claude nearly impossible for native English speakers who work multilingually or non-English natives like me. I really want to use Claude. I've tried workarounds like using Super Whisper or similar tools, but honestly you basically need SuperWhisper—the paid version—to get something decent, and even then it takes time to process each recording. So it's not really better. Am I missing something? Are people using something else to solve this? I'm lost and would love a solution to this problem, otherwise I'm going to have to go back to OpenAI, which offers something on the web that's significantly better and much more functional. submitted by /u/Delicious-Courage760 [link] [comments]
View originalSpec-first beats vibe-coding. Here's what changed for me.
I used to write prompts and hope Claude would figure out what I needed. Spent weeks iterating, hitting walls, scrapping half the output. Then I started writing specifications first - actual written specs before touching the prompt. The difference is absurd. A design system I would have spent weeks on got scaffolded in 2 days. No reopening Figma, no "let me try this approach instead." Just spec, one solid prompt, done. The spec forces you to think through edge cases, naming conventions, what actually matters. When Claude reads a clear spec instead of vague intent, it invents less garbage and ships real stuff. I'm not exaggerating - it cuts iteration cycles in half. I also stopped typing entirely. Whisper for voice-to-text, Claude Code for 90% of my work. That part sounds gimmicky but it's genuinely changed how I work - you talk at the speed you think instead of hunt-and-peck your way through syntax. The trap most people fall into: they treat Claude like a search engine. Ask it something, get an answer, ask again. Treat it like a code partner who needs a real spec first, and suddenly you're shipping instead of iterating endlessly. Anyone else notice this? Or does everyone just prompt-and-pray? submitted by /u/Temporary_Layer7988 [link] [comments]
View originalBuilt a Chrome extension that adds voice input to Claude. Free for 30 minutes
When I switched from ChatGPT to Claude, the biggest thing I missed was dictation. I used it every day and it was a dealbreaker that Claude didn't have it natively. You can speak via AI mode but then it talks back at you, whereas I just wanted my words as text in the input box. So I vibe coded this using githubs copilot (claude opus 4.6) and it does exactly that. One click to record, Whisper transcribes it, text drops into the box. No API keys required. I've been using it daily with no issues. The final version just hit the Chrome Web Store. Would love it if you guys tested it out. If it works well for you a 5 star review would mean a lot, and if anything's broken please let me know! https://chromewebstore.google.com/detail/gkhidmabinchbopegkjhfklflokhgljn?utm_source=item-share-cb submitted by /u/ZacBartley [link] [comments]
View originalHad vibe-coded something like "dispatch" long time back, was too lazy all this while but wanted to OS the code
REPO: GITHUB Basically the title. I know there are hundreds of "access claude remote from telegram/whatsapp etc etc" codebases all over the internet, some of them are great. My situation was slightly specific, I preferred using the VScode UI for most things. When I used to commute for work I had a solid 2-2.5 hrs everyday to burn, but I didn't want the usual "remote" access, what I wanted was to access my terminal sitting at home. I have been building local servers etc for a while now and am well versed with tailscale. I simply vibecoded the part where my responses are pushed into the terminal at home via a tailscale pathway. On phone Laptop Anthropic took a while to launch Dispatch: this is something they should have shipped way earlier and way better. Like the concept of controlling your terminal from your phone is not some groundbreaking idea, people have been doing this with SSH for years. Because I tried Dispatch. I see some issues. One guy on the GitHub issues page said he sat through 10+ minutes of permission prompts on basic read commands. There's also a bug where it always spawns with Sonnet regardless of what model you have configured, and you can't change it from mobile. And the whole thing routes through Anthropic's servers. There's a GitHub issue from a Max subscriber where Dispatch was completely dead for 48 hours, support sent him bot replies, issue was marked "resolved" on the status page but still broken. I think they use relay servers but mine just keeps working. because it's tailscale. there's no Anthropic server in the middle to go down. So here's what ping-claude does: Claude finishes something at home, you get a notification on your phone with what it last said. Claude wants to do something destructive, you get approve/deny buttons on your phone. There's also a live activity feed showing every tool call as it happens. not just "Claude is working." you can see Bash running, Edit completing, Grep searching, in real time on your phone. The voice thing is genuinely the feature I use most. Groq Whisper, free tier, transcription in under a second. I just say "do this that" into my phone. The whole thing runs on your machine over tailscale. Nothing goes to any external server except the optional Groq call for voice. Setup is like 5 commands total, open the IP on your phone, add to home screen. Still under dev is the native push notifications, it's a PWA so the tab needs to be open. Expo app is on the list. if you want push notifications right now the Telegram integration works. (Yes it fully runs on a telegram bot) MIT licensed, been using it for months. would genuinely love contributors especially if anyone wants to take a crack anything else in this workflow. (IDK if it will be useful, but yeah) REPO: GITHUB submitted by /u/theRealSachinSpk [link] [comments]
View originalI got tired of watching YouTube to learn things, so I built a tool that turns any video into a transcript, summary, and knowledge graph
I consume a lot of technical content on YouTube — system architecture, LLMs, SEO, dev tools. Watching is slow. Pausing, rewinding, taking notes manually. So I built a small Claude Code tool that does this instead: Paste a YouTube link Get a structured summary + interactive knowledge graph Everything runs locally. One command: /process Under the hood: - yt-dlp + YouTube Transcript API for fast transcription - Whisper (local, no API key) as fallback for videos without subtitles - Claude Code extracts entities and relationships → builds a knowledge graph with NetworkX + PyVis - Outputs: raw transcript, summary.md, graph.html (open in browser) A 30-minute video processes in a few minutes. I now have 40+ videos as searchable notes. Repo: https://github.com/velmighty/youtube-to-knowledge Requires Claude Code. Works on Windows, macOS, Linux. Edit: Obsidian integration is live. Add --obsidian to any /process command. Generates one .md per entity with [[wikilinks]] and YAML frontmatter. Details in the README. submitted by /u/ElectronicPlan8497 [link] [comments]
View originalI gave Claude Code a knowledge graph, spaced repetition, and semantic search over my Obsidian vault — it actually remembers things now
# I built a 25-tool AI Second Brain with Claude Code + Obsidian + Ollama — here's the full architecture **TL;DR:** I spent a night building a self-improving knowledge system that runs 25 automated tools hourly. It indexes my vault with semantic search (bge-m3 on a 3080), builds a knowledge graph (375 nodes), detects contradictions, auto-prunes stale notes, tracks my frustration levels, does autonomous research, and generates Obsidian Canvas maps — all without me touching anything. Claude Code gets smarter every session because the vault feeds it optimized context automatically. --- ## The Problem I run a solo dev agency (web design + social media automation for Serbian SMBs). I have 4 interconnected projects, 64K business leads, and hundreds of Claude Code sessions per week. My problem: **Claude Code starts every session with amnesia.** It doesn't remember what we did yesterday, what decisions we made, or what's blocked. The standard fix (CLAUDE.md + MEMORY.md) helped but wasn't enough. I needed a system that: - Gets smarter over time without manual work - Survives context compaction (when Claude's memory gets cleared mid-session) - Connects knowledge across projects - Catches when old info contradicts new reality ## What I Built ### The Stack - **Obsidian** vault (~350 notes) as the knowledge store - **Claude Code** (Opus) as the AI that reads/writes the vault - **Ollama** + **bge-m3** (1024-dim embeddings, RTX 3080) for local semantic search - **SQLite** (better-sqlite3) for search index, graph DB, codebase index - **Express** server for a React dashboard - **2 MCP servers** giving Claude native vault + graph access - **Windows Task Scheduler** running everything hourly ### 25 Tools (all Node.js ES modules, zero external dependencies beyond what's already in the repo) #### Layer 1: Data Collection | Tool | What it does | |------|-------------| | `vault-live-sync.mjs` | Watches Claude Code JSONL sessions in real-time, converts to Obsidian notes | | `vault-sync.mjs` | Hourly sync: Supabase stats, AutoPost status, git activity, project context | | `vault-voice.mjs` | Voice-to-vault: Whisper transcription + Sonnet summary of audio files | | `vault-clip.mjs` | Web clipping: RSS feeds + Brave Search topic monitoring + AI summary | | `vault-git-stats.mjs` | Git metrics: commit streaks, file hotspots, hourly distribution, per-project breakdown | #### Layer 2: Processing & Intelligence | Tool | What it does | |------|-------------| | `vault-digest.mjs` | Daily digest: aggregates all sessions into one readable page | | `vault-reflect.mjs` | Uses Sonnet to extract key decisions from sessions, auto-promotes to MEMORY.md | | `vault-autotag.mjs` | AI auto-tagging: Sonnet suggests tags + wikilink connections for changed notes | | `vault-schema.mjs` | Frontmatter validator: 10 note types, compliance reporting, auto-fix mode | | `vault-handoff.mjs` | Generates machine-readable `handoff.json` (survives compaction better than markdown) | | `vault-session-start.mjs` | Assembles optimal context package for new Claude sessions | #### Layer 3: Search & Retrieval | Tool | What it does | |------|-------------| | `vault-search.mjs` | FTS5 + chunked semantic search (512-char chunks, bge-m3 1024-dim). Flags: `--semantic`, `--hybrid`, `--scope`, `--since`, `--between`, `--recent`. Retrieval logging + heat map. | | `vault-codebase.mjs` | Indexes 2,011 source files: exports, routes, imports, JSDoc. "Where is the image upload logic?" actually works. | | `vault-graph.mjs` | Knowledge graph: 375 nodes, 275 edges, betweenness centrality, community detection, link suggestions | | `vault-graph-mcp.mjs` | Graph as MCP server: 6 tools (search, neighbors, paths, common, bridges, communities) Claude can use natively | #### Layer 4: Self-Improvement | Tool | What it does | |------|-------------| | `vault-patterns.mjs` | Weekly patterns: momentum score (1-10), project attention %, velocity trends, token burn ($), stuck detection, frustration/energy tracking, burnout risk | | `vault-spaced.mjs` | Spaced repetition (FSRS): 348 notes tracked, priority-based review scheduling. Critical decisions resurface before you forget them. | | `vault-prune.mjs` | Hot/warm/cold decay scoring. Auto-archives stale notes. Never-retrieved notes get flagged. | | `vault-contradict.mjs` | Contradiction detection: rule-based (stale references, metric drift, date conflicts) + AI-powered (Sonnet compares related docs) | | `vault-research.mjs` | Autonomous research: Brave Search + Sonnet, scheduled topic monitoring (competitors, grants, tech trends) | #### Layer 5: Visualization & Monitoring | Tool | What it does | |------|-------------| | `vault-canvas.mjs` | Auto-generates Obsidian Canvas files from knowledge graph (5 modes: full map, per-project, hub-centered, communities, daily) | | `vault-heartbeat.mjs` | Proactive agent: gathers state from all services, Sonnet reasons about what needs attention, sends WhatsApp alerts | | `vault-dashboard/` | React SPA dashboard (Expre
View originalBuilt a voice dictation app entirely with Claude Code. 4 months in, 326 stars.
VoiceFlow runs Whisper locally for voice dictation. Hold a hotkey, speak, text shows up at your cursor. No cloud, no accounts. I built it with Claude Code and the repo has a CLAUDE.md documenting what was AI-assisted. Some of you might remember the first version I posted here in December. It was Windows-only, kind of rough, and I was mostly using it to dump context into Claude faster. Since then it has been 4 months, 10 releases, and 326 GitHub stars. It runs on Linux now too. The Linux port took about 3 days with Opus 4.6. Claude wrote the evdev hotkey capture code and I had never touched evdev before, worked on the first try. Same with AppImage packaging and CUDA library probing, stuff I had no experience with and it just handled it. PySide6 on Wayland was a different story. Transparency, compositing, multi-monitor detection, Claude kept suggesting fixes that sounded right but did not actually work. I ended up in the Qt docs for those. Clipboard was similar, the wl-copy vs xclip vs pyperclip situation on Linux is a mess and Claude's first pass was a catch-all abstraction that broke on half the setups. I had to be very specific: only wl-copy, only Wayland, fall back to wtype. After 4 months on this project, the thing I keep coming back to is that Claude Code works best when I hand it existing code and say "make this work on a different platform." When the problem is more open-ended it tends to guess confidently and get it wrong. Also set up GitHub Actions this week so both Windows and Linux builds are automated now. Caught a glibc bug from user reports that was breaking the AppImage on Fedora and KDE Neon, fixed it and shipped v1.4.0 within two days. 326 stars, MIT licensed, still free. Demo: https://i.redd.it/59rbyzplc87g1.gif Site: https://get-voice-flow.vercel.app/ Repo: https://github.com/infiniV/VoiceFlow submitted by /u/raww2222 [link] [comments]
View originalIs anyone else getting 2 completely different Claude Codes in 2 terminals?
I’m building a LinkedIn engagement app and have two terminals open. Terminal 1: great. Understands the task. Makes good changes. Stays on track. Terminal 2: same model, same me, same project… and this one is fighting demons. Hallucinates files. Says “implemented” when nothing was implemented. Gets lost halfway through. Fixes the wrong thing with full confidence. That’s the weird part: same model same codebase same person prompting same app But one terminal is locked in and the other is pure chaos. At this point I’m starting to think terminal history matters way more than people admit. Not just the model, but: • session buildup • context pollution • task ambiguity • prompt order Sometimes Claude Code feels elite. Other times it feels like a guy opening random files, whispering “I got this,” and changing unrelated code. Do you treat each terminal/session like a separate employee? What actually helps? Fresh terminal per task? Force planning first? Shorter tasks? Hard resets? Because right now it doesn’t feel like I’m using one coding assistant. It feels like I’m managing two coworkers, and one of them confidently lies in standup. submitted by /u/pavlito88 [link] [comments]
View originalMade a local voice-to-text app for Claude Code sessions. The AI inside it is disturbingly polite.
So I've been using Claude Code CLI a lot lately and one thing that was slowly killing me was the constant typing. Long prompts, back and forth, complex instructions - all typed. I kept thinking there must be a voice-to-text tool that works system-wide, fully local, no cloud, no subscriptions, just - speak and it types. Maybe there is and I just didn't know, but I couldn't find exactly what I wanted so I built it. It's called Eqho - free, open-source, lives in your system tray. Press a hotkey, speak, it types into whatever app is focused. OpenAI's Whisper is the brain of the app, nothing leaves your machine, GPU-accelerated if you have CUDA, falls back to CPU if you don't. I'm a designer by background, not a developer - Claude was basically my co-pilot the whole way through. Which feels fitting. One thing worth knowing about Whisper - it's a known quirk of the model that it can hallucinate "thank you"s out of silence when nobody is speaking. It's apparently just too polite for its own good. I don't know how to feel about that. At any point I'm somewhere between grateful and terrified. Worth knowing before you try it: Windows only for now, and the setup requires some command line comfort - Python, venv, that kind of thing. Not plug-and-play yet. The plan is to rewrite the core using whisper.cpp to make it something you can just download and run. That's where I want to take it. If you're comfortable with the setup - would love feedback. If not, star it and check back. GitHub: https://github.com/DanielMevit/Eqho submitted by /u/danielmevit [link] [comments]
View originalCohere's open-weight ASR model hits 5.4% word error rate — low enough to replace speech APIs in production pipelines
Enterprises building voice-enabled workflows have had limited options for production-grade transcription: closed APIs with data residency risks, or open models that trade accuracy for deployability. Cohere's new open-weight ASR model, Transcribe, is built to compete on all four key differentiators — contextual accuracy, latency, control and cost. Cohere says that Transcribe outperforms current leaders on accuracy — and unlike closed APIs, it can run on an organization's own infrastructure. Cohere, which can be accessed via an API or in Cohere’s Model Vault as cohere-transcribe-03-2026, has 2 billion parameters and is licensed under Apache-2.0. The company said Transcribe has an average word error rate (WER) of just 5.42%, so it makes fewer mistakes than similar models. It’s trained on 14 languages: English, French, German, Italian, Spanish, Greek, Dutch, Polish, Portuguese, Chinese, Japanese, Korean, Vietnamese and Arabic. The company did not specify which Chinese dialect the model was trained on. Cohere said it trained the model “with a deliberate focus on minimizing WER, while keeping production readiness top-of-mind.” According to Cohere, the result is a model that enterprises can plug directly into voice-powered automations, transcription pipelines, and audio search workflows. Self-hosted transcription for production pipelines Until recently, enterprise transcription has been a trade-off — closed APIs offered accuracy but locked in data; open models offered control but lagged on performance. Unlike Whisper, which launched as a research model under MIT license, Transcribe is available for commercial use from release and can run on an organization's own local GPU infrastructure. Early users flagged the commercial-ready open-weight approach as meaningful for enterprise deployments. Organizations can bring Transcribe to their own local instances, since Cohere said the model has a more manageable inference footprint for local GPUs. The company said they were able to
View originalNightingale — WhisperX powered open-source karaoke app that works with any song on your computer
Website: https://nightingale.cafe License: GPL-3.0 I've been working on a karaoke app called Nightingale Karaoke. You point it at your music folder and it turns your songs into karaoke - separates vocals from instrumentals, generates word-level synced lyrics, and lets you sing with highlighted lyrics and pitch scoring. Works with video files too. Everything runs locally on your machine, nothing gets uploaded. No accounts, no subscriptions, no telemetry. It ships as a single binary for Linux, macOS, and Windows. On first launch it sets up its own isolated Python environment and downloads the ML models it needs - no manual installation of dependencies required. My two biggest drivers for the creation of this were: The lack of karaoke coverage for niche, avant-garde, and local tracks. Nostalgia for the good old cheesy karaoke backgrounds with flowing rivers, city panoramas, etc. Some highlights: Stem separation using the UVR Karaoke model (preserves backing vocals) or Demucs Automatic lyrics via WhisperX transcription, or fetched from LRCLIB when available Pitch scoring with player profiles and scoreboards Gamepad support and TV-friendly UI scaling for party setups GPU acceleration on NVIDIA (CUDA) and Apple Silicon (CoreML/MPS) Built with Rust and the Bevy engine The whole stack is open source. No premium tier, no "open core" - just the app. Feedback and contributions welcome. submitted by /u/rzzzzru [link] [comments]
View originalClaude wrote this
An ode to the kittens- I’m mourning my mom and like to think somehow she had a hand in Penny getting pregnant. I asked Claude to include that. ChatGPT used to be able to do this. Ode to the Beans Thou still unravish’d kittens of delight, Thou nurslings of slow hours and warming fur, Small foster-things of Penny’s watchful sight, Who sleep and dream and do not know you stir The very heart of her who tends your keep— What tangled heap art thou upon the bed? What tiny mews escape thy milky sleep? What world exists beyond thy mother’s thread Of warmth, of breath, of soft and steadfast care? Ye do not know, nor need to. Ye are there. For she who watches you has known the dark— The January grief, the empty chair, The silence where a mother’s voice would mark The hour, the season, the familiar air. And in that silence, heavy as the frost, When all the world seemed hollow at its core, When she who gave me life herself was lost And I could find no footing anymore— Then Penny swelled, and turned, and bore you forth, Small Beans of light against the grieving earth. Was it my mother’s hand that sent you here? Some final gift dispatched before she went? I cannot prove it, but I hold it dear— That she who loved me knew what comfort meant, And from whatever threshold she then stood Between this world and what lies past our sight, She whispered unto Penny: Make it good. Give her something warm against the night. And Penny, dutiful, obeyed the call, And brought you, Beans — and you were worth it all. How fair thy paws! How vast they seem to thee, Who cannot yet command them where to go, Who wobble like small ships upon a sea Of blanket, and tip gently, and lie low, And sleep again — for sleep is all thy art, Thy great vocation, and thy sweetest gift. But O, ye do not know ye mend a heart That winter cracked — ye are the tender rift Through which the light returns, unbidden, bright, Small lanterns lit against my longest night. And Penny — Queen of Beans — serene and proud, Who bore you with the dignity of cats, Who cleans you with a rough and patient shroud Of tongue, and settles you like welcome mats Against her belly — she, too, plays her part In this, my mother’s last conspiracy Of love: to place new life beside a heart That ached, and say These Beans will comfort thee. And so they do. Today the Beans are small. Today is warmth, and milk, and grace through all. Ye Beans, ye Beans! Thou needest not be fair As nightingales or Grecian urns to earn A poem — for ye are my mother’s prayer Made fur and breath, the last and sweetest turn Of her devotion, reaching past the veil To say I know the dark. I know the cost. But here — take these. Let love not wholly fail. And so I hold you, Beans, and am not lost. For truth is warmth, and warmth is all ye know, And that is all I need, here below. submitted by /u/yumyum_cat [link] [comments]
View originalchore(pricing): Update vertex-ai pricing
## 🔄 Pricing Update: vertex-ai ### 📊 Summary (complete_diff mode) | Change Type | Count | |-------------|-------| | ➕ Models added | 70 | | 🔄 Models updated (merged) | 24 | ### ➕ New Models - `gemini-2.5-computer-use-preview-10-2025` - `gemini-2.5-flash-preview-09-2025` - `gemini-2.5-flash-lite-preview-09-2025` - `gemini-3.1-flash-lite-preview` - `imagen-3.0-generate-002` - `imagen-3.0-capability-002` - `imagen-product-recontext-preview-06-30` - `text-embedding-large-exp-03-07` - `multimodalembedding` - `gpt-oss` - `gpt-oss-120b-maas` - `whisper-large` - `mistral` - `mixtral` - `mistral-small-2503` - `codestral-2501-self-deploy` - `mistral-ocr-2505` - `mistral-medium-3` - `codestral-2` - `ministral-3` - ... and 50 more ### 🔄 Updated Models - `gemini-2.5-pro` - `gemini-2.5-flash` - `gemini-2.5-flash-lite` - `gemini-2.5-flash-image` - `gemini-2.5-flash-image-preview` - `gemini-3.1-pro-preview` - `gemini-3-pro-preview` - `gemini-3-pro-image-preview` - `imagen-4.0-generate-001` - `imagen-4.0-fast-generate-001` - `imagen-4.0-ultra-generate-001` - `imagen-4.0-generate-preview-06-06` - `imagen-4.0-fast-generate-preview-06-06` - `imagen-4.0-ultra-generate-preview-06-06` - `imagen-3.0-capability-001` - `veo-3.0-generate-001` - `veo-3.0-fast-generate-001` - `veo-3.0-generate-preview` - `veo-3.0-fast-generate-preview` - `veo-3.1-generate-001` - `veo-3.1-generate-preview` - `veo-3.1-fast-generate-preview` - `text-embedding-005` - `text-multilingual-embedding-002` ## Model-to-Pricing-Page Mapping | Model ID | Publisher / Section | Source | Notes | |----------|-------------------|--------|-------| | `gemini-2.5-pro` | Google – Gemini 2.5 | API | $1.25/$10 input/output (≤200K); cache read $0.125 | | `gemini-2.5-flash` | Google – Gemini 2.5 | API | $0.30/$2.50; cache $0.03; image_token $30/1M | | `gemini-2.5-flash-lite` | Google – Gemini 2.5 | API | $0.10/$0.40; cache $0.01 | | `gemini-2.5-flash-image` | Google – Gemini 2.5 | API | Same as gemini-2.5-flash with image output | | `gemini-2.5-flash-image-preview` | Google – Gemini 2.5 | API | Same as gemini-2.5-flash (preview alias) | | `gemini-2.5-computer-use-preview-10-2025` | Google – Gemini 2.5 | API | Matched as "Gemini 2.5 Pro Computer Use-Preview"; $1.25/$10, no cache | | `gemini-2.5-flash-preview-09-2025` | Google – Gemini 2.5 | API | Preview alias of gemini-2.5-flash; same pricing | | `gemini-2.5-flash-lite-preview-09-2025` | Google – Gemini 2.5 | API | Preview alias of gemini-2.5-flash-lite; same pricing | | `gemini-2.0-flash-001` | Google – Gemini 2.0 | API | $0.15/$0.60; batch $0.075/$0.30 | | `gemini-2.0-flash-lite-001` | Google – Gemini 2.0 | API | $0.075/$0.30; batch $0.0375/$0.15 | | `gemini-3.1-pro-preview` | Google – Gemini 3 | API | $2/$12; cache $0.2; web_search 1.4¢ | | `gemini-3-pro-preview` | Google – Gemini 3 | API | $2/$12; cache $0.2; web_search 1.4¢ | | `gemini-3-pro-image-preview` | Google – Gemini 3 | API | $2/$12; image_token $120/1M; web_search 1.4¢ | | `gemini-3.1-flash-image-preview` | Google – Gemini 3 | API | $0.50/$3; image_token $60/1M; web_search 1.4¢ | | `gemini-3.1-flash-lite-preview` | Google – Gemini 3 | API | $0.25/$1.50; cache $0.025; web_search 1.4¢ | | `gemini-3-flash-preview` | Google – Gemini 3 | API | $0.50/$3; cache $0.05; web_search 1.4¢ | | `imagen-4.0-generate-001` | Google – Imagen | API | Row matched via lookup_variant `imagen-4.0-generate`; $0.04/image | | `imagen-4.0-fast-generate-001` | Google – Imagen | API | Row matched via `imagen-4.0-fast-generate`; $0.02/image | | `imagen-4.0-ultra-generate-001` | Google – Imagen | API | Row matched via `imagen-4.0-ultra-generate`; $0.06/image | | `imagen-4.0-generate-preview-06-06` | Google – Imagen | API | Preview; matched as Imagen 4; $0.04/image | | `imagen-4.0-fast-generate-preview-06-06` | Google – Imagen | API | Preview; matched as Imagen 4 Fast; $0.02/image | | `imagen-4.0-ultra-generate-preview-06-06` | Google – Imagen | API | Preview; matched as Imagen 4 Ultra; $0.06/image | | `imagen-3.0-generate-002` | Google – Imagen | API | Row matched via `imagen-3.0-generate`; $0.04/image | | `imagen-3.0-capability-001` | Google – Imagen | API – price not found | Editing/VQA feature model; no pricing row | | `imagen-3.0-capability-002` | Google – Imagen | API – price not found | Editing/VQA feature model; no pricing row | | `imagen-product-recontext-preview-06-30` | Google – Imagen | API | "Imagen Product Recontext"; $0.12/image | | `veo-2.0-generate-001` | Google – Veo | API | Row matched via `veo-2.0-generate`; $0.50/sec | | `veo-3.0-generate-001` | Google – Veo | API | Row matched as Veo 3 (video+audio rate); $0.40/sec | | `veo-3.0-fast-generate-001` | Google – Veo | API | Row matched as Veo 3 Fast; $0.15/sec | | `veo-3.0-generate-preview` | Google – Veo | API | Preview alias of Veo 3; $0.40/sec | | `veo-3.0-fast-generate-preview` | Google – Veo | API | Preview alias of Veo 3 Fast; $0.15/sec | | `veo-3.1-generate-001` | Google – Veo | API | Row matched as Veo 3.1; $0
View originalRepository Audit Available
Deep analysis of openai/whisper — architecture, costs, security, dependencies & more
Whisper uses a tiered pricing model. Visit their website for current pricing details.
Whisper has a public GitHub repository with 97,088 stars.
Based on user reviews and social mentions, the most common pain points are: API costs, openai, gpt, token cost.
Based on 20 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Sentdex
Creator at Python & AI YouTube
1 mention