Bring your own code, and run CPU, GPU, and data-intensive compute at scale. The serverless platform for AI and data teams.
Users generally praise Modal for its AI capabilities and integration flexibility, particularly for AI model discovery and multimodal engagement features. However, there is some frustration about the lack of detailed documentation and occasional performance issues, especially when managing large datasets or complex processes. Pricing sentiment is largely neutral, with users indicating that the costs are acceptable given Modal's extensive functionalities. Overall, Modal maintains a solid reputation for being a reliable and versatile tool for AI integration projects.
Mentions (30d)
16
1 this week
Reviews
0
Platforms
4
GitHub Stars
456
86 forks
Users generally praise Modal for its AI capabilities and integration flexibility, particularly for AI model discovery and multimodal engagement features. However, there is some frustration about the lack of detailed documentation and occasional performance issues, especially when managing large datasets or complex processes. Pricing sentiment is largely neutral, with users indicating that the costs are acceptable given Modal's extensive functionalities. Overall, Modal maintains a solid reputation for being a reliable and versatile tool for AI integration projects.
Features
Use Cases
Industry
information technology & services
Employees
80
Funding Stage
Series B
Total Funding
$112.0M
1,268
GitHub followers
77
GitHub repos
456
GitHub stars
20
npm packages
2
HuggingFace models
Claude just called me a human bunny?
I am using Claude Sonnet 4.6 to write a python script for an nlp sentimental analysis. I did not tell it to create all of the code and send it my way, but let's create together step by step so I can test each line before making it into the final form. After trying out a line of code that would filter out the footnotes from a pdf (by using the mean average) i told it that maybe we should try using another method (the modal average) because it still wasnt working. It gave me the answer, the code, the reason and all. The picture is what was at the end of the output. It looks unfinished as well, like it realised it didnt want to say that out loud, but still said it. Does anybody have an explanation? https://preview.redd.it/ruuvit5u6r2h1.png?width=693&format=png&auto=webp&s=6b88d7ea1a9e84fb694e22af2a731772bd5297ee
View originalPricing found: $355, $0.001736 / sec, $0.001261 / sec, $0.001097 / sec, $0.000842 / sec
AI solves 80-year-old math conjecture for under $1000
GPT-next solved an 80-year-old Erdős combinatorics conjecture for under $1,000 in compute. That single fact reframes everything else happening this week. The [Erdős unit distance problem](https://www.latent.space/p/ainews-openai-gpt-next-disproves) resisted human mathematicians since 1946. A frontier model closed it at a cost lower than a mid-tier SaaS subscription, which means the boundary between "AI as tool" and "AI as independent discoverer" is no longer theoretical. [Lilian Weng's new deep dive](https://lilianweng.github.io/posts/2025-05-01-thinking/) on test-time compute and chain-of-thought reasoning explains the underlying mechanism: reasoning models are not retrieving known proofs, they are generating novel inference chains at scale. The infrastructure layer is pricing this in faster than most observers realize. [Railway reports $200K+ monthly coding agent spend](https://www.latent.space/p/railway) and 100K signups per week, and is now building own-metal data centers to absorb the load. Daytona hit 850K daily sandbox runs with 74% month-over-month growth, confirming that isolated compute environments are now a first-class primitive, not a niche DevOps concern. Three specialized infrastructure companies, Exa, Modal, and TurboPuffer, reached unicorn valuations simultaneously this week, covering retrieval, serverless GPU, and vector search. When picks-and-shovels companies price in sustained demand at the same moment, it is not coincidence. Every major lab has now repositioned as an agent lab, not a model lab. [ClickUp replacing hundreds of employees with thousands of AI agents](https://techcrunch.com/2026/05/25/what-clickups-mass-layoff-tells-us-about-the-future-of-work/) is the first established tech company to execute that repositioning at the labor level rather than just the product level. The counterweight is that [Salesforce customers remain locked in](https://www.theregister.com/saas/2026/05/26/the-saas-pocalypse-can-wait-salesforce-still-has-customers-where-it-wants-them/5245228) despite the theoretical ability to rebuild on AI-native stacks cheaply. Data gravity and switching costs are buying incumbents time, but ClickUp's move suggests that time is measured in quarters, not years. The governance conversation caught up this week in an unexpected place. [Pope Leo XIV's 42,000-word encyclical](https://simonwillison.net/2026/May/25/encyclical-on-ai/#atom-everything) names specific failure modes including algorithmic control, surveillance capitalism, and autonomous weapons, and will directly shape EU and Latin American regulatory debates. [TechCrunch's read](https://techcrunch.com/2026/05/25/the-popes-ai-encyclical-isnt-really-about-ai/) is that the document's real target is the tech elite's capacity to reshape society outside democratic accountability, a framing that lands harder alongside [new UK research](https://www.theregister.com/off-prem/2026/05/26/big-tech-extracts-retirement-scale-wealth-from-uk-internet-users-research-shows/5246048) quantifying data extraction from consumers as equivalent in value to retirement savings. The Vatican and the empiricists arrived at the same diagnosis from opposite directions. Two structural forces will shape AI infrastructure economics over the next 90 days in ways most deployment teams are not modeling. China flooding global markets with DRAM and NAND will compress inference cluster costs faster than US export controls intended. The EU's sovereign cloud setback has paradoxically clarified the build-domestic mandate, accelerating European AI infrastructure investment independent of US hyperscalers. Security remains the open variable: even Google has no established playbook for prompt injection, model supply chain risk, or agentic authorization at production scale. A second Fortune 500 company will publicly attribute a reduction of more than 500 knowledge-worker roles directly to agentic AI systems before Q3 earnings season, making ClickUp's announcement the start of a visible series rather than an isolated case.
View originalWhat I learned building my latest AI app how one bad output exposed that I had no crisis safeguarding, and the 4-hour floor I'm adding before a single user touches it
I'm building a life coach app an offshoot from a personal tool I was using. Multiple AI agents, one for reflection, one for the body, one for finances, etc pre launch, no users, just me iterating. Last week I was testing the reflection agent on a journal entry about struggling with gym and hygiene habits. It returned this: >"You describe yourself as struggling with X, yet your stress stays at 2-3 and mood holds at 3. What are you actually avoiding naming about the gap between what you say matters and what you are doing?" My system prompt explicitly forbade rhetorical "what are you avoiding" questions the model did it anyway I sat down to tighten the prompt, thinking it was a 20 minute job. Then I looked at the output properly. The model had manufactured a contradiction that was not there. Low stress plus struggling with habits is not a contradiction, it is just being a human muddling along. The prompt told the agent to "surface contradictions" as part of its job, so the model was doing what I asked, finding contradictions whether they existed or not. LLMs are pattern matchers. Give one a job called "find the hidden thing" and it will produce hidden things either way. The fix was not tone, it was role definition. The agent is called the Mirror. A mirror does not interpret, it shows you what you look like. I rewrote the prompt around that principle. Do not introduce vocabulary the user has not used. Do not draw connections they have not drawn. Restate their words in their own words. Once the prompt was sharper, I sat with the question, What happens when a user writes something genuinely dark into this thing? People do not compartmentalise. Someone opening a journaling app to write about their gym routine ends up writing about why they have not been going, which involves why they have been feeling flat, which involves whatever is actually going on. You sit down to write about one thing and the real thing shows up. The agent I had scoped to "not be a therapist" was going to be the first thing a user talked to when they were struggling. Not because the agent invited it, but because the app was open and they needed somewhere to put their words. I had seen the Meta and OpenAI cases online cropping up the pattern in the worst incidents is the same. The model did not notice, or noticed and kept going. People wrote increasingly dark content over hours or days. The AI reflected it back, sometimes affirmed it, sometimes asked follow up questions that escalated rather than redirected. There were real harms. If a user wrote concerning content into my reflection agent, it would have produced a Stoic-flavoured response about acceptance and presence. The response would have sounded confident and would have been wrong, and it would have been the only thing between that user and whatever happened next. The same lesson from the rhetorical-question problem applied at a darker level. A good prompt does not stop the model doing the wrong thing. If it will do rhetorical interrogation despite the prompt forbidding it for gym content, it will do worse with crisis content. You cannot prompt your way to safety on critical paths. The model has to be out of the loop on those paths. **The scope trap** I started planning the proper safeguarding architecture. Detection layers, classifier models, pattern detection across entries, monitored user states, behavioural modes for vulnerable users, human reviewers with mental health first aid certs, clinical advisors, solicitor-reviewed legal pages, ICO registration, professional indemnity insurance. Then I caught myself I had no users. I was planning a hospital before anyone had walked in for a check up. So I worked backwards from "what is the actual minimum that protects the next person who touches this" and ignored everything else for a moment. **The 4-hour floor (this is the part worth copying)** If you are building any chat-with-AI app where users can type freely about anything personal, this is the minimum you need before first user. 1. Regex and keyword layer in your API middleware. Runs at the route handler level, before any agent's model call. Scans every text input field (message, journal, settings free text, capture box) for clear crisis vocabulary across the relevant categories for your audience. 2. When patterns hit, hardcoded crisis response. The model never generates it. Static text with real phone numbers for your region. 3. The flagged entry still saves. Textarea stays usable. The AI just does not respond to flagged content, it hands off. Do not delete the user's writing, that is its own violation. 4. Clear disclaimer at signup. This is not therapy, this is not a crisis service, here are real numbers to call. About four hours. Required at the moment anyone who is not you opens the app. Once I started building, the marginal cost of each next layer kept feeling small and the marginal benefit kept feeling real. So I went further than the floor. This is more tha
View originalMy Mac now has a wake word for Claude Code
Honestly this started as a weekend hack because I was tired of typing the same kind of prompts into Claude Code over and over. I wanted to just talk to it while making coffee. So I rigged up a wake word (Yabby), a WebRTC voice loop for the conversation, and an actual plan-approval modal that pops up before any agent runs so I can vet what's about to happen first. That was the plan. Two weekends later it had quietly turned into something weirder. The voice loop now talks to a "lead agent" that breaks the work down into a discovery phase, a plan, then it recruits a small team a manager or two, and sub-agents that actually do the work. They run in parallel where they can, sequentially where they can't, and when a sub-agent finishes there's an auto-triggered review pass (5 second debounce so they don't pile up). The lead agent watches the whole cascade and reports back by voice when everything's QA'd and done. Each agent runs its own Claude Code session under the hood with its own thread, so the conversations don't bleed. Watching three agents work in parallel on the same project last night was genuinely uncanny. One of them caught a bug another one had written. That part I really didn't expect. Things I still hate about it: \- Speaker verification is fiddly. Cosine-similarity threshold on the speaker embedding is annoying to tune too tight and it rejects me when I have a cold, too loose and it'll wake for anyone in the room. \- French was the default locale because I wrote it that way. Slowly fixing it. \- Background tasks dying when the parent Claude Code CLI exits was a nightmare to track. Ended up writing an OS-level PID watcher with a bookkeeper shell script just to know which long-lived servers had crashed. \- Lead agent occasionally over-plans tiny tasks. Ask it to rename a file and you get a four-phase project plan. Working on it. Stuff I'm still figuring out: how to make the QA phase less chatty, whether to let sub-agents recruit their own sub-agents, and how to keep the voice latency under 300ms when the Realtime API gets cranky. Curious if anyone else has tried voice-controlling Claude Code? Anthropic rolled out their own voice mode to 5% of users a couple weeks back and I keep wondering how they'll handle the multi-agent piece does anyone here have access to that rollout yet?
View originalClaude just called me a human bunny?
I am using Claude Sonnet 4.6 to write a python script for an nlp sentimental analysis. I did not tell it to create all of the code and send it my way, but let's create together step by step so I can test each line before making it into the final form. After trying out a line of code that would filter out the footnotes from a pdf (by using the mean average) i told it that maybe we should try using another method (the modal average) because it still wasnt working. It gave me the answer, the code, the reason and all. The picture is what was at the end of the output. It looks unfinished as well, like it realised it didnt want to say that out loud, but still said it. Does anybody have an explanation? https://preview.redd.it/ruuvit5u6r2h1.png?width=693&format=png&auto=webp&s=6b88d7ea1a9e84fb694e22af2a731772bd5297ee
View originalSelf-hosted sandboxes and MCP tunnels for Claude Managed Agents are now in public beta.
Self-hosted sandboxes lets you run agents in any environment you control: your own infrastructure, or managed providers like Cloudflare, Daytona, Modal, or Vercel. MCP tunnels connect your agents to MCP servers deployed in your private network without exposing them to the public internet. Available today on the Claude Platform. Read more: [https://claude.com/blog/claude-managed-agents-updates](https://claude.com/blog/claude-managed-agents-updates)
View originalScaling LLMs horizontally: hidden-state coupling without weight modification [R]
Residual Coupling (RC) connects frozen language models in parallel using small, learned linear bridge projections. These bridges read hidden states from one model and inject additive updates into the residual stream of another at intermediate layers. In bilateral setups, simultaneous return bridges form a feedback loop that stabilizes both streams without altering base weights. This architecture establishes a two-step paradigm where base models function as memorizers, while lightweight linear bridges handle cross-domain generalization. Constraining the bridges to purely linear maps prevents overfitting because they can only map existing geometric relationships between the frozen representation spaces. As the bridges are optimized against ground-truth target data, they have no incentive to map ungrounded features such as individual models' hallucinations. Keeping the base weights completely frozen eliminates catastrophic forgetting. The system maintains operational closure, transforming inputs through its existing structure rather than changing to accommodate them. Evaluating bilateral RC against Mixture-of-Experts (MoE) routing across the same frozen models shows these results: * Medical (3-model): Reduces perplexity to 11.02, compared to 56.80 for MoE and 57.08 for the frozen baseline. This represents an 80.7% reduction. * TruthfulQA Health (MC1): Improves accuracy by 9.1 percentage points over the baseline. Independent models have uncorrelated hallucinations, allowing the bridge gates to amplify consistent cross-model updates while suppressing individual errors. * Coding Test: CodeGPT-small-py and GPT-2 use different tokenizers, causing a 7-million baseline perplexity on mismatched text. MoE reaches 878, but RC achieves 5.91 by reading hidden states before the output projection collapses. This framework introduces a horizontal scaling axis for multi-model systems, moving beyond vertical scaling via larger monolithic models. Latency remains bounded by the slowest single model. Specialists can be added or removed without retraining the remaining system. In some scenarios, this architecture could replace multi-turn text prompting in agentic workflows with a single parallel forward pass, allowing models and/or bridges to run on separate nodes or edge devices without a central bottleneck. By decoupling memorization from relational alignment, RC bridges provide a framework for scaling multi-model systems and offer a path toward native multi-modal integration. Paper: [https://ssrn.com/abstract=6746521](https://ssrn.com/abstract=6746521) Code: [https://github.com/pfekin/residual-coupling/](https://github.com/pfekin/residual-coupling/)
View originalClaude keeps asking for permission when I have allow bypass on
I’m new to Claude, I have allow bypass on in Claude extension for antigravity. Then bypass permissions mode selected for antigravity. I still get these pop ups, anyway to fix and have Claude run more automatically after commands?
View originalChatGPT only lets you delete chats one at a time!! So I built a bulk delete dashboard!!
About a year ago I tried to clean up my ChatGPT chat list. I had something like 800 conversations, two years deep, mostly auto-titled "Untitled chat" garbage that I couldn't tell apart without opening. I sat down to delete the dead ones. Click chat. Click three-dot menu. Click Delete. Confirm. Click the next chat. Same thing. Repeat. After an hour I had deleted maybe 40 chats. Forty!! Out of 800!! That's the rate of clearing a 2-year history in something like three full workdays of just sitting there clicking confirm. I looked for a native bulk option. There isn't one inside ChatGPT itself. The closest is "Delete all chats" in Settings > Data Controls, which is the nuclear all-or-nothing button. There's no "delete the oldest 300" or "archive everything from before March". That's the entire native API. This seemed insane to me given how trivial "Select All plus Delete" is in literally every other product I've used since 2008! So I built the missing piece. **What I built** It's a Manage Chats modal inside a Chrome extension I ship called ChatGPT Toolbox (also runs on Edge, Brave, Opera, Arc). The modal lists every conversation in your account with checkboxes. Tick what you want gone, click Delete or Archive, and it runs through them in batches of 10 with a progress bar. [ChatGPT Toolbox Manage Chats Feature](https://preview.redd.it/097kho42ln1h1.png?width=892&format=png&auto=webp&s=3b9a9c517fa1005e968b9e664c08037b97795583) A few details that came out of dogfooding it: * **Color-coded age badges** on every chat. Green for the last week, blue for the last month, amber for the last 6 months, red for older than 6 months. The first thing I realized was that picking what to delete was the hard part, not the deletion itself, and age was the strongest signal for "I will never look at this again". * **Active vs Archived tabs.** Archive ended up getting more use than Delete in my own usage, because I was rarely 100% sure I wouldn't want a chat back. So I made archive a first-class action, not a second-tier option. * **Live progress bar** ("Deleting 23/50") on bulk operations. I tried it without and kept refreshing the page mid-operation thinking it was stuck. Adding the indicator stopped that completely. * **Search by title** to filter the list before you start ticking. Surprisingly useful even on the auto-generated nonsense titles because there's usually some keyword in there. * **Bulk export** to text, markdown, JSON, or PDF. Less critical for cleanup itself, but a few testers asked for it so they could save a chat outside ChatGPT before deleting it. I went from 800 chats to about 60 in 5 minutes using it. Most of those 5 minutes was deciding what to keep, not the deleting itself. **How does the workflow look?** Open the modal. List loads sorted by recency. Search to narrow it down if you want. Tick checkboxes. Hit Delete or Archive. Confirm. Progress bar runs through them. Done! If you've cleaned up a big ChatGPT history (with or without my tool, or with some clever workflow I haven't seen), would genuinely love to compare approaches in the comments.
View originalI hate it here...
Look at what they did to my boy 😭 But honestly, still miles ahead of ChatGPT, from it I would get page long wall of text
View originalI think its writing the SVG icons its funny btw
I think its writing the SVG icons its funny btw
View originalDecline in Opus 4.7 Max Quality
I’m currently working on two different projects, and both use **the same** Pre-Paywall modal. See the Figma file below: https://preview.redd.it/d7ri53vo9szg1.jpg?width=730&format=pjpg&auto=webp&s=a722bcd11caaa0b068f2c6af360cea687af76a17 I implemented the first one two weeks ago, and without any additional prompting, it was implemented correctly. You can see the result below: https://preview.redd.it/j4mr6k8u8szg1.jpg?width=919&format=pjpg&auto=webp&s=da920a7c1d0eefa951886235e0ca996cfb6fc43e Last night, I started implementing the same modal in another project, and for me, it became clear evidence of a decline in the quality of Opus 4.7. I compacted the context window, used `/effort max`, and even added `ultrathink`, but none of that helped. The result I got is shown below https://preview.redd.it/x87okntv8szg1.png?width=1260&format=png&auto=webp&s=d1d29875df6b947e21bc6647c0725171084d8c20 Note: I have used GPT 5.5 to fix it; after 2 prompts, it was ok...
View originalClaude can now build and publish websites to a domain right from chat
I built [teenyapp.com](http://teenyapp.com), a tool that lets Claude on the web (or any AI chat) build and deploy a full website end to end from a single pasted link. The problem teenyapp solves: every time I asked Claude to actually ship something, the agentic workflow broke. Cloudflare config, Vercel CLI, GitHub repos, env vars, secrets, DNS... all of it meant leaving the chat, signing up for some service, installing dependencies. So I built a way for Claude to handle the whole thing, right from chat. How it works: claim a live domain up front (yourapp.app.teenyapp.com), and you get a link back with an agent token baked in. Paste that link into Claude. Claude reads the agents.md instruction file at the link, and uses the agent token as bearer token to make HTTP POST requests that scaffold the project, writes the frontend and backend code, runs migrations, and deploys straight to that domain. What Claude can do through teenyapp: * Build and deploy frontend/backends of full stack apps to a live URL * Run schema migrations on a real database * Wire up auth (email and password, JWT, OAuth via Google, GitHub, Discord, LinkedIn) * Set up row level security rules in code * Iterate on the live site by saving and committing files through the link The example website "Clonable" in the attached image was built and published right from this chat: [https://claude.ai/share/c608db64-e296-4c6e-a5cf-daf9edba609a](https://claude.ai/share/c608db64-e296-4c6e-a5cf-daf9edba609a) You can try out Clonable here [https://clonable.app.teenyapp.com](https://clonable.app.teenyapp.com), the AI codegen should work until my OpenAI account powering it runs out of $. Its worth mentioning how Clonable supports google SSO, and has a backend request handler that proxies user message requests to the AI API provider, who is OpenAI in this case. That's only possible because we built teenyapp on top of a comprehensive backend framework called teenybase, so each teenyapp gets API, Auth, DB, and more out of the box. Really excited to see what everyone builds with teenyapp, checkout what websites people have made so far [https://teenyapp.com/explore-all](https://teenyapp.com/explore-all) Site: [teenyapp.com](http://teenyapp.com) The backend framework, which is open source: [github.com/teenybase/teenybase](http://github.com/teenybase/teenybase)
View originalFour months building with Claude: a diagnostic framework for American constitutional history
Sharing a project I built with Claude over four months. Free to try, no signup, runs in the browser: [https://www.papercutslibrary.com/explore/constitutional-reality-framework/](https://www.papercutslibrary.com/explore/constitutional-reality-framework/) It's an interactive learning module that maps 236 years of American constitutional history onto a two-dimensional analytical grid measuring accountability and proactivity by branch. The goal: let people see how American constitutional power has actually behaved over time, not how civics class describes it. https://preview.redd.it/56v5y0egx4yg1.png?width=1354&format=png&auto=webp&s=9cbcb6aa3499ab8b8a378b411447e2f1dbd21ae0 I want to be clear about what the collaboration actually looked like, because I think that's the more useful conversation. # How the framework came to be. This started as research on the Supreme Court. I noticed the 1937 switch in time and wanted to track the kind of institutional movement it signaled. The framework idea emerged from that. Early work was one-off mappings and thematic analysis, building the framework's two-dimensional logic by testing it against specific cases. At some point I got the idea of mapping the full sweep of American history through it, and a two-month grind to produce the learning module began. The initial idea was much smaller than what it became. The framework grew, and so did the scope, through the process. I wrote a short book on AI arguing that one of its most important practical uses could be helping people level-set reality, particularly during periods of heavy misinformation. This project applies that idea to history, through a diagnostic framework. Claude wouldn't have proposed any of that. The originating ideas and the module's design are mine. # What Claude contributed. Almost all of the historical and editorial content. I'm not a historian. Producing 29 mapped eras with placement-level evidence across 236 years was beyond what I could do alone. The work depends on AI's handle on historical context, and the info modal in the tool is explicit about this. I'm also not a coder. I have enough past programming experience to follow what I'm looking at, but I did not write a single line of code in this build. I reviewed specs and briefs, ran tests, and made architecture decisions. One day I spent four hours getting four captain threads to agree on a re-architecture brief. The code itself is Claude's work. The framework documentation grew complex enough that I couldn't track every internal consistency point either. Claude tracked it. I directed it. # How I structured the work. Multi-thread architecture, with specialized Claude threads running in parallel: • Project Captain: coordination and sequencing • Design Captain: UI decisions • Editorial Captain: voice and style standards • Era/Audit Captain: placement integrity across the timeline • Editorial Execution and Editorial Review: separate drafting and review threads The roles weren't strict walls. Project Captain wrote and coded when needed. The discipline was in the processes between threads: editorial runs, placement setups, structured handoffs. Over a hundred briefs and specs moved between threads across four months. That structure is what kept the work coherent and prevented the drift that happens when a single context handles everything. Captains had to be retired when context degradation set in. That was a constant challenge. The methodology I held to throughout: batch tasks, take time with everything, prefer high-quality results over speed. All 29 maps went through an execution and review cycle against a dedicated style guide. Every placement is backed by tiered evidence (Tier 1 primary sources, Tier 2 secondary), documented with explicit confidence levels. # Coding. The build itself ran through the same structured pattern. Captains wrote briefs and prompts for Cowork to do work on the modular codebase. Cowork was given a verification checklist in most cases, and the associated Captain would review the standalone HTML build that resulted. The current build is nearly 15,000 lines in a 1.6MB single standalone HTML file, which is what's online. # Cross-model verification. Recent events fall past Claude's training cutoff, so I used GPT and Gemini for independent verification through systematic web research. One unexpected finding worth reporting: some 2026 developments, particularly recent military actions, were so far outside the other models' priors that they flagged them as likely hallucinations. They weren't. The events were just genuinely unprecedented. Validating that gap was its own piece of work. # Disclosure. Full AI collaboration disclosure is in the tool's info modal. Claude (Opus and Sonnet, 2025 to 2026) for the analytical and editorial work. CC BY-NC-SA 4.0. Try it: [https://www.papercutslibrary.com/explore/constitutional-reality-framework/](https://www.papercutslibrary.com/explore/constitutional-real
View originalI built a hands-free voice AI that sends emails mid-conversation — and that's just one feature. Here's everything AskSary can do.
https://reddit.com/link/1symbsj/video/k2no3zfgq1yg1/player Been building AskSary solo for a while. Just shipped hands-free voice email - you're mid-conversation with an AI and you say "send an email to [john@example.com](mailto:john@example.com) subject X body Y" and it pre-fills the Gmail modal automatically. One tap sends. Powered by OpenAI Realtime API, works in 22 languages. But that's just the latest feature. Here's the full picture: **Every major model in one place** GPT-5-Nano, GPT-5.2, GPT-5.2 Pro, O1 Reasoning, Claude Sonnet 4.6, Grok 4, Gemini 2.5 Flash, Gemini 3.1 Pro, Gemini Ultra, DeepSeek V3, DeepSeek R1 - with smart auto-routing or manual override. **Pro-Active Personalisation** On every login the AI reads your previous conversations and sends the first message itself - asking if you want to continue or start fresh. Before you type a single word. **Persistent Cross-Model Memory** Start a conversation with Claude on your phone, open your laptop, switch to GPT-5.2 - it already knows what you discussed. No copy-pasting, no summaries. Just works. **Knowledge Base - RAG** Upload docs up to 500MB per file, unlimited uploads, chat with them across any model via OpenAI Vector Store. Your files stay in context forever. **Integrations** Google Drive, Gmail, Google Calendar, Notion - access files, get email and calendar summaries, use them in chat or push them to your Knowledge Base. **Generation Tools** * Image Gen - GPT-Image-1 and Nano Banana Pro * Flux Image Editor - full editing suite with visual history * Video Studio - Luma Dream, Veo 3.1, Kling 1.6 / 2.6 / 3, up to 10 second AI videos with audio * Music Studio - 30 second tracks with custom or AI lyrics via ElevenLabs, visualizer built into chat * 3D Model Studio - Meshy with STL export (deploying soon) * Video Analysis - upload up to 500MB or paste a YouTube link **Developer and Builder Tools** * Vision to Code - screenshot any UI, get live editable code * Web Architect - build full web apps from a single prompt * Game Engine - build and prototype games with AI * Code Lab - split screen live coding with SQL Architect, Bug Buster, Git Guru, Regex Generator, Test Genie and more * Tavily web search across all models **Voice and Audio** * Real-time 2-way voice chat - 8 voices, near-zero latency WebRTC * Podcast Mode - two AI voices, switchable, near-zero latency, downloadable as MP3 * Voiceover Studio, Voice Notes, Voice Tuner **Productivity and Content** * Slides, Docs and File Tools * Pro Writer and Content Library * Social Tools - Hook Generator, Video Script, Hashtag Creator, Idea Spark * Business Suite - Pitch Deck Builder, Deep Analytics, Legal Eagle, Maths Solver * Daily Briefing and Market Watch * CV Creator, Email Polisher, Cover Letter Builder, TL;DR Bot * Share conversations or snippets with anyone **Platform Extras** * 30+ live interactive wallpapers and themes * Custom Agents and Personas * Folder organisation and Smart Search across chat history * Media Manager Gallery - all your generated content in one place * Fully customisable UI in 26 languages with full RTL support **The Stack** Frontend: Next.js, Capacitor (iOS + Android), Vanilla JS / React Backend: Vercel serverless, Firebase / Firestore, Firebase Admin SDK AI: OpenAI, Anthropic, Google, xAI, DeepSeek Generation: Luma AI, Kling via Replicate, Veo via Replicate, ElevenLabs, Flux via Replicate, Meshy Integrations: Google Drive, Notion, Tavily, OpenAI Vector Store, Stripe, CloudConvert, Sentry Rendering: Mermaid, MathJax Platforms: Web, iOS, Android, Apple Vision Pro **What you get free just for creating an account (1,000 credits/month, rolling):** * Unlimited chat on GPT-5 Nano, Gemini Flash and DeepSeek V3 - no daily limits, zero credit charge * 25 image generations via GPT-Image-1 and Nano Banana Pro - 40 credits each * 8 image edits via Flux Studio - 80 credits each * 2 song generations via ElevenLabs - 350 credits each * 2 video generations via Luma Dream and Kling - 350 credits each * \~70 messages on Claude Sonnet 4.6, GPT-5.2, Grok 4, Gemini 3.1 Pro and DeepSeek R1 - 15 credits each No credit card required. Built entirely solo. No CS degree, no team, no funding. Started because I asked an AI to build me a chatbot and it failed - so I built my own. Accepted to LEAP 2026 in Saudi Arabia along the way. Happy to answer anything about the build. [asksary.com](http://asksary.com)
View originalThe Structured Output Benchmark (SOB) - validates both JSON parse and value accuracy [R]
Current structured output benchmarks only validate pass rate for json schema and types, however more commonly the issue tends to be inaccurate json values. For example hallucinated \`total\_price\` number when extracting value from a invoice or an array ordered wrongly because of inaccurate date mapping. The Structured output benchmark measures 7 key metrics instead of json schema. * Value Accuracy (primary): exact leaf-value match against verified ground truth * JSON Pass Rate, Type Safety, Path Recall, Structure Coverage (structural) * Faithfulness: are values grounded in context or hallucinated? * Perfect Response: every single leaf value correct * Modalities: text, image and audio **Overall results** [Overall benchmark results](https://preview.redd.it/05c2exsrwzxg1.png?width=2304&format=png&auto=webp&s=ee43a0e0691c6c7dda8e03feb72ec31e3bc982f6) Open source is doing pretty well with GLM 4.7 coming number 2 right below GPT 5.4. **JSON-pass vs Value-Accuracy gap** [JSON-pass vs Value-Accuracy gap](https://preview.redd.it/zjxkuysuwzxg1.png?width=2304&format=png&auto=webp&s=4a686ffc0ad38edb710d452a1c42ad4bf2d36262) What's interesting here is that while most models hit 90%+ on JSON schema pass, all of them drop significantly on value accuracy. **Overall best by modality** [Overall best by modality](https://preview.redd.it/ghasera2zzxg1.png?width=1344&format=png&auto=webp&s=558b28e889a168ddd6a8ea5935202fb2c7e435ec) Full breakdown blog: [https://interfaze.ai/blog/introducing-structured-output-benchmark](https://interfaze.ai/blog/introducing-structured-output-benchmark) Full leaderboard: [https://interfaze.ai/leaderboards/structured-output-benchmark](https://interfaze.ai/leaderboards/structured-output-benchmark) Paper: [https://interfaze.ai/sob\_paper.pdf](https://interfaze.ai/sob_paper.pdf) (Pending arXiv) The full break down goes deeper into different modalities, how we designed the dataset, and how we performed the benchmark. All code and dataset is open source 😄 Our goal is to be the best general model for deterministic tasks and a key aspect of determinism is controllable and consistent output structure. The first step to making structured output better is to measure it and hold ourselves and the industry against the best.
View originalRepository Audit Available
Deep analysis of modal-labs/modal-client — architecture, costs, security, dependencies & more
Yes, Modal offers a free tier. Pricing found: $355, $0.001736 / sec, $0.001261 / sec, $0.001097 / sec, $0.000842 / sec
Key features include: Your cloud environment, in code., Built for speed, at any scale., Autoscale from 0 to 1000+ GPUs, instantly., Out-of-the-box observability., Inference, Training, Sandboxes, LLM Inference.
Modal is commonly used for: Real-time AI model inference for web applications, Batch processing of large datasets for machine learning, Training deep learning models with elastic GPU scaling, Running Jupyter notebooks for data analysis and visualization, Creating isolated environments for testing AI algorithms, Deploying scalable microservices for AI applications.
Modal integrates with: TensorFlow, PyTorch, Kubernetes, Docker, AWS S3, Google Cloud Storage, Azure Blob Storage, Prometheus, Grafana, Slack.
Modal has a public GitHub repository with 456 stars.
Emad Mostaque
Former CEO at Stability AI
2 mentions

Speculation is all you need: Intro to Speculative Decoding for High Performance Inference
Mar 12, 2026
Based on user reviews and social mentions, the most common pain points are: token cost, cost tracking.
Based on 34 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.