Inference performance drives profitability.
Users of FriendliAI highlight its impressive ability to expedite software development, as evidenced by creators building numerous apps and projects rapidly, without writing code themselves. However, there are complaints about excessive resource consumption, particularly regarding token usage costs, which some find prohibitive after substantial interaction. Pricing sentiment seems mixed, with some citing efficient cost savings, while others lament over spending beyond their expectations. Overall, FriendliAI has a solid reputation for enhancing productivity and creativity in AI-driven projects, but resource management and costs are areas pointed out for improvement.
Mentions (30d)
33
Reviews
0
Platforms
2
Sentiment
20%
27 positive
Users of FriendliAI highlight its impressive ability to expedite software development, as evidenced by creators building numerous apps and projects rapidly, without writing code themselves. However, there are complaints about excessive resource consumption, particularly regarding token usage costs, which some find prohibitive after substantial interaction. Pricing sentiment seems mixed, with some citing efficient cost savings, while others lament over spending beyond their expectations. Overall, FriendliAI has a solid reputation for enhancing productivity and creativity in AI-driven projects, but resource management and costs are areas pointed out for improvement.
Features
Use Cases
Industry
information technology & services
Employees
50
Funding Stage
Venture (Round not Specified)
Total Funding
$26.7M
Pricing found: $1.4, $0.26, $4.4, $0.14, $0.4
The famous METR AI time horizons graph contains numerous severe errors [D]
Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: It is impossible to draw meaningful conclusions from METR’s Long Tasks benchmark — in particular once one realizes that its numerous flaws are probably compounding in unpredictable ways. The appropriate response to a study of this kind is not to assume it can be saved via back-of-the-envelope adjustments, or to comfort oneself that other anecdotal evidence implies that it is probably correct anyway. It is to cut one’s losses and move on in search of higher-quality information. … The METR graph cannot be saved. For all its sleekness and complexity, it contains far too many compounding errors to excuse. Among them is generalizing to the entire species data collected from a small group of the authors’ peers. Coming up with ever more dramatic ways to make this mistake has become a kind of sport among AI researchers. If the field has a central pathology, it is to aggressively overindex on a mix of anecdotal data from power-users, alongside a long list of benchmarks even more compromised than METR’s. One hopes that as the field matures, its participants will learn to stop making these mistakes. The errors include: Some of the human baselines data is not actually measured or collected from any empirical source, rather, it is just guesstimated by the authors A key variable in the data is how long it takes humans to complete certain tasks, but — when METR did actually measure this — it paid its human benchmarkers hourly, meaning they were incentivized with cash to take longer The sample of human benchmarkers was biased toward METR employees’ friends, acquaintances, and former colleagues (who are likely unrepresentative and possibly biased) Humans familiar with a codebase and a specific coding task were 5-18x faster at completing it, but METR used data from humans who were much slower because they had to spend time familiarizing themselves the codebase and the task at hand Test-training data contamination occurred because some of the tasks had published solutions online, which most likely would have been included in LLMs’ training datasets And many more Please read the full post. It’s not too long and it’s accessible to general audience. It’s worthwhile to read the whole post and see how many errors were made in the creation of the METR graph and just how bad they are. If you want to read about even more errors in the METR graph not covered in Nathan Witkin’s post, read this post by the AI researchers Gary Marcus and Ernest Davis. The METR graph is a great example of why scientific standards and best practices are so important, and why enforcing them through processes like peer review is necessary to prevent us from drowning in bad information. It’s extremely dangerous to rely on information that only superficially appears scientific but wasn’t actually conducted with the rigour normally required of scientific research. submitted by /u/common_yarrow [link] [comments]
View originalHow I protect my health when using Claude (and how I didn't before)
Tagged as productivity because without your health, what can you do? All of a sudden, I just felt tired, and I had this banging headache. I thought, okay. It's just a headache. And then I got home, and I knew it was more. Looking back now, it was a combination of many things, but one of the core constants was the way of my work had changed over the last 12 months. And I think it just caught up with me. Until the beginning of this year I'd been working away as a IT consultant. I had a project, working for a medical company that had gone on for about two years, and I was building (mostly internal) AI solutions. During that time I'd seen an influx of AI and personally, as I'm sure many of you have, have increased the amount of sessions and context switching. However, since recent waves of Claude, this seemed somewhat manageable to me, or at least the full effects hadn't kicked in yet... Then at the beginning of this year the project finished and I was on my own working on my own projects. Great! Right? Well, maybe. There's freedom, a lot of freedom but no team signing off each day, no expectations to work on certain projects at certain times. Maybe it was just time management I thought. So I decided to just work when I was feeling good, but this didn't really work because I felt like I needed to make this work for myself. Hustle now, chill later. There were maybe five or six different projects on at a time, and even now tbh, and I was context switching between all of them. Then not only that, i was drifting in and out of reddit or playing chess as a break (which is a terrible idea fyi - speaking to myself!). It almost felt like i was slowly drifting into exhaustion but because it was only one more prompt to write it was hard to see. I think this had such a bigger impact on me than I realized. Disclaimer: obviously i'm not a (Reddit) doctor and this isn't advice, but It felt important to share this post in an effort to help people understand the early signs I was having, how to recover, and what I'm now doing going forward. I took some time to order these into the order they first appeared. Early Signs Mid-Stage Signs Later Signs Bigger Warning Signs Constant urge to check, respond or research stuff Wired but exhausted Tired even after sleeping Anxiety spikes Difficulty relaxing even after stopping work Brain fog Eating less, prioritising work over nutritian Persistent headaches Reduced ability to focus on one thing (because I rarely was) Forgetting small things or losing train of thought Waking up already mentally fatigued My body and mind shutting down Feeling mentally full all the time Needing more stimulation to stay engaged Emotional flatness and less excitement Feeling emotionally numb Slight irritability / emotional sensitivity Struggling to enjoy offline activities Feeling detached from my body and the places I normally feel happy / safe 😞 Inability to stop working even when exhausted More compulsive context switching Feeling restless during quiet moments Small tasks were starting to feel overwhelming Physical symptoms continuing for days Increased doomscrolling during a 'research' session Sensitivity to noise, notifications, or interruptions The recovery: I was out with my friends in at a nice sushi restaurant and I didn't want to eat, I LOVE sushi, headache, fatigue, irritation, sensitivity - i needed to go. So I went home and the girl I'm seeing looked after me whilst I was basically non-verbal. She said it was nice because I'm usually so self-sufficient (thanks Claude). We did the obligatory AI checks, they all agreed, I needed rest (physically and mentally) and re-hydration. What I did was stay in a cool house, NO INTERACTIONS with Claude after the initial research (which was somewhat annoying tbh), went to bed and could hardly sleep at all in the beginning but I was reseting my dopamine system (I think) and only came out for water, dehydration tablets and food. The aftermath: I would have been easy to pass this off as a fever or whatever, but I took a long hard look at what was happening and realised I had to look after myself more (if only to spend more quality time with Claude). But seriously, now I'm starting each day away from the computer and each session with a clear plan (also away from the computer), time boxing sessions to work on single tasks and taking smaller breaks in-between, if there's dead time whilst the agent is working - I'll clean the dishes I was ignoring or grab the clothes drying for 4 days (you get the point), for reddit I'm using a custom tool to avoid too much time on the platform (still love you boo) and overall just paying attention more to myself and my needs. Sorry this has gone on a bit long. But I feel this is important and if you made it this far I hope something sits with you and you don't end up where I was. submitted by /u/BuffaloConscious7919 [link] [comments]
View originalStop Claude from wasting tokens exploring your codebase [archmcp]
AI coding agents spend a surprising amount of time: crawling files guessing architecture tracing dependencies rebuilding context every session So my friend built archmcp, a local MCP server that generates a compact architectural snapshot of a repository before the agent reads a single file. Instead of starting blind, Claude Code gets structured context about: modules symbols dependencies routes architectural patterns It’s giving AI agents enough architectural awareness to stop wasting tokens and time rediscovering the codebase from scratch. It also supports multi-repo setups, so agents can reason across systems like: Go backend TypeScript frontend Python FastAPI services mobile apps shared libraries Repo: archmcp on GitHub Would love feedback from people who give it a go. submitted by /u/yellow-llama1 [link] [comments]
View originalWhy We Build
One silver-lining to the dead internet we're living in, today, is that it's very quickly teaching us that we can't rely on our senses as much as we believe we can. It's not healthy to always live in skepticism, but it is necessary in a World where you don't know what's up or down anymore. That's why we need great minds to focus their attention on solving the problems associated with credible information sharing without it becoming some centralized playground designed to look like the free-flowing exchange of ideas. If we don't solve for that, then I guess we're heading into a future that a small handful of people want because elections or public opinion will no longer matter. One of the biggest focuses in AI should be in figuring out how to get it to provide deep credible knowledge in specific domains that can be best applied to the problems we're trying to solve. Sure, it can do this with enough fenagling, but what I really mean is having something easy for everyone to use like Perplexity or Gemini, only it doesn't simply find consensus information from the internet using all these black box methods that are owned by major corporations. Instead, it should use direct knowledge from domain experts who structure and cite their material and as users, we should be able to backtrack all of it, including the original author. And all of this should be achievable by simply engaging with a chatbot agent that can reliably go out and help me discover all of these things. Also, we shouldn't have to simply trust that the application works. We should be able to go in and see exactly how it's working. This way, the public can audit the systems we're relying on for grounding our worldviews. That, to me, is where we should be if we really want to break from the chains of propaganda and reclaim our genuine thoughts about how we ought to live. The alternative independent media space was co-opted long ago and now all of the feeds keep us in a state of perpetual dislocation from our friends, family, communities, new solutions, and better approximations to the truth. We exist in a walled-off digital pasture. But if regular people who are smart and capable enough decide to leverage this new technology, then we can break through the fencing and finally live in a world where discovery-based researching and learning can be easier than Google, which could eventually individuate society again, like how it was before, instead of keeping us clustered into specific groups based on our viewing preferences. That's why my brother and I got into this business. Yeah, sure, we also wanna make a buck so we can retire with dignity. That's true. But the drive has always stemmed from wanting to figure out a better way for people to share hidden insights and create things that are bigger than they thought they could handle. We have a long way to go, but we're making the first small steps, even if it isn't obvious, just yet. Bottom line, though? Humanity must figure out a way to help us master the means and methods of discovery-based knowledge acquisition, execution, and immediate distribution of information based on relevancy and needs from those who search instead of those who passively soak information in from the curated feeds. And all of this needs to be easy enough for a 12 year-old to do. If anyone else is working on this problem, we'd love to hear your thoughts, even if it's through a DM. We're living in the most exciting times, but with adventure, comes danger. So maybe, idk. Let's make it more fun and less hazardous, so that we can, at least, live long enough to re-tell this great story that we're all a part of. submitted by /u/CyborgWriter [link] [comments]
View originalSmall victory using Cloudflare for simple hosting of generated HTML/mini-websites
Something many people are running into: You, or a teammate, have created some kind of mini-website app out of Claude and now want to share it with the rest of the company, without overbaking the hosting solution (e.g. not setting up new Azure app services or containers, etc). Maybe you also need some basic data storage for persistence. And how do you do all of that securely? We recently went down this rabbit hole, while looking at all the major players: Vercel/V0, Lovable, Netlify, Coolify, Dokploy, Github Pages.. and even considered baking together our own hosting app solution using Azure or AWS as the backend. Our target audience is non-technical users in the team, so I was looking for something with drag-n-drop style deployment (no git required), and I really wanted to have SSO for protecting application access, along with some type of DB storage. The main issue I ran into was SSO authentication support being gated behind enterprise-level pricing plans for hosting systems like Netlify (which I'd otherwise highly recommend for a small public project). Netlify's enterprise level quickly gets quite a bit more expensive than their base tiers. I also didn't want to purchase yet another AI platform (e.g. Lovable, where really they're pushing an end-to-end AI development platform where you buy token credits through them). I wanted to host things we're already creating in our own Claude environment. Finally, I ended up on Cloudflare, which I've otherwise not really used before professionally. It's not as non-technical-friendly as Netlify, but it's pretty close. You can deploy Cloudflare Pages content via drag-n-drop. It has button-click databases available for integration, and most critically for us, the SSO integration is completely free for under 50 users. Their free hosting tier is also extremely generous and basically unlimited for completely static apps. Noting that SSO goes up to $7 USD/user/month for over 50 users, so your org size can really make a difference. If you have 500 users and the same use case for "hosting little mini apps", I'd go back to Netlify or another offering where SSO is more of a fixed fee. The other big win was that Cloudflare has a solid MCP server that works perfectly with Claude Cowork. We integrated that in and then wrote up some skills to assist with app building and deployment, including prompts for if a database backend is needed (using Cloudflare D1) and whether the app should be public or internal only with SSO protection. All working perfectly with minimal technical experience required for the enduser. I'm not at all associated with Cloudflare, just thought I'd share how we got a win for this use case. I'd be interested to hear if anyone else solved the same problem in a different way. submitted by /u/flck [link] [comments]
View original$2,500/mo AI Budget: My friend just burned through 62M Opus 4.7 tokens in 24 hours.
My buddy works for a small international company based in Vietnam, and their AI perks are absolutely insane. Management actively encourages heavy API usage and hands everyone a massive $2,500 USD monthly budget. The screenshot? That’s his dashboard after burning through 62M tokens on Opus 4.7 in a single day. He mentioned some of his colleagues are chewing through even more with 'fast' mode turned on. Honestly, prove me wrong, but I’m pretty sure this small company is offering a bigger AI allowance than most Big Tech giants in the US right now. Anyone at FAANG getting this kind of blank check for API usage? submitted by /u/No-Wheel5791 [link] [comments]
View originalRepurposed my old work ThinkPad as a dedicated personal AI workstation — looking for ideas from people who’ve done something similar
Apologies if formatting comes out weird- I am on mobile. My old employer let me keep a ThinkPad when I left. Rather than let it collect dust, I’m turning it into a dedicated personal AI environment — wiping it, installing Linux, and using it specifically for two things: life admin automation and building personal software tools. The core setup I’m planning: • Claude Desktop with MCP servers running persistently as Docker services • Tailscale so I can access everything securely from my phone when I’m not home • Open WebUI as a mobile-friendly chat interface • Code-server (VS Code in the browser) so I can actually write and run code from my phone • A dedicated Gmail account that acts as the “identity” for this Claude instance — wired into Google Drive, Calendar, and potentially an email-triggered agent pipeline • A local RAG system for personal documents — contracts, notes, research — so Claude has persistent context about my life The idea is that this becomes an ambient personal intelligence layer — always on, always up to date on my documents and projects, accessible from anywhere via Tailscale. Not a cloud subscription, not shared with anything work-related. Fully mine. On the software side, I’m planning to use Claude Code + Lovable to build local-first personal apps for my own pain points — things that don’t exist in the market the way I want them, or where I don’t want my data in someone else’s cloud. The ThinkPad is the runtime; Lovable builds the frontend, Claude Code builds the backend, and everything talks over a local API. What I’m curious about from people who’ve built something like this: • What MCP servers have actually been worth setting up vs. overhyped? • Has anyone built a reliable file-drop-to-RAG pipeline that actually stays current? • Is Open WebUI the right mobile interface or is there something better now? • Anyone using a dedicated “agent identity” email account — what workflows have you actually automated? • Claude Code + local backend: what’s your stack? FastAPI? SQLite? Something else? • Any gotchas with running Claude Desktop persistently on Linux? Genuinely trying to build something useful here rather than a tech demo. Would love to hear from people who’ve gone down this road. submitted by /u/Nashvillain12 [link] [comments]
View originalLLMs are just giant probability machines pretending to think
It’s fascinating that simple mathematics between tokens can eventually become a machine that writes essays, code, poetry, and even reasoning. We usually think probability means uncertainty. But LLMs show something strange: If probability + context + mathematical matching are scaled enough, uncertainty itself starts producing intelligent looking outputs. To understand this better, I tried breaking down an LLM from first principles using only 4 tiny training sentences. Example: The boat floated down to the bank. The investor walked into the bank to open a new account. The fisherman walked along the bank to cast his net. The bank has a vault. Then I asked: “The investor walked to the bank to lock his money in …” Why does the model predict “vault” instead of river-related words? That single question reveals almost the entire architecture of modern LLMs. The most underrated concept here is the LM Head. Most explanations immediately jump into transformers and attention, but almost nobody explains that the LM Head is essentially a gigantic token vocabulary containing all possible next token candidates the model can output. So internally the model is basically solving: “Out of all known tokens, which one best matches this context mathematically?” Then different layers help solve that problem: Embeddings: convert words into mathematical vectors Positional encoding: preserves word order Attention layer: figures out which words are related to each other in context (“investor”, “money”, “bank” become strongly connected) https://preview.redd.it/wxmpf00g7t2h1.jpg?width=2299&format=pjpg&auto=webp&s=a214113263cf008a759740474fbda4e0b8394ba5 Feed forward neural networks: act somewhat like massive learned if/else decision systems refining patterns internally And finally the LM Head converts all of that into probabilities for the next token. What surprised me most is: There is no hidden magic moment where the AI “becomes conscious”. It’s an enormous probability engine continuously finding the best contextual token match from its vocabulary. I made a beginner-friendly walkthrough explaining this visually without unnecessary jargon. https://www.youtube.com/watch?v=YTV5qUCpu2c Would genuinely love feedback from people learning transformers/LLMs from scratch. submitted by /u/abhishekkumar333 [link] [comments]
View originalI built an app with Claude Code that converts any text into high-quality audio. It works with PDFs, blog posts, Substack and Medium links, and even photos of text.
I’m excited to share a project I’ve been building over the past few months, created entirely using Claude Code! It’s a mobile app that turns any text into high-quality audio. Whether it’s a webpage, a Substack or Medium article, a PDF, or just copied text, it converts it into clear, natural-sounding speech. You can listen to it like a podcast or audiobook, even with the app running in the background. The app is privacy-friendly and doesn’t request any permissions by default. It only asks for access if you choose to share files from your device for audio conversion. You can also take or upload a photo of any text, and the app will extract and read it aloud. - React Native (expo) - NodeJS, react (web) - Framer Landing The app is called Frateca. You can find it on Google Play and the App Store. I also working on web vesion, it's already live. Free iPhone app Free Android app on Google Play Free web version, works in any browser (on desktop or laptop). Thanks for your support, I’d love to hear what you think! submitted by /u/OneMoreSuperUser [link] [comments]
View originalAI Can Provide Constructive Feedback on Your Written Work. You Just Need to Understand a Little Bit of Psychology. Same Exact Thing Applies to Human Feedback
Good feedback from AI is not that different from receiving feedback from people around you. My brother and I once threw a lot of money into a proof-of-concept film because we were blinded by the encouragement and agreeableness that people around us were expressing. We weren't recognizing that they were just trying to be nice to us and not hurt our feelings. They were active screenwriters and filmmakers just like us and just like us, they would need our help when the time came. That's why all of our feedback was watered down heavily. Only one of our friends told us the truth and you know what we did? We respectively ignored the advice. Film-wise, it turned out great because the team was amazingly talented. But the story fell significantly short of what it could have been, if only we had turned our egos off for a second and insist that people give us their complete, gloves-off opinion. It's the same when engaging with AI, but actually easier to handle since you're just working with your own mental barriers instead of two. Bottom line. You just gotta come into it with the understanding that it will be a yes man. You can do prompting and that can really help if you design it well, but even then, it pales in comparison to a guy like Dov Siemen who is hilariously legendary when it comes to wrecking screenplays and bursting people's bubbles. That's honestly why I don't often ask for it's opinion. Instead, I might ask it to compare a scene to all the other movies that are out there and spot the cliches. If I ask questions with the implicit assumption that whatever I wrote is garbage, it'll riff off of that and assume with me, which causes it to focus less on justifying why my story is so great and more on what could be wrong. It's the same with people. If you simply ask for their input, they'll water it down with praise. You have to specifically instruct people to find the problems and emphasize the truth over hurting your feelings. Do the same with AI and you'll have far less problems with feedback. So, don't ask questions like, "Is this good?" or "Will people understand this?" Ask questions like, "This dialogue is terrible. How can we fix it." or "This scene feels draggy and boring. We need to find what's missing." Come into it with the assumption that your work is poor, even if it isn't. Force it to identify the problems. Otherwise, it'll suck your....Well, you know. submitted by /u/CyborgWriter [link] [comments]
View originalI am building a chess analyzing program for my games on chess.com - i need help to further improve it, i am basically 100% using claude and feel bad with my prompts
Been grinding on a personal project where I built a chess analysis app for my own Chess.com games. Most of the coding/planning has honestly been done through Claude helping me step-by-step, but I’m starting to feel like my prompts are holding the project back more than the AI itself. Right now it can: analyze games with Stockfish show move accuracy / eval swings give natural-language feedback on mistakes visualize engine lines + review flow But the codebase is getting messy and I feel like I’m brute forcing development instead of structuring it properly - if someone dms me with some helping tips that would be great and i could even share the program on google drive. Just to clarify, i am making this chess program just for my self and maybe my friends, this is not an advertisement of any kind submitted by /u/xd_Fabian [link] [comments]
View originalFour backend concepts for Product Managers using Claude Code
You don't need to write backend code. But if you understand how backend systems behave, your prompts get dramatically better because you're speaking the same language as the system. Async vs Sync: user clicks "generate," you call OpenAI, it takes 3-5 seconds. If that's synchronous, the entire UI freezes, Nothing responds. The fix is to make the call async. Show a loading state immediately, let the user keep interacting, update the screen when the response arrives. Tell Claude Code "handle this asynchronously" and watch the output quality jump. Race conditions: two users click "claim this spot" on the last available slot at the same second. Backend reads the database, sees one spot, confirms both. Now you have a double booking. You don't need to write the fix, but you need to spot this pattern in your specs. Anytime a user action reads a value then updates it, ask one question: what happens if two users do this at the same time? The fix is an atomic transaction read and write happen as one indivisible operation. Idempotency user submits a form, internet cuts out for half a second. Did it go through? They don't know, so they click again. Without idempotency, you now have two records. With it, the second request returns the same result without creating a duplicate. The fix is an idempotency key is unique ID generated on the frontend, sent with every request. Backend checks if it already processed that key. Stripe uses this for every payment call. Graceful degradation: your app calls OpenAI and the API is down. If you haven't planned for this, users see a blank screen or a raw error code. Every feature needs three states: happy path (everything works), loading state (we're waiting), error state (something failed). Retry up to three times. If it still fails, show a friendly message and keep the rest of the page working. Never let one dependency take down the whole experience. TLDR: Next time you're in Claude Code, try using these terms in your prompt — "handle this asynchronously," "make this endpoint idempotent," "add graceful degradation." The output gets significantly better when you speak the system's language. Post inspired from this video, you can checkout SkillAgents AI on Youtube for similar content. submitted by /u/InfamousInvestigator [link] [comments]
View originalBuild agentic orchestrators in minutes NOT months.
Some of you might remember BoneScript, my LLM friendly declarative backend compiler. MarrowScript is the next version and the big addition is a full LLM harness built into the language itself. The problem I kept running into: every project that calls an LLM ends up with the same pile of glue code. Retry logic, response validation, caching, cost tracking, provider switching, confidence routing. You write it once, copy it to the next project, tweak it, and it slowly rots. None of it is your actual product logic but it takes up half your backend. So I made it declarative. In MarrowScript you declare your models, prompts, and routers as first-class concepts in the spec file. The compiler generates all the infrastructure around them. What that looks like in practice: You declare a model. Provider, endpoint, context window, cost class. Works with any OpenAI-compatible endpoint. LM Studio, Ollama, vLLM, OpenRouter, whatever you're running locally. You declare a prompt. Input types, output type, which model to use, validation mode, what to do when validation fails, retry policy, cache TTL. The compiler generates a typed function you call from your routes. Under the hood it handles retries, caches responses in Postgres, validates the output against your schema, and if validation fails it can automatically fire a repair prompt to fix the response. You declare a router. It picks which model to use based on input characteristics. Short simple inputs go to your tiny local model. Complex inputs escalate to something bigger. Confidence thresholds control when to retry or escalate. All deterministic at compile time. Some examples of what it generates: Provider adapters for openai_compat, ollama, llamacpp, koboldcpp, and raw http SSRF protection on all outbound LLM calls (allowlist-based, blocks private ranges by default) Prompt cache backed by Postgres with configurable TTL Per-trace and per-tenant token/cost budgets with hard cutoffs Cognition traces stored in Postgres (or in-memory for dev) with OTLP export Response validation (schema check or full AST compilation check for code generation) Repair prompts that fire automatically when validation fails Confidence scoring from logprobs (on providers that support it) A CLI command to convert recorded traces into regression tests The part I'm most interested in feedback on is the router concept. Right now it's a static decision tree. You set thresholds at compile time based on an input metric. There's a marrowc tune-router command that reads recorded traces and tells you if your thresholds are wrong, but it doesn't auto-rewrite them yet. The whole thing is designed around local-first inference. The default setup in the examples uses LM Studio on the LAN as the primary model and OpenRouter as the escalation tier. Most requests stay local and free. Only the ones that fail confidence checks hit the paid API. It's on GitHub and npm. The compiler is TypeScript, runs on Node 18+. There's a VS Code extension you can compile and edit to your needs. What I want to know: for those of you running local models in production or semi-production, what's the infrastructure pain that eats the most time? Is it the retry/validation loop? Cost tracking? Provider switching? Something else entirely? submitted by /u/Glittering_Focus1538 [link] [comments]
View originalwedding planner charleston. 4 years business owner. didn't expect claude to be the tool that changed my business this year.
charleston SC. wedding planner. 4 years. 18-22 weddings per year. average wedding budget $48k. team of 3 (me + 2 day-of coordinators). i don't usually post on this sub because i'm not technical. wanted to share because if claude is useful for a wedding planner in south carolina, it's probably useful for more service-business operators than the typical r/ClaudeAI audience. how i actually use claude. client comms. weddings involve emotional decisions. brides text me at 11pm asking about vendor concerns or family drama. before claude i'd respond in the morning and the bride would have been spiraling for 8 hours. now i type my rough response into claude at night, ask it to soften my tone (i'm direct, brides need warmth), and send the response immediately. response time per emotional message: 90 seconds. brides feel heard. nobody spirals overnight. vendor negotiations. emails to florists, caterers, photographers. i tell claude what i need to negotiate (price, change orders, scheduling conflicts) and the vendor relationship context. claude drafts a firm-but-warm version. i edit. send. saves me ~5 hours a week of vendor email i used to dread. timeline writing. each wedding needs a 14-hour day-of timeline. used to take me 6-8 hours per wedding. now claude takes my notes from the venue walkthrough + the couple's prefs + the vendor schedules and produces a draft. i edit. 2 hours instead of 6. proposal writing. when i'm bidding on a new wedding, claude drafts a proposal based on the consultation call. consistent quality. doesn't depend on whether i'm having a good week. emotional decisions, my side. i'm a wedding planner. clients have meltdowns. i absorb a lot. claude is my journal at the end of hard days. i type out what happened, what i'm feeling, what i should do differently next time. claude reflects back. it's not therapy. it's processing. what surprised me. claude works for non-technical service businesses. i'd been told by friends in tech that claude was "for coders." it's not. it's for anyone who writes things and makes decisions. it gives me back hours i didn't know i was losing. wedding planning is emotional labor as much as logistical labor. claude takes the logistical labor down significantly, which means i have more energy for the emotional labor that actually requires me. my brides notice. they don't know about claude. they notice that my responses are quicker, my timelines are more thorough, my emails sound warmer. they refer me to friends at higher rates than they did before. revenue impact (i tracked this carefully): 2024: ~$184k from 19 weddings. 2025: ~$247k from 22 weddings. partly more weddings. partly higher average wedding budget. some of it is claude. i'd guess 30-40% of the improvement is directly attributable to claude saving me time so i could take on better-fit clients. for other service business operators who think AI is "for tech people." it's not. open the app. talk to it about your business this week. report back here in 60 days. submitted by /u/Temporary-Prior7384 [link] [comments]
View originalClaude is improving my RV rental business but working me to death 😅
Long story short but long. I own an RV rental business. I used to be a Mechanical Engineer but got tired of the office/government life and started renting my personal RV on the side 9 years ago. That turned into a small fleet of Winnebagos I rent out of Los Angeles so I quit my job to do this full time out of a random ass whim. I have 20 units that have never, ever failed a single customer. I send all 20 to Burning Man every year and they all come back with no issues whatsoever. If you've never been, the alkaline dust kills everything, including your soul if you don't prepare well enough. I have however neglected my gig as of late. Everything is more expensive, too many variables to keep up with and two months ago I just decided to finally sit down and see if this is even worth continuing with. I have major ADHD so I started looking for any AI apps that help you organize your brainfarted life and ran into Claude. I don't know if I just fell into an endless dopamine trap but here I am, redesigning the interior of one of our units. I've sourced cabinet quality plywood for cheap, done precision cuts to substitute old particle board. I've always hated to paint but I got clowned into spray painting to a decent AF level. I used Claude to help me make interior design decisions as well as help me with our website, ads, tool decisions, etc. I'm probably wasting my time here cause I could just sell this unit and get a newer one, but the overall picture I've gotten... The ease of learning new skills, understanding roles I typically sub out so I can at least make sure I'm hiring the right people. The sudden engagement I've gotten into my own little gig... I am dead tired from this rollercoaster ride my brain has gone down into but I have to admit... This fucking Skynet shit is helping me focus and make it easy to complete tasks I've neglected forever. Skynet is coming or I guess it's here already and I'm not sure that's entirely a bad thing, a worse thing, a worserererer thing or an actual positive addition to one's life. Possibly a mix of both but fuck I haven't been this locked in for anything else other than the hobby that keeps my brain gears greased (2000 🪂 skydives and counting). Edit: I am not using Claude to make any structural designs, I'm just using it to recommend a less expensive way to remodel the interior of an RV which came up with replacing lights for more modern ones, replacing cabinet handles, curtains, etc. Then I asked if I should replace cabinet doors or paint them. I just don't like how painted cabinets look but the issue I was having visually is that brush painted cabinets look terrible imo, spray painted ones look sleek. So down I went with a ton of questions on how to get a factory finish look on my cabinets with a spray gun. Which gun to get was an entire day asking a ton of questions. Claude, GPT and almost every AI will give you answers that point towards products that have heavy marketing on youtube, and even on some reddit posts. I knew it was pointing me to a cheap trash product that will cause me a lot of frustration so I had to guide it not to give me anything with happy influencer bullshit that will never yield good results. I wanted to get a budget friendly beginner spray gun that will get me really close to a professional finish and I asked it to look on professional painter forums and confirm any findings with other forum like sources. Then I bounced those results with other LLMs to arrive at my current setup. Paint was another day of selecting which paint would work best for cabinets that wont scratch easily. That was yet another rabbit hole because not all cabinet paints are easy to spray with. Some are very forgiving for beginners like myself because they level easier and they also dry faster so I could do this with minimum downtime of a single unit I'm testing this on. Workflow? I wish I knew anything as organized as workflow. I'm just agent chaos here drilling down to the very last detail asking questions that get me to where I need to be. But next month I will be playing with agents to see if I can achieve something remotely close to a decent workflow that makes this process faster. Our landscaper came up today, saw my furniture pieces and asked if I could help him paint his classic car project so I guess I'm doing something right lol. submitted by /u/PVPirates [link] [comments]
View originalYes, FriendliAI offers a free tier. Pricing found: $1.4, $0.26, $4.4, $0.14, $0.4
Key features include: Ship faster with production‑grade defaults, Scale seamlessly, Spend less, Drop‑in OpenAI compatibility, Blazing‑fast inference, Seamless scaling, Always‑on reliability, Multi‑modality.
FriendliAI is commonly used for: Real-time data analysis for e-commerce platforms, Automated customer support chatbots, Content generation for marketing campaigns, Personalized recommendations for streaming services, Sentiment analysis for social media monitoring, Image recognition for security systems.
FriendliAI integrates with: Slack, Zapier, Salesforce, Shopify, WordPress, Google Cloud, AWS Lambda, Microsoft Azure, Twilio, Jira.
Based on user reviews and social mentions, the most common pain points are: token usage, cost tracking, spending too much, token cost.

Deploy Hugging Face Models on Friendli Endpoints!
Feb 7, 2025
Based on 134 social mentions analyzed, 20% of sentiment is positive, 76% neutral, and 4% negative.