What is the overall sentiment around MultiOn?

Based on 175 social mentions analyzed, 2% of sentiment is positive, 98% neutral, and 0% negative.

MultiOn

aiai-agentstiered

Designing everyday AGI.

Users generally appreciate MultiOn for its versatility in facilitating multi-agent execution and its ability to handle structured work efficiently under governance rules. However, some users express concerns about potential conflicts or data overwriting when multiple agents engage simultaneously. The pricing sentiment is mixed, as some value the capabilities provided, while others find it challenging to justify the cost. Overall, MultiOn is seen as a robust tool with a good reputation among those needing structured AI management solutions, but it may require improvements in conflict resolution and cost transparency.

Website

Mentions (30d)

102

20 this week

Reviews

Platforms

Sentiment

3 positive

Pain Score: 1/10015 integrations10 featuresSeed

Voices Discussing MultiOn

Div Garg

CEO at MultiOn

12 mentions

Share:Twitter LinkedIn

Product Screenshots

AI Summary

Features & Use Cases

Features

Our Investors and PartnersRecent NewsThe World’s Most Capable Mobile AgentMedia FeaturesCareersAI Product EngineerAI ResearcherBackend EngineerML Platform Infrastructure EngineerProduct Designer

Use Cases

Personalized virtual assistants for daily task managementAutomated customer support agents for businessesAI-driven content creation tools for marketersIntelligent scheduling assistants for professionalsReal-time language translation during conversationsSmart home management systems integrating various devicesData analysis and reporting tools for enterprisesVirtual tutors for personalized education experiences

Company Intel

Industry

information technology & services

Employees

Funding Stage

Seed

Total Funding

$20.0M

Top Mention

reddit@axendo26 engagement5/6/2026

eTPS — Effective Tokens Per Second: A Better Way to Measure Local LLM Performance

# [](https://www.reddit.com/r/ArtificialInteligence/?f=flair_name%3A%22%F0%9F%9B%A0%EF%B8%8F%20Project%20%2F%20Build%22)We're obsessed with raw tokens per second. Every hardware post leads with it. Every quantization comparison is ranked by it. It's the one number everyone agrees to report. It's also measuring the wrong thing. Raw TPS tells you how fast tokens hit the screen. It tells you almost nothing about how quickly you get a correct, usable answer. On sustained, multi-turn workflows, that gap becomes massive. A faster model that hallucinates, requires multiple corrections, and forgets context you gave it earlier can easily be less useful than a slower model that gets it right the first time. **eTPS (Effective Tokens Per Second)** is a complementary metric that measures actual progress toward a useful answer, not just token throughput. The basic idea: weight the final accepted output by how clean the path to that answer was — first-pass correct scores highest — then divide by total time. Correction loops, hallucinations, and repeated explanations all reduce the score. A response that never reaches a correct answer scores zero regardless of speed. It doesn't replace raw TPS. It sits next to it. **Results — same prompt, four runs, same hardware:** * gemma-4-e2b (4.6B): 53.2 raw TPS → eTPS 53.18 ✓ * qwen3.5-0.8b: 173.1 raw TPS → eTPS 86.57 ✗ partial * qwen3.5-9b (optimized): 1.8 raw TPS → eTPS 1.78 ✓ * qwen3.5-9b (baseline): 0.5 raw TPS → eTPS 0.32 ✗ partial The 0.8B leads on raw speed by a wide margin and still lost. Raw TPS said it won. eTPS said it didn't. **Hardware:** RTX 5060 Laptop, 8GB VRAM. eTPS scores aren't portable across hardware — always report your full setup. **Known limitations (v0.1):** * Scoring requires human judgment. The line between "needed clarification" and "was factually wrong" isn't always clean. Code generation with objective pass/fail criteria is a cleaner target and the focus of the next benchmark run. * One task isn't representative of sustained multi-turn workflows — that's where the metric gets most interesting and where I'm headed next. * Easy to game without full system prompt logging. The spec will require it. These are acknowledged constraints, not hidden flaws. Full specification coming soon covering methodology, task library, scoring protocol, and reproducibility standards. Before I lock the final weights I'd genuinely like input on two open questions: How should the penalty differ between a model that confidently states something false versus one that's just vague enough you had to ask a follow-up? And should hardware normalization live in the core formula or be reported separately? Thoughts welcome.

MultiOn

Compare MultiOn With