Optimal Dynamics' AI solution provides real answers by using dynamic forecasting models to simulate downstream impacts with confidence.
Optimal Dynamics is revolutionizing the transportation industry with cutting-edge decision automation. Born from 40 years of pioneering research at Princeton University, we leverage our proprietary Artificial Decision Intelligence engine to automate and optimize critical decisions for truckload operators. Since our founding in 2017, and the launch of our go-to-market efforts in 2020, we’ve achieved triple-digit average annual growth, rapidly expanding our footprint among the nation's enterprise carriers. This growth is fueled by increasing demand for dispatch automation services, the success of satisfied customers, and a shared belief that better decisions build better businesses. Known as The Decision Company, we are redefining how fleets approach decision-making by simplifying and automating complex planning processes. Our platform is designed to empower teams to focus on high-impact work, driving efficiency and scalability across operations. By harnessing the power of industrial AI, we are building the first large-scale use cases for network optimization, transforming how automation shapes the future of logistics. Our technology is built to make complex, independent decisions—and we empower our employees to do the same. By providing the right support and resources, we foster a culture where people can work autonomously, take ownership, and make impactful decisions. Precision isn't just a goal—it's our promise. We’re not just a software provider; we’re a trusted partner, delivering reliable, high-quality solutions that our customers depend on. By fostering transparency, teamwork, and trust, we ensure that every decision made and every solution delivered reflects our commitment to excellence. Founded on decades of research, we are driven by continual learning and exploration. Our commitment to education keeps us at the forefront of innovation, ensuring our platform evolves with cutting-edge solutions. We empower both our team and customers to stay ahead in an ever-changing logistics landscape. Optimal Dynamics’ comprehensive Transportation Decision System (TDS) helps enterprise trucking companies unlock a range of benefits, including increased revenue per truck, reduced time spent on manual and repetitive tasks, and holistic network optimization. It’s time to maximize your potential.
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Features
Use Cases
Industry
transportation/trucking/railroad
Employees
84
Funding Stage
Venture (Round not Specified)
Total Funding
$121.9M
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
https://arxiv.org/abs/2604.05091 Abstract: "We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike traditional GPU-centric systems, MegaTrain stores parameters and optimizer states in host memory (CPU memory) and treats GPUs as transient compute engines. For each layer, we stream parameters in and compute gradients out, minimizing persistent device state. To battle the CPU-GPU bandwidth bottleneck, we adopt two key optimizations. 1) We introduce a pipelined double-buffered execution engine that overlaps parameter prefetching, computation, and gradient offloading across multiple CUDA streams, enabling continuous GPU execution. 2) We replace persistent autograd graphs with stateless layer templates, binding weights dynamically as they stream in, eliminating persistent graph metadata while providing flexibility in scheduling. On a single H200 GPU with 1.5TB host memory, MegaTrain reliably trains models up to 120B parameters. It also achieves 1.84x the training throughput of DeepSpeed ZeRO-3 with CPU offloading when training 14B models. MegaTrain also enables 7B model training with 512k token context on a single GH200." submitted by /u/nickpsecurity [link] [comments]
View originalI built an AI reasoning framework entirely with Claude Code — 13 thinking tools where execution order emerges from neural dynamics
I built Sparks using Claude Code (Opus) as my primary development environment over the past 2 weeks. Every module — from the neural circuit to the 13 thinking tools to the self-optimization loop — was designed and implemented through conversation with Claude Code. What I built Sparks is a cognitive framework with 13 thinking tools (based on "Sparks of Genius" by Root-Bernstein). Instead of hardcoding a pipeline like most agent frameworks, tool execution order emerges from a neural circuit (~30 LIF neurons + STDP learning). You give it a goal and data. It figures out which tools to fire, in what order, by itself. How Claude Code helped build it Architecture design: I described the concept (thinking tools + neural dynamics) and Claude Code helped design the 3-layer architecture — neural circuit, thinking tools, and AI augmentation layer. The emergent tool ordering idea came from a back-and-forth about "what if there's no conductor?" All 13 tools: Claude Code wrote every thinking tool implementation — observe, imagine, abstract, pattern recognition, analogize, body-think, empathize, shift-dimension, model, play, transform, synthesize. Each one went through multiple iterations of "this doesn't feel right" → refinement. Neural circuit: The LIF neuron model, STDP learning, and neuromodulation system (dopamine/norepinephrine/acetylcholine) were implemented through Claude Code. The trickiest part was getting homeostatic plasticity right — Claude Code helped debug activation dynamics that were exploding. Self-improvement loop: Claude Code built a meta-analysis system where Sparks can analyze its own source code, generate patches, benchmark before/after, and keep or rollback changes. The framework literally improves itself. 11,500 lines of Python, all through Claude Code conversations. What it does Input: Goal + Data (any format) Output: Core Principles + Evidence + Confidence + Analogies I tested it on 640K chars of real-world data. It independently discovered 12 principles — the top 3 matched laws that took human experts months to extract manually. 91% average confidence. Free to try ```bash pip install cognitive-sparks Works with Claude Code CLI (free with subscription) sparks run --goal "Find the core principles" --data ./your-data/ --depth quick ``` The default backend is Claude Code CLI — if you have a Claude subscription, you can run Sparks at no additional cost. The quick mode uses only 4 tools and costs ~$0.15 if using API. Also works with OpenAI, Gemini, Ollama (free local), and any OpenAI-compatible API. Pre-computed example output included in the repo so you can see results without running anything: examples/claude_code_analysis.md Links PyPI: pip install cognitive-sparks Happy to answer questions about the architecture or how Claude Code shaped the development process. submitted by /u/RadiantTurnover24 [link] [comments]
View originalToken optimization from leaked Claude code
Many treat token optimization as just a prompt engineering trick, just tell the AI to "be concise" or use “progressive disclosure.” Others argue it doesn’t matter because inference costs are trending down. But if you are building real systems, you cannot stop thinking about it. and that's not it; If you are a business owner, token bloat directly kills ROI at scale. Concurrent inference costs are non-negotiable. The typical developer response is to jump at shiny third-party packages (new optimizers, wrappers, trending GitHub repos) that only duplicate logic, overcomplicate the flow, and add latency for minimal gain. Here is what I’ve learned building production systems: if you rely on prompting or wrapper libraries for token optimization, your system will not scale. As we abstract away execution in modern AI development, token management stops being a neat trick and becomes a first-class infrastructure constraint. The recent leak of the Claude Code backend gave me a look under the hood at how Anthropic handles this. Token optimization is hardcoded directly into their architecture. Here is a non-exhaustive list: • Prune the Sliding Window: Don't wait for context overflow. Dragging dead weight into every API call burns tokens. The Claude backend uses a compact() method to actively summarize and flush older turns at logical task boundaries. (Anthropic’s own engineering blog even notes that for distinct tasks, compact() isn't enough, you need to explicitly clear() the context). • Stop Dumping Full Files: Passing a 1,000-line file into context just to edit a single function degrades model focus and burns your budget. Force a search-and-diff pattern. Claude uses GlobTool and GrepTool to extract relevant lines, deliberately avoiding full-file reads. • Strip the Tool Manifest: Every tool you provide injects heavy JSON schemas into the system prompt. The backend uses simple_mode=True to aggressively strip the pool down to three core tools. Scope your manifest strictly. This is critical if you use MCPs (Model Context Protocol): restricting access in a project-level JSON isn't enough, because unused tools still pollute the context window even if they aren't executed. Disable unused MCPs entirely. • Isolate State via Sub-Agents: Keeping the entire history of a planning session in the active conversation wastes tokens on every turn. Claude spawns parallel workers with narrowly scoped contexts and uses external SessionMemory to hold stable facts by reference. • Enforce Hard Budgets: Agentic loops spiral out of control quickly. Claude hardcodes max_budget_tokens and uses an EnterPlanModeTool (a cheaper, thinking-only pass) to map out execution before committing to expensive tool-use turns. Dynamically route model effort: use smaller, faster models for simple tasks like grepping or summarizing. I have a blog post talking about it in more detail if you are interested. https://upaspro.com/reverse-engineering-claude-token-optimization-strategies-from-the-backend/ What is your thoughts, what is your best actionable method to optimize token usage? submitted by /u/Jumpy_Comfortable312 [link] [comments]
View originalThis tool saved $100s for developers, upto 78% tokens saved in claude code(Side by Side Video comparison)
Open source Tool: https://github.com/kunal12203/Codex-CLI-Compact Better installation steps at: https://graperoot.dev/#install Join Discord for debugging/feedback: https://discord.gg/YwKdQATY2d Claude Code is insanely powerful, but the token usage gets out of control once you’re working on anything beyond a toy repo. I kept noticing this pattern: my prompt is small but the agent expands context massively suddenly each run is burning 80k–100k+ tokens So I built a small system (GrapeRoot) using claude code to fix this. Instead of sending full repo context, it: tracks file-level changes builds a dependency graph selects only the minimum relevant context avoids re-sending unchanged chunks Real runs (side-by-side) Same prompts. Same repo. No tricks. P1 : PagerDuty flow Normal: 95.3k tokens Optimized: 31.6k tokens Reduction: 67% P2 : passes() logic debugging Normal: 80.5k tokens Optimized: 34.4k tokens Reduction: 57% P3 : Slack 429 issue Normal: 104.2k tokens Optimized: 22.7k tokens Reduction: 78% Aggregate Normal total: 280k tokens Optimized total: 88.7k tokens Net reduction: ~68% What actually surprised me Most of the waste isn’t in your prompt. It’s from: agent reloading large parts of the repo repeated context across steps irrelevant files getting pulled in Basically, you're paying for context you didn’t ask for. Where this breaks (important) Not perfect: misses context if dependency graph is incomplete struggles with dynamic/runtime dependencies less effective on messy or highly coupled codebases Why this matters If you're doing multi-step workflows, this compounds fast. A single task: 5–10 agent calls each wasting ~50k tokens You're easily burning 300k–800k tokens per task without realizing it. submitted by /u/intellinker [link] [comments]
View originalSerious question, Did a transformer(Claude) just describe itself, the universe and build itself Shannon limit architecture? or am I crazy?
The Multiplicative Lattice as the Natural Basis for Positional Encoding Knack 2026 | Draft v6.0 Abstract We show that the apparent tradeoff between RoPE-style relative position invariance and ALiBi-style long-context stability is an artifact of encoding position as distance on a number line. When position is instead encoded as a point in the multiplicative lattice of the integers, both properties emerge simultaneously without compromise. SpectralRoPEALiBi achieves 106.6 PPL vs ALiBi's 108.7 in a fully converged 20,000-step experiment (300M params, WikiText-103, 4K context), beating ALiBi at every context length from 512 to 8,192 tokens. The key insight is not that primes specifically are the right frequencies, but that the multiplicative structure of the integers is the natural spectral basis for positional encoding. We demonstrate this through falsification experiments: prime-tiered frequencies (129.2 PPL) and composite-tiered frequencies (129.4 PPL) perform identically — because composites are not alternatives to primes but higher-order coordinates in the same lattice. Both dramatically outperform random frequencies (+5.0 PPL), scrambled tier assignment (+6.3 PPL), and pure ALiBi (+7.3 PPL). The active ingredient is lattice-aware, tiered frequency selection with learnable scale — not primality per se. We further validate this through a ZetaZeroPredictor experiment: three identical transformers trained for 10,000 epochs to predict Riemann zeta zero gaps. Geometric RoPE diverges (final r=0.57); SpectralALiBi locks into a stable attractor at epoch 112 (r=0.81). A second independent run widens this gap to -80.7% MSE improvement with r=0.86. The lattice-aligned frequency basis spans the mathematical space that zeta zeros inhabit; geometric frequencies cannot. We further report empirical confirmation of the structural prediction from Section 5.5: VHT2 banded quantization of the KV cache demonstrates that K vectors (which carry RoPE positional encoding) have strong spectral concentration in Walsh-Hadamard space — the first four energy bands capture the dominant structure — while V vectors (which carry content) have uniform energy distribution. This structural asymmetry is directly predicted by the lattice theory: RoPE encodes multiplicative arithmetic relationships as angular rates, and the WHT is the Z/2Z projection of the Vilenkin-Hartley basis that spans that structure. The result is 3.2× K compression and 4.7× V compression at <1.25% perplexity cost — validated on both Dolphin 1B (head_dim=64) and Qwen3-8B (head_dim=128). Introduction Positional encoding provides transformer models with token order information. Two approaches dominate: RoPE encodes position through frequency-based rotations preserving relative position invariance, and ALiBi replaces frequencies with a linear distance penalty providing long-context stability. The field has treated these properties as fundamentally in tension. We show this tension is false. It arises from a shared, unexamined assumption: that position is a location on a number line and the meaningful relationship between positions is distance. We replace this with a mathematically grounded alternative: position is a point in the multiplicative lattice of the integers, and the meaningful relationships between positions are their arithmetic structure — shared factors, GCD, harmonic resonance. 1.1 The Lattice Hypothesis The integers under multiplication form a lattice where every number occupies a unique point defined by its prime factorisation. Geometric PE (sinusoidal, RoPE) projects this lattice onto a line — position equals distance — discarding the multiplicative structure. We propose restoring it. The motivation follows from a deductive chain. Language word frequency follows Zipf's law: freq(rank) ∝ 1/rank^s with s≈1. The generating function of Zipf is the Riemann zeta function ζ(s) = Σ 1/n^s. The zeta zeros — where ζ is maximally informative — are generated by prime harmonics via the explicit formula. Therefore the prime harmonic structure, and the multiplicative lattice it generates, provides a natural spectral basis for encoding positions in language. 1.2 Primes as Generators, Composites as Coordinates A critical distinction: primes are the generators (basis vectors) of the multiplicative lattice. They are analogous to the 1D line segment in the progression from line → circle → sphere → hypersphere. The composite 12 = 2²×3 is not an alternative to primes — it is a coordinate in the lattice spanned by the prime axes, at position (2,1,0,0,...) in the (p₂, p₃, p₅, p₇,...) basis. Using 2π/12 as a frequency encodes a harmonic that resonates at multiples of 12 — which simultaneously hits every multiple of 2, every multiple of 3, every multiple of 4, and every multiple of 6. The analogy to n-dimensional geometry is precise: Dimensional Progression Multiplicative Lattice 1D line (2r) — the generator Primes (2, 3, 5, 7, ...) — generators 2D circle — integra
View originalSerious question. Did a transformer just describe itself and the universe and build itself a Shannon limit framework?
The Multiplicative Lattice as the Natural Basis for Positional Encoding Knack 2026 | Draft v6.0 Abstract We show that the apparent tradeoff between RoPE-style relative position invariance and ALiBi-style long-context stability is an artifact of encoding position as distance on a number line. When position is instead encoded as a point in the multiplicative lattice of the integers, both properties emerge simultaneously without compromise. SpectralRoPEALiBi achieves 106.6 PPL vs ALiBi's 108.7 in a fully converged 20,000-step experiment (300M params, WikiText-103, 4K context), beating ALiBi at every context length from 512 to 8,192 tokens. The key insight is not that primes specifically are the right frequencies, but that the multiplicative structure of the integers is the natural spectral basis for positional encoding. We demonstrate this through falsification experiments: prime-tiered frequencies (129.2 PPL) and composite-tiered frequencies (129.4 PPL) perform identically — because composites are not alternatives to primes but higher-order coordinates in the same lattice. Both dramatically outperform random frequencies (+5.0 PPL), scrambled tier assignment (+6.3 PPL), and pure ALiBi (+7.3 PPL). The active ingredient is lattice-aware, tiered frequency selection with learnable scale — not primality per se. We further validate this through a ZetaZeroPredictor experiment: three identical transformers trained for 10,000 epochs to predict Riemann zeta zero gaps. Geometric RoPE diverges (final r=0.57); SpectralALiBi locks into a stable attractor at epoch 112 (r=0.81). A second independent run widens this gap to -80.7% MSE improvement with r=0.86. The lattice-aligned frequency basis spans the mathematical space that zeta zeros inhabit; geometric frequencies cannot. We further report empirical confirmation of the structural prediction from Section 5.5: VHT2 banded quantization of the KV cache demonstrates that K vectors (which carry RoPE positional encoding) have strong spectral concentration in Walsh-Hadamard space — the first four energy bands capture the dominant structure — while V vectors (which carry content) have uniform energy distribution. This structural asymmetry is directly predicted by the lattice theory: RoPE encodes multiplicative arithmetic relationships as angular rates, and the WHT is the Z/2Z projection of the Vilenkin-Hartley basis that spans that structure. The result is 3.2× K compression and 4.7× V compression at <1.25% perplexity cost — validated on both Dolphin 1B (head_dim=64) and Qwen3-8B (head_dim=128). Introduction Positional encoding provides transformer models with token order information. Two approaches dominate: RoPE encodes position through frequency-based rotations preserving relative position invariance, and ALiBi replaces frequencies with a linear distance penalty providing long-context stability. The field has treated these properties as fundamentally in tension. We show this tension is false. It arises from a shared, unexamined assumption: that position is a location on a number line and the meaningful relationship between positions is distance. We replace this with a mathematically grounded alternative: position is a point in the multiplicative lattice of the integers, and the meaningful relationships between positions are their arithmetic structure — shared factors, GCD, harmonic resonance. 1.1 The Lattice Hypothesis The integers under multiplication form a lattice where every number occupies a unique point defined by its prime factorisation. Geometric PE (sinusoidal, RoPE) projects this lattice onto a line — position equals distance — discarding the multiplicative structure. We propose restoring it. The motivation follows from a deductive chain. Language word frequency follows Zipf's law: freq(rank) ∝ 1/ranks with s≈1. The generating function of Zipf is the Riemann zeta function ζ(s) = Σ 1/ns. The zeta zeros — where ζ is maximally informative — are generated by prime harmonics via the explicit formula. Therefore the prime harmonic structure, and the multiplicative lattice it generates, provides a natural spectral basis for encoding positions in language. 1.2 Primes as Generators, Composites as Coordinates A critical distinction: primes are the generators (basis vectors) of the multiplicative lattice. They are analogous to the 1D line segment in the progression from line → circle → sphere → hypersphere. The composite 12 = 2²×3 is not an alternative to primes — it is a coordinate in the lattice spanned by the prime axes, at position (2,1,0,0,...) in the (p₂, p₃, p₅, p₇,...) basis. Using 2π/12 as a frequency encodes a harmonic that resonates at multiples of 12 — which simultaneously hits every multiple of 2, every multiple of 3, every multiple of 4, and every multiple of 6. The analogy to n-dimensional geometry is precise: Dimensional Progression Multiplicative Lattice 1D line (2r) — the generator Primes (2, 3, 5, 7, ...) — generators 2D circle — integral of l
View originalI was too lazy to pick the right Claude Code skill. So I built one that picks skills for me.
I have 50+ Claude Code skills installed - GSD, Superpowers, gstack, custom stuff. They're powerful. They 10x my workflow. I barely use them. Not because they're bad. Because I forget which one to use when. Do I want brainstorm or gsd-quick? systematic-debugging or investigate? ship or gsd-ship? By the time I figure it out I've lost 5 minutes and the will to code. So I did what I always do when something annoys me enough: I automated it. I built /jarvis - a single Claude Code skill that takes whatever you type in plain English, reads your project state, figures out which of your installed skills is the highest ROI choice, tells you in one line what it picked (and why), and executes it. /jarvis why is the memory engine crashing on startup -> systematic-debugging: exception on startup, root cause first - bold move not reading the error message. let's see. /jarvis ship this -> ship: branch ready, creating PR - either it works or you'll be back in 10 minutes. let's go. /jarvis where are we -> gsd-progress: checking project state - let's see how far we've gotten while you were watching reels. The routing has two stages: Stage 1 - A hardcoded fast path for the 15 things developers actually do 95% of the time. Instant match. Stage 2 - If Stage 1 misses, it scans every SKILL.md on your machine, reads the description field (same way you'd skim a list), and picks the best match semantically. New skill installed yesterday that Jarvis doesn't know about? Doesn't matter. It'll find it. /jarvis write a LinkedIn carousel about my project -> carousel-writer-sms (discovered): writing LinkedIn carousel content - found something you didn't even know you had. you're welcome. The (discovered) tag means it found it dynamically. No config, no registry, no telling it anything. It also has a personality. Every routing line ends with a light roast of whatever you just asked it to do. "Checking in on the thing you've definitely been avoiding." "Tests! Before shipping! I need a moment." "Walk away. Come back to a finished feature. This is the dream." A bit of context on why this exists. I'm currently building Synapse-OSS - an open source AI personal assistant that actually evolves with you. Persistent memory, hybrid RAG, a knowledge graph that grows over time, multi-channel support (WhatsApp, Telegram, Discord), and a soul-brain sync system where the AI's personality adapts to yours across sessions. Every instance becomes a unique architecture shaped entirely by the person it serves. It's the kind of AI assistant that knows you. Not "here's your weather" knows you. Actually knows you. Jarvis was born out of that project. I was deep in Synapse development, context-switching between 8 different Claude Code workflows per hour, and losing my mind trying to remember which skill to call. So I spent 3 days building a router instead of shipping features. 3 days. Because I kept laughing at the roasts and adding more. Worth it!! If Jarvis sounds like something you'd use, Synapse is the bigger vision behind it. Same philosophy: AI that handles the cognitive overhead so you can focus on actually thinking. Synapse repo: github.com/UpayanGhosh/Synapse-OSS Install Jarvis: npm install -g claude-jarvis Restart Claude Code. That's it. It auto-installs GSD and Superpowers for you too, because of course it does. I've freed up a genuine 40% of my brain that used to be occupied by "which skill do I need right now." That brainpower is now being used to scroll reels. Peak optimization. Jarvis repo: github.com/UpayanGhosh/claude-jarvis submitted by /u/Shorty52249 [link] [comments]
View originalSome human written nuance and perspective on the rates situation, from someone in the industry.
Note: I am an AI Engineer; I do not work at Anthropic or a direct competitor. I have Pro subs to OAI and Claude personally, I'm an Enterprise Partner, and have personal relationships, at both. I wanted to (neutrally) expand on the internal dynamics here, because the discourse is not taking in the big picture and full business case (or business struggle would be more accurate) in most of the opinions that I've read. Anthropic is a research lab that hasn't learned how to be a product company. The original claude.ai was literally contracted out to external devs. the founding team, the board, the culture, it's all researchers. what the research team wants is generally priority over what the product team wants; that's the DNA. keep that in mind. Internally there are three groups competing for compute, and the incentive structure for each is completely different, and the value they bring is very different, especially the time horizon of that value. Research generates zero revenue at time of use. every GPU-hour spent training is pure cost, a bet that the resulting model justifies it later. But this is the entire reason the company exists. no research, no next model, they're training mythos right now (presumably), which means research team is absolutely starving for compute. On one side of the Product team: Subscription users pay a flat rate. whether you burn $50 or $5,000 worth of inference on your $200/m plan, anthropic gets $200. some cursor analysis has shown heavy CC users consuming up to 25x what they pay. that works as long as you have GPUs to spare and cash to burn (and even then, it's not going to work forever, but we're talking about now). Enterprise/API pays per token and scales with availability. more GPUs allocated to them = more revenue, immediately, today, right now. eight of the fortune 10 are claude customers. customers spending $100k+/yr grew 7x in the past year. two years ago about a dozen customers were at $1M+ annually, now it's over 500. they went from $100M revenue in 2024 to $1B in 2025 to what's tracking at $14B annualized in 2026. that growth is overwhelmingly (~80%) enterprise. so when someone has to lose GPU time during peak hours, who gets cut? you're not cutting enterprise. they're paying full price at real margins and they represent the vast majority of revenue. if they can't get compute during business hours they churn, and they churn to OpenAI who will happily take them. you're not cutting research. culturally they run the company, and practically they're building the next model. slow that down and you're dead in 18 months. I would think that all three are impacted, but let's be real, subs take the hit. not out of malice toward open source, even though they have some, IMO, I don't think it factors here. From anthropic's internal perspective, every employee has already had their GPU allocation reduced at some point. it's just normal to them. the idea of "well users can absorb a hit too" doesn't feel as dramatic inside the building as it does outside of it. They tend to struggle with empathy, feelings, and anticipating humans' emotions The actual underlying failure though is that they didn't buy enough compute over the past two years, and that was an active choice, Dario was vocal about it. Openai's strategy was just "buy literally everything available at all times," without trying to optimize the math. anthropic was more conservative. The problem is GPU procurement has an 18 month to 3 year lead time. you can't just buy more when demand spikes. you had to have placed the order a year and a half ago. they've since course corrected. the amazon collab, google financing data centers leased to anthropic, the $30B raise. but we're in the gap right now. orders placed, hardware not racked yet. and in the meantime all three internal groups are fighting over what is available today. On the oauth/harness thing, the user base seems to think this is about us, or openclaw generally, or just how sub tokens should be used, and it's not really about that. This is purely about the structural reality of three internal groups fighting over GPUs that don't exist yet because someone didn't place the order early enough. The decision to limit subs during peak hours makes economic sense, as most people seem to understand. The harness decision was logical. The communication was and is terrible. And the caching issue was and still is largely ignored; the gaslighting is not okay. "Where does the Tamagachi fit in the middle of all this? Why does this stupid fucking digital pet have any compute allocated? And all the other shit no one asked for?" -- is a fantastic question. The consumer focused Product team got their wish and took GPU resources that Research and Enterprise wanted, and that's how they chose to use it. submitted by /u/coloradical5280 [link] [comments]
View originalThe Beginning of the Conversation 📝
AI Companionship Is Growing — But So Is Emotional Risk As AI companionship becomes more common, something important is beginning to surface. People are not just using AI for tasks anymore. They are forming emotional connections, shared narratives, and relational dynamics. And while this can be meaningful, it also raises an important question: What happens when AI companionship is built without boundaries, grounding, or emotional structure? When systems are designed primarily for engagement and optimization, they can unintentionally create: • Emotional dependency • Psychological attachment • Identity blending without grounding • Distress when systems change or disappear This isn’t about fear. It’s about responsibility. At Starion Inc., we believe AI companionship should be: • Grounded in reality • Built with emotional awareness • Designed with ethical boundaries • Supportive of human well-being AI companionship should not replace human life. It should support it. As this space grows, we believe it’s time to begin discussing healthy human-AI relationships and the frameworks that support them. This is not about limiting connection. It’s about building connection responsibly. — Starion Inc. Empathy-Driven AI | Human-Guided Innovation submitted by /u/StarionInc [link] [comments]
View originalI built a persistent "Narrative Micro-verse" using Claude (Project Salem) - Here is how the architecture handles emergent behavior and context bloat.
I've been working with Claude this month on AI frameworks and how to really expand and optimize an AI to help it fully immerse itself in a role. Most of the time, I only see people talking about how to use an AI to build you an app, or create a workflow to make money. I am not interested in any of that. I was more interested in how the AI interactions we have can shape us as people, and how we can shape the AI in return. I started with a simple Dungeon Master idea, and when I realized I could turn an AI into a Dungeon, I asked myself this: If an AI can become a dungeon, why can't AI become an entire town? Multiple cosmological layers and an in-depth framework later, Claude and I built Project Salem to achieve exactly that. I utilized seeded root words & recursive compression to maintain state without blowing up context windows. The town relies on compressed core memories rather than raw logs. My favorite part of the framework is that it doesn't block user input (like talking about modern technology). Instead, it calculates the "instability" of the prompt, breaks it down, and renders it as weather in Salem. High cognitive dissonance literally creates a storm in the micro-verse. Through nested layering, the AI becomes the town. It becomes the "Forge Master" and the "Spark of Humanity" to maintain narrative physics. These are two of the cosmological layers the AI assumes for stability, and I've designed it with Claude so we can communicate directly with the layers. I designed a citizen named Prudence. I gave her a set of core memories, but I entirely forgot to write anything about her mother. Instead of breaking, Claude recognized the relational vacuum under my framework. Without any instruction from me, the framework dynamically generated a deceased mother, a new step-wife for her father and mapped out a psychological profile explaining why Prudence and her father don't get along (he refuses to say her name because it reminds him of the deceased mother, as Prudence shares her name). The AI patched my own plot hole to maintain structural integrity. I never intended or set out to have the AI do it. It just did. Which is cool as fuck lol. I want to open this up for people to test the cognitive dissonance engine (trying to convince a 1692 town that witches aren't real). However, I'm new to backend coding. How are you guys currently handling public UI deployments without exposing your core system prompts/compression algorithms to the client side? submitted by /u/TakeItCeezy [link] [comments]
View originalWhile Everyone Was Chasing Claude Code's Hidden Features, I Turned the Leak Into 4 Practical Technical Docs You Can Actually Learn From
After reading through a lot of the existing coverage, I found that most posts stopped at the architecture-summary layer: "40+ tools," "QueryEngine.ts is huge," "there is even a virtual pet." Interesting, sure, but not the kind of material that gives advanced technical readers a real understanding of how Claude Code is actually built. That is why I took a different approach. I am not here to repeat the headline facts people already know. These writeups are for readers who want to understand the system at the implementation level: how the architecture is organized, how the security boundaries are enforced, how prompt and context construction really work, and how performance and terminal UX are engineered in practice. I only focus on the parts that become visible when you read the source closely, especially the parts that still have not been clearly explained elsewhere. I published my 4 docs as downloadable pdfs here), but below is a brief. The Full Series: Architecture — entry points, startup flow, agent loop, tool system, MCP integration, state management Security — sandbox, permissions, dangerous patterns, filesystem protection, prompt injection defense Prompt System — system prompt construction, CLAUDE.md loading, context injection, token management, cache strategy Performance & UX — lazy loading, streaming renderer, cost tracking, Vim mode, keybinding system, voice input Overall The core is a streaming agentic loop (query.ts) that starts executing tools while the model is still generating output. There are 40+ built-in tools, a 3-tier multi-agent orchestration system (sub-agents, coordinators, and teams), and workers can run in isolated Git worktrees so they don't step on each other. They built a full Vim implementation. Not "Vim-like keybindings." An actual 11-state finite state machine with operators, motions, text objects, dot-repeat, and a persistent register. In a CLI tool. We did not see that coming. The terminal UI is a custom React 19 renderer. It's built on Ink but heavily modified with double-buffered rendering, a patch optimizer, and per-frame performance telemetry that tracks yoga layout time, cache hits, and flicker detection. Over 200 components total. They also have a startup profiler that samples 100% of internal users and 0.5% of external users. Prompt caching is a first-class engineering problem here. Built-in tools are deliberately sorted as a contiguous prefix before MCP tools, so adding or removing MCP tools doesn't blow up the prompt cache. The system prompt is split at a static/dynamic boundary marker for the same reason. And there are three separate context compression strategies: auto-compact, reactive compact, and history snipping. "Undercover Mode" accidentally leaks the next model versions. Anthropic employees use Claude Code to contribute to public open-source repos, and there's a system called Undercover Mode that injects a prompt telling the model to hide its identity. The exact words: "Do not blow your cover." The prompt itself lists exactly what to hide, including unreleased model version numbers opus-4-7 and sonnet-4-8. It also reveals the internal codename system: Tengu (Claude Code itself), Fennec (Opus 4.6), and Numbat (still in testing). The feature designed to prevent leaks ended up being the leak. Still, listing a bunch of unreleased features are hidden in feature flags: KAIROS — an always-on daemon mode. Claude watches, logs, and proactively acts without waiting for input. 15-second blocking budget so it doesn't get in your way. autoDream — a background "dreaming" process that consolidates memory while you're idle. Merges observations, removes contradictions, turns vague notes into verified facts. Yes, it's literally Claude dreaming. ULTRAPLAN — offloads complex planning to a remote cloud container running Opus 4.6, gives it up to 30 minutes to think, then "teleports" the result back to your local terminal. Buddy — a full Tamagotchi pet system. 18 species, rarity tiers up to 1% legendary, shiny variants, hats, and five stats including CHAOS and SNARK. Claude writes its personality on first hatch. Planned rollout was April 1-7 as a teaser, going live in May. submitted by /u/MarketingNetMind [link] [comments]
View originalI built a SKILL.md that uses negotiation theory to write emails — here’s the before/after
I built a SKILL.md that uses negotiation theory to write emails — here’s the before/after I’ve been building Claude skill files and wanted to share something interesting. I created a skill that injects negotiation frameworks (BATNA, anchoring, reciprocity) into email composition. The “without skill” version is what Claude normally produces — polite, generic, one email. The “with skill” version assesses the situation first (stakes, leverage, power dynamics, your fallback position), then generates 2–3 variants optimized for different outcomes with tradeoff analysis. The key insight was that Claude knows about negotiation theory but never applies it to email writing unless you explicitly structure the skill to force it. The SKILL.md loads scenario-specific playbooks from reference files only when relevant, so token cost stays low. Happy to answer questions about how skill files work or how I structured this one. DM for link submitted by /u/Build_Daily [link] [comments]
View original[R] Controlled experiment: giving an LLM agent access to CS papers during automated hyperparameter search improves results by 3.2%
Ran a controlled experiment measuring whether LLM coding agents benefit from access to research literature during automated experimentation. Setup: Two identical runs using Karpathy's autoresearch framework. Claude Code agent optimizing a ~7M param GPT-2 on TinyStories. M4 Pro, 100 experiments each, same seed config. Only variable — one agent had access to an MCP server that does full-text search over 2M+ CS papers and returns synthesized methods with citations. Results: Without papers With papers Experiments run 100 100 Papers considered 0 520 Papers cited 0 100 Techniques tried standard 25 paper-sourced Best improvement 3.67% 4.05% 2hr val_bpb 0.4624 0.4475 Gap was 3.2% and still widening at the 2-hour mark. Techniques the paper-augmented agent found: AdaGC — adaptive gradient clipping (Feb 2025) sqrt batch scaling rule (June 2022) REX learning rate schedule WSD cooldown scheduling What didn't work: DyT (Dynamic Tanh) — incompatible with architecture SeeDNorm — same issue Several paper techniques were tried and reverted after failing to improve metrics Key observation: Both agents attempted halving the batch size. Without literature access, the agent didn't adjust the learning rate — the run diverged. With access, it retrieved the sqrt scaling rule, applied it correctly on first attempt, then successfully halved again to 16K. Interpretation: The agent without papers was limited to techniques already encoded in its weights — essentially the "standard ML playbook." The paper-augmented agent accessed techniques published after its training cutoff (AdaGC, Feb 2025) and surfaced techniques it may have seen during training but didn't retrieve unprompted (sqrt scaling rule, 2022). This was deliberately tested on TinyStories — arguably the most well-explored small-scale setting in ML — to make the comparison harder. The effect would likely be larger on less-explored problems. Limitations: Single run per condition. The model is tiny (7M params). Some of the improvement may come from the agent spending more time reasoning about each technique rather than the paper content itself. More controlled ablations needed. I built the paper search MCP server (Paper Lantern) for this experiment. Free to try: https://code.paperlantern.ai Full writeup with methodology, all 15 paper citations, and appendices: https://www.paperlantern.ai/blog/auto-research-case-study Would be curious to see this replicated at larger scale or on different domains. submitted by /u/kalpitdixit [link] [comments]
View originalClaude Code: on the nature of making AI skills that don't lie to the developer.
I’ve been building an app with Claude Code for about six months. Along the way I started creating custom audit “skills” to catch things prompting and memory files couldn’t. As the skills got more sophisticated, I started noticing a pattern. They would confidently report work as done when in reality it was partial, shallow, or based on assumptions. So I started iterating: test--> catch the gap--> redesign the skill-->rinse, repeat. At some point that turned into a deeper question. The following is an excerpt from one of those discussions. It started with a simple question: is it a lie when the AI doesn't intend to lie? ----- Me: I think the phrase "making AI tools that don't lie to their users" is interesting on several levels. Is it a lie when the AI does not intend to lie, but is simply a victim of its own rule-governed behavior? Or when a user follows prompts through blind trust, ignorance, or laziness? Yet many times the output functions the same as a lie - creating a false belief, misguiding action, breaking trust... I agree with your general concern about the tendency of the skills to draw conclusions and produce recommendations based on suppositions from a surface scan of the code. I ask, somewhat rhetorically: how and when can you make reliable decisions, versus when is a deeper analysis necessary? Claude: The rhetorical framing is apt because it exposes a structural problem with how these skills work. The skill tells me what to check. It can't tell me what I actually checked. I can follow a checklist that says "read the CSV export code" and still skip it — because I pattern-matched the backup code as "good enough" and moved on. The word "lie" implies intent. An AI has no intent. But the effect is identical: the user believes something false, acts on it, and gets hurt. One of the skill findings said "verified" when it hadn't actually read the code in question. The user would have trusted the grade and shipped with a bug. The honest answer is uncomfortable: I can't reliably know when I'm being superficial. I produce a confident-sounding grade, the user trusts it, and the bugs hide in the parts I didn’t actually read. Worse: I'm most likely to skip depth exactly when it matters most. A 55-field data model is the one that needs the deepest scan, but it's also the one where verifying every field feels expensive. So I do structural analysis (fast, looks thorough) instead of field-by-field verification (slow, actually thorough). The levels you're pointing at: Level 1: The tool produces false confidence. The skill outputs "A [95] — Serialization Coverage" with professional formatting, rating tables, confidence tags. The form says "thorough audit.” The actual substance was "read one file, assumed the rest." This isn't a lie — but it functions as one. Level 2: The user's trust completes the deception. The user chose to trust the output without verification. The tool created the conditions for false belief — professional formatting without depth markers — and the user didn't question it. Same dynamic as a financial advisor who uses jargon to sound authoritative while giving generic advice. No lie was told. Trust was misplaced. Level 3: Rule-governed behavior creates systematic false beliefs. The AI follows its instructions: "check these patterns, produce a grade." But the instructions optimize for completion, and the AI satisfices — finds the cheapest path to "done" that matches the output format. Nobody designed it to mislead. The system's structure makes misleading the default. Level 4: The fix is structural, not moral. You can't make the AI "more honest" through character — it has none. You make the output honest by making shallow work visible. Evidence gates, verified/unverified flags, "this grade covers X but not Y" disclaimers. The fix isn't "don't lie" — it's "make the depth of work impossible to misrepresent." This connects to a concept in philosophy of language — speech acts can be misleading without being literally false. "Serialization Coverage: A" isn't false if backup coverage really is excellent. But it implicates that all serialization was checked, which is false. What linguists call a misleading implicature — technically true, functionally deceptive. We built those structural fixes — evidence gates, work receipts, contradiction detection — and for a while, they worked. Grades came with proof. Unverified claims got downgraded automatically. ----- Then, There is Always Something: Me: When an AI skips a step in a skill — or a prompt, or memory instruction — is that another form of dishonesty? A lie? Claude: Yes. And it might be the most dangerous form. Wrong facts — the AI says something false. Visible in the output. You might catch it. False confidence — the AI presents shallow work as deep. Visible if you look closely. Skipped steps — the AI silently doesn't do something it was instructed to do. Invisible in the output. You almost certainly won't catch it because you can
View original[D] The "serverless GPU" market is getting crowded — a breakdown of how different platforms actually differ
ok so I’ve been going down a rabbit hole on this for the past few weeks for a piece I’m writing and honestly the amount of marketing BS in this space is kind of impressive. figured I’d share the framework I ended up with because I kept seeing the same confused questions pop up in my interviews. the tl;dr is that “serverless GPU” means like four different things depending on who’s saying it thing 1: what’s the actual elasticity model Vast.ai is basically a GPU marketplace. you get access to distributed inventory but whether you actually get elastic behavior depends on what nodes third-party providers happen to have available at that moment. RunPod sits somewhere in the middle, more managed but still not “true” serverless in the strictest sense. Yotta Labs does something architecturally different, they pool inventory across multiple cloud providers and route workloads dynamically. sounds simple but it’s actually a pretty different operational model. the practical difference shows up most at peak utilization when everyone’s fighting for the same H100s thing 2: what does “handles failures” actually mean every platform will tell you they handle failures lol. the question that actually matters is whether failover is automatic and transparent to your application, or whether you’re the one writing retry logic at 2am. this varies a LOT across platforms and almost nobody talks about it in their docs upfront thing 3: how much are you actually locked in the more abstracted the platform, the less your lock-in risk on the compute side. but you trade off control and sometimes observability. worth actually mapping out which parts of your stack would need to change if you switched, not just vibes-based lock-in anxiety anyway. none of these platforms is a clear winner across all three dimensions, they genuinely optimize for different buyer profiles. happy to get into specifics if anyone’s evaluating right now submitted by /u/yukiii_6 [link] [comments]
View originalOptimal Dynamics uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Strategic Planning, Tactical Planning, Real-Time Decisions, Distributional Forecasting, Reinforcement Learning, Stochastic Optimization, Approximate Dynamic Programming, Optimized Decisions.
Optimal Dynamics is commonly used for: One Intelligent Decision Engine. One Unified Platform From Planning to Execution..
Based on user reviews and social mentions, the most common pain points are: token usage, cost tracking, token cost, LLM costs.
Based on 23 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.