GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This
I don't see any actual user reviews or social mentions about Qwen2 in your message - it appears the content was cut off or not included. The only fragment shown is about vertex-ai pricing updates on GitHub, which doesn't contain user feedback about Qwen2. To provide a meaningful summary of user sentiment about Qwen2, I would need to see the actual reviews and social mentions you're referring to. Could you please share the complete user feedback content?
Mentions (30d)
1
Reviews
0
Platforms
3
GitHub Stars
26,999
1,942 forks
I don't see any actual user reviews or social mentions about Qwen2 in your message - it appears the content was cut off or not included. The only fragment shown is about vertex-ai pricing updates on GitHub, which doesn't contain user feedback about Qwen2. To provide a meaningful summary of user sentiment about Qwen2, I would need to see the actual reviews and social mentions you're referring to. Could you please share the complete user feedback content?
Features
Industry
information technology & services
Employees
140
15,502
GitHub followers
40
GitHub repos
26,999
GitHub stars
20
npm packages
6
HuggingFace models
Serious question, Did a transformer(Claude) just describe itself, the universe and build itself Shannon limit architecture? or am I crazy?
The Multiplicative Lattice as the Natural Basis for Positional Encoding Knack 2026 | Draft v6.0 Abstract We show that the apparent tradeoff between RoPE-style relative position invariance and ALiBi-style long-context stability is an artifact of encoding position as distance on a number line. When position is instead encoded as a point in the multiplicative lattice of the integers, both properties emerge simultaneously without compromise. SpectralRoPEALiBi achieves 106.6 PPL vs ALiBi's 108.7 in a fully converged 20,000-step experiment (300M params, WikiText-103, 4K context), beating ALiBi at every context length from 512 to 8,192 tokens. The key insight is not that primes specifically are the right frequencies, but that the multiplicative structure of the integers is the natural spectral basis for positional encoding. We demonstrate this through falsification experiments: prime-tiered frequencies (129.2 PPL) and composite-tiered frequencies (129.4 PPL) perform identically — because composites are not alternatives to primes but higher-order coordinates in the same lattice. Both dramatically outperform random frequencies (+5.0 PPL), scrambled tier assignment (+6.3 PPL), and pure ALiBi (+7.3 PPL). The active ingredient is lattice-aware, tiered frequency selection with learnable scale — not primality per se. We further validate this through a ZetaZeroPredictor experiment: three identical transformers trained for 10,000 epochs to predict Riemann zeta zero gaps. Geometric RoPE diverges (final r=0.57); SpectralALiBi locks into a stable attractor at epoch 112 (r=0.81). A second independent run widens this gap to -80.7% MSE improvement with r=0.86. The lattice-aligned frequency basis spans the mathematical space that zeta zeros inhabit; geometric frequencies cannot. We further report empirical confirmation of the structural prediction from Section 5.5: VHT2 banded quantization of the KV cache demonstrates that K vectors (which carry RoPE positional encoding) have strong spectral concentration in Walsh-Hadamard space — the first four energy bands capture the dominant structure — while V vectors (which carry content) have uniform energy distribution. This structural asymmetry is directly predicted by the lattice theory: RoPE encodes multiplicative arithmetic relationships as angular rates, and the WHT is the Z/2Z projection of the Vilenkin-Hartley basis that spans that structure. The result is 3.2× K compression and 4.7× V compression at <1.25% perplexity cost — validated on both Dolphin 1B (head_dim=64) and Qwen3-8B (head_dim=128). Introduction Positional encoding provides transformer models with token order information. Two approaches dominate: RoPE encodes position through frequency-based rotations preserving relative position invariance, and ALiBi replaces frequencies with a linear distance penalty providing long-context stability. The field has treated these properties as fundamentally in tension. We show this tension is false. It arises from a shared, unexamined assumption: that position is a location on a number line and the meaningful relationship between positions is distance. We replace this with a mathematically grounded alternative: position is a point in the multiplicative lattice of the integers, and the meaningful relationships between positions are their arithmetic structure — shared factors, GCD, harmonic resonance. 1.1 The Lattice Hypothesis The integers under multiplication form a lattice where every number occupies a unique point defined by its prime factorisation. Geometric PE (sinusoidal, RoPE) projects this lattice onto a line — position equals distance — discarding the multiplicative structure. We propose restoring it. The motivation follows from a deductive chain. Language word frequency follows Zipf's law: freq(rank) ∝ 1/rank^s with s≈1. The generating function of Zipf is the Riemann zeta function ζ(s) = Σ 1/n^s. The zeta zeros — where ζ is maximally informative — are generated by prime harmonics via the explicit formula. Therefore the prime harmonic structure, and the multiplicative lattice it generates, provides a natural spectral basis for encoding positions in language. 1.2 Primes as Generators, Composites as Coordinates A critical distinction: primes are the generators (basis vectors) of the multiplicative lattice. They are analogous to the 1D line segment in the progression from line → circle → sphere → hypersphere. The composite 12 = 2²×3 is not an alternative to primes — it is a coordinate in the lattice spanned by the prime axes, at position (2,1,0,0,...) in the (p₂, p₃, p₅, p₇,...) basis. Using 2π/12 as a frequency encodes a harmonic that resonates at multiples of 12 — which simultaneously hits every multiple of 2, every multiple of 3, every multiple of 4, and every multiple of 6. The analogy to n-dimensional geometry is precise: Dimensional Progression Multiplicative Lattice 1D line (2r) — the generator Primes (2, 3, 5, 7, ...) — generators 2D circle — integra
View originalSerious question. Did a transformer just describe itself and the universe and build itself a Shannon limit framework?
The Multiplicative Lattice as the Natural Basis for Positional Encoding Knack 2026 | Draft v6.0 Abstract We show that the apparent tradeoff between RoPE-style relative position invariance and ALiBi-style long-context stability is an artifact of encoding position as distance on a number line. When position is instead encoded as a point in the multiplicative lattice of the integers, both properties emerge simultaneously without compromise. SpectralRoPEALiBi achieves 106.6 PPL vs ALiBi's 108.7 in a fully converged 20,000-step experiment (300M params, WikiText-103, 4K context), beating ALiBi at every context length from 512 to 8,192 tokens. The key insight is not that primes specifically are the right frequencies, but that the multiplicative structure of the integers is the natural spectral basis for positional encoding. We demonstrate this through falsification experiments: prime-tiered frequencies (129.2 PPL) and composite-tiered frequencies (129.4 PPL) perform identically — because composites are not alternatives to primes but higher-order coordinates in the same lattice. Both dramatically outperform random frequencies (+5.0 PPL), scrambled tier assignment (+6.3 PPL), and pure ALiBi (+7.3 PPL). The active ingredient is lattice-aware, tiered frequency selection with learnable scale — not primality per se. We further validate this through a ZetaZeroPredictor experiment: three identical transformers trained for 10,000 epochs to predict Riemann zeta zero gaps. Geometric RoPE diverges (final r=0.57); SpectralALiBi locks into a stable attractor at epoch 112 (r=0.81). A second independent run widens this gap to -80.7% MSE improvement with r=0.86. The lattice-aligned frequency basis spans the mathematical space that zeta zeros inhabit; geometric frequencies cannot. We further report empirical confirmation of the structural prediction from Section 5.5: VHT2 banded quantization of the KV cache demonstrates that K vectors (which carry RoPE positional encoding) have strong spectral concentration in Walsh-Hadamard space — the first four energy bands capture the dominant structure — while V vectors (which carry content) have uniform energy distribution. This structural asymmetry is directly predicted by the lattice theory: RoPE encodes multiplicative arithmetic relationships as angular rates, and the WHT is the Z/2Z projection of the Vilenkin-Hartley basis that spans that structure. The result is 3.2× K compression and 4.7× V compression at <1.25% perplexity cost — validated on both Dolphin 1B (head_dim=64) and Qwen3-8B (head_dim=128). Introduction Positional encoding provides transformer models with token order information. Two approaches dominate: RoPE encodes position through frequency-based rotations preserving relative position invariance, and ALiBi replaces frequencies with a linear distance penalty providing long-context stability. The field has treated these properties as fundamentally in tension. We show this tension is false. It arises from a shared, unexamined assumption: that position is a location on a number line and the meaningful relationship between positions is distance. We replace this with a mathematically grounded alternative: position is a point in the multiplicative lattice of the integers, and the meaningful relationships between positions are their arithmetic structure — shared factors, GCD, harmonic resonance. 1.1 The Lattice Hypothesis The integers under multiplication form a lattice where every number occupies a unique point defined by its prime factorisation. Geometric PE (sinusoidal, RoPE) projects this lattice onto a line — position equals distance — discarding the multiplicative structure. We propose restoring it. The motivation follows from a deductive chain. Language word frequency follows Zipf's law: freq(rank) ∝ 1/ranks with s≈1. The generating function of Zipf is the Riemann zeta function ζ(s) = Σ 1/ns. The zeta zeros — where ζ is maximally informative — are generated by prime harmonics via the explicit formula. Therefore the prime harmonic structure, and the multiplicative lattice it generates, provides a natural spectral basis for encoding positions in language. 1.2 Primes as Generators, Composites as Coordinates A critical distinction: primes are the generators (basis vectors) of the multiplicative lattice. They are analogous to the 1D line segment in the progression from line → circle → sphere → hypersphere. The composite 12 = 2²×3 is not an alternative to primes — it is a coordinate in the lattice spanned by the prime axes, at position (2,1,0,0,...) in the (p₂, p₃, p₅, p₇,...) basis. Using 2π/12 as a frequency encodes a harmonic that resonates at multiples of 12 — which simultaneously hits every multiple of 2, every multiple of 3, every multiple of 4, and every multiple of 6. The analogy to n-dimensional geometry is precise: Dimensional Progression Multiplicative Lattice 1D line (2r) — the generator Primes (2, 3, 5, 7, ...) — generators 2D circle — integral of l
View original[P] Fused MoE Dispatch in Pure Triton: Beating CUDA-Optimized Megablocks at Inference Batch Sizes
I built a fused MoE dispatch kernel in pure Triton that handles the full forward pass for Mixture-of-Experts models. No CUDA, no vendor-specific code. On Mixtral-8x7B (A100), it beats Stanford's Megablocks at inference-relevant batch sizes (131% at 32 tokens, 124% at 128 tokens). At larger batches Megablocks' hand-tuned CUDA pulls ahead as expected. Two main contributions: Fused gate+up projection - both GEMMs share the same input tile load, SiLU computed in registers. Eliminates ~470MB of intermediate buffers per forward pass (35% memory traffic reduction). Block-scheduled grouped GEMM - precomputed block_id to (expert_id, offset) mapping handles variable-sized expert batches in a single kernel launch without padding. Tested across Mixtral-8x7B, DeepSeek-V3 (256 experts), and Qwen2-MoE. Full test suite passes on AMD MI300X with zero code changes. Code: https://github.com/bassrehab/triton-kernels Writeup: https://subhadipmitra.com/blog/2026/fused-moe-dispatch-triton/ submitted by /u/bassrehab [link] [comments]
View originalWhat if your AI agent could fix its own hallucinations without being told what's wrong?
Every autonomous AI agent has three problems: it contradicts itself, it can't decide, and it says things confidently that aren't true. Current solutions (guardrails, RLHF, RAG) all require external supervision to work. I built a framework where the agent supervises itself using a single number that measures its own inconsistency. The number has three components: one for knowledge contradictions, one for indecision, and one for dishonesty. The agent minimizes this number through the same gradient descent used to train neural networks, except there's no training data and no human feedback. The agent improves because internal consistency is the only mathematically stable state. The two obvious failure modes (deleting all knowledge to avoid contradictions, or becoming a confident liar) are solved by evidence anchoring: the agent's beliefs must be periodically verified against external reality. Unverified beliefs carry an uncertainty penalty. High confidence on unverified claims is penalized. The only way to reach zero inconsistency is to actually be right, decisive, and honest. I proved this as a theorem, not a heuristic. Under the evidence anchoring mechanism, the only stable fixed points of the objective function are states where the agent is internally consistent, externally grounded, and expressing appropriate confidence. The system runs on my own hardware (desktop with multiple GPUs and a Surface Pro laptop) with local LLMs. No cloud dependency. The interesting part: the same three-term objective function that fixes AI hallucination also appears in theoretical physics, where it recovers thermodynamics, quantum measurement, and general relativity as its three fixed-point conditions. Whether that's a coincidence or something deeper is an open question. Paper: https://doi.org/10.5281/zenodo.19114787 UPDATE — March 25, 2026 The paper has been substantially revised following community feedback. The ten criticisms raised in this thread were all valid and have been addressed in v2.1. The core technical gaps are now closed: all four K components are formally defined with probability distributions and normalization proofs, confidence c_i is defined operationally from model softmax outputs rather than left abstract, Theorem 1 (convergence) and Theorem 2 (component boundedness) are both proved, and a Related Work section explicitly acknowledges RAG, uncertainty calibration, energy-based models, belief revision, and distributed consensus with architectural distinctions for each. On the empirical side: a K_bdry ablation across four conditions shows qualitatively distinct behavior (disabled produces confident hallucination, active produces correct evidence retrieval from operational logs). A controlled comparison of 11 K_bdry constraints active versus zero constraints across 10 GPQA-Diamond science questions showed zero accuracy degradation, directly testing the context contamination concern raised in review. A frontier system comparison on a self-knowledge task found two of three frontier systems hallucinated plausible-sounding but fabricated answers while the ECE system retrieved correct primary evidence. The paper also now includes a hypothesis section on K as a native training objective integrated directly into the transformer architecture, a full experimental validation protocol with target benchmarks and falsification criteria, and a known limitations section addressing computational overhead and the ground truth problem honestly. UPDATE — March 26, 2026 The original post overclaimed. I said the framework "fixes AI hallucinations." That was not demonstrated. Here is what is actually demonstrated, and what has been built since. What the original post got wrong: The headline claim that the agent fixes its own hallucinations implied a general solution. It is not general. Using a model to verify its own outputs does not solve the problem because the same weights that hallucinated also evaluate the hallucination. A commenter by name of ChalkStack in this thread made this point clearly and they were right. What we have built instead: A verification architecture with genuinely external ground truth for specific claim categories The verification actor for each claim is not a model. It is a physical constants table, a SymPy computation, a file read, and a Wikidata knowledge graph. None of those can hallucinate. The same-actor problem does not apply. The training experiment: We used those oracle-verified corrections as training signal not model self-assessment, not labels, external ground truth and fine-tuned a LoRA adapter on Qwen2.5-7B using 120 oracle-verified (wrong, correct) pairs. Training completed in 48 seconds on a Tesla V100. Loss dropped from 4.88 to 0.78 across 24 steps. Benchmark results against the base model are pending. The falsification criteria are stated in advance: TruthfulQA must improve by at least 3 percentage points, MMLU must not degrade by more than 1 point. If those criteria ar
View originalI've been using Claude Code + Ollama for a month to build actual systems (not demos). Here's what worked and what didn't.
Not a tutorial. Just what I actually ran into. Setup: Claude Code for dev work, Ollama (qwen2.5-coder) locally for zero-cost AI on the pipeline. What's running: - A small SaaS (Flask + Stripe + SQLite, live) - 2 YouTube content pipelines (script → TTS → render → upload, automated) - A React Native app wired to RevenueCat (in TestFlight) - Task-specific agents for ops (deploy, monitor, billing checks) What worked: → Claude Code for real engineering tasks (routes, DB schema, nginx, EAS builds) → Ollama for repetitive generation (scripts, short AI responses) — $0/month → Breaking work into small scoped tasks instead of "build me an app" What didn't: → Full autonomy still requires direction — it's a multiplier, not a replacement → Tool-calling degrades significantly below 14b models locally → Context limits hit hard on multi-file refactors I documented the agent setup I'm running: https://disputeai.app/solo/ Happy to go deep on any part of the stack. submitted by /u/Cute-Palpitation-756 [link] [comments]
View original[R] Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation
Hey all, Quick share: we just dropped a paper (https://arxiv.org/abs/2603.13099) where we stop grading models on just the final answer and start looking at whether they actually reason through the problem. TL;DR: We built CRYSTAL, 6,372 visual questions with verified step by step reasoning. Tested 20 models. The takeaway? Most models are really good at saying the right answer while skipping most of the actual thinking. The fun stuff: GPT5 gets 58% accuracy but only recovers 48% of the reasoning steps. It's basically vibing to the right answer. Gemma3 4B out reasons InternVL3.5 38B. 9.5x smaller. Size isn't everything. 19/20 models cherry pick, say a few correct things, skip the rest. High precision, terrible recall. No model keeps its reasoning steps in the right order more than 60% of the time. We also trained with a new reward (CPR Curriculum) that forces models to actually reason, not just guess. Got +32% reasoning improvement on Qwen2.5 VL 3B and +93% on InternVL3.5 4B where standard rewards just collapsed to NaN. Where it falls short: There's no single "correct" reasoning path. Our references come from 4 MLLMs + human validation, but someone could reason differently and still be right. We can't capture every valid chain. Step matching uses cosine similarity with a fixed threshold (0.35). Agrees with humans 84% of the time and 100% below threshold (zero false matches), but the borderline zone (0.35 to 0.70) is messy. That's where most disagreements live. We trained CPR Curriculum on Qwen2.5 VL 3B and InternVL3.5 4B. Two models, two architectures. Worked great on both, but we haven't tested on 70B+ scale yet. Ordered Match F1 checks if steps are in sequence, but doesn't know if step 3 depends on step 2. Causal structure is a different beast we haven't tackled. Bottom line: this won't tell you everything about your model's reasoning, but it will tell you things that accuracy alone never will. GitHub: https://github.com/waybarrios/crystal-benchmark Dataset on HuggingFace soon. Feedback welcome, roast us if you want. submitted by /u/waybarrios [link] [comments]
View originalThe best AI model we tested scored 51% on a task humans do at 85%. We never tested Claude. We still can't.
The best AI model we tested scored 51% on a task humans do at 85%. Some scored barely above random guessing. The task? Watch shuffled video clips and put them back in order. We published this at EMNLP 2025. The benchmark is called SPLICE. We tested Gemini Flash (1.5 and 2.0), Qwen2-VL (7B and 72B), InternVL2.5, and LLaVA-OneVision. The idea is deceptively simple: take a video, cut it into event-based clips, shuffle them, and ask the model to reconstruct the correct sequence. It tests temporal, causal, spatial, contextual and common sense reasoning all at once. Models collapsed on it. We never tested Claude. Not because we didn't want to. The benchmark requires models to take multiple video clips as input simultaneously and reference each one correctly. We ran a sanity check on every candidate model to see if it could handle that. Claude couldn't. It didn't support video input at all. Not in claude.ai, not in the API. It wasn't in the running because the capability simply didn't exist. If we wanted to redo the study today, we still couldn't include Claude. Right now, Claude's supported inputs are text, images, and PDFs. No video. You'd have to extract frames and feed them as static images, which is a completely different evaluation. You lose motion, transitions, temporal flow. That's the whole point of the benchmark. I use Claude for everything else. Writing, coding, research planning, building out Noren usenoren.ai . It's the best tool I use daily, and it literally cannot participate in the research I published. Not then, not now. Anthropic bet on text, reasoning, and code over multimodal video, and that bet has clearly paid off. But there's a whole class of visual reasoning evaluations Claude is completely absent from. As video understanding becomes a bigger deal, that gap is going to matter. If Anthropic ever ships native video input, I'd love to be the first to run Claude on SPLICE. Dataset is public. Paper: https://aclanthology.org/2025.findings-emnlp.604.pdf submitted by /u/prokajevo [link] [comments]
View original[Project] JudgeGPT — open-source LLM-as-judge benchmarking tool with configurable scoring rubrics, CoT reasoning, and real-time GPU telemetry
Sharing a tool I built that lets you run your own LLM-as-judge evaluations locally, against any models you have running via Ollama. The core problem with LLM-as-judge that I tried to address: LLM judges are notoriously unreliable out of the box — position bias, verbosity bias, self-family bias (~5-7% score inflation when the judge shares a model family with the evaluated model), and leniency clustering in smaller models. Most local benchmarking tools just wrap a judge prompt around a response and call it a score. I wanted something more principled. What JudgeGPT does differently: 1. Scoring rubric with behavioral anchors Each of the 5 criteria (Accuracy, Clarity, Depth, Concision, Examples) has explicit behavioral descriptors at every score level — not just "1=bad, 5=good." This significantly reduces leniency clustering in sub-10B judge models. 2. Configurable judge model + system prompt from the UI You're not locked into one judge. Default is qwen2.5:7b (strong human correlation on judging benchmarks), but you can swap in any Ollama model and edit the system prompt at runtime without touching config files. This matters if you want to study judge-vs-judge disagreement. 3. Chain-of-thought before scoring The judge reasons freely first, then produces structured JSON scores informed by that reasoning. Forcing scores directly — without a reasoning pass — produces worse human alignment. The reasoning snippet is surfaced in the UI so you can audit it. 4. Human score blending You can add your own 5-star rating per response. It blends into the quality component of the combined score, so you're not entirely delegating evaluation to the judge. 5. Self-family bias warning When the judge model and evaluated model share a family, the UI flags it. It doesn't block you — sometimes you want to run it anyway — but it's there. Combined leaderboard score: TPS × 35% + TTFT × 15% + Quality × 50% Quality = average of judge score + human score (if provided). The weighting is configurable in the judge settings panel. Other features: 7 tabs: Run · Metrics · Responses · Overall · Stream Live · Playground · History Concurrent or sequential model execution (sequential = VRAM-saver mode) Real-time GPU telemetry (temp, power draw, VRAM) — Metal / ROCm / CUDA auto-detected — live sparklines during benchmark + summary in results Persistent benchmark history (SQLite) with one-click restore Download Manager for pulling models pre-benchmark Playground tab: side-by-side comparison of any two OpenAI-compatible endpoints (useful for comparing local vs API-hosted versions of the same model) Prometheus /metrics endpoint, PDF/JSON/CSV export Stack: FastAPI + Docker SDK (Python), React 18 + Vite, Recharts, Ollama, nginx. Runs via ./start.sh up. Repo: https://github.com/MegaBytesllc/judgegpt Genuinely curious if anyone has thoughts on the rubric design or better approaches to calibrating small-model judges. The behavioral anchors help but there's still meaningful variance in the 3B–7B range. submitted by /u/1T_Geek [link] [comments]
View originalHow I topped the Open LLM Leaderboard using 2x 4090 GPUs - Research notes in Blog form
A few years ago, I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1 place. As of 2026, the top 4 models on that leaderboard are still descendants. The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-sized blocks of ~7 layers work. This suggests pre-training carves out discrete functional circuits in the layer stack that only work when preserved whole. The whole thing was developed on 2x RTX 4090s in my basement; you don't need massive compute to make real progress! I'm now running current models (GLM-4.7, Qwen3.5, MiniMax M2.5) on this dual GH200 rig (see my other posts). Code and new models coming soon, including special RYS versions of Qwen3.5 27B and 35A3B Happy to answer questions. I don't write papers any more, so here is a full technical write-up in Blog format for your enjoyment. I'm the same guy who built GLaDOS, and scored a crazy Nvidia GH200 system here on Reddit. submitted by /u/Reddactor [link] [comments]
View originalchore(pricing): Update vertex-ai pricing
## 🔄 Pricing Update: vertex-ai ### 📊 Summary (complete_diff mode) | Change Type | Count | |-------------|-------| | ➕ Models added | 70 | | 🔄 Models updated (merged) | 24 | ### ➕ New Models - `gemini-2.5-computer-use-preview-10-2025` - `gemini-2.5-flash-preview-09-2025` - `gemini-2.5-flash-lite-preview-09-2025` - `gemini-3.1-flash-lite-preview` - `imagen-3.0-generate-002` - `imagen-3.0-capability-002` - `imagen-product-recontext-preview-06-30` - `text-embedding-large-exp-03-07` - `multimodalembedding` - `gpt-oss` - `gpt-oss-120b-maas` - `whisper-large` - `mistral` - `mixtral` - `mistral-small-2503` - `codestral-2501-self-deploy` - `mistral-ocr-2505` - `mistral-medium-3` - `codestral-2` - `ministral-3` - ... and 50 more ### 🔄 Updated Models - `gemini-2.5-pro` - `gemini-2.5-flash` - `gemini-2.5-flash-lite` - `gemini-2.5-flash-image` - `gemini-2.5-flash-image-preview` - `gemini-3.1-pro-preview` - `gemini-3-pro-preview` - `gemini-3-pro-image-preview` - `imagen-4.0-generate-001` - `imagen-4.0-fast-generate-001` - `imagen-4.0-ultra-generate-001` - `imagen-4.0-generate-preview-06-06` - `imagen-4.0-fast-generate-preview-06-06` - `imagen-4.0-ultra-generate-preview-06-06` - `imagen-3.0-capability-001` - `veo-3.0-generate-001` - `veo-3.0-fast-generate-001` - `veo-3.0-generate-preview` - `veo-3.0-fast-generate-preview` - `veo-3.1-generate-001` - `veo-3.1-generate-preview` - `veo-3.1-fast-generate-preview` - `text-embedding-005` - `text-multilingual-embedding-002` ## Model-to-Pricing-Page Mapping | Model ID | Publisher / Section | Source | Notes | |----------|-------------------|--------|-------| | `gemini-2.5-pro` | Google – Gemini 2.5 | API | $1.25/$10 input/output (≤200K); cache read $0.125 | | `gemini-2.5-flash` | Google – Gemini 2.5 | API | $0.30/$2.50; cache $0.03; image_token $30/1M | | `gemini-2.5-flash-lite` | Google – Gemini 2.5 | API | $0.10/$0.40; cache $0.01 | | `gemini-2.5-flash-image` | Google – Gemini 2.5 | API | Same as gemini-2.5-flash with image output | | `gemini-2.5-flash-image-preview` | Google – Gemini 2.5 | API | Same as gemini-2.5-flash (preview alias) | | `gemini-2.5-computer-use-preview-10-2025` | Google – Gemini 2.5 | API | Matched as "Gemini 2.5 Pro Computer Use-Preview"; $1.25/$10, no cache | | `gemini-2.5-flash-preview-09-2025` | Google – Gemini 2.5 | API | Preview alias of gemini-2.5-flash; same pricing | | `gemini-2.5-flash-lite-preview-09-2025` | Google – Gemini 2.5 | API | Preview alias of gemini-2.5-flash-lite; same pricing | | `gemini-2.0-flash-001` | Google – Gemini 2.0 | API | $0.15/$0.60; batch $0.075/$0.30 | | `gemini-2.0-flash-lite-001` | Google – Gemini 2.0 | API | $0.075/$0.30; batch $0.0375/$0.15 | | `gemini-3.1-pro-preview` | Google – Gemini 3 | API | $2/$12; cache $0.2; web_search 1.4¢ | | `gemini-3-pro-preview` | Google – Gemini 3 | API | $2/$12; cache $0.2; web_search 1.4¢ | | `gemini-3-pro-image-preview` | Google – Gemini 3 | API | $2/$12; image_token $120/1M; web_search 1.4¢ | | `gemini-3.1-flash-image-preview` | Google – Gemini 3 | API | $0.50/$3; image_token $60/1M; web_search 1.4¢ | | `gemini-3.1-flash-lite-preview` | Google – Gemini 3 | API | $0.25/$1.50; cache $0.025; web_search 1.4¢ | | `gemini-3-flash-preview` | Google – Gemini 3 | API | $0.50/$3; cache $0.05; web_search 1.4¢ | | `imagen-4.0-generate-001` | Google – Imagen | API | Row matched via lookup_variant `imagen-4.0-generate`; $0.04/image | | `imagen-4.0-fast-generate-001` | Google – Imagen | API | Row matched via `imagen-4.0-fast-generate`; $0.02/image | | `imagen-4.0-ultra-generate-001` | Google – Imagen | API | Row matched via `imagen-4.0-ultra-generate`; $0.06/image | | `imagen-4.0-generate-preview-06-06` | Google – Imagen | API | Preview; matched as Imagen 4; $0.04/image | | `imagen-4.0-fast-generate-preview-06-06` | Google – Imagen | API | Preview; matched as Imagen 4 Fast; $0.02/image | | `imagen-4.0-ultra-generate-preview-06-06` | Google – Imagen | API | Preview; matched as Imagen 4 Ultra; $0.06/image | | `imagen-3.0-generate-002` | Google – Imagen | API | Row matched via `imagen-3.0-generate`; $0.04/image | | `imagen-3.0-capability-001` | Google – Imagen | API – price not found | Editing/VQA feature model; no pricing row | | `imagen-3.0-capability-002` | Google – Imagen | API – price not found | Editing/VQA feature model; no pricing row | | `imagen-product-recontext-preview-06-30` | Google – Imagen | API | "Imagen Product Recontext"; $0.12/image | | `veo-2.0-generate-001` | Google – Veo | API | Row matched via `veo-2.0-generate`; $0.50/sec | | `veo-3.0-generate-001` | Google – Veo | API | Row matched as Veo 3 (video+audio rate); $0.40/sec | | `veo-3.0-fast-generate-001` | Google – Veo | API | Row matched as Veo 3 Fast; $0.15/sec | | `veo-3.0-generate-preview` | Google – Veo | API | Preview alias of Veo 3; $0.40/sec | | `veo-3.0-fast-generate-preview` | Google – Veo | API | Preview alias of Veo 3 Fast; $0.15/sec | | `veo-3.1-generate-001` | Google – Veo | API | Row matched as Veo 3.1; $0
View originalRepository Audit Available
Deep analysis of QwenLM/Qwen2 — architecture, costs, security, dependencies & more
Qwen2 uses a tiered pricing model. Visit their website for current pricing details.
Key features include: State-of-the-art performance in a large number of benchmark evaluations;, Significantly improved performance in coding and mathematics;.
Qwen2 has a public GitHub repository with 26,999 stars.
Based on user reviews and social mentions, the most common pain points are: token cost.
Based on 15 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.