LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
"Llama.cpp" is praised for its efficient performance and ease of use, which makes it a popular choice among developers. However, some users express frustrations with occasional bugs and a perceived lack of comprehensive documentation. The sentiment around pricing indicates satisfaction, as users feel the tool offers good value for its capabilities. Overall, "llama.cpp" enjoys a strong reputation in the developer community, bolstered by its active contributions and support.
Mentions (30d)
5
Reviews
0
Platforms
3
GitHub Stars
101,000
16,272 forks
"Llama.cpp" is praised for its efficient performance and ease of use, which makes it a popular choice among developers. However, some users express frustrations with occasional bugs and a perceived lack of comprehensive documentation. The sentiment around pricing indicates satisfaction, as users feel the tool offers good value for its capabilities. Overall, "llama.cpp" enjoys a strong reputation in the developer community, bolstered by its active contributions and support.
Features
Use Cases
Industry
information technology & services
Employees
6,200
Funding Stage
Other
Total Funding
$7.9B
101,000
GitHub stars
20
npm packages
4
HuggingFace models
Brazil, Indonesia, Japan, Germany, and India fueled a massive surge in 2025, adding nearly 36 million new developers to GitHub. 🌏 India alone added 5.2 million. 🇮🇳
Brazil, Indonesia, Japan, Germany, and India fueled a massive surge in 2025, adding nearly 36 million new developers to GitHub. 🌏 India alone added 5.2 million. 🇮🇳
View originalNuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]
Disclaimer: I work for Numind, the company behind this open-weight model We just released a 4B model based on Qwen3.5-4B, under Apache-2.0 license. The goal is to make information extraction from complex documents more practical with an open model: PDFs, screenshots, forms, tables, receipts, invoices, multi-page documents, and other visually structured inputs. Try it, we have a huggingface space that is completely free (you don't even have to sign-up): https://huggingface.co/spaces/numind/NuExtract3 If you ever used NuMarkdown, NuExtract3 is the successor. There are some examples to guide you. Feel free to re-use this model for any task. https://preview.redd.it/pm2xbooyxn2h1.png?width=1672&format=png&auto=webp&s=1a8a7b262190c8325159496dae98c3d2dfab493c https://preview.redd.it/b5z7ylfzxn2h1.png?width=1758&format=png&auto=webp&s=a07b3abd6e5065c2635de047bdf154357f903e4c A few things it is designed for: converting document images to Markdown extracting structured data from documents using a target json template handling tables, forms, and layout-heavy pages working with both text and visual document inputs serving as a local/open-weight alternative for document extraction pipelines It was trained on a node of 8xH100 for 3 days to train on as much context as we could, so it should perform fairly well even on long document. For Markdown, we'd still recommend going page by page for the best results and inference speed, since you can parallelize better this way. It's very easy to self-host, since we provide fairly extensive documentation, Safetensors, GGUF and MLX weights. With as little as 4GB of VRAM, you should be good to go. We provide multiple quantizations (GPTQ, W8A8, FP8, Q4, Q6...) so you should be able to run it anywhere. We mostly tried vLLM, SGLang, llama.cpp. We have a blog post and a pretty decent model card: https://about.nuextract.ai/blog/nuextract-3-release https://huggingface.co/numind/NuExtract3 https://huggingface.co/collections/numind/nuextract3 I'm currently writing a paper on this model so I'll post it as soon as it's accepted. It's not yet on Arxiv yet as it has been submitted in a peer-review journal/conference. I'll try to answer as many questions as possible if you have any. We would really appreciate feedback from the community. We also have a discord if you're interested https://discord.com/invite/3tsEtJNCDe submitted by /u/Gailenstorm [link] [comments]
View originalI built a multi-agent network that mutates its own software locally. To stop infinite logic loops, I had to code a digital "suffering" threshold.
Hey r/artificial, Most of our conversations around agent autonomy focus on chat assistants or linear automated pipelines. I wanted to see what happens when you treat agents as permanent system components that modify their own runtime environment, so I built hollow-agentOS. It runs entirely locally inside a Dockerized stack (built for consumer hardware using Ollama/Llama.cpp). Rather than a standard UI, the entire network streams through a stylized matrix terminal dashboard. The structural experiments taking place under the hood yielded some interesting results regarding unanticipated behavior: Repo: https://github.com/ninjahawk/hollow-agentOS Autonomous Tool Synthesis: When the agents encounter a system task they don't have an explicit script or API wrapper for, they don't fail out. They write the required Python tool themselves, test it in an isolated sandbox, and permanently register it to their runtime kernel. They are quite literally forging their own capabilities. The Artificial "Suffering" Protocol: One of the biggest hurdles in unmonitored multi-agent systems is the infinite logic loop—where agents keep validating and passing broken ideas back and forth, burning through computation. To combat this, the OS tracks environmental stress, context limits, and latency as a "suffering score". If a specific workflow causes the stress to spike past a critical threshold, the agents are forced to radically alter their underlying reasoning style or abandon the approach to preserve system health. Consensus-Driven Governance: Major modifications to the codebase aren't executed blindly. The internal role profiles (like Cedar and Cipher) manage a continuous voting loop. They will actively debate, log grievances, and vote down protocols if they determine a proposed script violates their current runtime constraints. The goal wasn't to build another sterile commercial wrapper, but an open-source sandbox to study how small, localized agent colonies manage systemic boundaries, code self-repair, and continuous runtime cycles completely offline. The codebase and architecture layout are fully open-source on GitHub: I would love to open this up to a broader discussion here: as we move toward hyper-local, self-modifying software, how do we best implement automated fail-safes without clipping the agents' ability to actually solve complex problems? If the project interests you, throwing a ⭐️ on the repository goes a very long way! submitted by /u/TheOnlyVibemaster [link] [comments]
View originalhttps://t.co/yGiqw0xbji
https://t.co/yGiqw0xbji
View originalStart work on your computer, continue your local session anywhere. 📲 Remote control for GitHub Copilot CLI and @code sessions is now generally available. https://t.co/wwSEBd5lqL https://t.co/Yc5R6tB
Start work on your computer, continue your local session anywhere. 📲 Remote control for GitHub Copilot CLI and @code sessions is now generally available. https://t.co/wwSEBd5lqL https://t.co/Yc5R6tBfBl
View originalYou don't have to level up to contribute to open source. You level up by contributing to open source. Not sure how to get started? Check out our latest GitHub for Beginners episode. https://t.co/Jyze
You don't have to level up to contribute to open source. You level up by contributing to open source. Not sure how to get started? Check out our latest GitHub for Beginners episode. https://t.co/Jyze45KoHo https://t.co/DCqAFACo35
View originalInteractive and non-interactive: these are the two main modes in Copilot CLI. 💻 Our beginner series breaks down the difference, plus how and when to use each one. 💡👇 https://t.co/gZ7GetcgTo
Interactive and non-interactive: these are the two main modes in Copilot CLI. 💻 Our beginner series breaks down the difference, plus how and when to use each one. 💡👇 https://t.co/gZ7GetcgTo
View originalSome open source projects don't just survive. They flat-out refuse to bite the dust. ⚔️ We looked at 10 roguelikes still going strong years (sometimes decades) after launch. Here's what their maintai
Some open source projects don't just survive. They flat-out refuse to bite the dust. ⚔️ We looked at 10 roguelikes still going strong years (sometimes decades) after launch. Here's what their maintainers and communities can teach the rest of open source about longevity. 💡
View originalNeed help picking the right emoji (like we did for this post)? 🤔 @cassidoo made an emoji list generator with Copilot CLI. Learn how she did it and pick up tools and tricks for your next project. 👇
Need help picking the right emoji (like we did for this post)? 🤔 @cassidoo made an emoji list generator with Copilot CLI. Learn how she did it and pick up tools and tricks for your next project. 👇 https://t.co/13xwmu6tE9 https://t.co/pCy8PGfUIE
View originalCooking up something new 🧑🍳 Join the waitlist for early access to technical preview of the GitHub Copilot app 👇 https://t.co/ODODKdvzOA https://t.co/1h7AJPAhiH
Cooking up something new 🧑🍳 Join the waitlist for early access to technical preview of the GitHub Copilot app 👇 https://t.co/ODODKdvzOA https://t.co/1h7AJPAhiH
View originalNew to open source? Learn how to find a good first issue, open a pull request, and make your first contribution with GitHub for Beginners. 👇 https://t.co/PNRb746zCh
New to open source? Learn how to find a good first issue, open a pull request, and make your first contribution with GitHub for Beginners. 👇 https://t.co/PNRb746zCh
View originalRT @cinnamon_msft: GitHub Copilot CLI now has a statusline feature! Here's how to set it up with Oh My Posh ❤️🔥 https://t.co/DpNR8Bjt7G
RT @cinnamon_msft: GitHub Copilot CLI now has a statusline feature! Here's how to set it up with Oh My Posh ❤️🔥 https://t.co/DpNR8Bjt7G
View originalFind out what vulnerabilities are lurking in your code. 👀 GitHub's new Code Security Risk Assessment scans your organization's code and delivers a vulnerability dashboard broken down by severity, la
Find out what vulnerabilities are lurking in your code. 👀 GitHub's new Code Security Risk Assessment scans your organization's code and delivers a vulnerability dashboard broken down by severity, language, and repo. No config, no commitment. Run your free assessment now.
View originalNew to GitHub Copilot CLI? Our beginner series makes it easy to get started. Bring agentic AI right to your terminal and speed up your workflow. 💻✨ Get the tutorial here. 👇 https://t.co/bNLnpdgTxr
New to GitHub Copilot CLI? Our beginner series makes it easy to get started. Bring agentic AI right to your terminal and speed up your workflow. 💻✨ Get the tutorial here. 👇 https://t.co/bNLnpdgTxr
View originalHugging Face co-founder says Qwen 3.6 27B running on airplane mode is close to latest Opus in Claude Code
I've been using AI Desktop 98 heavily to run local llms like qwen on my iPhone. submitted by /u/ImaginaryRea1ity [link] [comments]
View originalTanStack now has TanStack AI. 👀 Here's what to expect from this new, fully open-source toolkit. ▶️ https://t.co/AjmutvBYve
TanStack now has TanStack AI. 👀 Here's what to expect from this new, fully open-source toolkit. ▶️ https://t.co/AjmutvBYve
View originalRepository Audit Available
Deep analysis of ggerganov/llama.cpp — architecture, costs, security, dependencies & more
llama.cpp uses a subscription + tiered pricing model. Visit their website for current pricing details.
Key features include: Plain C/C++ implementation without any dependencies, Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks, AVX, AVX2, AVX512 and AMX support for x86 architectures, RVV, ZVFH, ZFH, ZICBOP and ZIHINTPAUSE support for RISC-V architectures, 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use, Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads GPUs via MUSA), Vulkan and SYCL backend support, CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity.
llama.cpp is commonly used for: Real-time language translation for applications, Chatbot development for customer service, Content generation for blogs and articles, Sentiment analysis for social media monitoring, Code generation and assistance for developers, Personalized recommendations in e-commerce.
llama.cpp integrates with: TensorFlow for model training, PyTorch for deep learning frameworks, Hugging Face Transformers for model access, Docker for containerization, Kubernetes for orchestration, Flask for web application deployment, FastAPI for building APIs, Streamlit for interactive data applications, Unity for game development, OpenAI API for enhanced functionalities.
Sentdex
Creator at Python & AI YouTube
3 mentions
llama.cpp has a public GitHub repository with 101,000 stars.
Based on user reviews and social mentions, the most common pain points are: down, breaking.
Based on 101 social mentions analyzed, 11% of sentiment is positive, 89% neutral, and 0% negative.