Unsloth is an open-source, no-code web UI for training, running and exporting open models in one unified local interface.
Unsloth lets you run and train AI models on your own local hardware. Run and train Google's new Gemma 4 models! A new open, no-code web UI to train and run LLMs. New Qwen3.5 Small Medium LLMs are here! Run the new 4B and 120B models by NVIDIA. Train MoE LLMs 12x faster with less VRAM. Learn to run local LLMs via Claude OpenAI. Run fine-tune the new 80B coding model. Run fine-tune 30B model for agentic coding. Unsloth streamlines local training, inference, data, and deployment Search + download + run any model like GGUFs, LoRA adapters, safetensors. Train and RL 500+ models ~2x faster with ~70% less VRAM (no accuracy loss) Supports full fine-tuning, pre-training, 4-bit, 16-bit and FP8 training. Enables LLMs to predict if a headline impacts a company positively or negatively. Can use historical customer interactions for more accurate and custom responses. Fine-tune LLM on legal texts for contract analysis, case law research, and compliance. You can think of a fine-tuned model as a specialized agent designed to do specific tasks more effectively and efficiently. Fine-tuning can replicate all of RAG's capabilities, but not vice versa.
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Industry
information technology & services
Employees
17
Funding Stage
Seed
Total Funding
$0.6M
1
npm packages
20
HuggingFace models
Trained Qwen 3.5 2B for pruning tool output in coding agents / Claude Code workflows
Agents can spend a lot of context on raw pytest, grep, git log, kubectl, pip install, file reads, stack traces, etc., even though usually only a small block is actually relevant. I built a benchmark for task-conditioned tool-output pruning and fine-tuned Qwen 3.5 2B for it with Unsloth. The benchmark combines real SWE-bench-derived tool observations with synthetic multi-ecosystem examples. Held-out test results: 86% recall 92% compression Beats other pruners and zero shot models (+11 recall over zero-shot Qwen 3.5 35B A3B) You can put squeez in front of tool output before the next reasoning step, or add it to something like CLAUDE md as a lightweight preprocessing step. You can serve it with vLLM or any other OpenAI-compatible inference stack. Everything is open source, check for details: - paper: https://arxiv.org/abs/2604.04979 - model: https://huggingface.co/KRLabsOrg/squeez-2b - dataset: https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench - code: https://github.com/KRLabsOrg/squeez submitted by /u/henzy123 [link] [comments]
View originalSo, what exactly is going on with the Claude usage limits?
I'm extremely new to AI and am building a local agent for fun. I purchased a Claude Pro account because it helped me a lot in the past when coding different things for hobbies, but then the usage limits started getting really bad and making no sense. I had to quite literally stop my workflow because I hit my limit, so I came back when it said the limit was reset only for it to be pushed back again for another 5 hours. Today I did ask for a heavy prompt, I am making a local Doom coding assistant to make a Doom mod for fun and am using Unsloth Studio to train it with a custom dataset. I used my Claude Pro to "vibe code" (I'm sorry if this is blasphemy, but I do have a background in programming, so I am able to read and verify the code if that makes it less bad? I'm just lazy.) a simple version of the agent to get started, a Python scraper for the Zdoom wiki page to get all of the languages for Doom mods, a dataset from those pages turned into pdf, formating, and the modelfile for the local agent it would be based around along with a README (claudes recommendation, thought it was a good idea). It generated those files, I corrected it in some areas so it updated only two of the files that needed it, and I know this is a heavy prompt, but it literally used up 73% of my entire usage. Just those two prompts. To me, even though that is a super big request, that seems extremely limited. But maybe I'm wrong because I'm so fresh to the hobby and ignorant? I know it was going around the grapevine that Claude usage limits have gone crazy lately, but this seems more than just a minor issue if this isn't normal. For example, I have to purchase a digital visa card off amazon because I live in a country that's pretty strict with its banking, so the banks don't allow transactions to places like LLM's usually. I spend $28 on a $20 monthly subscription because of this, but if I'm so limited on my usage, why would I continue paying that? Or again, maybe I'm just ignorant. It's very bizarre because the free plan was so good and honestly did a lot of these types of requests frequently. It wasn't perfect, but doable and I liked it so much that I upgraded to the Pro version. Now I can barely use it. Kinda sucks. submitted by /u/New-Pressure-6932 [link] [comments]
View original[P] mlx-tune – Fine-tune LLMs on Apple Silicon with MLX (SFT, DPO, GRPO, VLM)
Sharing mlx-tune, a Python library for fine-tuning LLMs natively on Apple Silicon using Apple's MLX framework. It supports SFT, DPO, ORPO, GRPO, KTO, SimPO trainers with proper loss implementations, plus vision-language model fine-tuning (tested with Qwen3.5). The API mirrors Unsloth/TRL, so the same training script runs on Mac and CUDA — you only change the import line. Built on top of mlx-lm and mlx-vlm. LoRA/QLoRA, chat templates for 15 model families, GGUF export. Runs on 8GB+ unified RAM. Not a replacement for Unsloth on NVIDIA — this is for prototyping locally on Mac before scaling to cloud GPUs. GitHub: https://github.com/ARahim3/mlx-tune submitted by /u/A-Rahim [link] [comments]
View originalRepository Audit Available
Deep analysis of unslothai/unsloth — architecture, costs, security, dependencies & more
Unsloth uses a tiered pricing model. Visit their website for current pricing details.
Julien Chaumond
CTO at Hugging Face
1 mention