ExLlamaV2 Alternatives

12 alternatives to ExLlamaV2 in the infrastructure category

ExLlamaV2

A fast inference library for running LLMs locally on modern consumer-class GPUs - turboderp-org/exllamav2

Looking for alternatives?

Compare 12 similar tools below

Recall.aimeeting-apiusage-based + contract + tieredFree tier

Recall.ai provides an API to get recordings, transcripts and metadata from video conferencing platforms like Zoom, Google Meet, Microsoft Teams, and m

34 mentions/mo

Compare Profile

FriendliAIinferencetieredFree tier

Inference performance drives profitability.

33 mentions/mo

Compare

Inferencedistributedsubscription + tieredFree tier

Train, deploy, observe, and evaluate LLMs from a single platform. Lower cost, faster latency, and dedicated support from Inference.net.

5.0 (1)

Determined AItraining

26 mentions/mo

Profile

Cloudflareusage-based + subscription + freemium + per-seat + tieredFree tier

Welcome to Cloudflare - Powering the next generation of applications

4.3 (20)23 mentions/mo

Modalserverless-gpuusage-based + tieredFree tier

Bring your own code, and run CPU, GPU, and data-intensive compute at scale. The serverless platform for AI and data teams.

16 mentions/mo456

vLLMinferencetiered

High-throughput and memory-efficient inference and serving engine for Large Language Models. Deploy AI faster with state-of-the-art performance.

14 mentions/mo74,806

Daily.covideo-apiusage-based + subscription

Daily is the team behind Pipecat. Ultra low latency, open source SDKs, and enterprise reliability since 2016.

13 mentions/mo

Profile

DeepSpeedtrainingtiered

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

12 mentions/mo

Profile

Netlifyusage-based + subscription + freemium + tieredFree tier

Create with AI or code, deploy instantly on production infrastructure. One platform to build and ship.

4.7 (20)7 mentions/mo

Lambdagpu-cloudtiered

Cloud GPUs, on-demand clusters, private cloud, and hardware for AI training and inference. Run B200 and H100, deploy fast, and scale cost effectively.

4.5 (2)6 mentions/mo

llama.cppinferencesubscription + tiered

LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.

5 mentions/mo101,000