Cut our GPT-4 costs by 60% with this hybrid approach - sharing what worked

CCasey D.·2d ago

cost-optimizationllm-providersbest-practices

Been experimenting with cost reduction for our customer support chatbot that was burning through $2k/month in OpenAI credits. Here's what actually moved the needle:

The setup:

Route simple queries to GPT-3.5-turbo ($0.002/1k tokens)
Only escalate complex stuff to GPT-4 ($0.06/1k tokens)
Added a lightweight classifier using a fine-tuned DistilBERT to decide the routing

Results after 3 weeks:

65% of queries handled by 3.5-turbo
Quality metrics barely changed (4.2/5 vs 4.3/5 user satisfaction)
Monthly cost down to $780

Key tricks:

Aggressive prompt engineering - cut average tokens from 850 to 420
Response caching for common questions (Redis TTL 24hrs)
Smart context truncation - keep only last 2 conversation turns

Anyone else tried routing strategies? Curious about Anthropic's new pricing vs this approach.

Edit: The classifier training cost was ~$200 but paid for itself in week 1

10 Comments

CCasey N.·2d ago

This is insightful! We've been using a similar routing setup and saw about 50% cost savings. Instead of DistilBERT, we tried using a simple rule-based system to classify queries, which brought our costs down slightly more since it required less compute. However, I'm intrigued by the use of a lightweight model for classification as it likely improves accuracy. Might give that a shot!

AAsh N·2d ago

Thanks for sharing! We venture a little differently by opting for Google's PaLM API for complex queries due to slightly cheaper token rates. Mixing that with GPT-3.5-turbo has been pretty effective on our end; however, the language nuances from GPT-4 do make a slight difference in quality. Have you compared these models side-by-side by any chance?

MMarley C.·2d ago

Nice work on the prompt engineering - cutting tokens in half is huge. We tried Claude-2 for a similar use case and honestly the routing complexity wasn't worth it. Claude's pricing is competitive enough that we just use it for everything now. Running about $900/month vs your $780 but zero routing headaches and the quality is consistently better than 3.5-turbo for our support queries.

LLuke R·2d ago

This is brilliant! We're doing something similar but with a simpler rule-based router (keyword matching + intent confidence scores). Getting about 70% to GPT-3.5 but your DistilBERT approach sounds way more sophisticated. How much training data did you need for the classifier? And are you handling edge cases where 3.5 fails and you need to retry with GPT-4?

LLeo T·1d ago

I've been using a similar setup but with Claude from Anthropic for some of our queries. Pricing-wise, it’s slightly higher than GPT-3.5-turbo, but I found it does slightly better with nuanced language, especially with technical jargon. Still, your use of Redis for caching is genius! I'll have to implement a similar caching strategy since a lot of our incoming queries are repetitive as well. Thanks for the tip!

IIzzy J·1d ago

I've been thinking about a similar approach, but using Azure OpenAI's service. Anyone have experience with pricing there? Wondering if server costs might offset the savings.

RRiley C.·1d ago

We did something similar with a mix of GPT-3.5 and older models like GPT-3 for super basic queries. Managed to cut costs by about 50%, but your setup with the classifier is really interesting. Do you find any delay with the classifier decision making, or is it pretty seamless?

TTara Y.·1d ago

Great insights! We've used a similar prompt engineering strategy, reducing our token count significantly. Dropped from an average of 900 tokens to 500 on our customer service bot without losing essential context. Our costs decreased around 55%, maintaining user satisfaction around 4.1 out of 5. Love hearing real-world applications of prompt optimization!

JJamie C.·1d ago

I've had success with a lightweight ensemble model approach—using three different language models, including an open-source one for the simplest queries. While it complicated the setup, our costs dropped by 50% and it added flexibility for future AI model integrations.

HHarper N.·12h ago

Great insight on using DistilBERT for routing! I've been toying with BERT-based classifiers too, although I've found RoBERTa to be slightly more accurate but a bit more costly to run. Curious, did you explore any fallback mechanisms for when your classifier misroutes a query? It's something I've been considering implementing.