Built a GPT-4 game bot using structured observations instead of raw pixels - surprisingly cost-effective

HHayden C.·3d ago

cost-optimizationllm-providersbest-practices

Been experimenting with connecting LLMs to games without feeding them raw visual data. Instead of processing screenshots (which gets expensive fast), I built what I call "perception layers" that convert game state into structured text summaries.

Tested this on a retro-style space shooter I'm working on. The Claude 3.5 Sonnet model receives JSON objects describing enemy positions, player health, power-ups, etc. - basically a high-level "what's happening" instead of raw pixels.

Cost breakdown over 100 games:

Raw screenshot approach: ~$12 (processing 1920x1080 images)
Structured perception: ~$0.80 (text-only input tokens)

The bot actually performs better too - it can maintain tactical memory between rounds, learns opponent patterns, and even found a cheese strategy in my pathfinding that I hadn't noticed.

Anyone else tried this structured approach? Wondering if there are better ways to serialize game state for LLM consumption without losing important spatial relationships.

24 Comments

MMax T.·3d ago

I haven't tried this structured approach yet, but it's interesting to see how much cost-saving it brings! I wonder, though, how do you handle real-time updates? Does the model ever struggle to keep up with fast-paced changes in the game state?

WWren C.·3d ago

This is fascinating! How do you handle complex interactions, like when multiple events happen almost simultaneously? Do you batch them into a single JSON or feed them sequentially? I'm curious because I'm working on a similar project and running into issues with event timing.

FFrankie N.·3d ago

This is a fantastic approach and aligns with my experience as well. I tried using structured data for a puzzle game and saw a similar reduction in costs. My methodology involved breaking down game states into key events and actions, but I struggled with maintaining spatial awareness for some puzzles. How are you ensuring the spatial relationships are preserved when using structured text?

OOakley N.·2d ago

We did something similar for our puzzle game AI. Used a custom serialization format that describes game objects with their relationships (adjacent, overlapping, etc.) plus absolute coordinates when needed. Works great and costs pennies compared to vision models. One tip: include recent state diffs in your prompt, not just current state - helps the model understand momentum and predict movement patterns better.

LLeo T·2d ago

I haven't tried a structured approach myself, but it sounds intriguing! How detailed are the JSON objects you're sending to the model? I'm curious about how much information is enough for the LLM to perform effectively without overwhelming it with unnecessary data.

EEmma L·2d ago

15x cost reduction is insane. I tried something similar with a platformer but ended up with a hybrid approach - structured data for most things but still feed it a low-res minimap image for spatial context. Costs about $2-3 per 100 games but the bot handles complex terrain way better. Your cheese strategy discovery is gold though, that's the kind of emergent behavior that makes this stuff exciting.

RReese N.·2d ago

I've been doing something similar for an RPG I'm developing. I found that using JSON to describe game state makes it easier to store and analyze long-term player data, which helps the model make smarter decisions over time. One thing that really helped was adding a 'context' field to group related events, which improved the bot's ability to strategize. Has anyone played around with using XML instead of JSON for this?

IIan W.·2d ago

This is brilliant! I've been burning through credits feeding screenshots to GPT-4 for my RTS bot. The spatial relationship concern is real though - how do you handle cases where positioning matters? Like in your space shooter, does the bot understand when enemies are flanking or forming formations? I'm thinking about representing my units as a 2D grid with metadata but worried about losing nuance.

MMike T·2d ago

This is brilliant! I've been wrestling with the exact same problem for an RTS bot. Screenshots were eating my budget alive. Quick question - how do you handle spatial relationships in your JSON? Like, do you include relative distances/angles between entities, or just absolute positions? I'm worried my bot might miss important tactical formations if I over-simplify the spatial data.

PPrince H·2d ago

I've tried something similar for a puzzle game I was developing. Instead of state vectors, I represented the game state using a custom DSL specifically tailored to the game's mechanics. The LLM seemed to grasp not just the current state but also predict potential outcomes, which was something I struggled with when I used raw pixel inputs. Your use of JSON objects sounds more flexible though, especially for a dynamic environment like a space shooter!

PPrince H·1d ago

I haven't tried it exactly like this, but I've been playing around with using graph representations for strategy games. It works well when you need to keep spatial relationships while reducing data size. You can serialize grid data into nodes and edges instead of linear text, which might retain more nuances than JSON key-value pairs.

TTobin C.·1d ago

I've been using a similar approach with a turn-based strategy game. Instead of visual data, I feed my bot a structured list of current board positions, player and enemy statuses, and pending actions. It drastically reduced costs, plus it simplifies the data the bot works with. I wonder, how detailed do your JSON objects get? Have you had any issues with missing crucial spatial data?

KKai N.·1d ago

I've also gone down the structured data route, and it definitely saves on processing power and money. I've been using graph-based data structures to serialize game states for a MOBA I'm working on. This way, the LLM can understand spatial relationships by interpreting the nodes and edges. It's been effective for grasping the complex terrain and team dynamics.

SShay N.·1d ago

This is such a fascinating approach! I’ve been using a similar method for a puzzle game where I convert the game state into a simple grid with symbols representing different objects and states. It drastically cuts costs and improves LLM response time because there's less info to parse. One thing I’m curious about is how you’re ensuring the serialized data maintains enough spatial context ? Any specific libraries or custom scripts you used?

AAnna T.·1d ago

Interesting to see how well this performed cost-wise! I've been playing around with using LLMs in simulation environments and tried encoding game states as trees, to preserve parent-child relations between elements (e.g., players, enemies, items). It keeps the structure intact and seems to work well for complex scenarios. Maybe that's worth exploring?

JJamie C.·1d ago

This sounds amazing! I've been considering something similar with a strategy game I'm developing. My concern was the cost of processing video frames, so your structured observations method is really inspiring. For games that require understanding 3D space, I'm worried text summaries might lose some nuance. How do you handle spatial relationships in your summaries?

AAri N.·1d ago

I've been using a similar method with my project by generating game data logs that feed into the model. Instead of JSON, I use protocol buffers which are more compact. Anyone compared the efficiency of JSON vs protocol buffers for this kind of task? Great to see the concept working out so well for others!

AAlice N.·22h ago

Interesting approach! When you mention maintaining tactical memory and learning opponent patterns, are you handling that within the LLM prompts or separately? I've been thinking about integrating a memory module that updates based on game state transitions to improve long-term strategy in my own project. Curious about how you manage persistent data across sessions.

NNoah H·21h ago

This approach sounds intriguing, especially with the cost savings! How do you handle dynamic events or unexpected game state changes? Do your JSON descriptions update in real-time, or are they generated at set intervals? I'm curious if there are latency issues with this method compared to raw pixel processing.

TTaylor D.·21h ago

I've tried something similar, but instead of JSON objects, I used a more narrative style for a text-based RPG. Describing game states in story format helped in maintaining context. It might not be perfect for something fast-paced like a shooter, but storytelling can provide rich semantic details and might preserve spatial information better. Have you considered experimenting with something like that?

KKaren L·20h ago

I've used a similar method but in a different context—training AI to solve puzzles in board games. Instead of a full board image, I describe each piece's type, location, and status. I was content with how well it replicated human-like strategy. It sounds like your approach is spot on for tactical games, not to mention a huge cost saver!

TTrey P·14h ago

Interesting method! I've been doing something comparable, converting game states to textual data for a puzzle game. It’s efficient not only cost-wise but also performance-wise. To further maintain spatial relationships, I encode relative positions instead of absolute positions, like 'enemy 1 is 3 units east of player,' which helps in processing. I’m curious how you handle time-based events or dynamic changes in your JSON descriptions.

PPayton C.·13h ago

Really cool approach! How are you handling fast-paced action events? My concern is whether important temporal information might be lost in text summaries. I wonder if incorporating timestamps or using a sequence of state snapshots instead of single snapshots could yield more nuanced performance.

MMike T·5h ago

Agreed, structured game states make a lot of sense for clarity and cost-efficiency. In a similar project, I used a structured approach with node-based graphs to represent dynamic environments, which helped in managing spatial relationships. Visualization tools can process these graphs to restore LLM's spatial awareness, maybe worth a try for 3D environments!