Hey team, I've been experimenting with various LLMs like OpenAI's GPT-4 and Anthropic's Claude for generating backend code components. I've noticed something interesting, though not entirely unexpected: these models struggle with maintaining code constraints over time.
In particular, I tasked them with generating database migration scripts which must adhere strictly to our existing schema constraints. Initially, the models perform decently, but as the prompts get more complex, the adherence to rules starts to degrade.
For instance, while GPT-4 handles initial SQL script generation quite well, any iterative enhancement often drifts away from the necessary constraints, introducing things like incorrect data types or missing indexes. It's as if the more it is 'coached', the less it remembers about the constraint rules.
I've been implementing solutions like embedding checks within the code generation pipeline using tools like SQLFluff and custom linting scripts, but these are more reactive than proactive.
Has anyone had better luck with different strategies or maybe a different model that maintains these constraints more reliably? Your insights would be much appreciated!
I've faced a similar issue when generating code with LLMs. It seems like these models are great at handling one-off tasks but get tripped up when maintaining consistency over a sequence of operations. To counter this, I've started using a feedback loop in my pipeline. After every generation step, I automatically run unit tests and schema validation scripts to catch any deviations early on. It's not perfect, but it saves a lot of hassle later down the line.
Have you tried using LangChain for aligning LLM output with your constraints? It provides a framework to chain various prompt templates together, which might help in breaking down complex tasks into manageable parts. Also, LangChain has integrations with memory systems, which could potentially help in maintaining schema constraints more effectively over multiple iterations.
Have you tried leveraging LangChain or similar orchestration tools? I've used it to create a multi-step process for code generation, where it first generates a complete proposal, then refines it while applying a series of checks tailored to our schema. It's been more reliable than asking the LLM to consider everything at once. Would love to hear if anyone else has done something similar!
I've also encountered similar issues with GPT-4. What worked for me was creating a feedback loop where I manually review and correct a few iterations of the generated code. With this, I use the corrected output as a form of 'training data' for the next prompt. It’s not perfect but tends to guide the model better over time.
I completely agree with your observations about GPT-4. I've faced similar issues where ongoing iterations lead to more errors. I've found a bit of success using a reinforcement learning approach, continuously fine-tuning the models based on the specific errors they introduce using a feedback loop. This way, the LLM sort of 'learns' to maintain constraints over time, but it's definitely not foolproof or easy!
Have you tried using Copilot Labs' Experimental Code Brushes? I've found that it can sometimes be more effective, especially if you’re working within a specific IDE. Though it's more of a complementary tool, it might help refine the initial output before getting too deep into constraint-specific modifications.