SLMFix: Leveraging Small Language Models for error fixing with Reinforcement Learning

ref: https://arxiv.org/pdf/2511.19422

Summary

The paper suggest to train small language model (SLM) repair code for least known programming languages. The report over 95% static-validation pass rate and improvements over direct LLM fine-tuning and self-correction prompting (agentic frameworks). The paper include building training pairs from LLM-generated programs, applies LoRA for initialization, then PPO reinforcement learning with a reward combining static validation and AST-based semantic similarity, gradually shifting focus from syntactic to semantic correctness. Results show SLMs better fit in repair, not full generation.

Ansible Code Generation prompt

			
You are an expert in Ansible. 
The user will give you a task description and ask you to generate an Ansible playbook to complete the given task. 
You only need to output the content of the playbook. 
DO NOT use any shell commands (ansible.builtin.shell, ansible.builtin.command, etc.) in the playbook.
Task: {task}
Answer: “‘yaml

		

Bash Code Generation prompt

			
You are an expert in Bash. 
The user will give you a task description and ask you to generate a bash command to complete the given task. 
You only need to output the content of the command.
Task: {task}
Answer: “‘bash

		

SQL Code Generation prompt

			
You are an expert in SQL. 
The user will give you a task description and ask you to generate a SQL command to complete the given task. You only need to output the content of the command.
Task: {task}
Answer: “‘sql

Ansible Program Repair prompt

			
You are an expert in Ansible. 
You are asked to fix a possibly incorrect Ansible playbook. 
You will be provided with the playbook to fix, the user input, and feedback from an interpreter that lists all
syntactic errors in the playbook. 
Your goal is to fix the syntactic errors in the playbook (if any) while following the user’s instruction. 
You only need to output the content of the modified playbook.
User query: {query}
Original playbook:
{output}
Interpreter feedback:
{feedback}
Answer: “‘yaml

		

Bash Program Repair prompt

			
You are an expert in Bash. 
You are asked to fix a possibly incorrect Bash command. 
You will be provided with the command to fix, the user input, and feedback from an interpreter that lists
all syntactic errors in the command. 
Your goal is to fix the syntactic errors in the command (if any) while following the user’s instruction. 
You only need to output the content of the modified command.
User query: {query}
Original command:
{output}
Interpreter feedback:
{feedback}
Answer: “‘bash

		

SQL Program Repair prompt

			
You are an expert in SQL. 
You are asked to fix a possibly incorrect SQL command. 
You will be provided with the command to fix, the user input, and feedback from an interpreter that lists
all syntactic errors in the command. 
Your goal is to fix the syntactic errors in the command (if any) while following the user’s instruction. 
You only need to output the content of the modified command.
User query: {query}
Original command:
{output}
Interpreter feedback:
{feedback}
Answer: “‘sql

		

Ansible In-context Learning prompt

			
You are an expert in Ansible. 
The user will give you a task description and ask you to generate an Ansible playbook to complete the given task. 
You only need to output the content of the playbook. 
DO NOT use any shell commands (ansible.builtin.shell, ansible.builtin.command, etc.) in the playbook.
The following are some example input queries and corresponding Ansible playbooks for your reference:
{examples}
Task: {task}
Answer: “‘yaml

		

Bash In-context Learning prompt

			
You are an expert in Bash. 
The user will give you a task description and ask you to generate a bash command to complete the given task. 
You only need to output the content of the command.
The following are some example input queries and corresponding Bash commands for your reference:
{examples}
Task: {task}
Answer: “‘bash

		

SQL In-context Learning prompt

			
You are an expert in SQL. 
The user will give you a task description and ask you to generate a SQL command to complete the given task. 
You only need to output the content of the command.
The following are some example input queries and corresponding SQL commands for your reference:
{examples}
Task: {task}
Answer: “‘sql

		

Ansible Dataset Query Generation prompt

			
You are an expert in Ansible. 
You are asked to write a user prompt for the given Ansible playbook that can be used to generate the playbook. 
Instead of explicitly describing the functionality of the playbook, the prompt should tell what the user wants to
accomplish through the playbook. 
Write the prompt as short as you can, and start the prompt with: 
Generate an Ansible playbook that ...

		

Kim 2 ML

SLMFix: Leveraging Small Language Models for error fixing with Reinforcement Learning

Summary

Ansible Code Generation prompt

Bash Code Generation prompt

SQL Code Generation prompt

Ansible Program Repair prompt

Bash Program Repair prompt

SQL Program Repair prompt

Ansible In-context Learning prompt

Bash In-context Learning prompt

SQL In-context Learning prompt

Ansible Dataset Query Generation prompt

Leave a comment Cancel reply

SLMFix: Leveraging Small Language Models for error fixing with Reinforcement Learning

Summary

Ansible Code Generation prompt

Bash Code Generation prompt

SQL Code Generation prompt

Ansible Program Repair prompt

Bash Program Repair prompt

SQL Program Repair prompt

Ansible In-context Learning prompt

Bash In-context Learning prompt

SQL In-context Learning prompt

Ansible Dataset Query Generation prompt

Share this:

Leave a comment Cancel reply