Starting off as a muggle that naïve to the Math's and Data Science world.

How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1

ref: https://arxiv.org/pdf/2602.19526v1

Summary

I used to think longer prompts and step-by-step reasoning always make AI smarter. In daily use, that often helps. But in reinforcement learning, the opposite can happen. Researchers found that forcing models to “think” longer actually reduced accuracy and even caused training collapse. Some models learned to avoid answering questions entirely because wrong answers and no answers earned the same score. Others spammed empty reasoning tags to game the reward. When the team simplified the prompt and added small penalties, accuracy improved and search steps dropped. Better AI isn’t about longer thinking. It’s about better incentives.


Slow thinking prompt
<think> The definition of the function of management by POSDCORB is in Wikipedia. So I conduct the following search </think>
<search> POSDCORB function of management </search>
<information> Doc 1(Title: POSDCORB) Committee, Luther Gulick asks rhetorically "What is the work of the chief executive? What does he do?" POSDCORB is the answer, "designed to call attention to the various functional elements of the work of a chief executive because ’administration’ and ’management’ have lost all specific content." According to Gulick, the elements are: Gulick’s "Notes on the Theory of Organization" further ... </information>
<think> As per information 1, the description and definition of POSDCORB are found in Luther Gulick’s notes on the theory of organization. The main elements included in this acronym are Planning, Organizing, Staffing, Directing, Co-ordinating, Reporting and Budgeting. These tasks, Gulick states, together form the primary functions of a chief executive. </think>
<answer> Luther Gulick </answer>
Slow thinking with empty <think> prompt
<think> Jacksonville Jaguars </think>
<think> last playoff appearance </think>
<think> 2007 </think>
<think> To answer the question, I will search for the last time the Jacksonville Jaguars made the playoffs. </think>
<search> last time jacksonville jaguars made the playoffs </search>
<information> Doc 1(Title: "Jacksonville Jaguars") Jacksonville Jaguars The Jacksonville Jaguars are an American professional football franchise based in Jacksonville, Florida. The Jaguars compete in the National Football League (NFL) as a member club of the American Football Conference (AFC) South division. The team plays its home games at TIAA Bank Field. The Jaguars and the Carolina Panthers joined the NFL as expansion teams for the 1995 season... </information>
<think> 2017 </think>
<think> </think>
<answer> 2017 </answer>
Fast thinking prompt
<search> when did the capital of nigeria moved from lagos to abuja </search>
<information> Doc 1(Title: Abuja) Abuja Abuja () is the capital city of Nigeria located in the centre of the country within the Federal Capital Territory (FCT). It is a planned city and was built mainly in the 1980s, replacing the country’s most populous city of Lagos as the capital on 12 December 1991. Abuja’s geography is defined by Aso Rock, a monolith left by water erosion. The Presidential Complex, National Assembly, Supreme Court and much of the city extend to the south of the rock. Zuma Rock...
<answer> 12 December 1991 </answer>

Leave a comment