TROLL: Trust Regions improve Reinforcement Learning for Large Language Models
Philipp Becker, Niklas Freymuth, Serge Thilges, Fabian Otto, Gerhard Neumann
Published in International Conference on Learning Representations (ICLR), Oral, 2026
Paper | Code | Project Page
