Approximation Algorithm Examples

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

We propose the Trust Region Preference Approximation (TRPA) algorithm ⚙️, which integrates rule-based optimization with preference-based optimization for LLM reasoning tasks 🤖🧠. As a ...

IEEE

Toward General Function Approximation in Nonstationary Reinforcement Learning

Abstract: Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for nonstationary RL with function ...

Queen Mary University of London

A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries

Mark Jerrum, Alistair Sinclair (UC Berkeley) and Eric Vigoda (Georgia Tech) received the Association for Computing Machinery (ACM) Test of Time Award at a virtual ceremony on Wednesday 23 June at the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

Toward General Function Approximation in Nonstationary Reinforcement Learning

A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries

Trending now