On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding

May 19, 2025·
Haoyuan Wu
Haoyuan Wu
,
Rui Ming
,
Jilong Gao
,
Hangyu Zhao
,
Xueyi Chen
,
Yikai Yang
,
Haisheng Zheng
,
Zhuolun He
,
Bei Yu
· 0 min read
Type
Publication
arXiv preprint arXiv:2505.12723 (2025)