On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
May 19, 2025·
,
,
,
,
,
,
,
,
·
0 min read

Haoyuan Wu
Rui Ming
Jilong Gao
Hangyu Zhao
Xueyi Chen
Yikai Yang
Haisheng Zheng
Zhuolun He
Bei Yu