I am a first-year PhD student at University of Illinois Urbana-Champaign, advised by Prof. Hao Peng. Previously, I was fortunate to work with Prof. Zhiyuan Liu at THUNLP and Prof. Heng Ji at UIUC.

My research seeks scalable training and oversight to intelligent language models. To this end, I work on:

  • Scalable data synthesis that enables to keep scaling compute to improve LLMs [UltraFeedback, UltraInteract/Eurus].
  • Scalable evaluation that unlocks and amplifies LLMs’ ability to provide feedback for both training and inference [Implicit PRM].
  • Scalable training algorithms that incorporate such feedback to enhance LLMs and, in return, help improve feedback quality [PRIME/Eurus-2].

🔥 News

  • 2025.05: We identified that entropy collapse impedes the scaling of RL, with the perfromance ceiling being surprisingly predictable if not intervened.
  • 2025.05: We find that simply minimizing entropy to squeeze LLMs’ capability works surprisingly well. Therefore, we call for attention to the importance of base models and a second thought on the recent fever over 0/few-shot RL.
  • 2025.05: We reveal that RL intrinsically leads to sparse updates, while SFT updates densely. Check out our paper here.
  • 2025.05: Implicit PRM has been accepted to ICML.
  • 2025.01: Eurus has been accepted to ICLR.
  • 2025.01: We introduce PRIME, a scalable RL solution for advanced reasoning through implicit process rewards! We also release Eurus-2, which is trained from Qwen2.5-Math-Base to surpass Qwen2.5-Math-Instruct using only 1/10 of the data.
  • 2024.12: We release Implicit PRM, get your model free process rewards without process labels! Together, we also release the SOTA Llama-3.1-8B-based PRMs!
  • 2024.09: NCA has been accepted to NeurIPS and CPO has been accepted to EMNLP.

📝 Publications

  • Selected
  • All

* denotes equal contribution


  • Reinforcement Learning Finetunes Small Subnetworks in Large Language Models [Paper]
    Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tur, Hao Peng.
    Preprint
  • Process Reinforcement through Implicit Rewards [Paper][Blog]
    Ganqu Cui*, Lifan Yuan*, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan Yao, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, Bowen Zhou, Ning Ding*.
    (* denotes project leads; + denotes core authors)
    Preprint
  • Free Process Rewards without Process Labels [Paper]
    Lifan Yuan*, Wendi Li*, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, Hao Peng.
    ICML 2025
  • Advancing LLM Reasoning Generalists with Preference Trees [Paper]
    Lifan Yuan*, Ganqu Cui*, Hanbin Wang*, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu, Maosong Sun.
    ICLR 2025; ICML 2024 Workshop On AI4Math
  • Executable Code Actions Elicit Better LLM Agents [Paper]
    Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji.
    ICML 2024
  • UltraFeedback: Boosting Language Models with High-quality Feedback [Paper]
    Ganqu Cui*, Lifan Yuan*, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, Maosong Sun.
    ICML 2024
  • Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations [Paper]
    Lifan Yuan, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Fangyuan Zou, Xingyi Cheng, Heng Ji, Zhiyuan Liu, Maosong Sun.
    NeurIPS 2023 (Datasets and Benchmarks Track)

    Preprints


  • Process Reinforcement through Implicit Rewards [Paper][Blog]
    Ganqu Cui*, Lifan Yuan*, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan Yao, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, Bowen Zhou, Ning Ding*.
  • Reinforcement Learning Finetunes Small Subnetworks in Large Language Models [Paper]
    Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tur, Hao Peng.
  • The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning [Paper]
    Shivam Agarwal, Zimin Zhang, Lifan Yuan, Jiawei Han, Hao Peng.
  • The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models [Paper]
    Ganqu Cui*, Yuchen Zhang*, Jiacheng Chen*, Lifan Yuan, Zhi Wang, Yuxin Zuo, Haozhan Li, Yuchen Fan, Huayu Chen, Weize Chen, Zhiyuan Liu, Hao Peng, Lei Bai, Wanli Ouyang, Yu Cheng, Bowen Zhou, Ning Ding.
  • TTRL: Test-Time Reinforcement Learning [Paper]
    Yuxin Zuo*, Kaiyan Zhang*, Li Sheng, Shang Qu, Ganqu Cui, Xuekai Zhu, Haozhan Li, Yuchen Zhang, Xinwei Long, Ermo Hua, Biqing Qi, Youbang Sun, Zhiyuan Ma, Lifan Yuan, Ning Ding, Bowen Zhou.

  • 2025


  • Free Process Rewards without Process Labels [Paper]
    Lifan Yuan*, Wendi Li*, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, Hao Peng.
    ICML
  • Advancing LLM Reasoning Generalists with Preference Trees [Paper]
    Lifan Yuan*, Ganqu Cui*, Hanbin Wang*, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu, Maosong Sun.
    ICLR
  • The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning [Paper]
    Bingxiang He*, Ning Ding*, Cheng Qian*, Jia Deng, Ganqu Cui, Lifan Yuan, Haiwen Hong, Huan-ang Gao, Longtao Huang, Hui Xue,Huimin Chen, Zhiyuan Liu, Maosong Sun.
    Findings of ACL

  • 2024


  • Noise Contrastive Alignment of Language Models with Explicit Rewards [Paper]
    Huayu Chen, Guande He, Lifan Yuan, Ganqu Cui, Hang Su, Jun Zhu.
    NeurIPS
  • Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [Paper]
    Yiju Guo*, Ganqu Cui*, Lifan Yuan, Ning Ding, Jiexin Wang, Huimin Chen, Bowen Sun, Ruobing Xie, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun.
    EMNLP
  • UltraFeedback: Boosting Language Models with High-quality Feedback [Paper]
    Ganqu Cui*, Lifan Yuan*, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, Maosong Sun.
    ICML
  • Executable Code Actions Elicit Better LLM Agents [Paper]
    Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji.
    ICML
  • CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets [Paper]
    Lifan Yuan*, Yangyi Chen*, Xingyao Wang, Yi R. Fung, Hao Peng, Heng Ji.
    ICLR
  • MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback [Paper]
    Xingyao Wang*, Zihan Wang*, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng, Heng Ji.
    ICLR

  • 2023


  • Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection via Querying ChatGPT [Paper]
    Biru Zhu, Lifan Yuan, Ganqu Cui, Yangyi Chen, Chong Fu, Bingxiang He, Yangdong Deng, Zhiyuan Liu, Maosong Sun, Ming Gu.
    EMNLP
  • Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations [Paper]
    Lifan Yuan, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Fangyuan Zou, Xingyi Cheng, Heng Ji, Zhiyuan Liu, Maosong Sun.
    NeurIPS (Datasets and Benchmarks Track)
  • Removing Backdoors in Pre-trained Models by Regularized Continual Pre-training [Paper]
    Biru Zhu*, Ganqu Cui*, Yangyi Chen, Yujia Qin, Lifan Yuan, Chong Fu, Yangdong Deng, Zhiyuan Liu, Maosong Sun, Ming Gu.
    TACL
  • A Close Look into the Calibration of Pre-trained Language Models [Paper]
    Yangyi Chen*, Lifan Yuan*, Ganqu Cui, Zhiyuan Liu, Heng Ji.
    ACL
  • Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework [Paper]
    Lifan Yuan*, Yichi Zhang*, Yangyi Chen, Wei Wei.
    ACL (Findings)
  • From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework [Paper]
    Yangyi Chen*, Hongcheng Gao*, Ganqu Cui*, Lifan Yuan, Dehan Kong, Hanlu Wu, Ning Shi, Bo Yuan, Longtao Huang, Hui Xue, Zhiyuan Liu, Maosong Sun, Heng Ji.
    ACL (Findings)

  • 2022


  • A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks [Paper]
    Ganqu Cui*, Lifan Yuan*, Bingxiang He, Yangyi Chen, Zhiyuan Liu, Maosong Sun.
    NeurIPS (Datasets and Benchmarks Track) (Spotlight)
  • FactMix: Using a Few Labeled In-domain Examples to Generalize to Cross-domain Named Entity Recognition [Paper]
    Lifan Yuan*, Linyi Yang*, Leyang Cui, Wenyang Gao, Yue Zhang.
    COLING (Oral)
  • Deep Clustering and Visualization for End-to-End High-Dimensional Data Analysis [Paper]
    Lirong Wu*, Lifan Yuan*, Guojiang Zhao, Haitao Lin, Stan Z. Li.
    IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

🔧 Projects

  • TinyZero: A Minimal Reproduction of Reasoning Models [Github Repo]

    12K+ Stars; Featured in CNBC, The Independent, Tom's Hardware, Daily Cal, Xinhua News, etc.

    Jiayi Pan, Junjie Zhang, Xingyao Wang, Lifan Yuan, Hao Peng, Alane Suhr

💬 Invited Talks

  • 2025.02, Process Reinforcement through Implicit Rewards, Google DeepMind.

💻 Internships

  • 2025.06 - 2025.08, Student Researcher at Google Deepmind, Mountain View.

📄 Academic Services

Reviewer:

NeurIPS (2022-2025), ICLR (2024-2025), ICML (2024-2025), COLM (2025), ACL (2023), EMNLP (2022-2023), ARR (2022-2024)

Workshop Organizer:

  • The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR), COLM 2025.