I’m a PhD student at the Institute of Artificial Intelligence, Peking University, advised by
Prof. Yaodong Yang (both a good teacher and a
helpful friend in my life). In 2024, I was honored to receive the first batch of
National Natural Science Foundation funding for the Youth Student Basic Research Project (Ph.D.
students); the sole recipient from Peking University in the field of intelligence. Before this,
I conducted research on safe reinforcement learning and won the championship in the
NeurIPS 2022 MyoChallenge for robotic dexterous manipulation.
AI Alignment: Given the biases and
discriminations that may exist in pre-training data, large models (LMs) may exhibit
unintended behaviors. I am interested in alignment methods (e.g., Reinforcement Learning
from human feedback (RLHF)) and post-hoc alignment methods to ensure the safety and
trustworthy of LLMs.
Theoretical Explanations and Mechanism Design for
Alignment: Aligning these AI System (e.g. LLMs) effectively to
ensure
consistency with human intentions and values (though some views may question universal
values) is a significant current challenge. I am particularly interested in ensuring the
feasibility of these alignment methods in both theoretical and practical mechanisms.
Applications (LM + X): I am interested in the
application of large models in various domain, such as healthcare and education, and the
potential impact of rapid industry development and iteration brought about by large
models.
News
2024-09Aligner has been accepted as an Oral presentation at
NeurIPS 2024!
2024-09ProgressGym has been accepted as an Spotlight at
NeurIPS 2024 DB Track, and SafeSora has been accept as Poster.
2024-09Our framework: OmniSafe have accepted by JMLR 2024 (The most
popular open-source Safe Reinforcement Learning framework).
2024-06We released PKU-SafeRLHF dataset, the 2nd version of
BeaverTails (The total number of downloads: 800K+).
2024-05We released Language Models Resist Alignment (Exploring Hoke's
Law in large models: A theoretical analysis of the fragility of
alignment).
2024-01Two papers get accepted to ICLR 2024. Safe RLHF (Spotlight),
SafeDreamer.
2023-10 Big News! We released AI Alignment: A Comprehensive
Survey.
Awards
2025-03 Apple Scholars in AI/ML.
苹果学者,全国仅两位。
2024-12CIE-Tencent Doctoral Research Incentive Project.
首届中国电子学会—腾讯博士生科研激励计划,全国17人,科研基金10万。
2024-05Peking University President Scholarship, the highest doctoral
research honor. 北京大学校长奖学金。
2024-05National Natural Science Foundation for Ph.D. students (first
batch; the sole recipient in the Peking University's intelligence field). 首批国家自然科学基金青年学生基础研究项目(博士研究生)项目资助,北大智能学科唯一。