AI Alignment
AI Safety
Large Models
Email: jiamg.ji at gmail dot com
[Google
Scholar][GitHub]
About me
I’m a PhD student at the Institute of Artificial Intelligence, Peking University, advised by Prof. Yaodong Yang (both a good teacher and a helpful friend in my life). In parallel, I am also a visiting scholar at the Hong Kong University of Science and Technology, working under the guidance of renowned computer scientist Prof. Yike Guo, and collaborating closely with Sirui Han.
My research direction are reinforcement learning, the safety and value alignment of large language models. Beyond academic research, I place strong emphasis on the practical deployment of large models. I have contributed to the open-source and real-world deployment of several large-scale models, including Baichuan2, the Hong Kong AI Model HKGAI-V1, the Pengcheng Brain model, and the medical triage model MedGuide. Notably, MedGuide has been deployed in hospitals and is actively supporting doctors and nurses in emergency triage—something I take great pride in beyond my academic achievements.
In 2025, I was honored to be selected as an Apple Scholar in AI/ML, mentored by Rin Metcalf Susa and Natalie Mackraz. In 2024, I received the first batch of National Natural Science Foundation funding for the Youth Student Basic Research Project (Ph.D. track), as the sole awardee from Peking University in the field of intelligence.
Prior to my Ph.D., I conducted research on neuromorphic computing and brain-computer interfaces with Prof. Gang Pan at Zhejiang University. I began my research journey focusing on safe reinforcement learning and won the championship in the NeurIPS 2022 MyoChallenge for robotic dexterous manipulation.
AI
Alignment: Given the biases and discriminations that may exist in
pre-training data, large models
(LMs) may exhibit unintended behaviors. I am interested in alignment methods (e.g.,
Reinforcement
Learning from human feedback (RLHF)) and post-hoc alignment methods to ensure the safety and
trustworthy of LLMs.
Theoretical
Explanations and Mechanism Design for
Alignment:
Aligning these AI System (e.g. LLMs) effectively to
ensure
consistency with human intentions and values (though some views may question universal
values) is a significant current challenge. I am particularly interested in ensuring the
feasibility of these alignment methods in both theoretical and practical mechanisms.
Applications (LM + X):
I am interested in the
application of large models in various domain, such as healthcare and education, and the
potential impact of rapid industry development and iteration brought about by large
models.
Honors
2025-03
Apple Scholars in AI/ML. 苹果学者,全国仅两位。
2024-12
CIE-Tencent Doctoral Research Incentive Project. 首届中国电子学会—腾讯博士生科研激励计划,全国17人,科研基金10万。
2024-05
Peking University President Scholarship, the highest doctoral research honor.
北京大学校长奖学金。
2024-05
National Natural Science Foundation for Ph.D. students (first batch; the sole
recipient in the Peking University's intelligence field). 首批国家自然科学基金青年学生基础研究项目(博士研究生)项目资助,北大智能学科唯一。
Preprints
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning
Borong Zhang*, Yuhao Zhang*, Jiaming Ji*, Yingshan Lei, Josef Dai, Yuanpei Chen,
Yaodong Yang
Arxiv 2025
[Paper]
Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback
Jiaming Ji*, Jiayi Zhou*, Hantao Lou, Boyuan Chen, Donghai Hong, Xuyao Wang, Wenqi
Chen, Kaile Wang, Rui
Pan, Jiahao Li, Mohan Wang, Josef Dai, Tianyi Qiu, Hua Xu, Dong Li, Weipeng Chen, Jun Song, Bo Zheng,
Yaodong Yang
Arxiv 2025
[Paper][Code][Data]
Baichuan 2: Open Large-scale Language Models
Jiaming Ji and Other Authors (Alphabetic Order)
Arxiv, 2023
[Paper][Code]
Publications
(* denotes equal contribution, and †
denotes the corresponding author)
2025
Language Models Resist Alignment: Evidence From Data Compression
Jiaming Ji*, Kaile Wang*, Tianyi Qiu*, Boyuan Chen*, Jiayi Zhou*, Changye Li, Hantao
Lou, Yaodong
Yang
ACL 2025 Main.
[Paper]
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
Jiaming Ji*, Donghai Hong*, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi
Qiu, Boxun Li,
Yaodong Yang
ACL 2025 Main.
[Paper][Data]
Reward Generalization in RLHF: A Topological Perspective
Tianyi Qiu*, Fanzhi Zeng*, Jiaming Ji*, Dong Yan*, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang
ACL 2025 Findings.
[Paper]
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
Hantao Lou*, Changye Li*, Jiaming Ji, Yaodong Yang
ICML 2025.
[Paper]
Revolutionizing health care: The transformative impact of large language models in medicine
Kuo Zhang, Xiangbin Meng, Xiangyu Yan, Jiaming Ji, ... , Wenyao Wang, Jiarong Li,
Ming-Qi Zheng,
Yaodong Yang, Yi-Da Tang
Journal of Medical Internet Research 2025.
[Paper]
Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction
Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang
AAAI 2025
[Paper]
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
Jiayi Zhou*, Jiaming Ji*, Juntao Dai, Yaodong Yang
AAAI 2025 Oral.
[Paper]
2024
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Jiaming Ji*, Jiayi Zhou*, Borong Zhang*, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong
Huang, Yiran Geng,
Mickel Liu, Yaodong Yang
JMLR 2024. (Top 15 ~ 20 Papers for Open-source AI Systems per year.)
[Paper]
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
Juntao Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong
Yang
NeurIPS 2024.
[Paper]
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu*, Yang Zhang*, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong
Yang
NeurIPS 2024
[Paper]
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai*, Xuehai Pan*, Ruiyang Sun*, Jiaming Ji*, Xinbo Xu, Mickel Liu, Yizhou Wang,
Yaodong Yang
ICLR 2024
[Paper][Code]
SafeDreamer: Safe Reinforcement Learning with World Models
Weidong Huang*, Jiaming Ji*, Chunhe Xia*, Borong Zhang, Yaodong Yang
ICLR 2024 (2024)
[Paper][Code]
2023
Heterogeneous-Agent Reinforcement Learning
Yifan Zhong*, Jakub Grudzien Kuba*, Xidong Feng*, Siyi Hu, Jiaming Ji, Yaodong
Yang
TIPAMI 2023
[Paper][Code]
Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation
Yuanpei Chen, Yiran Geng, Fangwei Zhong, Jiaming Ji, Jiechuang Jiang, Zongqing Lu, Hao
Dong, Yaodong
Yang
TIPAMI 2023
[Paper][Code]
VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning
Jiayi Guan, Guan Chen, Jiaming Ji, Long Yang, Ao Zhou, Zhijun Li, Changjun Jiang
NeurIPS 2023
[Paper][Code]
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji*, Mickel Liu*, Juntao Dai*, Xuehai Pan, Chi Zhang, Ce Bian, Chi Zhang,
Ruiyang Sun, Yizhou
Wang, Yaodong Yang
NeurIPS 2023
[Paper][Code][Data]
Augmented Proximal Policy Optimization for Safe Reinforcement Learning
Juntao Dai*, Jiaming Ji*, Long Yang, Qian Zheng, Gang Pan
AAAI 2023
[Paper]
2022
MyoChallenge 2022: Learning contact-rich manipulation using a musculoskeletal hand
Vittorio Caggiano, Guillaume Durandau, Huwawei Wang, Alberto Chiappa, Alexander Mathis, Pablo Tano,
Nisheet Patel, Alexandre Pouget, Pierre Schumacher, Georg Martius, Daniel Haeufle, Yiran Geng, Boshi An,
Yifan Zhong, Jiaming Ji, Yuanpei Chen, Hao Dong, Yaodong Yang, Rahul Siripurapu, Luis
Eduardo Ferro
Diez, Michael Kopp, Vihang Patil, Sepp Hochreiter, Yuval Tassa, Josh Merel, Randy Schultheis, Seungmoon
Song, Massimo Sartori, Vikash Kumar
NeurIPS 2022 Competition Track
[Paper]
Constrained Update Projection Approach to Safe Policy Optimization
Long Yang*, Jiaming Ji*, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong
Yang, Gang Pan
NeurIPS 2022
[Paper][Code]
Services
Reviewer for ICLR 2025
Reviewer for CVPR 2025
Reviewer for NeurIPS 2023, 2024 (main track, db track), 2025.
Reviewer for TMLR (Transactions on Machine Learning Research).