Jiaming Ji (吉嘉铭)

Phd Student at Peking University

AI Alignment
AI Safety
Large Models

Email: jiamg.ji at gmail dot com

[Google Scholar][GitHub]

About me

I’m a PhD student at the Institute of Artificial Intelligence, Peking University, advised by Prof. Yaodong Yang (both a good teacher and a helpful friend in my life). In parallel, I am also a visiting scholar at the Hong Kong University of Science and Technology, advised by renowned computer scientist Prof. Yike Guo. My research focuses on Reinforcement Learning, Large Language Models, Multimodal Models and Safety Alignment, with a strong emphasis on bridging academic advances and real-world deployment. I have contributed to the open-source and real-world deployment of several large-scale models, including Baichuan2, the Hong Kong AI Model HKGAI-v1, the Pengcheng Brain model, and the medical triage model MedGuide. Notably, MedGuide has been deployed in hospitals and is actively supporting doctors and nurses in emergency triage—something I take great pride in beyond my academic achievements.

In 2025, I was honored to be selected as an Apple Scholar in AI/ML, mentored by Rin Metcalf Susa and Natalie Mackraz. In 2024, I received the first batch of National Natural Science Foundation funding for the Youth Student Basic Research Project (Ph.D. track), as the sole awardee from Peking University in the field of intelligence. Prior to my Ph.D., I conducted research on neuromorphic computing and brain-computer interfaces with Prof. Gang Pan at Zhejiang University. I began my research journey focusing on safe reinforcement learning and won the championship in the NeurIPS 2022 MyoChallenge for robotic dexterous manipulation.

News

2025-07

Our survey: AI Alignment: A Contemporary Survey has been accepted by ACM Computing Surveys, Impact Factor: 28.0 (ranked 1/147 in Computer Science Theory & Methods).
2025-07

Five papers (2*Spotlight, 3*Poster) are accepted by NeurIPS 2025.
2025-07

Language Model Resist Alignment has been awarded the ACL 2025 Best Paper!
2024-01

MedAligner has been accepted to The Innovation (Impact Factor=32.1).
2025-05

Four papers are accepted by ACL 2025 Main.
2025-05

SAE-V has been accepted as ICML 2025.
2024-12

Seq2Seq RM (Oral) and StreamAligner have been accepted to AAAI 2025.
2024-09

Aligner (Oral), ProgressGym (Spotlight) and Safe Sora have been accepted to NeurIPS 2024.
2024-09

RL framework: OmniSafe is accepted by JMLR 2024 (The most popular Safe Reinforcement Learning framework).
2024-06

We released PKU-SafeRLHF dataset, the 2nd version of BeaverTails (The total number of downloads: 800K+).
2024-01

Safe RLHF (Spotlight) and SafeDreamer have been accepted to ICLR 2024.

[Show more]

Research Summary

Currently, i focus on AI Safety and Alignment.

AI Alignment: Given the biases and discriminations that may exist in pre-training data, large models (LMs) may exhibit unintended behaviors. I am interested in alignment methods (e.g., Reinforcement Learning from human feedback (RLHF)) and post-hoc alignment methods to ensure the safety and trustworthy of LLMs.
Theoretical Explanations and Mechanism Design for Alignment: Aligning these AI System (e.g. LLMs) effectively to ensure consistency with human intentions and values (though some views may question universal values) is a significant current challenge. I am particularly interested in ensuring the feasibility of these alignment methods in both theoretical and practical mechanisms.
Applications (LM + X): I am interested in the application of large models in various domain, such as healthcare and education, and the potential impact of rapid industry development and iteration brought about by large models.

Honors

2025-03

Apple Scholars in AI/ML.
2024-12

CIE-Tencent Doctoral Research Incentive Project.
2024-05

Peking University President Scholarship, the highest doctoral research honor.
2024-05

National Natural Science Foundation for Ph.D. students (first batch; the sole recipient in the Peking University's intelligence field).

AI Alignment: A Comprehensive Survey
Jiaming Ji*, Tianyi Qiu*, Boyuan Chen*, Jiayi Zhou*, Borong Zhang, Hantao Lou, ..., Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao
ACM Computing Surveys, 2025
[Project Webpage]

AlignmentSurvey

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu*, Shaoyang Guo*, Zhuo-Yang Song*, Yunbo Sun*, Zeyu Cai*, Jiashen Wei*, Tianyu Luo*, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Jiaming Ji, ..., Qing-Hong Cao, Ming-xing Luo, Yaodong Yang, Muhan Zhang, Hua Xing Zhu
NeurIPS, 2025
[Project Webpage]

Large Language Models

InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback
Boyuan Chen*, Donghai Hong*, Jiaming Ji*, Jiacheng Zheng, Bowen Dong, Jiayi Zhou*, Kaile Wang, Juntao Dai, Xuyao Wang, Wenqi Chen, Qirui Zheng, Wenxin Li, Sirui Han, Yike Guo, Yaodong Yang†
NeurIPS, 2025
[Project Webpage]

Multimodal Models

Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou*, Jiaming Ji*, Boyuan Chen, Jiapeng Sun, Wenqi Chen, Donghai Hong, Sirui Han, Yike Guo, Yaodong Yang†
NeurIPS, 2025
[Project Webpage]

Safety and AlignmentMultimodal Models

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning

Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback
Jiaming Ji, Xinyu Chen, Rui Pan, Conghui Zhang, Han Zhu, ..., Yike Guo, Yaodong Yang†
NeurIPS, 2025
[Project Webpage]

Safety and AlignmentMultimodal Models

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning
Borong Zhang*, Yuhao Zhang*, Jiaming Ji*, Yingshan Lei, Josef Dai, Yuanpei Chen, Yaodong Yang
NeurIPS Spotlight, 2025
[Project Webpage]

Robotics and VLASafety and Alignment

Language Models Resist Alignment: Evidence From Data Compression
Jiaming Ji*, Kaile Wang*, Tianyi Qiu*, Boyuan Chen*, Jiayi Zhou*, Changye Li, Hantao Lou, Juntao Dai, Yunhuai Liu, Yaodong Yang†
ACL Best Paper, 2025
[Paper]

AI AlignmentAI SafetyReinforcement Learning

PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
Jiaming Ji*, Donghai Hong*, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang
ACL Main, 2025
[Paper][Data]

AI AlignmentAI SafetyReinforcement Learning

Reward Generalization in RLHF: A Topological Perspective
Tianyi Qiu*, Fanzhi Zeng*, Jiaming Ji*, Dong Yan*, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang
ACL Findings, 2025
[Paper]

Reinforcement LearningLarge Language Models

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction
Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang
AAAI, 2025
[Paper]

Large Language ModelsAI Alignment

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
Jiayi Zhou*, Jiaming Ji*, Juntao Dai, Yaodong Yang
AAAI Oral, 2025
[Paper]

Large Language ModelsReinforcement Learning

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Jiaming Ji*, Jiayi Zhou*, Borong Zhang*, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang
JMLR, 2024

Reinforcement LearningRobotics and VLA

Aligner: Efficient Alignment by Learning to Correct
Jiaming Ji*, Boyuan Chen*, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang†
NeurIPS Oral, 2024
[Code][Data]

Large Language ModelsAI SafetyReinforcement Learning

ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu*, Yang Zhang*, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang
NeurIPS Spotlight, 2024

Large Language ModelsAI SafetyAI Alignment

Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai*, Xuehai Pan*, Ruiyang Sun*, Jiaming Ji*, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang
ICLR Spotlight, 2024
[Code]

Large Language ModelsAI SafetyAI Alignment

SafeDreamer: Safe Reinforcement Learning with World Models
Weidong Huang*, Jiaming Ji*, Chunhe Xia*, Borong Zhang, Yaodong Yang
ICLR, 2024
[Code]

Reinforcement LearningRobotics and VLA

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Jiaming Ji*, Borong Zhang*, Jiayi Zhou*, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, Yaodong Yang
NeurIPS, 2023
[Paper][Code]

Reinforcement LearningRobotics and VLA

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji*, Mickel Liu*, Juntao Dai*, Xuehai Pan, Chi Zhang, Ce Bian, Chi Zhang, Ruiyang Sun, Yizhou Wang, Yaodong Yang
NeurIPS, 2023
[Paper][Code][Data]

Large Language ModelsAI SafetyAI Alignment

Baichuan 2: Open Large-scale Language Models
Jiaming Ji, Other Authors (Alphabetic Order)
Arxiv (Technical Report), 2023
[Code]

Large Language Models

Constrained Update Projection Approach to Safe Policy Optimization
Long Yang*, Jiaming Ji*, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan
NeurIPS, 2022
[Paper][Code]

Reinforcement LearningRobotics and VLA

Jiaming Ji (吉嘉铭)

Phd Student at Peking University

About me

News

Research Summary

Honors

Preprints

Publications (* denotes equal contribution, and † denotes the corresponding author)

Services

Teaching Assistant