Jiaming Ji


Hello! I’m a first-year PhD student at the Institute of Artificial Intelligence, Peking University, advised by Prof. Yaodong Yang (both a good teacher and a helpful friend in my life). In 2024, I was honored to receive the first batch of National Natural Science Foundation funding for the Youth Student Basic Research Project (Ph.D. students); the sole recipient from Peking University in the field of intelligence, and I am also a recipient of the Peking University President’s Scholarship. Before this, I conducted research on safe reinforcement learning and won the championship in the NeurIPS 2022 MyoChallenge for robotic dexterous manipulation. Currently, my core research interest lies in AI Safety and Alignment, particularly focusing on the safety and value alignment of large models:

  • Safety Alignment of Large Language Models: Given the biases and discriminations that may exist in pre-training data, LLMs may exhibit unintended behaviors. I am interested in alignment methods (e.g., Reinforcement Learning from human feedback (RLHF)) and post-hoc alignment methods to ensure the safety and trustworthy of LLMs.

  • Theoretical Explanations and Mechanism Design for Alignment: Aligning these AI System (e.g. LLMs) effectively to ensure consistency with human intentions and values (though some views may question universal values) is a significant current challenge. I am particularly interested in ensuring the feasibility of these alignment methods in both theoretical and practical mechanisms.

  • Large Models and Cross-Domain Applications (LM + X): I am interested in the application of large models in various domain, such as healthcare and education, and the potential impact of rapid industry development and iteration brought about by large models.


Feb 01, 2024 We released Aligner: a new efficient alignment paradigm, bypasses the whole RLHF process.
Jan 16, 2024 Two papers get accepted to ICLR 2024. Safe RLHF (Spotlight), SafeDreamer.
Dec 05, 2023 One paper get accepted to JMLR 2023! Heterogeneous-Agent Reinforcement Learning.
Nov 01, 2023 Big News! We released AI Alignment: A Comprehensive Survey.
Oct 21, 2023 We released Safe RLHF: Safe Reinforcement Learning from Human Feedback.
GitHub Repo Stars AK's Daily Papers

selected publications

  1. arXiv
    Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
    Jiaming Ji*, Boyuan Chen*, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, and Yaodong Yang
    In Preprint , 2024
  2. arXiv
    Baichuan 2: Open Large-scale Language Models
    Jiaming Ji, and Other Authors (Alphabetic Order)
    In Preprint , 2023
  3. ICLR Spotlight
    Safe RLHF: Safe Reinforcement Learning from Human Feedback
    Josef Dai*, Xuehai Pan*, Ruiyang Sun*, Jiaming Ji*, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang
    In International Conference on Learning Representation , 2024
  4. arXiv
    OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
    Jiaming Ji*, Jiayi Zhou*, Borong Zhang*, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang
    In Preprint , 2023
  5. NeurIPS
    BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
    Jiaming Ji*, Mickel Liu*, Juntao Dai*, Xuehai Pan, Chi Zhang, Ce Bian, Chi Zhang, Ruiyang Sun, Yizhou Wang, and Yaodong Yang
    Advances in Neural Information Processing Systems, 2023