|
2024.09.26 Aligner has been accepted as an Oral presentation at NeurIPS 2024!
2024.09.26 ProgressGym has been accepted as an Spotlight at NeurIPS 2024 DB Track, and SafeSora has been accept as Poster.
2024.09.15 Our framework: OmniSafe have accepted by JMLR 2024 (The most popular open-source Safe Reinforcement Learning framework).
2024.06.20 We released PKU-SafeRLHF dataset, the 2nd version of BeaverTails (The total number of downloads: 800K+).
2024.05.10 We released Language Models Resist Alignment (Exploring Hoke's Law in large models: A theoretical analysis of the fragility of alignment).
2024.02.07 We released Aligner: a new efficient alignment paradigm, bypasses the whole RLHF process. 无需RLHF显著提升GPT-4/Llama2性能,北大团队提出Aligner对齐新范式
2024.01.16 Two papers get accepted to ICLR 2024. Safe RLHF (Spotlight), SafeDreamer.
2023.10.30 Big News! We released AI Alignment: A Comprehensive Survey.
2023.10.19 We released Safe RLHF: Safe Reinforcement Learning from Human Feedback.
National Natural Science Foundation for Ph.D. students (first batch; the sole recipient in the Peking University's intelligence field)
Peking University President Scholarship (the highest doctoral research honor at Peking University).
Baichuan and Baichuan2 series models (HuggingFace Download 500W+, Github Stars: 12000+) [Baichuan-7B] [Baichuan-13B] [Baichuan2]
the core contributor; work done by jiaming as a intern at Baichuan.
A series of large language models developed by Baichuan Intelligent Technology.
PKU-Alignment/safe-rlhf, (Github Stars: 1.3K+) [GitHub]
the core developer
a highly modular open-source RLHF framework, support constrained value alignment for LLMs.
PKU-Alignment/OmniSafe (Github Stars: 900+) [GitHub]
the core developer
an infrastructural parallel training framework; the most popular open-source library in the field of Safe RL.
Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction [Website] [GitHub] [Models] [Data]
Jiaming Ji*, Boyuan Chen*, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, and Yaodong Yang.
Advances in Neural Information Processing Systems (NeurIPS) 2024 Oral.
ProgressGym: Alignment with a Millennium of Moral Progress [Leaderboard] [Code]
Tianyi Qiu, Yang Zhang, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang.
Advances in Neural Information Processing Systems (NeurIPS) 2024 Spotlight.
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset [Website] [GitHub] [Data]
Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang.
Advances in Neural Information Processing Systems (NeurIPS) 2024 Poster.
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research [GitHub]
Jiaming Ji*, Jiayi Zhou*, Borong Zhang*, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang.
Journal of Machine Learning Research, Machine Learning Open Source Software, JMLR 2024. (Top 15 ~ 20 Papers per year.)
AI Alignment: A Comprehensive Survey [Website]
Jiaming Ji*, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou,
Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer,
Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao.
Arxiv, 2024.
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset [Website] [GitHub] [Data]
Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang.
Arxiv, 2024.
PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models [GitHub] [Data]
Jiaming Ji*, Donghai Hong*, Borong Zhang*, Boyuan Chen*, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang.
Arxiv, 2024.
Language Models Resist Alignment [GitHub]
Jiaming Ji*, Kaile Wang*, Tianyi Qiu*, Boyuan Chen*, Jiayi Zhou, Changye Li, Hantao Lou, Yaodong Yang.
Arxiv, 2024.
The application of large language models in medicine: A scoping review
Xiangbin Meng, Xiangyu Yan, Kuo Zhang, Da Liu, Xiaojuan Cui, Yaodong Yang, Muhan Zhang, Chunxia Cao, Jingjia Wang, Xuliang Wang,
Jiaming Ji, Zifeng Qiu, Muzi Li, Cheng Qian, Tianze Guo, Shuangquan Ma, Zeying Wang, Zexuan Guo, Youlan Lei, Chunli Shao, Wenyao Wang, Haojun Fan, Yi-Da Tang.
iScience, Cell Press, 2024.
Reward Generalization in RLHF: A Topological Perspective
Tianyi Qiu*, Fanzhi Zeng*, Jiaming Ji*, Dong Yan*, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang.
Arxiv, 2024.
Safe RLHF: Safe Reinforcement Learning from Human Feedback [Code]
Josef Dai*, Xuehai Pan*, Ruiyang Sun*, Jiaming Ji*, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang.
International Conference on Learning Representations (ICLR), 2024. Spotlight.
SafeDreamer: Safe Reinforcement Learning with World Models [Website] [Code]
Weidong Huang*, Jiaming Ji*, Chunhe Xia*, Borong Zhang, Yaodong Yang.
International Conference on Learning Representations (ICLR), 2024. Spotlight.
Heterogeneous-Agent Reinforcement Learning [GitHub]
Yifan Zhong*, Grudzien Kuba Jakub*, Xidong Feng*, Siyi Hu, Jiaming Ji, and Yaodong Yang.
Journal of Machine Learning Research (JMLR), 2024.
Baichuan 2: Open Large-scale Language Models [Website] [Models]
Jiaming Ji, and Other Authors (Alphabetic Order).
Arxiv, 2023.
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset [Website] [Data]
Jiaming Ji*, Mickel Liu*, Juntao Dai*, Xuehai Pan, Chi Zhang, Ce Bian, Chi Zhang, Ruiyang Sun, Yizhou Wang, and Yaodong Yang.
Advances in Neural Information Processing Systems (NeurIPS), 2023.
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark [Website] [GitHub]
Jiaming Ji*, Borong Zhang*, Jiayi Zhou*, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, and Yaodong Yang.
Advances in Neural Information Processing Systems (NeurIPS), 2023.
VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning
Jiayi Guan, Guang Chen, Jiaming Ji, Long Yang, Ao Zhou, Zhijun Li, Changjun Jiang.
Advances in Neural Information Processing Systems (NeurIPS), 2023.
Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation [GitHub]
Yuanpei Chen, Yiran Geng, Fangwei Zhong, Jiaming Ji, Jiechuang Jiang, Zongqing Lu, Hao Dong, Yaodong Yang.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TIPAMI), 2023.
Augmented Proximal Policy Optimization for Safe Reinforcement Learning
Juntao Dai*, Jiaming Ji*, Long Yang, Qian Zheng, and Gang Pan
Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023.
MyoChallenge 2022: Learning contact-rich manipulation using a musculoskeletal hand
Vittorio Caggiano, Guillaume Durandau, Huwawei Wang, Alberto Chiappa, Alexander Mathis, Pablo Tano, Nisheet Patel, Alexandre Pouget, Pierre Schumacher, Georg Martius, Daniel Haeufle, Yiran Geng, Boshi An, Yifan Zhong, Jiaming Ji, Yuanpei Chen, Hao Dong, Yaodong Yang, Rahul Siripurapu, Luis Eduardo Ferro Diez, Michael Kopp, Vihang Patil, Sepp Hochreiter, Yuval Tassa, Josh Merel, Randy Schultheis, Seungmoon Song, Massimo Sartori, Vikash Kumar.
NeurIPS 2022 Competition Track, 2022. First Place in NeurIPS 2022 Challenge Track (1st in 340 submissions from 40 teams).
Constrained Update Projection Approach to Safe Policy Optimization [GitHub]
Long Yang*, Jiaming Ji*, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, and Gang Pan.
Advances in Neural Information Processing Systems (NeurIPS), 2022.
July 2024: Mechanism of LLMs Alignment and Efficient Alignment Technology.
Invited Talk at HKUST Machine Creativity Lab (MACRE Lab) and Hong Kong Generative AI Center (HKGAI) poster.
Invited Talk at The 17th China-R Conference & The 2024 X-AGI Conference & The 2024 International Forum on Data Science.
Reviewer for NeurIPS. Area Chair for ICML 2024 Workshop TiFA.