P1 Base ykeohane
A set of examples based on verl for end-to-end RL training recipes.
verl: Volcano Engine Reinforcement Learning for LLMs