Question d’entretien chez TikTok

Describe GRPO loss and other RL algorithm