-
Inference
- Your code uses the ART client to perform an agentic workflow (usually executing several rollouts in parallel to gather data faster).
- Completion requests are routed to the ART backend, which runs the model’s latest LoRA in vLLM.
- As the agent executes, each
system,user, andassistantmessage is stored in a Trajectory. - After your rollouts finish, your code assigns a
rewardto each Trajectory, with higher rewards indicating better performance than low ones.
-
Training
- When all rollouts have finished, Trajectories are grouped and sent to the backend. Inference is blocked while training executes.
- The backend trains your model using GRPO, initializing from the latest checkpoint (or an empty LoRA on the first iteration).
- The backend saves the newly trained LoRA to a local directory and loads it into vLLM.
- Inference is unblocked and the loop resumes at step 1.
ART Client
The client is responsible for interfacing between your code and the ART
backend.
ART Backend
The backend is responsible for generating tokens and training your models.