Commit Graph

  • 2e1a1a6603 [BREAKING] [rollout] chore: remove default rollout selection (#2757) Chi Zhang 2025-07-27 01:11:24 +08:00
  • ea4442470e [algo] refactor: don't special-case compute_policy_loss (#2701) Frederick Robinson 2025-07-26 10:09:42 -07:00
  • 0f5ab5c854 [doc] feat: add retool blog (#2761) H 2025-07-25 22:13:55 -07:00
  • 92e81cfcfd [perf] feat: add optional role selection in discrete mode for NPU Profiler (#2750) YumiMom 2025-07-25 21:53:09 +08:00
  • f107800837 [rollout] feat: remove chat scheduler (#2725) Joel 2025-07-25 21:46:35 +08:00
  • 58d698e04b [trainer] refactor: Make sure to keep the type checking (#2634) Yeonwoo Sung 2025-07-25 14:32:07 +09:00
  • caec858ebb [doc] style: change resize handle from gradient to plain color (#2746) Tingberer 2025-07-25 12:20:07 +08:00
  • f407887414 [CI] feat: add mypy to pre-commit (#2614) Frederick Robinson 2025-07-24 20:36:34 -07:00
  • dc8b5076c3 [megatron] feat: a bunch of optimzation on vram, sequence packing (#2678) Yan Bai 2025-07-25 10:34:33 +08:00
  • 4879d619fc [docker] feat: upgrade to torch 2.7, sglang 0.4.8 (#2617) Blue Space 2025-07-25 05:53:24 +08:00
  • bcd336fd46 [doc] feat: add resizable sidebar and improve layout (#2577) Tingberer 2025-07-25 05:46:38 +08:00
  • 1df03f3abf [ci] fix: release ascend test time, fix one step off-policy CI (#2731) Blue Space 2025-07-24 16:58:16 +08:00
  • a0248a8f17 [recipe] chore: add retool training script (#2732) Joel 2025-07-24 16:34:10 +08:00
  • 8adcffa25a [ci] fix: checkpoint_convertor ci miss a hf model download (#2730) Blue Space 2025-07-24 15:56:08 +08:00
  • 88c084c4f3 [doc] feat: Add agent-lightning in the list of "awesome works using verl (#2726) Wang Zilong 2025-07-24 14:49:27 +08:00
  • dc3015e9af [tool] fix: geo3k create return str instead of tuple (#2714) Nan Jiang 2025-07-23 22:56:13 -07:00
  • 73fc53f600 [megatron] fix: resolve backward propagation error in megatron_actor due to shared logits tensor in-place modification (#2484) Bowei Song 2025-07-24 13:37:18 +08:00
  • d57bfb02b3 [misc] chore: bump main branch version to v0.5.0.dev (#2718) H 2025-07-23 19:46:16 -07:00
  • 0eed7124fc [sglang] fix: Adding strict naming sanity for sglang (#2719) Chayenne 2025-07-24 10:45:57 +08:00
  • 1862f748e5 [ray] feat: RayWorkerGroup support set worker env (#2685) Jason Chen 2025-07-24 10:07:35 +08:00
  • 6a9a1b872d [ci] test: add CriticWorker unit test, make some util CPU friendly (#2717) H 2025-07-23 15:36:10 -07:00
  • 4de3ecf0f0 [cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code (#2621) H 2025-07-23 11:45:14 -07:00
  • 8fdc4d3f20 [misc] chore: bump version to v0.5.0 (#2716) v0.5.0 H 2025-07-23 10:57:10 -07:00
  • e13863e463 [ci] fix: auto-download model in Megatron-related CI tests (#2698) Shawn/Yuxuan Tong 2025-07-24 01:49:09 +08:00
  • f926dc90b0 [sglang] fix: fix is_vlm issue (issue #2639) (#2667) Nan Jiang 2025-07-23 10:45:57 -07:00
  • 4ed106698b [megatron] fix: CUDA_DEVICE_MAX_CONNECTIONS in ray error (#2709) Blue Space 2025-07-23 18:57:57 +08:00
  • 5bfb58e35d [recipe] fix: fix dapo cannot save the checkpoint of last step (#2619) Zhirong Chen 2025-07-23 17:26:35 +08:00
  • e9072c58fa [ci] feat: CI request via Feishu (#2699) Shawn/Yuxuan Tong 2025-07-23 14:54:15 +08:00
  • 0404956290 [training_utils] fix: align tensorboard default dir for val_log_generation (#2696) Xihuai Wang 2025-07-23 14:09:58 +08:00
  • c95c9ef701 [fsdp,megatron,sglang] fix: Fix torch reduce to speed up update weights (#2692) Stefan He 2025-07-22 22:40:41 -07:00
  • dc1599b7e4 [rollout] fix: bug in init_engine Method of AsyncSglangServer (#2664) OC 2025-07-23 13:09:37 +08:00
  • 4792b70dd4 [megatron] fix: reset recompute_granularity and add backward compatibility fix (#2693) Blue Space 2025-07-23 11:16:23 +08:00
  • 4c10dddf74 [fsdp] fix: use torch 2.7 state dict api for torch 2.6 to resolve OOM (#2606) Wei (Will) Feng 2025-07-22 19:54:33 -07:00
  • d20e5e07e1 [fsdp, ckpt] fix: Wrap GenerationConfig.from_pretrained with try-except to avoid crashes. (#2659) rj42 2025-07-23 05:18:35 +03:00
  • 8888122a89 [megatron] fix: remove the demising model.enable_gradient_checkpointing flags in the script (#2691) H 2025-07-22 18:25:30 -07:00
  • f252da34cf [megatron] fix: CUDA_DEVICE_MAX_CONNECTIONS not taking effect (#2687) Blue Space 2025-07-22 20:51:12 +08:00
  • 244481ac8f [misc] fix: main pre-commit and API change (#2675) Blue Space 2025-07-22 15:01:20 +08:00
  • c5b189a1af [BREAKING][megatron] refactor: activation checkpointing APIs (#2651) Blue Space 2025-07-22 10:24:28 +08:00
  • 72cae971d0 [sglang] fix: rename Sglang to SGLang following SGLang's fashion (#2672) Chayenne 2025-07-22 09:11:20 +08:00
  • d062314a18 [data, recipe] fix: remove redundant json parsing (#2671) Zhihui Xie 2025-07-21 18:09:10 -07:00
  • 2bcc5d1212 [misc] fix: fix prompt and response key in gemma7b example (#2610) Lin Yuan 2025-07-21 16:06:52 -07:00
  • e5f0b2aa80 [perf] feat: mistral and gemma3_text mfu compute support (#2622) Xihuai Wang 2025-07-21 16:54:11 +08:00
  • ac826e0558 [tool] chore: Add log for AsyncRolloutRequest ID, and rollout viewr to support request id display and search (#2636) Hecate 2025-07-20 21:01:37 -07:00
  • 3f6cd47926 [rollout,vllm] fix: A major issue in random sampling of vllm engine (#2646) Guanning Zeng 2025-07-21 00:00:28 -04:00
  • ac414d95c4 [recipe] feat: add QWen 30b moe dapo script that can run on a single 80GB node (#2645) Chi Zhang 2025-07-21 09:49:21 +08:00
  • 5d5ae81cdb [sglang] fix: update response handling and scoring method in GSM8K interaction (#2428) Aaron Yee 2025-07-21 08:06:46 +08:00
  • fcb1e191b7 [doc] fix: non-standardized path references (#2637) beep-bebop 2025-07-20 18:49:16 +08:00
  • 7fc3029a1e [doc] fix: add options to enable agent loop (#2624) OC 2025-07-20 06:03:06 +08:00
  • 5d52d15fd3 [trainer] feat: Add FSDPCheckpointManager for SFTtrainer, support resume training, manage the number of CKPTS in keep (#2292) shaofei hu 2025-07-19 12:15:23 +08:00
  • 69a467f934 [docker] fix: downgrade TransformerEngine version 2.2.1 to allow mcore image using rope fusion and provide another set of v0.5 image (#2611) Blue Space 2025-07-18 17:23:19 +08:00
  • 9d7cba4e12 [trainer] refactor: Training Engine Interface and Development Plan (#1977) Ziheng Jiang 2025-07-17 22:05:21 -07:00
  • 223caf7022 [single_controller] fix: padding for kwargs (#2585) Le Xue 2025-07-18 10:10:49 +08:00
  • fb810355f3 [tool] fix: supports variable arguments for marked_timer (#2576) X. HU 2025-07-18 04:35:36 +08:00
  • 2b2aa9d3fd [tool] chore: introduce RolloutViewer TUI tools (#2469) 杨睿 2025-07-18 04:30:41 +08:00
  • 7459131411 [hardware] refactor: replace device_name with config.trainer.device (#2542) Cheetah 2025-07-18 04:29:01 +08:00
  • 2adedb77b4 [doc] chore: add agent loop design doc (#2598) Joel 2025-07-18 04:27:27 +08:00
  • 332c7d53c1 [cfg] refactor: add flatten megatron trainer config generation and verification script (#2582) H 2025-07-17 08:08:45 -07:00
  • 0b62a6ece1 [cfg] feat: add critic config class (#2583) H 2025-07-17 00:59:47 -07:00
  • 40d638c63b [doc] fix: typo in perf_tuning.rst (#2590) Xihuai Wang 2025-07-17 15:58:34 +08:00
  • 648e3c95cc [doc] fix: fix some contents for one step off policy (#2591) meituan-search 2025-07-17 15:54:06 +08:00
  • 1775bd638f [trainer] fix: maybe_filter_out_long_prompts on image and video (#2553) Qifan Zhang 2025-07-17 14:17:20 +08:00
  • d51c52f754 [ci] chore: add codeowner for role/engine (#2587) H 2025-07-16 22:05:04 -07:00
  • 64601e418c set use_kl_in_reward=True in reinforce_plus_plus (#2580) Titanpku 2025-07-17 12:10:54 +08:00
  • 503ea75f53 [trainer, fsdp, vllm, recipe] feat: one step off async training recipe (#2231) imh966 2025-07-17 10:45:53 +08:00
  • ef3fffc3a2 [trainer] refactor: no need to call load_reward_manager in compute_reward_async (#2557) H 2025-07-16 18:52:36 -07:00
  • f0964b6650 [rollout] fix: fix bug for remax when the rollout mode is async (#2574) none0663 2025-07-16 22:45:09 +08:00
  • 40b2ebe9fd Merge branch 'volcengine:main' into recipe/async_training recipe/one_step_off_async arron 2025-07-16 19:24:55 +08:00
  • 8e5b714f0c Merge pull request #3 from imh966/recipe/async_training_rollout_nodes arron 2025-07-16 16:58:16 +08:00
  • e3db358fee Merge branch 'recipe/async_training' into recipe/async_training_rollout_nodes ArronHZG 2025-07-16 16:50:38 +08:00
  • 174d94af20 Merge branch 'recipe/async_training' of https://github.com/imh966/verl into recipe/async_training ArronHZG 2025-07-16 16:50:15 +08:00
  • c56467fa80 update docs ArronHZG 2025-07-16 16:49:06 +08:00
  • 3f63715a96 [doc] fix: fix non-existing tag of base image in docs (#2569) Yuchen Cheng 2025-07-16 15:59:40 +08:00
  • 1837fc7389 update code and doc by comments ArronHZG 2025-07-16 15:57:04 +08:00
  • 8df1c1bef1 ruff ArronHZG 2025-07-16 15:49:48 +08:00
  • 754cfaead1 update code and doc by comments ArronHZG 2025-07-16 15:45:39 +08:00
  • 1ed49c71e4 rollout.nnodes ArronHZG 2025-07-16 15:16:13 +08:00
  • 96b730bbed [megatron] fix: wrong response_mask for megatron + sglang mutli-turn (#2543) 杨睿 2025-07-16 14:27:07 +08:00
  • da2ab088d9 [doc] fix: correct link in agentic RL doc (#2567) OC 2025-07-16 14:26:02 +08:00
  • 152c599303 [perf] feat: Clip gsm8k solution string to optimize reward calculation (#2568) Huapeng Zhou 2025-07-16 01:51:44 -04:00
  • 7aabfc437b [rollout] feat: add ReactAgentLoop based on LangGraph (#2463) Joel 2025-07-16 13:41:04 +08:00
  • 6e21c0a625 [megatron] feat: support distributed megatron model converter and merger (#2281) 杨睿 2025-07-16 13:36:33 +08:00
  • 1a89141222 [training_utils] fix: uneven support in split (#2560) Yuge Zhang 2025-07-16 13:29:27 +08:00
  • e300d0f099 [doc] feat: add document for agentic RL related features (#2563) OC 2025-07-16 12:51:16 +08:00
  • 3f0773259c [tool] fix: correctly convert 'None' to null in sandbox fusion _process_single_case (#2409) Mathew Han 2025-07-15 20:53:39 -07:00
  • 5f687b211d [sglang] fix: adding missing param for sgl async unit test (#2561) Chayenne 2025-07-15 20:22:43 -07:00
  • 218298720f [ci] chore: add single-controller reviewer (#2554) H 2025-07-15 17:59:45 -07:00
  • f0d4c76ed6 [sglang] feat: update weights in batch with FSDP (#2559) Chayenne 2025-07-15 16:57:20 -07:00
  • 1fe5daf7f1 [sglang, megatron, perf] feat: speed up megatron sglang weight update by 10x (#2418) 杨睿 2025-07-16 05:46:45 +08:00
  • 9b5646abcc Fix pre-commit error: sort imports in async_main_ppo.py openhands 2025-07-15 20:43:50 +00:00
  • a63243b0dd [fsdp] fix: change geo3k model name from non-vl to vl (#2555) Nan Jiang 2025-07-15 12:07:42 -07:00
  • 166d91a62e [trainer] refactor: minor code cleanup (#2537) H 2025-07-15 09:24:49 -07:00
  • 2c0ae781d9 [ray] fix: strip [] for ipv6 address (#2545) Joel 2025-07-15 20:29:45 +08:00
  • 2dea2598a1 [data] fix: Add missing init files in verl experimental data folders (#2548) Joost van Doorn 2025-07-15 14:29:29 +02:00
  • 10f4eb8cfc [misc] chore: fix typo in function name (#2525) ShareLer 2025-07-15 19:06:20 +08:00
  • 473d8ff0c1 [env] fix: bump tensordict to 0.9.1 (#2541) Yuge Zhang 2025-07-15 19:04:07 +08:00
  • 659b108007 update ruff ArronHZG 2025-07-15 18:44:14 +08:00
  • d8dd8b020b Merge branch 'volcengine:main' into recipe/async_training arron 2025-07-15 14:37:25 +08:00
  • c8468e6d8c update comments ArronHZG 2025-07-15 14:21:22 +08:00
  • bbd1288353 [data, megatron] feat: add dynamic batching computational workload balance (#2452) Simiao Zhang 2025-07-15 14:17:28 +08:00
  • 83d6a80ac0 [fsdp] fix: vlm dynamic batch & unify dynamic batch api (#2524) Yaowei Zheng 2025-07-15 14:07:41 +08:00