Commit Graph

  • 4f1c489e45 [algo] fix: remove torch.quantile-based percentile metrics to resolve tensor size limit error (#3810) main Yingru Li 2025-10-20 13:04:57 +08:00
  • 53aed3eea1 [doc] fix: update install instruction and retool readme (#3824) OC 2025-10-20 11:43:11 +08:00
  • 65eb019a81 [trainer] fix: Add data.seed to config (#3815) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-20 04:57:14 +03:00
  • 8235425094 Revert "[worker] fix: create a new event loop if none exists when building rollouts" (#3820) Chi Zhang 2025-10-20 09:19:49 +08:00
  • 1546ce23ae [rollout, vllm] fix: make LoRA with async vLLM work properly (#3821) listar2000 2025-10-19 20:18:35 -05:00
  • 25060e9f63 Revert "[trainer] fix: address serialization issues when using async reward f…" revert-3769-fix-async-reward Chi Zhang 2025-10-19 07:40:05 +08:00
  • f209c6f656 [ci] fix: Install mlflow dependency (#3817) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-19 02:21:54 +03:00
  • 4da0d3d318 [misc] fix: Sanitize MLFlow metric names (#3736) Pratik Sharma 2025-10-17 19:12:05 -07:00
  • 5b417da543 [megatron] fix: fix logits process error when disable pack_seqs (#3777) HaochenYuan 2025-10-18 10:11:36 +08:00
  • f0539a5121 [trainer] fix: address serialization issues when using async reward function and ray ppo trainer (#3769) ben 2025-10-17 17:22:59 -07:00
  • e0e352b566 [worker] fix: create a new event loop if none exists when building rollouts (#3803) ChangyWen 2025-10-18 08:20:57 +08:00
  • 85d5b2ee2e [doc] feat: update fully async experiment message (#3804) arron 2025-10-18 06:20:01 +08:00
  • b25bb7d4f3 [trainer, recipe] feat: fully async training recipe (#2981) arron 2025-10-17 22:29:18 +08:00
  • dd8864f9ee [megatron] feat: script of qwen3vl 235b (#3799) Yan Bai 2025-10-17 16:46:45 +08:00
  • ae5d8504d4 [trainer] feat: ReMax support using reward model for baseline (#3780) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-17 07:07:05 +03:00
  • a80ed95e70 [trainer] fix: batch size mismatch with n>1 when gen_max for ReMax (#3779) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-17 05:05:12 +03:00
  • 9078a533c6 [vllm] fix: catch exception of vllm async engine (#3789) 杨睿 2025-10-17 09:50:34 +08:00
  • 4abae2d77a [doc] chore: add agent loop get started tutorial (#3790) Joel 2025-10-17 08:30:10 +08:00
  • 7e3898fef2 [recipe] fix: fix the gpt-oss-20b training script for agent loop recipe (#3793) HEJIAN SANG 2025-10-16 17:09:45 -07:00
  • 65b8bf1bc0 [misc] fix: sft SFT E2E CI test failure due to megatron engine (#3786) Houmin Wei 2025-10-17 06:27:39 +08:00
  • acfcf98ed0 [doc] fix: actor_rollout_ref.critic is not correct (#3778) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-16 06:12:45 +03:00
  • e81e7db725 [docker] feat: update Dockerfile.rocm7 (#3781) vickytsang 2025-10-15 20:02:43 -07:00
  • 061535208c [recipe] feat: Add example for gpt-oss training using agent loop (#3774) HEJIAN SANG 2025-10-15 01:45:11 -07:00
  • 55f651c94d [misc] feat: bump version to 0.7.0.dev (#3772) Chi Zhang 2025-10-15 13:40:12 +08:00
  • ddd86f527a [misc] chore: bump version to v0.6.0 (#3773) v0.6.0 v0.6.x Chi Zhang 2025-10-15 13:19:38 +08:00
  • 22d082f9a4 [recipe] feat: add open math reasoning (#3767) Chi Zhang 2025-10-15 12:11:41 +08:00
  • 8ec9bf64a1 [ci] fix: fix test_engine ci (#3771) Chi Zhang 2025-10-15 12:11:17 +08:00
  • 231d725f69 Revert "[trainer] feat: set interleave to False in dapo trainer" (#3770) Chi Zhang 2025-10-15 11:41:33 +08:00
  • d69164e1cb [misc] feat: bump version to 0.6.0.dev (#3768) Chi Zhang 2025-10-15 10:47:13 +08:00
  • 2181d5b33a [recipe] fix: update readme for gmpo-trainer (#3764) Liu Yue 2025-10-15 10:24:24 +08:00
  • 33eb86f54f [megatron] feat: support qwen3vl (#3763) Yan Bai 2025-10-15 10:19:22 +08:00
  • 67f9a21b8e [trainer] feat: set interleave to False in dapo trainer (#3760) jiaqiw09 2025-10-14 21:13:57 +08:00
  • d2c51dc186 Add Meta-Bandit-LLM, a long-horizon multiturn interative awesome use case of verl (#3756) Sanxing Chen 2025-10-14 00:01:13 -04:00
  • 16c2a21064 Add ARES and Revisual-R1 two awesome multimodal reasoning work using verl. (#3755) 2025-10-14 10:51:32 +08:00
  • 3abcc09d44 [sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy (#3531) KAMiPan 2025-10-14 10:48:29 +08:00
  • 5d378b5f95 [rollout] refactor: rename "clip" mode back to "mask" mode (#3750) Yingru Li 2025-10-14 02:06:36 +08:00
  • 3bee096da2 build(deps): bump sglang[all] from 0.5.2 to 0.5.3.post1 dependabot/pip/sglang-all--0.5.3.post1 dependabot[bot] 2025-10-13 17:14:16 +00:00
  • 21271aabb9 [BREAKING][rollout, trainer, algo] feat: comprehensive rollout importance sampling implementation (#3694) Yingru Li 2025-10-13 17:05:29 +08:00
  • 7f27789961 [fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type (#3739) yangbaoxing 2025-10-13 15:58:59 +08:00
  • e9ee6b39c6 [model] fix: qwen3vl models shape mismatch error with SP (#3735) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-13 08:09:10 +03:00
  • 9d4554b931 [model] fix: qwen3vl training stuck with mixed text-image data (#3734) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-13 08:08:13 +03:00
  • 71cf69e7ad [ci] feat: increase sft e2e time (#3738) Chi Zhang 2025-10-13 11:29:39 +08:00
  • 7ddb9b29f0 [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 3 (#3600) Houmin Wei 2025-10-13 08:18:09 +08:00
  • 8cc9e3af67 [misc] feat: support offline generation with server mode (#3732) Chi Zhang 2025-10-12 11:00:33 +08:00
  • f07596c02e [misc] feat: support build DataProto from TensordDict (#3726) Huazhong 2025-10-11 17:28:18 +08:00
  • 656f4e6705 [rollout] chore: Misc changes for extending internal compatibility (#3701) Peng Wu 2025-10-11 01:08:39 -07:00
  • d36d3b9cbe [rollout] feat: add default agent name for agent loop (#3716) Joel 2025-10-11 14:45:30 +08:00
  • e960fbaeab [rollout] feat: Add gpt-oss tool parser to enable agent loop training for gpt-oss models (#3705) HEJIAN SANG 2025-10-10 20:53:10 -07:00
  • d87602432c [fsdp] fix: Handle dict type for per_tensor_param in LoRA weight sync (#3712) Pouria Mistani 2025-10-10 06:58:30 -07:00
  • e01376663b [megatron] feat: add ascend megatron merge support (#3722) jiaqiw09 2025-10-10 21:54:27 +08:00
  • 152ce6a1de [misc] fix: Allow HF model ID with use_shm (#3663) EduardDurech 2025-10-10 07:44:53 +02:00
  • 2d72c52e1b [misc] fix: model reassign to inner model in vllm patch file (#3668) Changlong Yu 2025-10-09 21:13:49 -07:00
  • eb06fda2a9 [data] fix: merge metrics from all workers in DataProto.concat() (#3699) Yingru Li 2025-10-10 11:45:08 +08:00
  • 7ffd413734 [megatron, model] fix: VLMs using mbridge together with fused kernels (#3700) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-10 06:05:32 +03:00
  • cf619d68d4 [recipe] fix: move all collabllm files into recipe directory (#3706) OC 2025-10-09 18:50:37 +08:00
  • 23877bcc64 [worker] fix: create a new event loop if none exists (#3703) Huazhong 2025-10-09 17:11:58 +08:00
  • e56e3df071 [worker] refactor: Add kwargs to checkpoint related functions in BaseEngine and its subclasses (#3662) Hongpeng Guo 2025-10-08 23:56:22 -07:00
  • 54fed7fec7 [rollout] feat: support async mode for multimodal data inference (#3702) xichengpro 2025-10-09 14:11:09 +08:00
  • f06ef09f1c [rollout] fix: Add LoRA datatype based on rollout model type to the LoRA config (#3675) mgilmore-relace 2025-10-08 20:48:32 -07:00
  • fc489dbaef [rollout] fix: add batch_data_id default value check in AsyncRolloutRequest (#3657) Pandeng Yao 2025-10-09 10:56:10 +08:00
  • d45d04946b [rollout,sglang] fix: get_tool_call_parser_type for gpt-oss models in sglang rollout (#3661) HEJIAN SANG 2025-10-08 19:51:37 -07:00
  • baf7506cff [worker] fix: support for vllm V0 deprecation version (#3687) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-09 05:44:31 +03:00
  • 798a6f8ba0 [trainer] feat: Enabled fused adamw (#3692) Puneesh Khanna 2025-10-07 23:13:46 +04:00
  • ab10eb2671 [model] fix: qwen3vl patch (#3686) Yaowei Zheng 2025-10-07 03:32:53 +08:00
  • 7904d0b672 [ci] fix: fix checkpoint converter ci (#3685) Chi Zhang 2025-10-06 19:42:47 +13:00
  • 1216ce4599 [ci] fix: merge pre-commit-full into pre-commit (#3684) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-06 05:56:11 +03:00
  • 42c55ac6b3 [model] feat: add qwen3vl (#3681) Yaowei Zheng 2025-10-06 10:21:19 +08:00
  • 327e813136 [rollout] fix: qwen2_vl position_ids shape mismatch (#3653) m-Just 2025-10-05 16:03:12 +08:00
  • 83aebcc133 [ci] fix: disable workflows with self-host machines to run on fork (#3677) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-04 12:02:41 +03:00
  • 4e9faafc94 [model] fix: stuck issue with mixed text-image data (#3670) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-04 02:47:09 +03:00
  • f50e5c2e8f [sglang] feat: add preparation for sglang+verl (#3506) lbk-sys 2025-09-29 10:21:01 +08:00
  • aa19c1afc4 [recipe] feat: add multiturn scripts for vllm backend; fix progess bar in dapo (#3644) jiaqiw09 2025-09-28 20:28:25 +08:00
  • 9e2072d120 [megatron, training_utils] fix: encoder pp is removed in mcore >= 0.14 (#3640) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-09-28 07:59:32 +03:00
  • 39e531f29e [rollout,vllm] fix: Add LoRA Loading to Async vLLM (#3639) Kion Fallah 2025-09-27 19:13:40 -07:00
  • abca659ec7 [megatron, worker] fix: use extract_multi_modal_inputs method for handling multi_modal_inputs (#3641) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-09-28 05:08:51 +03:00
  • 4ff3ce2fed [algo, perf] feat: Vectorize GRPO Advantage Estimator - 13~26x Speedup (#3635) CedricHuang 2025-09-27 17:21:08 +08:00
  • c03dcb0f8f [model] feat: add glm4v (#3291) Lambert 2025-09-27 04:12:14 +08:00
  • 84d5619f99 [2/N][rollout] feat: support vllm/sglang DP+EP in server mode (#3530) Joel 2025-09-26 21:52:03 +08:00
  • 64a9860be2 [trainer] fix: Ref to #3596. More import fix for transformers version higher than 4.55.0 (#3608) A1waysBeenHere 2025-09-26 21:37:46 +08:00
  • e51305883d [rollout] refactor: Update rollout and reward configs to reuse vllm/sglang replicas (#3625) Yuyang Ding 2025-09-26 17:43:45 +08:00
  • 2234810235 [megatron] feat: add mindspeed engine and support sft (#3599) Huazhong 2025-09-26 14:39:10 +08:00
  • 377bbb84f0 [recipe] fix: Fix a Typo in One_Step_Off_Policy and Add async of Generative Reward Model in Response Generation (#3369) Zhichao Wang 2025-09-25 22:22:00 -07:00
  • 096ab6dc1b [CI] fix: changed the model used in the PPO test case to Qwen2.5-0.5B to avoid the huggingface download error (#3631) Huazhong 2025-09-26 13:20:40 +08:00
  • 231e18948d [tool] feat: support load local datasets when preparing datasets (#3621) Huazhong 2025-09-26 11:42:53 +08:00
  • fbfdc81f9a [ci] feat: increase timeout of e2e_sft (#3630) Chi Zhang 2025-09-26 10:23:25 +08:00
  • 6ff2b43d13 [ci] feat: upgrade sglang to 0.5.2 (#3613) Joel 2025-09-26 09:25:53 +08:00
  • 14c397f474 [doc] feat: Adding Table-R1 to the Awesome work (#3627) FlowRays 2025-09-25 23:26:26 +08:00
  • 21536f2b03 [ci] fix: fix sanity ci (#3626) Chi Zhang 2025-09-25 23:15:10 +08:00
  • 515f2255ac [ci] fix: use local models/configs/datasets to increase stability (#3616) Chi Zhang 2025-09-25 22:14:56 +08:00
  • bf7aac2fa7 [rollout, tool] feat: export rollout rewards to total rewards (#3563) Qizhi Chen 2025-09-25 17:33:03 +08:00
  • 616e933e29 [worker] fix: correctly determine is_vlm_model if sp > 1 (#3282) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-09-25 12:21:40 +03:00
  • 90154aeeb6 [doc] fix: fix doc (#3614) Chi Zhang 2025-09-25 16:11:43 +08:00
  • 7731c5c6ec [rollout] fix: remove code responsible for tool response duplication (#3604) mgilmore-relace 2025-09-25 01:10:36 -07:00
  • 4d0999c161 [ci] chore: Use local dataset and models in e2e_ascend CI (#3601) Zhen 2025-09-25 15:14:45 +08:00
  • 3dfa28ae32 [doc] feat: add model engine doc (#3611) Chi Zhang 2025-09-25 14:25:44 +08:00
  • 25d78fa913 [recipe] feat: CollabLLM integration for multiturn training (#3574) Shirley Wu 2025-09-24 18:53:39 -07:00
  • ba8555120a [trainer] fix: Import flash attn utils for Transformers higher than 4.55.0 (#3596) A1waysBeenHere 2025-09-24 23:27:48 +08:00
  • 634bd9352b [CI] chore: reopen ppo test in e2e_ascend CI (#3588) Zhen 2025-09-24 17:46:30 +08:00
  • 26a734e740 [algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup (#3555) EduardDurech 2025-09-24 11:36:41 +02:00
  • 69b0127b74 [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 2 (#3567) Houmin Wei 2025-09-24 17:12:31 +08:00