### What does this PR do?
- As title
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
### What does this PR do?
- Add TensorDict utilities and tests to cover the current DataProto
functionalities.
- Add nested tensor example to remove padding throughout the system
- Add image example
- Upgrade tensordict to v0.10
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
### What does this PR do?
> This PR adds tensorboard as a dependency to requirements.txt file,
across several Dockerfiles (Dockerfile.ngc.vllm, Dockerfile.ngc.vllm0.8,
Dockerfile.ngc.vllm0.8.sagemaker), a setup script
(install_vllm_sglang_mcore.sh), and the main setup.py file. This change
ensures that the tensorboard package is consistently installed, enabling
visualization of training metrics for various configurations and
deployment environments. This is a maintenance task that enhances the
project's observability without altering core functionality.
### Test
> This change is a dependency update and doesn't require specific
testing beyond confirming the installation is successful.
### API and Usage Example
> No API changes are introduced. The usage of TensorBoard would be
initiated by the user after installing the requirements.
```python
# No code snippet is applicable for this change
### What does this PR do?
Bump to tensordict 0.9.1 and ban 0.9.0 per discussions in #2460.
This bug: https://github.com/pytorch/tensordict/issues/1374 has an
impact on dp_actor, making it crash because of the wrong batch size.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
### What does this PR do?
Upgrade tensordict to latest
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
### Checklist Before Starting
- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
- In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
- type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`
### What does this PR do?
Migrate images to verlai, upgrade CUDA support to 12.6 and support
latest flash attention
```txt
docker
├── README.md
├── verl0.4-cu124-torch2.6-fa2.7.4
│ ├── Dockerfile.app.sglang.vllm.mcore0.12
│ ├── Dockerfile.app.sglang.vllm.mcore0.13.preview
│ ├── Dockerfile.app.vllm.mcore0.12
│ ├── Dockerfile.app.vllm.mcore0.13.preview
│ ├── Dockerfile.base
│ └── README.md
├── verl0.5-cu126-torch2.7.1-fa2.8.0
│ ├── Dockerfile.app.sglang.mcore0.12
│ ├── Dockerfile.app.sglang.mcore0.13.preview
│ ├── Dockerfile.base.fi0.2.6
│ └── README.md
└── verl0.5-preview-cu128-torch2.7.1-fa2.8.0
├── Dockerfile.app.sglang.megatron
├── Dockerfile.base.fi0.2.6
└── README.md
```
- verlai/verl
- verl0.4
- base
- app.sglang.vllm.mcore
- app.vllm.mcore
- verl0.5
- base
- app.sglang.mcore
- app.vllm.mcore [may not support now, for debug]
- verl0.5-preview
- base
- app.sglang.mcore
- app.vllm.mcore [may not support now, for debug]
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
### What does this PR do?
#### Fix https://github.com/volcengine/verl/issues/2216
#### 1 Fix Config Reference in entropy_trainer.yaml
#### 2 Fix TypeError When Merging `reward_kwargs` and
`cfg_reward_kwargs`
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
#### 1 Fix Config Reference in entropy_trainer.yaml
- Modified File : `recipe.entropy.config.entropy_trainer.yaml`
- Change:
```yaml
- reward_model.reward_kwargs.overlong_buffer_cfg: $reward_model.overlong_buffer
+ reward_model.reward_kwargs.overlong_buffer_cfg: ${reward_model.overlong_buffer}
```
- Purpose : Ensures OmegaConf correctly resolves the reference as a
DictConfig object instead of interpreting it as a string.
#### 2 Fix TypeError When Merging `reward_kwargs` and
`cfg_reward_kwargs`
- Modified File : `recipe.entropy.main_entropy.py`
- Change :
```yaml
- reward_fn = load_reward_manager(config, tokenizer, num_examine=0, **(merge_dict(reward_kwargs, cfg_reward_kwargs)))
+ reward_fn = load_reward_manager(config, tokenizer, num_examine=0, **OmegaConf.merge(OmegaConf.create(reward_kwargs), cfg_reward_kwargs))
```
- Purpose : Use OmegaConf.merge() to safely merge dict and DictConfig
types.
> Background :
> The DAPORewardManager class accesses the `enable` attribute from
`overlong_buffer_cfg`.
> This fails if `overlong_buffer_cfg` is a regular dict.
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
---------
Co-authored-by: H <linhaibin.eric@gmail.com>
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
Update mcore image to use vLLM which support qwen3 and rewrite
installation from conda
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
Docker image and docs
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: both
- **Inference**: both
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
> [!WARNING]
> We are [immigrating to `ruff` as the linter and formatter and
`pre-commit` as the managing
tool](https://github.com/volcengine/verl/pull/1010).
>
> If your branch is based on a previous commit using `yapf` and
`pylint`, simply merging might trigger overwhelming linting errors,
while **you are only expected to resolve ones in the files related to
your PR**.
>
> To resolve this issue, please try the following workaround to only
include the files you **really changed** in the PR:
>
> 1. In your branch, fix linting and format with `ruff`: `ruff check
--fix && ruff-format`
> 2. Squash into a single commit in a new branch: `git reset --soft
$(git merge-base main HEAD) && git add -A && git commit -m "feat: ..."`
> 3. Merge with the latest main: `git merge origin/main`
> 4. Force push to your branch: `git push --force`
We add the reminder above to the documentation to tell contributors how
to avoid overwhelming linting errors.
### Motivation
According to dicussion in #896, this PR immigrates from yapf & pylint to
ruff based on pre-commit, which allows unified version control and
automatic hook on committing.
### Summary
The `pre-commit` hook and CI
- checks staged / committed files in commits / PR's
- checks all files each month (This should fail before we fix all the
files by the ruff standard)
### Explanation for the Failing CI Workflow `pre-commit`
For now, we only apply `ruff format` and `ruff check --fix` **without
resolving all the errors**, since there are too many errors to resolve,
which causes the CI workflow `pre-commit` fails.
For resolving the remaining errors, we leave to future commits.
Specifically, the `pre-commit` hook and CI will require every commit to
fix its related files with `ruff`, which will fix all the files
incrementally.
### Reviewing Suggestion
The commit
3d93f51ba8
is huge since we apply `ruff` to all the files. To review the main
changes, please check the commits before and after it.
HF Dataset provides better memory management and can handle larger
datasets. It also supports multi-process acceleration during map/filter
operations (while pandas requires version >2.0).
Now we can specify `filter_overlong_prompts` on large-scale datasets
when set `filter_overlong_prompts_workers` to a appreciate num.
---------
Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
* add a workflow to run pylint
* add a section to `pyproject.toml` that blacklists all rules which
would trigger given the current code
* pin a version of pylint in `requirements.txt` for reproducability
In a followup PR I will remove some rules from the blacklist and fix
some bugs.
https://github.com/volcengine/verl/issues/680
Changes:
- Move math-verify to the optional dependencies. Now it can be installed
via `cd verl && pip install -e .[math]`
- Revert using naive verifier for math dataset. Users can switch to
math-verify or custom a new `compute_score` function.
Try to resolve this
[issue](https://github.com/volcengine/verl/issues/356).
As suggested by this issue discussion, I replace default DataLoader with
StatefulDataloader, which provides state_dict and load_state_dict
methods that may support resuming the iterator position of mid-epoch
checkpointing.
This PR aims to integrate vllm>=0.7.0 and preserve:
**Backward compatibility**: 0.3.1, 0.4.2, 0.5.4, 0.6.3 are still
supported
**Forward compatibility**: Future versions of vllm (>= 0.7.0) will be
supported without requiring manual maintenance for each new release.
The readme of this Beta version is located at docs/README_vllm0.7.md,
where users can find the installation method and related features. This
readme is copied as below.
---
# Readme for verl(vllm>=0.7) version
## Installation
Note: This version of veRL supports **FSDP** for training and **vLLM**
for rollout. (Megatron-LM is not supported yet.)
```
# Create the conda environment
conda create -n verl python==3.10
conda activate verl
# Install verl
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -e .
# Install vLLM>=0.7
pip3 install vllm==0.7.0
# Install flash-attn
pip3 install flash-attn --no-build-isolation
```
For existing stable vllm versions (<=0.7.2), you also need to make some
tiny patches manually on vllm (/path/to/site-packages/vllm after
installation) after the above steps:
- vllm/distributed/parallel_state.py: Remove the assertion below:
```
if (world_size
!= tensor_model_parallel_size * pipeline_model_parallel_size):
raise RuntimeError(
f"world_size ({world_size}) is not equal to "
f"tensor_model_parallel_size ({tensor_model_parallel_size}) x "
f"pipeline_model_parallel_size ({pipeline_model_parallel_size})")
```
- vllm/executor/uniproc_executor.py: change `local_rank = rank` to
`local_rank = int(os.environ["LOCAL_RANK"])`
- vllm/model_executor/model_loader/weight_utils.py: remove the
`torch.cuda.empty_cache()` in `pt_weights_iterator`
These modifications have already been merged into the main branch of
vLLM. To avoid modifying these files manually, you can directly build
vLLM from source.
## Features
### Use cuda graph
After installation, examples using FSDP as training backends can be
used. By default, the `enforce_eager` is set to True, which disables the
cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add
the following lines to the bash script:
```
actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \
```
For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh,
the rollout generation time is 115 seconds with vLLM0.6.3, while it is
85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation
duration is further reduced to 62 seconds.
**Note:** Currently, if the `n` is greater than 1 in `SamplingParams` in
vLLM>=0.7, there is a potential performance issue on the stability of
rollout generation time (Some iterations would see generation time
bursts). We are working with the vLLM team to check this issue.
### Other features in vLLM
1. **num_scheduler_step>1:** not supported yet (weight loading has not
been aligned with `MultiStepModelRunner`)
2. **Prefix caching:** not supported yet (vLLM sleep mode does not
support prefix caching)
3. **Chunked prefill:** supported
---------
Co-authored-by: zhangshulai <zhangshulai@bytedance.com>
## Summary
This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance`
to init a fsdp worker model.
## Main Changes
1. Adding an option of using
`liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model
from pretained, instead of the default
`transformers.AutoModelForCausalLM`
2. Added a test case using configuration file
`tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh`
## Related Issue
#96
## TODO
#97 optimize the memory usage when computing entropy & log_probs
6d96fda3d4/verl/workers/actor/dp_actor.py (L94-L106)
---------
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
This PR adds support for LoRA (Low-Rank Adaptation) for efficient model
fine-tuning.
### Changes
1. Added LoRA configuration support in trainer config
2. Modified FSDP wrapping policy to handle LoRA modules
3. Integrated with existing FSDP training infrastructure
4. Added peft dependency
5. Removed unused ring_attn_utils.py
### Features
- Configurable LoRA rank and alpha parameters
- Target module specification for selective adaptation
- Compatible with FSDP sharding strategy
### Testing
Tested with Qwen2.5-0.5B-Instruct model on GSM8K dataset using the
provided example script.
### Dependencies
- Added `peft` package to requirements.txt
This PR is based on commit 902ddbe6 and has been merged with the latest
upstream main branch.
---------
Co-authored-by: Jiayi Pan <i@jiayipan.me>
Co-authored-by: openhands <openhands@all-hands.dev>
* init commit of rmpad
* add rmpad test
* support rmpad in actor model
* add test for value model
* support rmpad in critic and rm
* fix actor return and fix num_labels and clean not used rmpad
* fix critic and benchmark
* update script
* fix critic
* lint
* fix util issue
* fix unnecessary unpad
* address issues
* fix args
* update test and update rmpad support model list
* fix typo
* fix typo and fix name
* rename rmpad to rename padding
* fix arch to model_type
* add ci for e2e rmpad and fix typo
* lint
* fix ci
* fix typo
* update tests for customize tokenizer in actor
* fix rmpad test
* update requirement of transformers as hf_rollout may have issue
* [megatron] style: clean up unused code in megatron
* update docs
* add install from docker section for docs
---------
Co-authored-by: Your Name <you@example.com>
* [ci] upload several tests
* [ci] add sanity and tensordict utility workflow
* [ci] fix workflow
* try fix import ci
* [dataproto] update repeat and unpad/pad
* fix rollout test to 2GPU
* add a fsdp vllm hybridengine script, which can be launched by torchrun
* fix import test
* update requirement.txt
* draft vllm fsdp test
* update label
* fix
* upload conda
* test conda
* test ci
* use docker
* test ci
* test ci
* test ci
* update ci
* test ci
* fix model loader
* fix model loader
* test ci
* test
* upload e2e digit completion test
* update running script for e2e test
* update test config
* fix path
* test
* fix import to register autotokenizer
* fix tokenizer
* fix create dataset
* fix
* fix reward model validate
* fix reward module of digit_completion
* fix reward module of digit_completion
* fix reward module of digit_completion
* fix reward module of digit_completion
* fix reward module of digit_completion
* can run but seems to have some test issue
* no problem, add check results
* add e2e training
* l20-0 seems has docker permission problem, test later
* fix
* test l20-0 and torchrun
* test l20-0 and torchrun
* fix
* fix
* fix
* fix
* fix
* tolerate difference
* tolerate difference with levenshtein
* lint
* add more test for ray
* delete
* use docker on l20
* use docker on l20
* add upgrade
* update ci
* delete code
* ignore test
* upgrade ray
* fix workerhelper method
* lint
* revert worker changes
* fix
* fix
* fix
* fix worker missing func