verl/ppo at main - verl - Gitea: Git for Me

mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Files

Yingru Li 4f1c489e45 [algo] fix: remove torch.quantile-based percentile metrics to resolve tensor size limit error (#3810 )

## Summary

Fixes #3787 by removing `torch.quantile()`-based percentile metrics
(`rollout_is_p25`, `rollout_is_p50`, `rollout_is_p75`) that caused
`RuntimeError: quantile() input tensor is too large` when using large
batch sizes or response lengths.

## Problem

When using configurations with large tensor sizes (e.g.,
`max_response_length: 32k`, `rollout.n: 16`, `train_batch_size: 16`),
the `torch.quantile()` function fails with a runtime error due to
PyTorch's internal tensor size limitations (~2^24 to 2^27 elements
depending on version, GPU memory, and dtype).

The error occurred in `verl/trainer/ppo/mismatch_helper.py`:
```python
metrics["rollout_is_p25"] = torch.quantile(flat_weights, 0.25)
metrics["rollout_is_p50"] = torch.quantile(flat_weights, 0.50)
metrics["rollout_is_p75"] = torch.quantile(flat_weights, 0.75)
```

## Solution

Removed the three quantile-based percentile metrics from the Rollout IS
framework. The remaining metrics (`rollout_is_mean`, `rollout_is_std`,
`rollout_is_min`, `rollout_is_max`, `rollout_is_eff_sample_size`, etc.)
provide sufficient monitoring capabilities for importance sampling
health without triggering tensor size limitations.

## Changes

- **Modified**:
[verl/trainer/ppo/mismatch_helper.py](verl/trainer/ppo/mismatch_helper.py)
- Removed `rollout_is_p25`, `rollout_is_p50`, `rollout_is_p75` metric
calculations
  - All other rollout IS and mismatch metrics remain functional

## Testing

Verified that:
- Rollout IS framework continues to function correctly without
percentile metrics
- No runtime errors with large tensor configurations
- All other metrics (mean, std, min, max, ESS, veto fraction, etc.) are
computed correctly

Resolves #3787

2025-10-20 13:04:57 +08:00

__init__.py

[util] docs: add docstrings to metric util functions that recipes reuse (#1395 )

2025-05-12 08:49:14 +08:00

test_core_algos_on_cpu.py

[algo, perf] feat: Vectorize GRPO Advantage Estimator - 13～26x Speedup (#3635 )

2025-09-27 17:21:08 +08:00

test_metric_utils_on_cpu.py

[recipe] feat: add retool recipe (#2233 )

2025-07-02 20:05:43 +08:00

test_rollout_is_integration.py

[rollout] refactor: rename "clip" mode back to "mask" mode (#3750 )

2025-10-13 11:06:36 -07:00

test_rollout_is.py

[algo] fix: remove torch.quantile-based percentile metrics to resolve tensor size limit error (#3810 )

2025-10-20 13:04:57 +08:00