mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
HF Dataset provides better memory management and can handle larger datasets. It also supports multi-process acceleration during map/filter operations (while pandas requires version >2.0). Now we can specify `filter_overlong_prompts` on large-scale datasets when set `filter_overlong_prompts_workers` to a appreciate num. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
22 lines
321 B
Plaintext
22 lines
321 B
Plaintext
# requirements.txt records the full set of dependencies for development
|
|
accelerate
|
|
codetiming
|
|
datasets
|
|
dill
|
|
flash-attn
|
|
hydra-core
|
|
numpy
|
|
pandas
|
|
datasets
|
|
peft
|
|
pyarrow>=15.0.0
|
|
pybind11
|
|
pylatexenc
|
|
ray[default]>=2.10
|
|
tensordict<=0.6.2
|
|
torchdata
|
|
torchvision
|
|
transformers
|
|
wandb
|
|
sglang[all]==0.4.4.post3
|
|
torch-memory-saver>=0.0.5 |