pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Files

Mikayla Gawarecki 19207b9183 Allow more backend worker threads with each using a separate cuda stream (#116190 )

Added a `--num_workers` option to `server.py` that allows more than 1 worker in the `ThreadPoolWorker` used for model predictions. Each worker uses its own `cuda.Stream()` that is created when the worker thread is initialized.

Ran benchmark for 2-4 workers with `compile=False` (since compile is not thread-safe)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116190
Approved by: https://github.com/albanD
ghstack dependencies: #115286, #116187, #116188, #116189

2023-12-20 22:08:29 +00:00

avg_latency_plot.png

Allow more backend worker threads with each using a separate cuda stream (#116190 )

2023-12-20 22:08:29 +00:00

throughput_plot.png

Allow more backend worker threads with each using a separate cuda stream (#116190 )

2023-12-20 22:08:29 +00:00