Files
Mikayla Gawarecki 19207b9183 Allow more backend worker threads with each using a separate cuda stream (#116190)
Added a `--num_workers` option to `server.py` that allows more than 1 worker in the `ThreadPoolWorker` used for model predictions. Each worker uses its own `cuda.Stream()` that is created when the worker thread is initialized.

Ran benchmark for 2-4 workers with `compile=False` (since compile is not thread-safe)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116190
Approved by: https://github.com/albanD
ghstack dependencies: #115286, #116187, #116188, #116189
2023-12-20 22:08:29 +00:00
..