pytorch/_C at 4908fb53c32dffd0ba32cb39a8300f1896a15de7 - pytorch - Gitea: Git for Me

frozenleaves/pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

History

vandrei 627482a7b7 [torch][cuda][device_limits] Library for querying device hardware limits for flops and bandwidth (#162942 )

In various benchmarks scattered across the repo, the limits for flops/second and memory bandwidth are usually hardcoded for a single device. This utility could help in providing a more structured way to query the device capabilities. If this is approved, we can use it when reporting flops efficiency and bandwidth relative to peak in the benchmarks and tests. The intent is to add more devices, more parameters (e.g. L2 cache bandwidth, NVLink, etc.) for both CPUs and accelerators.

Testing:

```
import torch

if torch.cuda.is_available():
    device = torch.cuda.current_device()
    mod = torch.get_device_module('cuda')
    hw = mod._device_limits.GPULimits(device)

    print(hw.get_tflops_per_second(torch.float16))
    print(hw.get_tflops_per_second(torch.float32))
    print(hw.get_tflops_per_second(torch.float64))
    print(hw.get_tflops_per_second(torch.bfloat16))
    print(hw.get_tflops_per_second(torch.int8))
    print(hw.get_memory_bandwidth_Bps() / 1e9)
    print(hw.get_shared_memory_bandwidth_Bps() / 1e9)

# Output on an H100 GPU
1070.53056
535.26528
66.90816
1070.53056
2141.06112
4893.696
33454.08
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162942
Approved by: https://github.com/ngimel

2025-09-18 06:40:07 +00:00

..

[inductor] Add FLOAT_IS_NAN and COMPLEX_IS_NAN guards (#162537 )

2025-09-12 04:32:46 +00:00

[serialization] Add pte file to archive (#162520 )

2025-09-11 07:59:11 +00:00

__init__.pyi.in

[torch][cuda][device_limits] Library for querying device hardware limits for flops and bandwidth (#162942 )

2025-09-18 06:40:07 +00:00

_aoti.pyi

[AOTInductor] Add class declarations to torch._C._aoti interface file (#155128 )

2025-06-17 00:10:57 +00:00

_autograd.pyi

Add is_hidden_event method to KinetoEvent Python interface (#155214 )

2025-07-02 16:29:21 +00:00

_cpu.pyi

[CPUInductor] Fix SVE256 detection (#146207 )

2025-02-01 18:51:34 +00:00

_cudnn.pyi

Improve typing in torch/types.py (#145237 )

2025-01-28 05:29:12 +00:00

_cusparselt.pyi

[sparse] Add cuSPARSELt as a backend (#128534 )

2024-08-21 22:06:07 +00:00

_distributed_autograd.pyi

remove allow-untyped-defs for torch/_C/_distributed_autograd.pyi (#143369 )

2024-12-17 18:09:28 +00:00

_distributed_c10d.pyi

[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )

2025-09-12 10:54:42 +00:00

_distributed_rpc_testing.pyi

Use Generic TypeAlias (PEP 585) and Union Type (PEP 604) in .pyi stub files (#129419 )

2024-06-29 09:23:39 +00:00

_distributed_rpc.pyi

Use Generic TypeAlias (PEP 585) and Union Type (PEP 604) in .pyi stub files (#129419 )

2024-06-29 09:23:39 +00:00

_functions.pyi

PEP585 update - torch/_C torch/_decomp torch/_lazy torch/_library torch/_numpy torch/_prims torch/_refs torch/_strobelight (#145102 )

2025-01-18 20:47:12 +00:00

_functorch.pyi

[dynamo] Guard serialization for FUNCTORCH_STACK_MATCH (#152616 )

2025-05-05 18:05:56 +00:00

_instruction_counter.pyi

Add compile time instruction count metric (#133834 )

2024-08-27 23:29:02 +00:00

_itt.pyi

Fix ITT unit-tests if PyTorch is compiled with USE_ITT=OFF (#86199 )

2022-10-04 21:57:05 +00:00

_jit_tree_views.pyi

added stubs for jit tree views (#156504 )

2025-06-25 06:15:17 +00:00

_lazy_ts_backend.pyi

Use Generic TypeAlias (PEP 585) and Union Type (PEP 604) in .pyi stub files (#129419 )

2024-06-29 09:23:39 +00:00

_lazy.pyi

remove allow-untyped-defs for torch/_C/_lazy.pyi (#143370 )

2024-12-17 17:18:10 +00:00

_monitor.pyi

[BE][CI][Easy] Run lintrunner on generated .pyi stub files (#150732 )

2025-05-27 14:58:02 +00:00

_nn.pyi.in

fix type hints for interpolation functions (#157202 )

2025-07-09 03:11:37 +00:00

_nvtx.pyi

Inductor annotations (#130429 )

2024-12-10 08:53:39 +00:00

_onnx.pyi

[1/N] [Caffe2] Remove caffe2_aten_fallback code (#128675 )

2024-06-17 21:25:59 +00:00

_profiler.pyi

[BE][CI][Easy] Run lintrunner on generated .pyi stub files (#150732 )

2025-05-27 14:58:02 +00:00

_VariableFunctions.pyi.in

[BE][CI][Easy] Run lintrunner on generated .pyi stub files (#150732 )

2025-05-27 14:58:02 +00:00

_verbose.pyi

[RFC] enable oneMKL&oneDNN on-demands verbose functinality (#63212 )

2022-07-27 23:29:35 +00:00

build.bzl

Make all .pyi.in files exportable from torch/_C/ folder (#74962 )

2022-03-31 12:54:14 +00:00

return_types.pyi.in

[BE][CI][Easy] Run lintrunner on generated .pyi stub files (#150732 )

2025-05-27 14:58:02 +00:00