pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Maggie Moss	d795fb225a	[RFC] Add pyrefly to lintrunner (#165179 ) This will add pyrefly to lint runner as a warning only - and allow us to collect feedback about the tool before switching to pyrefly as the main type checker. References the steps outlined here: : https://github.com/pytorch/pytorch/issues/163283: test plan: `lintrunner init` `lintrunner` confirm when pyrefly errors are present results look like: https://gist.github.com/maggiemoss/e6cb2d015dd1ded560ae1329098cf33f Pull Request resolved: https://github.com/pytorch/pytorch/pull/165179 Approved by: https://github.com/ezyang	2025-10-16 20:07:09 +00:00
Yuanyuan Chen	a43c4c3972	[5/N] Apply ruff UP035 rule (#164423 ) Continued code migration to enable ruff `UP035`. Most changes are about moving `Callable` from `typing` to `from collections.abc`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164423 Approved by: https://github.com/ezyang	2025-10-02 07:31:11 +00:00
Aart J.C. Bik	9b5ec0ff7c	Use computed buffer sizes of torch for cusparseLt metadata (#163125 ) Making sure buffer allocation matches what is computed by cusparseLt compression Pull Request resolved: https://github.com/pytorch/pytorch/pull/163125 Approved by: https://github.com/jcaip	2025-09-19 22:12:40 +00:00
Xuehai Pan	5cedc5a0ff	[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format` (#144552 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144552 Approved by: https://github.com/ezyang	2025-08-07 00:09:56 +00:00
Xuehai Pan	cec2977ed2	[BE][6/16] fix typos in torch/ (#156316 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315	2025-06-23 02:57:34 +00:00
PyTorch MergeBot	3f44fdc03d	Revert "[BE][6/16] fix typos in torch/ (#156316 )" This reverts commit b210cf1ea56bcd9f937a2805d9e70d8684d25ee4. Reverted https://github.com/pytorch/pytorch/pull/156316 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:57 +00:00
Xuehai Pan	b210cf1ea5	[BE][6/16] fix typos in torch/ (#156316 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315	2025-06-22 08:43:33 +00:00
Aaron Orenstein	e95e8eed0a	mypy 1.16.0 (#155821 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155821 Approved by: https://github.com/ezyang, https://github.com/zou3519	2025-06-14 18:18:43 +00:00
Yuanhao Ji	0a7eef140b	Add `torch.Tensor._make_wrapper_subclass` to `torch/_C/__init__.pyi` (#154022 ) Fixes #153790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154022 Approved by: https://github.com/Skylion007	2025-05-27 14:10:00 +00:00
Aaron Orenstein	0afd335174	PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175 Approved by: https://github.com/bobrenjc93	2025-01-21 16:57:27 +00:00
PyTorch MergeBot	5fd881a5b6	Revert "PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175 )" This reverts commit 54a00af2c6026a830f40d9e6a659ff81d51f9bc6. Reverted https://github.com/pytorch/pytorch/pull/145175 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some trunk tests ([comment](https://github.com/pytorch/pytorch/pull/145175#issuecomment-2603418267))	2025-01-21 00:49:55 +00:00
Aaron Orenstein	54a00af2c6	PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175 Approved by: https://github.com/bobrenjc93	2025-01-20 22:32:59 +00:00
Jesse Cai	bc21689136	[sparse][semi-structured] Add float8 dtype support to 24 sparsity (#136397 ) Summary: This PR adds `torch.float8e4m3fn` support to cuSPARSELt and `to_sparse_semi_structured`. This will let users to run fp8 + 2:4 sparse matmuls on Hopper GPUs with cusparselt >= 0.6.2, via to `scaled_mm` API. ``` A = rand_sparse_semi_structured_mask(256, 128, dtype=torch.float16) B = torch.rand(dense_input_shape, device=device).to(torch.float16).t() A_fp8, A_scale = to_float8(A) B_fp8, B_scale = to_float8(B) dense_result = torch._scaled_mm( A_fp8, B_fp8, scale_a=A_scale, scale_b=B_scale, out_dtype=out_dtype ) A_fp8_sparse = to_sparse_semi_structured(A_fp8) sparse_result = torch._scaled_mm( A_fp8_sparse, B_fp8, scale_a=A_scale, scale_b=B_scale, out_dtype=out_dtype ) ``` Note that to keep this consistent with normal torch behavior, calling `torch.mm(A_fp8_sparse, B_fp8)` will raise a NotImplementedError. I also turned on cuSPARSELt by default and added CUSPARSELT_MAX_ID to the backend to make the tests a bit cleaner Test Plan: ``` python test/test_sparse_semi_structured -k scaled_mm python test/test_sparse_semi_structured -k fp8 ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/136397 Approved by: https://github.com/drisspg	2024-09-27 21:37:34 +00:00
Aaron Gokaslan	31715be72a	[BE]: Update mypy to 1.11.2 (#133816 ) Updates mypy to 1.11.1 to improve type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816 Approved by: https://github.com/ezyang	2024-09-16 19:44:11 +00:00
PyTorch MergeBot	3117f2cf67	Revert "[BE]: Update mypy to 1.11.2 (#133816 )" This reverts commit 55299cfc223fa838aadd8d6d6fa3ed541fa5acd1. Reverted https://github.com/pytorch/pytorch/pull/133816 on behalf of https://github.com/jeanschmidt due to seems to have broken https://github.com/pytorch/pytorch/actions/runs/10865710499/job/30155699792 on main ([comment](https://github.com/pytorch/pytorch/pull/133816#issuecomment-2352377684))	2024-09-16 09:11:16 +00:00
Aaron Gokaslan	55299cfc22	[BE]: Update mypy to 1.11.2 (#133816 ) Updates mypy to 1.11.1 to improve type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816 Approved by: https://github.com/ezyang	2024-09-14 21:40:36 +00:00
Jesse Cai	157de30f53	[sparse] Update cuSPARSELt to v0.6.2 (#134022 ) Summary: This PR updated cuSPARSELt to v0.6.2. I think we should land https://github.com/pytorch/pytorch/pull/128534 first though. Most of this PR is just enabling tests to run when cuSPARSELt v0.6.2 is available. Unfortunately was running into a bug with fp32 support on Hopper, so I removed fp32 support from the cuSPARSELt backend. I think this should be fine since almost everybody uses the bfloat/float16/int8 kernels. Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/134022 Approved by: https://github.com/jerryzh168, https://github.com/malfet ghstack dependencies: #128534	2024-08-23 19:34:53 +00:00
Xuehai Pan	30293319a8	[BE][Easy][19/19] enforce style for empty lines in import segments in `torch/[o-z]*/` (#129771 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129771 Approved by: https://github.com/justinchuby, https://github.com/janeyx99	2024-08-01 17:07:14 +00:00
WeiChunyu-star	535016967a	Enable UFMT on all of torch/sparse (#130545 ) Partially addresses #123062 Ran lintrunner on: - torch/sparse Detail: ``` $ lintrunner -a --take UFMT --all-files ok No lint issues. Successfully applied all patches. ``` @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/130545 Approved by: https://github.com/ezyang	2024-07-15 22:35:52 +00:00
Aaron Orenstein	8db9dfa2d7	Flip default value for mypy disallow_untyped_defs [9/11] (#127846 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127846 Approved by: https://github.com/ezyang ghstack dependencies: #127842, #127843, #127844, #127845	2024-06-08 18:50:06 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit 749a132fb0a8325cbad4734a563aa459ca611991. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
Jesse Cai	c9db59e9e4	[sparse] Add fast semi-structured spasification kernels (#122350 ) This PR adds in fast semi-structured sparsification kernels to PyTorch. These kernels allow for accelerated semi-structured sparsification kernels in PyTorch. The kernels have been added as aten native functions In particular, three new functions have been added: * `torch._sparse_semi_structured_tile` This function will return the packed representation and metadata for both X and X', as well as the thread masks. Note that this applies 2:4 sparsity in a 4x4 tile instead of a 1x4 strip as usual. * `torch._sparse_semi_structured_apply` This function takes in an input tensor and thread masks from the above function and returns a packed representation and metadata from applying thread masks to the input tensor. * `torch._sparse_semi_structured_apply_dense` This function does the same thing as above but instead of returning the tensor in the sparse representation it returns it in the dense representation The subclasses have also been updated to add a new `prune_dense_static_sort` classmethod to create sparse tensors with this format. I've added some additional documentatino on how to calculate the compressed tensors needed to create a SparseSemiStructuredTensor oneself. To this end, there are two new helper functions added: `sparse_semi_structured_tile` `compute_compressed_swizzled_bitmask` Differential Revision: [D56190801](https://our.internmc.facebook.com/intern/diff/D56190801) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122350 Approved by: https://github.com/cpuhrsch	2024-04-19 13:31:58 +00:00
PyTorch MergeBot	2dc15b6849	Revert "[sparse] Add fast semi-structured spasification kernels (#122350 )" This reverts commit 14b2273b0c58b4000e10b2e441341eeafb7dd2f6. Reverted https://github.com/pytorch/pytorch/pull/122350 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/122350#issuecomment-2061070350))	2024-04-17 11:47:02 +00:00
Jesse Cai	14b2273b0c	[sparse] Add fast semi-structured spasification kernels (#122350 ) This PR adds in fast semi-structured sparsification kernels to PyTorch. These kernels allow for accelerated semi-structured sparsification kernels in PyTorch. The kernels have been added as aten native functions In particular, three new functions have been added: * `torch._sparse_semi_structured_tile` This function will return the packed representation and metadata for both X and X', as well as the thread masks. Note that this applies 2:4 sparsity in a 4x4 tile instead of a 1x4 strip as usual. * `torch._sparse_semi_structured_apply` This function takes in an input tensor and thread masks from the above function and returns a packed representation and metadata from applying thread masks to the input tensor. * `torch._sparse_semi_structured_apply_dense` This function does the same thing as above but instead of returning the tensor in the sparse representation it returns it in the dense representation The subclasses have also been updated to add a new `prune_dense_static_sort` classmethod to create sparse tensors with this format. I've added some additional documentatino on how to calculate the compressed tensors needed to create a SparseSemiStructuredTensor oneself. To this end, there are two new helper functions added: `sparse_semi_structured_tile` `compute_compressed_swizzled_bitmask` Differential Revision: [D56190801](https://our.internmc.facebook.com/intern/diff/D56190801) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122350 Approved by: https://github.com/cpuhrsch	2024-04-16 20:31:52 +00:00
Aleksandar Samardžić	f5331aade5	Simplify ATen sparse semi-structured operators based on CUTLASS (#123473 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123473 Approved by: https://github.com/cpuhrsch	2024-04-14 06:57:41 +00:00
PyTorch MergeBot	97261be0a8	Revert "Simplify ATen sparse semi-structured operators based on CUTLASS (#123473 )" This reverts commit b2a0b8c446234f0b35a66aff87501c4596ea5d51. Reverted https://github.com/pytorch/pytorch/pull/123473 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/123473#issuecomment-2053561077))	2024-04-13 07:47:32 +00:00
PyTorch MergeBot	3120dbbf81	Revert "[sparse] Add fast semi-structured spasification kernels (#122350 )" This reverts commit aaec97a40364bb6ccfd968f28d309cfff8748d20. Reverted https://github.com/pytorch/pytorch/pull/122350 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/122350#issuecomment-2051757450))	2024-04-12 13:26:10 +00:00
Jesse Cai	aaec97a403	[sparse] Add fast semi-structured spasification kernels (#122350 ) This PR adds in fast semi-structured sparsification kernels to PyTorch. These kernels allow for accelerated semi-structured sparsification kernels in PyTorch. The kernels have been added as aten native functions In particular, three new functions have been added: * `torch._sparse_semi_structured_tile` This function will return the packed representation and metadata for both X and X', as well as the thread masks. Note that this applies 2:4 sparsity in a 4x4 tile instead of a 1x4 strip as usual. * `torch._sparse_semi_structured_apply` This function takes in an input tensor and thread masks from the above function and returns a packed representation and metadata from applying thread masks to the input tensor. * `torch._sparse_semi_structured_apply_dense` This function does the same thing as above but instead of returning the tensor in the sparse representation it returns it in the dense representation The subclasses have also been updated to add a new `prune_dense_static_sort` classmethod to create sparse tensors with this format. I've added some additional documentatino on how to calculate the compressed tensors needed to create a SparseSemiStructuredTensor oneself. To this end, there are two new helper functions added: `sparse_semi_structured_tile` `compute_compressed_swizzled_bitmask` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122350 Approved by: https://github.com/cpuhrsch	2024-04-12 02:22:56 +00:00
Aleksandar Samardžić	b2a0b8c446	Simplify ATen sparse semi-structured operators based on CUTLASS (#123473 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123473 Approved by: https://github.com/cpuhrsch	2024-04-11 11:56:27 +00:00
PyTorch MergeBot	e61d04e467	Revert "[sparse] Add fast semi-structured spasification kernels (#122350 )" This reverts commit c63a7b569133c9d91bde362c68e4f60abd4b619b. Reverted https://github.com/pytorch/pytorch/pull/122350 on behalf of https://github.com/malfet due to This broke rocm builds, which is visible on PR as well ([comment](https://github.com/pytorch/pytorch/pull/122350#issuecomment-2038424125))	2024-04-04 23:15:36 +00:00
Jesse Cai	c63a7b5691	[sparse] Add fast semi-structured spasification kernels (#122350 ) This PR adds in fast semi-structured sparsification kernels to PyTorch. These kernels allow for accelerated semi-structured sparsification kernels in PyTorch. The kernels have been added as aten native functions In particular, three new functions have been added: * `torch._sparse_semi_structured_tile` This function will return the packed representation and metadata for both X and X', as well as the thread masks. Note that this applies 2:4 sparsity in a 4x4 tile instead of a 1x4 strip as usual. * `torch._sparse_semi_structured_apply` This function takes in an input tensor and thread masks from the above function and returns a packed representation and metadata from applying thread masks to the input tensor. * `torch._sparse_semi_structured_apply_dense` This function does the same thing as above but instead of returning the tensor in the sparse representation it returns it in the dense representation The subclasses have also been updated to add a new `prune_dense_static_sort` classmethod to create sparse tensors with this format. I've added some additional documentatino on how to calculate the compressed tensors needed to create a SparseSemiStructuredTensor oneself. To this end, there are two new helper functions added: `sparse_semi_structured_tile` `compute_compressed_swizzled_bitmask` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122350 Approved by: https://github.com/cpuhrsch	2024-04-04 19:07:35 +00:00
Jesse Cai	16369816a2	[sparse] semi-structured sparse refactor (#117302 ) Summary: This PR is a refactor of semi-structured sparsity support. deprecation: Before `torch.sparse.to_sparse_semi_structured` had a kwarg param `transposed=False`, which has been removed. This kwarg was unused and now thros a deprecation warning. Namely, I've taken the subclassing implementation that xFormers has created and brought it over to PyTorch, as part of our plan to upstream runtime 2:4 sparsity. I've also copied over all the op support that Daniel implemenented that did not depend on the fast sparsification routines, into `_sparse_semi_structured_ops.py` With this subclass, all of our internal tests pass, as well as those in xFormers. The main change is that we now define a base subclass, `SparseSemiStructuredTensor` that is inherited from for each of the specific backends. We also now can arbitrarily override the sparse dispatch table with `_load_dispatch_table()`, idea being this is still general enough where users don't need to modify pytorch source code to get their model working. This also adds in padding support and stores alg_id and fuse_transpose as flags on the tensor, instead of hardcoding them. There still remains two components in xFormers that will need to be ported over eventually: - the autograd functions (`Sparsify24`, `Sparsify24_like`) - fast sparsification routines that they rely on Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/117302 Approved by: https://github.com/alexsamardzic, https://github.com/HDCharles	2024-02-14 01:10:40 +00:00
Jesse Cai	1c1dc0e4e0	[sparse] Add in out_dtype support (i8i8->bf16, i32) for cusparselt (#119296 ) Summary: Adds in out_dtype support for (i8i8->bf16) and (i8i8->i32) matmul with cuSPARSELt. Test Plan: ``` python test/test_sparse_semi_structured.py -k mixed ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/119296 Approved by: https://github.com/cpuhrsch, https://github.com/alexsamardzic	2024-02-12 16:02:36 +00:00
Aleksandar Samardžić	341c4227a8	Update F32 sparse semi-structured support for CUTLASS back-end (#116017 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116017 Approved by: https://github.com/jcaip	2023-12-22 16:53:04 +00:00
Jesse Cai	a8e354a9a0	[sparse][semi-structured] enable fp32 support, separate sparse and dense constraints (#115550 ) Summary: Both cuSPASRELt and CUTLASS support 1:2 semi-structured sparsity for fp32, which this PR enables.(thanks @alexsamardzic). Furthermore, this PR also updates the sparse_config to take into account the different shape constraints for sparse and dense matrices. Technically, cuSPARSELt supports smaller sparse matrix constraints as it seens to pad to the CUTLASS constraints under the hood. However, in practice small sparse matrices are not commonly used and we care more about the dense constraints for LLM inference. For now, we keep the CUTLASS constraints in place for both cuSPARSELt and CUTLASS tensors This PR also reconnects the _FUSE_TRANSPOSE flag for cuSPARSELt tensors. Test Plan: ``` python test/test_sparse_semi_structured.py ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/115550 Approved by: https://github.com/cpuhrsch	2023-12-15 02:28:17 +00:00
Joel Schlosser	22704426c3	Expand dynamic dims support for traceable subclasses (#114311 ) Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors of the same dim as outer when `mark_dynamic(outer, ...)` is called * Addresses this: `6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)` * Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols) * Signatures now: ```python # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr) # ctx is anything useful for rebuilding the class we want to guard on attrs, ctx = x.__tensor_flatten__() ... # inner_tensors is a dict of {attr -> tensor} # ctx is taken unmodified from flattening and (eventually) guarded on # outer_size is the expected size of the output; possibly symbolic # outer_stride is the expected strides of the output; possibly symbolic y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride) # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride # the assert simplifies symbols when there are relationships between outer and inner symbols ``` * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work) * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Next PR: add TENSOR_MATCH guards on inner tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311 Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh	2023-12-05 21:09:25 +00:00
Jesse Cai	4cb7dd0fc9	[sparse][quant] Add support for vector alpha in cusparselt mm (#112056 ) Summary: This PR adds in support for passing in a alpha Tensor, which represents a tensor of alpha values to fuse into the matmul. ``` cusparselt_sparse_mm = alpha A @ B + bias ``` This operation is necessary for quantization, where we would like to fuse one of the dequant matmuls into the sparse op. Test Plan: ``` python test/test_sparse_semi_structured -k alpha ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/112056 Approved by: https://github.com/cpuhrsch	2023-12-04 16:56:06 +00:00
Jesse Cai	702aaf8aea	[sparse] semi-structured sparse + torch.compile support (#111049 ) Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049 Approved by: https://github.com/cpuhrsch	2023-10-24 02:23:20 +00:00
PyTorch MergeBot	41490119f2	Revert "[sparse] semi-structured sparse + torch.compile support (#111049 )" This reverts commit 408f210938176870133a3dde5e8fbc4926cafbc0. Reverted https://github.com/pytorch/pytorch/pull/111049 on behalf of https://github.com/clee2000 due to Sorry I'm pretty sure this caused a memory leak `408f210938` https://github.com/pytorch/pytorch/actions/runs/6550388354/job/17790615103 `test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(1, 128)_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(1, 128)_cuda! Caching allocator allocated memory was 235008 and is now reported as 352256 on device 0. CUDA driver allocated memory was 359333888 and is now 361431040.` ([comment](https://github.com/pytorch/pytorch/pull/111049#issuecomment-1767186569))	2023-10-17 21:11:09 +00:00
Jesse Cai	408f210938	[sparse] semi-structured sparse + torch.compile support (#111049 ) Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110583	2023-10-16 23:07:26 +00:00
PyTorch MergeBot	b4745d476c	Revert "[sparse] semi-structured sparse + torch.compile support (#111049 )" This reverts commit ac02531babab028cb260d2225ff9e91e92df063b. Reverted https://github.com/pytorch/pytorch/pull/111049 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/111049#issuecomment-1763795957))	2023-10-16 06:16:59 +00:00
Jesse Cai	ac02531bab	[sparse] semi-structured sparse + torch.compile support (#111049 ) Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110583	2023-10-14 01:13:01 +00:00
Jesse Cai	8db72a430d	[sparse] Add padding for dense matrices in semi-structured sparse (#110583 ) Summary: Currently we have shape constraints in semi-structured sparsity for both CUTLASS and cuSPARSELt These shape constraints unfortunately apply to both the dense and sparse matrices in sparsedense matmul. This PR adds in support for calling `F.pad` in order to pad dense matrices to the right size with zeros and then pull out the corresponding rows from the resultant result matrix. We also throw a warning in this case. The tests have also been updated to take in a dense_input_shape parameter. Test Plan: ``` python test/test_sparse_semi_structured.py ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110583 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch	2023-10-13 20:04:23 +00:00
Jesse Cai	f10aab03c4	[sparse] Fix semi-structured sparse shape mismatch bug (#110420 ) Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110420 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch	2023-10-10 03:07:31 +00:00
Aleksandar Samardžić	6a202c36af	Minor fixes in semi-structured sparse code (#105595 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105595 Approved by: https://github.com/jcaip	2023-09-25 14:06:08 +00:00
Christian Puhrsch	925d71e72e	[core][sparse][pruning] cuSPARSELt Kernels and ops. (#107398 ) Summary: This is a duplicate PR of 102133, which was reverted because it was failing internal tests. It seems like that internal builds did not like my guard to check if cuSPARSELt was available or not. Test Plan: python test/test_sparse_semi_structured.py Differential Revision: D48440330 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107398 Approved by: https://github.com/cpuhrsch	2023-08-25 07:04:15 +00:00
PyTorch MergeBot	fe594ab323	Revert "[core][pruning][feature] cuSPARSELt kernels and ops (#102133 )" This reverts commit ad22f0ffb456fc3f967ad32e09376f7c9cf94a56. Reverted https://github.com/pytorch/pytorch/pull/102133 on behalf of https://github.com/jcaip due to breaking lots of internal builds, see D48144534 ([comment](https://github.com/pytorch/pytorch/pull/102133#issuecomment-1671707821))	2023-08-09 16:03:14 +00:00
Jesse Cai	ad22f0ffb4	[core][pruning][feature] cuSPARSELt kernels and ops (#102133 ) This PR contains two new private ops, added for cuSPARSELt support. These ops call into the cuSPASRELt kernels using the bindings they provide. For more information, see the documentation [here](https://docs.nvidia.com/cuda/cusparselt/index.html). The two new private ops added are: ``` _cslt_compress() _cslt_sparse_mm() ``` _cslt_compress is an op that reuturns the compressesed matrix given a sparse matrix that is passed in. _cslt_sparse_mm is an op that expects a compressed matrix (the result of _cslt_compress) and a dense matrix and performs sparse-dense matmul These ops will throw runtime errors if they cusparselt is not present. This PR also modifies the test and tensor sublass to reflect the new cuSPARSELt support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102133 Approved by: https://github.com/cpuhrsch	2023-08-08 06:59:22 +00:00

1 2

63 Commits